DDIA | Partitioning | Partitioning of Key-Value Data

In previous few posts, we learned about Replication, from this post, we will look into partitions. 

Replication is having multiple copies of same data on multiple nodes. For very large datasets, or very high query throughput, that is not sufficient: we need to break the data up into partitions, also known as sharding.

The main reason for wanting to partition data is scalability. Different partitions can be placed on different nodes in a shared-nothing cluster.  Thus, a large dataset can be distributed across many disks, and the query load can be distributed across many processors.

Partitioning of Key-Value Data

Skewed partitioning – If the partitioning is unfair, so that some partitions have more data or queries than others, we call it skewed

Hotspot – all the load could end up on one partition, so 9 out of 10 nodes are idle and your bottleneck is the single busy node. A partition with disproportionately high load is called a hot spot.

  1. Partitioning by Key Range – One way of partitioning is to assign a continuous range of keys (from some minimum to some maximum) to each partition. If you know the boundaries between the ranges, you can easily determine which partition contains a given key. If you also know which partition is assigned to which node, then you can make your request directly to the appropriate node.

    The downside of key range partitioning is that certain access patterns can lead to hot spots. If the key is a timestamp, then the partitions correspond to ranges of time, and it’s possible for some of the data to be accessed at particular times while nothing gets accessed in other times.

  2. Partitioning by Hash of Key – Using hash function to determine partition of keys. Once you have a suitable hash function for keys, you can assign each partition a range of hashes (rather than a range of keys), and every key whose hash falls within a partition’s range will be stored in that partition.

    Unfortunately however, by using the hash of the key for partitioning we lose a nice property of key-range partitioning: the ability to do efficient range queries. Keys that were once adjacent are now scattered across all the partitions, so their sort order is lost. This can be solved using compound primary key consisting of several columns. Only the first part of that key is hashed to determine the partition, but the other columns are used as a concatenated index for sorting the data. (used by cassandra).

 

Skewed Workloads and Relieving Hot Spots

Consider this example – for example, on a social media site, a celebrity user with millions of followers may cause a storm of activity when they do something. This event can result in a large volume of writes to the same key (where the key is perhaps the user ID of the celebrity, or the ID of the action that people are commenting on). Hashing the key doesn’t help, as the hash of two identical IDs is still the same.

Today, most data systems are not able to automatically compensate for such a highly skewed workload, so it’s the responsibility of the application to reduce the skew.

For example, if one key is known to be very hot, a simple technique is to add a random number to the beginning or end of the key. However, having split the writes across different keys, any reads now have to do additional work, as they have to read the data from all keys and combine it.

This technique also requires additional bookkeeping: it only makes sense to append the random number for the small number of hot keys; for the vast majority of keys with low write throughput this would be unnecessary overhead. Thus, you also need some way of keeping track of which keys are being split.

Thanks for stopping by! Hope this gives you a brief overview in to partitioning. Eager to hear your thoughts and chat, please leave comments below and we can discuss. 


One response to “DDIA | Partitioning | Partitioning of Key-Value Data”