6. Partition
Partitioning of key-value data
- key-range partitioning
- partitioned with range
- good for range-search
- rebalancing by spliting into two subranges.
- hash partitioning
- apply hash function. might add additional random letters at the end/begining.
- common to create fixed number of partition in advance
- apply hash function. might add additional random letters at the end/begining.
Partitioning and Secondary index
Document-partitioned index(local index)
- secondary index is stored at the same partition with primary key and value
- one write requires write to one partition.
- one read may require reads from multiple partitions
Term-partitioned index(global index)
secondary indices are partitioned too. Based on term
one write may require write to multiple partitions
one read only require read from one partition.
Rebalancing Partitions
Strategies for rebalancing
fixed number of partitions
- create more partitions than the number of nodes and spread it when node is added.
- Number of partitions does not change. But the number of partition per nodes changes.
dynamic partitioning
- Similar with the top level of B-tree
- Suitable for key range partitioned data, but hash-partitioned can also use it.
Partitioning proportionally to nodes.
- When new node joins the cluster, randomly choose the node to split and half of the data are moved.
Automatic? Manual?
due to the overhead and risk of rebalancing, manual commit should join the process
Request Routing
3 ways
client contact any node → if it does not have right partition, forward.
client contact to routing tier like zookeeper
client be aware of the partitioning
Common issue
how to make routing decisions?
hard problem.
in case of Zookeeper,
- each node registers itself in Zookeeper.
- Zookeeper maintains the authoritative mapping of partitions to nodes. Whenever t
'Books > DDIA' 카테고리의 다른 글
7. Transactions (0) | 2021.01.09 |
---|---|
5. Replication (0) | 2021.01.07 |