6. Partition

2021. 1. 8. 07:56

Partitioning of key-value data

key-range partitioning
- partitioned with range
- good for range-search
- rebalancing by spliting into two subranges.
hash partitioning
- apply hash function. might add additional random letters at the end/begining.
  - common to create fixed number of partition in advance

Document-partitioned index(local index)
- secondary index is stored at the same partition with primary key and value
- one write requires write to one partition.
- one read may require reads from multiple partitions
Term-partitioned index(global index)
- secondary indices are partitioned too. Based on term
- one write may require write to multiple partitions
- one read only require read from one partition.
  
  Rebalancing Partitions
  
  Strategies for rebalancing
- fixed number of partitions
  - create more partitions than the number of nodes and spread it when node is added.
  - Number of partitions does not change. But the number of partition per nodes changes.
- dynamic partitioning
  - Similar with the top level of B-tree
  - Suitable for key range partitioned data, but hash-partitioned can also use it.
- Partitioning proportionally to nodes.
  - When new node joins the cluster, randomly choose the node to split and half of the data are moved.
  Automatic? Manual?
- due to the overhead and risk of rebalancing, manual commit should join the process
  
  Request Routing
  
  3 ways
- client contact any node → if it does not have right partition, forward.
- client contact to routing tier like zookeeper
- client be aware of the partitioning
  
  Common issue
- how to make routing decisions?
- hard problem.
- in case of Zookeeper,
  - each node registers itself in Zookeeper.
  - Zookeeper maintains the authoritative mapping of partitions to nodes. Whenever t

7. Transactions (0)	2021.01.09
5. Replication (0)	2021.01.07