这一期我们讨论Designing Data-Intensive Applications书中partitioning这一章的学习笔记。

🔴  这一期偏重技术话题,我们会用很多英文表述技术性专有名词。之前有朋友反馈过中英夹杂对大家收听不方便,希望在意的朋友见谅。如果有不准确或者过时的地方欢迎指正。

# Show Notes

📕 Designing Data-Intensive ApplicationsWhat is partitioning?A partition is a division of a logical database or its constituent elements into distinct independent parts.Main reason: scalability - the query load can be distributed across many processors.Youtube / Vitess scaling storySingle MySQL → Add read replica → Write can’t catchup up → PartitionHow to partition?Partitioning by Key Range (e.g., Bigtable)Assign a continuous range of keys to each partitionPro: range scan is easier, data localityCons: certain access patterns can lead to hot spots (timestamp)Cons: finding split points and managing rebalancing is hardPartitioning by HashGood hash function: uniformly distribute keysCon: no easy range queriesCassandra does KKV (partitioning key, sort key, value)Hot spots: 3% of Twitter's Servers Dedicated to Justin BieberSecondary indexes: Local indexEfficient write, expensive readElasticSearchSecondary indexes: Global indexEfficient read, expensive writeUsing Global Secondary Indexes in DynamoDB (这里说错了,DynamoDB 支持 20 global secondary indexes per table)Rebalancing partitionsMove loads to other nodesFixed number of partitionsNew node steals partitions from every existing nodeNotion: 480 partitionsDynamic partitioning📈: split partition into 2📉: merge 2 partitions into 1Fixed number of partitions per nodehttps://www.datastax.com/blog/new-token-allocation-algorithm-cassandra-30Operations: full automatic (dangerous) / semi-automatic / full manual (tedious)Request Routing3 approaches: nodes talk to each other, separate routing tier, smart clientSeparate coordination service such as ZooKeeperNotes by xg

# 联系方式

官网: eng.cafe微信公众号: Eng CafeTwitter: @engcafefmYoutube: Eng Cafe小宇宙播客泛用型播客客户端: eng.cafe/subscribeEmail: [email protected]

Twitter Mentions