관리자

#노마드코더 #북클럽 #노개북

전자책으로 참여 신청...!

 

7. Transactions

The slippery Concept of a Transaction

ACID

  • Atomicity

    • The transaction is aborted if some of writes in one grouped atomic transaction can not be completed.
    • main idea: what happens if a client wants to make several writes but what if some fails?
  • Consistency

    • the data should always follow some rules.
      • ex. in accounting system, credit sum == debit sum
    • it is a property of application
  • Isolation

    • concurrently executing transactions are isolated from each other
    • classic text describes isolation as serializability
    • (serializability): each transaction can pretend that it is the only transaction running on the entire database
  • Durability

    • Once a transaction has committed successfully, data written will not be forgotten, despite of hardware fault or database crash
    • single-node DB : data has been written to nonvolatile storage
    • replicated DB : successfully copied to some number of nodes

Single-Object and Multi-Object Operations

  • common philosophy: Abandon transaction entirely, rather than remain half-finished

  • leaderless replication : best effort

  • retry is good, though not perfect

    • transaction succeed, but network( DB → server ) failed

    • when failed due to overload → worse

    • only worth in transient errors

    • side effects

    • if client process fails while retrying, any data it was trying to write to the DB is lost

      Weak Isolation Levels

    • transaction isolation

      Rather than blindly relying on tools, we need to develop a good understanding of the kinds of concurrency problems that exist and how to prevent them.

      Read committed

    • no dirty read : read only committed

      • when updating multiple objects, keep sync/consistency
      • the same result even after rollback
    • no dirty write : overwrite only committed

      • with row-level locks
      • when transaction is ongoing, others see old value.

      Snapshot Isolation and Repeatable Read

    • Repeatable Read

      • Nonrepeatable read(read skew)
        • The data read at the end of the transaction is different with previous query
    • Snapshot Isolation

      • each transaction reads from a consistent snapshot of database

        (see only the old data)

      • Readers never block writers, writers never block readers

      • multi-version concurrency control(MVCC)

        • maintaining several version of an object side by side
        • data is tagged with its transaction id(txid)
      • visibility rules

        At the start of the transaction, make a list of proceeding transactions.

        1. Any write of above transactions are ignored
        2. Any writes made by aborted transactions are ignored.
        3. Any writes made by transactions with later transaction ID are ignored

        Preventing lost updates

      • client1: read → update

        client2: read - > update

        client1's update is lost

      • Prevent by locking

        • explicit lock
        • compare and set
        • conflict resolution and replication

        Write skews and Phantoms

      • write skew

        • Transactions update two different objects concurrently and there occurs conflict.

        • ex. at least 2 doctors should be on_call.

          • Bob changes to on_cal=false, Alice changes to on_cal=false, concurrently.
          • Now less than 2 doctors are on call.
          • Two different objects(record of Bob, record of Alice) are updated
        • restricted solution

          serialization, or explicit lock on rows

      • phantom

        • a write in one transaction changes the result of a search query in another tranascation

        • ex. select → condition → update

          between select and update, something changed.

        • SELECT FOR UPDATE may not help when SELECT returns 0 rows.

        • materializing conflict : create a table of all possible combinations, and put lock there.

          Serializability

        • the end result is the same as if they had executed one at a time, serially without any concurrency

          Two-phase Locking

        • read : shared lock

        • write : exclusive lock

        • first phase : when the locks are acquired,

          second phase : when all the locks are released

        • predicate lock

          • lock that belongs to all objects that match some search condition
          • applies even to objects that do not yet exist
        • index-range locking(next-key locking)

          • predicate lock has not good performance.
          • lock on index

          Serializable Snapshot Isolation

        • pessimistic concurrency control

          • if anything may go wrong, better wait
        • optimistic concurrency control

          • transaction continues. If anything goes wrong, abort
          • old idea
          • tend to be better than pessimistic control
        • Serializable Snapshot Isolation is promising.

          • optimistic approach
          • allows transactions without blocking

'Books > DDIA' 카테고리의 다른 글

6. Partition  (0) 2021.01.08
5. Replication  (0) 2021.01.07

6. Partition

Partitioning of key-value data

  • key-range partitioning
    • partitioned with range
    • good for range-search
    • rebalancing by spliting into two subranges.
  • hash partitioning
    • apply hash function. might add additional random letters at the end/begining.
      • common to create fixed number of partition in advance

Partitioning and Secondary index

  • Document-partitioned index(local index)

    • secondary index is stored at the same partition with primary key and value
    • one write requires write to one partition.
    • one read may require reads from multiple partitions
  • Term-partitioned index(global index)

    • secondary indices are partitioned too. Based on term

    • one write may require write to multiple partitions

    • one read only require read from one partition.

      Rebalancing Partitions

      Strategies for rebalancing

    • fixed number of partitions

      • create more partitions than the number of nodes and spread it when node is added.
      • Number of partitions does not change. But the number of partition per nodes changes.
    • dynamic partitioning

      • Similar with the top level of B-tree
      • Suitable for key range partitioned data, but hash-partitioned can also use it.
    • Partitioning proportionally to nodes.

      • When new node joins the cluster, randomly choose the node to split and half of the data are moved.

      Automatic? Manual?

    • due to the overhead and risk of rebalancing, manual commit should join the process

      Request Routing

      3 ways

    • client contact any node → if it does not have right partition, forward.

    • client contact to routing tier like zookeeper

    • client be aware of the partitioning

      Common issue

    • how to make routing decisions?

    • hard problem.

    • in case of Zookeeper,

      • each node registers itself in Zookeeper.
      • Zookeeper maintains the authoritative mapping of partitions to nodes. Whenever t

'Books > DDIA' 카테고리의 다른 글

7. Transactions  (0) 2021.01.09
5. Replication  (0) 2021.01.07

5. Replication

Leaders and Followers

  • synchronous replication

  • asynchronous replication

  • setting up new followers

    • take consistent snapshot of leader → pass to follower
    • follower connects to the leader and requests the changes after the snapshot
  • leader failure: failover

    • HOW-TO

      • declare leader has failed(usually with timeout)
      • choose new leader: best candidate - node with latest data changes.
      • configure system with new leader. Let previous leader know he is not the leader anymore
    • problems

      • loss of data from previous leader
      • synchronizing with different storage system
      • two leaders at the same time
      • appropriate timeout for the declaration of leader's death
    • Implementation of replication log

      • statement-based
      • WAL(Write Ahead Log)
        • PostgreSQL, Oracle
        • low level - storage engine dependent
      • Logical log replication(row-based)
        • sequence of records describing writes to DB in row-level
          • insert : new value of column
          • delete : unique identifier of row
          • update : unique identifier, and new value
      • trigger based
        • big overhead
    • Problems of replication log

      • Read After Write Consistency
      • Monotonic Read : Do not read older value than previous read
      • Consistent Prefix Read : data read should appear in the same sequence with writes
      MultiLeader replication
    • Usecases

      • multi-datacenter
      • clients with offline operation
      • collaborative editing ex. google docs
    • Handling write conflicts

      • avoiding conflict

        • certain user writes to certain datacenter → not always the case
      • converging toward a consistent state

        • apply already codede script
        • read both values
        • automatifc conflict resolution
          • conflict-free replicated datatype
          • mergeable persistent data structure: three-way merge like git
          • operational transformation : algorithm
      • topology

        • circular : each write has tag

        • star

          • circular and star may fail if only one node fails.
        • all-to-all : result may be over taken

          Leaderless Replication
        • with coordinator node / or client write to several nodes

        • writing to DB when a node is down

          • write to/read from several nodes
          • read repair : when stale response is found, client writes back
          • anti-entropy process : on background, compare with other nodes
        • quorums for reading and writing

          • w + r > n
          • sloppy quorum: w and r may have nodes that are not included in n home nodes
          • hinted handoff: any writes that was temporarily accepted on behalf of another node are sent to the appropriate home node
        • detecting concurrent writes

          • not so much in database level. application developer should understand DB and handle them.
          Issues
        • last write wins

          • ambiguity of "recent"
          • only safe way of LWW is to use immutable, unique key like uuid
        • happens-before relationship

          • server and clients hold version number for every key.
          • when client reads, compare the version number with the previous read.
          • when server writes, write with new version number
        • version vector : collection of version number from multiple nodes

'Books > DDIA' 카테고리의 다른 글

7. Transactions  (0) 2021.01.09
6. Partition  (0) 2021.01.08

+ Recent posts