On Database Transactions

Transaction: Collections of operations that form a single logical unit of work.

Database-side properties:
Atomicity => Either all actions of a transaction are executed completely or none (indivisible: do-all or rollback-all).
Durability => Once a transaction is successfully executed, its effects must persist in the database (no loss of data).
Isolation => transactions  should be isolated from the effects of other concurrently executing transactions (Serializability / swaping).

Transaction-itself property (programmer responsibility) (beyond integrity):
Consistency => If a transaction is run atomically in isolation starting from a consistent database, the
database must again be consistent at the end of the transaction.

 Transaction states:
    Active, the initial state; the transaction stays in this state while it is executing.
    Partially committed, after the final statement has been executed.
    Failed, after the discovery that normal execution can no longer proceed.
    Aborted, after the transaction has been rolled back and the database has been restored to its state prior to the start of the transaction. Then, the system has two options: Restart or Kill the transaction (be cautious when dealing with "observable external writes")
    Committed, after successful completion.

 Isolation
Let us consider a schedule S in which there are two consecutive instructions, I and J , of transactions Ti and Tj, respectively.
  •  I and J  on conflict: if they operate on the same data item, and at least one of these instructions is a write operation.
  • If a schedule S 1 can be transformed into a schedule S 2 by a series of swaps of nonconflicting instructions, we say that S 1 and S 2 are conflict equivalent. 
  • We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule.
  • A schedule S can be depicted as precedence graph (google it, if any).
  • A schedule S is conflict serializable <=> precedence graph has no cycle
  • Recoverable schedule: for each pair of transactions Ti and Tj such that Tj reads a data item previously written by Ti, the commit operation of Ti appears before the commit operation of Tj.
  • Cascade-less schedule : for each pair of transactions Ti and Tj such that Tj reads a data item previously written by Ti, the commit operation of Ti appears before the read operation of Tj  (Thus, Cascade-less schedule  is Recoverable)
DBMS concurrency-control should provide a high degree of concurrency, while ensuring that all schedules that can be generated are conflict serializable, recoverable, and cascade-less.

 Locking VS isolation level:
Locking prevent concurrent access / modification (e.g Optimistic / pessimistic).
Isolation is about read consistency:

Isolation level:
Given a transaction T, programmers may specify one of the following isolation levels, so that T will be more or less serializable with respect to other transactions:
  • Serializable: obviously, no concurrent execution + strong consistency.
  • Repeatable read: less serializable. Allows only committed data to be read + lock on read (against write)(thus, the transaction will always read the same committed value)
  • Read committed (default value on most DBMS): more concurrent. Allows only committed data to be read + no lock on read (e.g.: between two reads of a data item by the transaction, another transaction may have updated the data item and committed)
  • Read uncommitted: most concurrent execution + weak consistency. Allows uncommitted data to be read
All the isolation levels disallow dirty writes: writes to a data item that has already been written by another transaction that has not yet committed or aborted.

Serializable level implementation (possibly used on Oracle, PostgreSQL) called snapshot isolation:
Each transaction is given its own version of the database when it begins. It reads/writes data from/to this private version. Transaction updates are pushed to database once committed.


From "Ask Tom": Serializable level:  each transaction operates in an environment that makes it appear as if there are no other users modifying data in the database, the database will be "frozen" at the point in time your query began. Side effects (changes) made by other transactions are not visible to it, regardless of how long it has been running.

Lock modes
Shared: allows multiple users to read data, but do not allow any users to write on that data.
Update: allows only one user to read data. No other user can read or write on it.
Exclusive: allows only one user to write on a data. When one user has an exclusive lock on a data, no other lock of any type may be placed on it.


Experiments (Oracle)

Oracle only supports 3 transaction levels: Read Committed, Serializable (and Read-only?). In Oracle, Read Committed is the default and it uses Snapshot semantics.

Schema definition: one table named Customers








Serializable isolation level:
  • Run concurrent oracle session S1 and S2 (e.g. parallel sqlplus terminals)
  • Deactivate autocommit on both sessions
  • Set "Serializable" level:  ALTER SESSION SET ISOLATION_LEVEL = SERIALIZABLE;
  • Insert a new row using S1, le'ts call it R1, without commit 
  • Go to S2, and try to read table rows (select ...), obviously you won't see R1.
  • Being in S2, insert a new row , R2, and commit this time.
  • Go back to S1. If you run a "select * from customers", you won't see R2 even if it's already commited during S2  (this is exactly where the Serializable level manifest itself)
  • Now, commit the R1 transaction in S1.
  • Now you will see R1 and R2 in both sessions, when you select data.
Mix Serializable & Read Committed levels:
  • Run concurrent oracle session S1 and S2 (e.g. parallel sqlplus terminals)
  • Deactivate autocommit on both sessions 
  • Set "Read committed" level in S1:  ALTER SESSION SET ISOLATION_LEVEL = READ COMMITTED;
  • Set "Serializable" level in S2:  ALTER SESSION SET ISOLATION_LEVEL = SERIALIZABLE;
  • Insert a new row using S1, le'ts call it R1, without commit 
  • Go to S2, and try to read table rows (select ...), obviously you won't see R1.
  • Being in S2, insert a new row , R2, and commit this time.
  • Go back to S1. If you run a "select * from customers", you will see R2 (this is exactly where the READ COMITTED level manifest itself in S1)
  • Now, commit the R1 transaction in S1.
  • Now you will see R1 and R2 in both sessions, when you select data.
 

No comments: