Transaction Isolation Levels in Databases

Transaction isolation levels define the degree to which a transaction must be isolated from modifications made by other transactions in a database system.

They’re important to understand in the context of concurrent transactions.

There are 4 isolation levels. Each solves 1 related problem + the problems of all previous levels:

#Isolation levelProblem to be solvedProblem description
1Read UncommittedLost UpdateOnly the last of the concurrent transactions affects the read data. The impact of other transactions is lost
2Read CommittedDirty ReadThe read data was changed by a transaction, that was then rolled back
3Repeatable ReadNon-repeatable ReadSecond read of the same data gives a result that is not equal to the first read, because the data was changed by another transaction between reads
4SerializationPhantom ReadSecond data select by the same parameters is different from the first one, because the data was changed by another transaction between reads

1. Read Uncommitted: The Least Restrictive Level

The Read Uncommitted isolation level is the most permissive in terms of transaction isolation. Transactions operating at this level can read data that has been modified by other transactions but not yet committed. This allows for maximum concurrency but comes with significant risks:

  • Dirty Reads: The most notable issue is the possibility of “dirty reads.” A dirty read occurs when a transaction reads data written by a concurrent uncommitted transaction, which might later be rolled back. This means the reading transaction may obtain incorrect or non-existent data, leading to data inconsistencies.
  • Applicability: Despite its risks, Read Uncommitted is useful in scenarios where absolute accuracy is not critical, and performance is a paramount concern, like in certain types of data analytics operations.

2. Read Committed: Balancing Act

Read Committed is a more conservative isolation level. It addresses the problem of dirty reads by ensuring that a transaction can only read data that has been committed by other transactions.

  • Non-repeatable Reads: However, it’s still susceptible to non-repeatable reads. This phenomenon occurs when a transaction reads the same row twice and finds different data each time because another transaction has modified and committed the data in the meantime.
  • Use Case: This level is often the default setting in many database systems because it strikes a balance between data integrity and performance, making it suitable for a wide range of applications.

3. Repeatable Read: Enhanced Consistency

The Repeatable Read isolation level ensures that any rows read during a transaction are locked until the transaction is completed. This means if a row is read twice in the same transaction, the data will be the same each time.

  • Phantom Reads: However, it is still vulnerable to phantom reads, where a transaction may experience new rows added by other transactions between its own reads.
  • Scenario Fit: This level is well-suited for applications where consistency in repeated reads is crucial, but complete isolation is not required.

4. Serializable: The Most Strict Level

Serializable is the highest isolation level. It ensures complete isolation of transactions, making each transaction appear as if it is the only one interacting with the database.

  • Complete Isolation: At this level, transactions are completely protected from dirty reads, non-repeatable reads, and phantom reads.
  • Performance Trade-off: The downside is a significant reduction in concurrency, potentially leading to performance bottlenecks, as transactions may have to wait for others to complete to maintain isolation.
  • Ideal Use: This level is best used in scenarios where data integrity is paramount and cannot be compromised, such as financial transactions.

A Real-World Example

The best way to understand how these isolation levels come into play is through real-world examples.

Imagine a simple banking system where customers can check their account balance, deposit, and withdraw money. We’ll focus on two customers, Alice and Bob, who are performing transactions concurrently on the same account.

Initial Condition:

  • Account Balance: $1000

1. Read Uncommitted

  • Alice’s Transaction: Begins a transaction to withdraw $200.
  • Bob’s Transaction: Concurrently, Bob checks the account balance.
  • Issue: With Read Uncommitted, Bob might see the account balance as $800 even though Alice’s transaction hasn’t been committed yet. If Alice’s transaction fails or is rolled back, Bob will have seen incorrect data (dirty read).

2. Read Committed

  • Alice’s Transaction: Again, starts withdrawing $200.
  • Bob’s Transaction: Checks the balance after Alice’s transaction is committed.
  • Outcome: Bob sees the correct balance of $800. However, if Bob had checked the balance twice during his transaction, before and after Alice’s withdrawal, he would see two different balances ($1000 and then $800), which is a non-repeatable read.

3. Repeatable Read

  • Alice’s Transaction: Attempts to withdraw $200.
  • Bob’s Transaction: Starts a transaction to check the balance twice during the same transaction period.
  • Outcome: Regardless of when Alice’s transaction is committed, Bob will see the same balance ($1000) in both checks within his transaction. However, if a new deposit (say $300) were made into the account by a third party during Bob’s transaction, Bob wouldn’t see this change, leading to a phantom read.

4. Serializable

  • Alice’s Transaction: Begins to withdraw $200.
  • Bob’s Transaction: Tries to check the balance.
  • Outcome: Bob’s transaction will be blocked until Alice’s transaction is completed. This ensures complete isolation, preventing dirty reads, non-repeatable reads, and phantom reads. The downside is, if many users are transacting simultaneously, this could lead to significant delays.

Conclusion

In our banking example:

  • Read Uncommitted can lead to seeing uncommitted or even potentially rolled-back transactions.
  • Read Committed provides a more accurate view but can show different data if the same data is read multiple times in a transaction.
  • Repeatable Read ensures consistency within a transaction but might miss concurrent changes.
  • Serializable provides the highest level of data integrity but at the cost of concurrency, potentially leading to performance issues.

Each isolation level offers a different trade-off between accuracy of the data seen during transactions and the performance due to concurrency. The choice depends on the specific needs of the application – in banking, higher isolation is usually preferred to ensure data integrity.

Real-World Example #2

Imagine an online system where customers can select and book seats for a movie. We’ll focus on two customers, Emily and John, who are trying to book the last available seat for a popular movie at the same time.

Initial Condition:

  • Available Seats: 1 seat remaining (Seat A10)

1. Read Uncommitted

  • Emily’s Transaction: Begins a transaction to book Seat A10.
  • John’s Transaction: Concurrently, checks available seats.
  • Issue: With Read Uncommitted, John might see no seats available as Emily’s transaction is in progress, even though Emily hasn’t completed the booking yet. If Emily’s transaction fails or is rolled back, John will have seen a misleading “sold out” status.

2. Read Committed

  • Emily’s Transaction: Attempts to book Seat A10 again.
  • John’s Transaction: Checks for available seats after Emily’s transaction is committed.
  • Outcome: John sees the correct “sold out” status only after Emily successfully books the seat. However, if John had checked the availability twice during his transaction, before and after Emily’s booking, he would see different statuses (1 seat available and then sold out), leading to a non-repeatable read.

3. Repeatable Read

  • Emily’s Transaction: Tries to book Seat A10.
  • John’s Transaction: Starts a transaction to check seat availability twice.
  • Outcome: John sees the seat as available in both checks within his transaction, despite Emily booking it concurrently. This can lead to a situation where John tries to book the already booked seat, leading to a conflict or error later on (a phantom read issue).

4. Serializable

  • Emily’s Transaction: Begins to book Seat A10.
  • John’s Transaction: Tries to check the seat availability.
  • Outcome: John’s transaction will wait until Emily’s transaction is completed. This ensures complete isolation and prevents John from seeing the seat as available when it’s actually being booked by Emily. However, this can slow down the system significantly if many users are trying to book tickets simultaneously.

Conclusion

In this movie theater ticket booking scenario:

  • Read Uncommitted could falsely indicate no seats available while a booking is in progress.
  • Read Committed ensures more accurate information but can still show fluctuating seat availability.
  • Repeatable Read maintains consistent view within a transaction but risks missing real-time updates.
  • Serializable ensures that users see the most accurate availability but might slow down the booking process due to sequential processing of transactions.

Databases are crucial to understand deeply.


SWE Quiz is the perfect way to test yourself and fill in any gaps in your software knowledge.

Test Your Databases Knowledge

Topics