Goal
Ensure that a column (c1) in a table (t1) has unique consecutive integer values, independently of rollbacks, simultaneous commits and so on.
Context
I use mysql innodb storage to benefit from transactions and ensure integrity and consistency of data.
Main advantage of transactions is that you can revert them and restore data to their previous content. Can you revert anything? The answer is no. There is one element that cannot be reverted is the value of auto_increment fields.
And it’s absolutely normal. Let’s say you have two transactions, both inserting a new row, but first transaction is rolled back while second transaction is commited. You cannot change already allocated id on transaction 2 and say “sorry guy, but the other transaction was cancelled so you’re no longer row #34 but row #33″. This would be terrible especially if transaction 2 contained multiple statements such as inserting dependent rows on other table based on changing id.
Anyway, this topic has been tackled on mysql documentation page and the answer is simply that’s not possible : http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html (cf. section 13.6.4.3.2. Configurable InnoDB Auto-Increment Locking).
What have we learnt so far?
=> we’ve learnt that we won’t be able to use AUTO_INCREMENT feature if we want to be sure of having consecutive values (and innodb_autoinc_lock_mode config option is helpless on this matter).
=> we will need to increment our value ourself
Where to look now? Innodb locking mechanism.
Locking mechanism
Locking mechanism is where we find our solution.
Default behavior, regarding transactions, is to allow multiple transactions to be run at the same time (‘concurrent’, necessary to improve execution speed and avoid bottlenecks) but have one transaction ignore what another transaction is doing until it has committed its work.
Let’s see an example here:
mysql> CREATE TABLE t1 (c1 INT) ENGINE = InnoDB;
Query OK, 0 rows affected (0.00 sec)
#from connection 1:
mysql> START TRANSACTION ;
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO t1 SET c1 = 1;
Query OK, 1 row affected (0.00 sec)
#open another connection (connection 2) and run:
mysql> SELECT * FROM t1 ;
Empty set (0.00 sec)
#from connection 1:
mysql> COMMIT;
Query OK, 0 rows affected (0.01 sec)
#from connection 2:
mysql> SELECT * FROM t1;
+------+
| c1 |
+------+
| 1 |
+------+
1 row in set (0.00 sec)
What does it mean for us? It means that by default we cannot run a simple query to fetch all inserted rows and increment highest value by one, because another transaction could end-up doing the same thing and we would endup with key duplicates.
example:
#from connection 1, execute:
mysql > truncate t1;
insert into t1 set c1 = 1;
start transaction;
select max(c1) from t1;
+-----------+
| max(c1) |
+-----------+
| 1 |
+-----------+
mysql > insert into t1 set c1 = 2;
#from connection 2, execute:
mysql > start transaction;
select max(t1) from c1;
+-----------+
| max(c1) |
+-----------+
| 1 |
+-----------+
mysql > insert into t1 set c1 = 2;
commit;
#commit connection 1 too:
mysql > commit;
select * from t1;
+------+
| c1 |
+------+
| 1 |
| 2 |
| 2 |
+------+
How to we avoid this? We simply ask innodb to process transactions in a FIFO basis (ie. not concurrently) so that each transaction must wait for the previous one to finish (rollback or commit) for it to simply access t1.c1 data for either reading, updating or deleting.
How to we do this? prior to our insert query, we will run a select query on our column of interest t1.c1, ending it with “FOR UPDATE” statement. This statement request “intention of exclusive lock” (IX) that will trigger above mentionned mechanism.
Let’s see it in action.
mysql > truncate t1;
insert into t1 set c1 = 1;
start transaction;
select max(c1) from t1 for update;
+-----------+
| max(c1) |
+-----------+
| 1 |
+-----------+
mysql > insert into t1 set c1 = 2;
#from connection 2, execute:
mysql > start transaction;
select max(c1) from t1 for update;
#note: here, mysql hangs, simply waiting for other transaction to be committed
# => we go back to connection 1 and commit it
# from connection 1:
mysql > commit;
#we go back to connection 2 and we see that our select statement got executed.
+-----------+
| max(c1) |
+-----------+
| 2 |
+-----------+
# now we can insert our increased value based on trustful result we had.
mysql > insert into t1 set c1 = 3;
commit;
select * from t1;
+------+
| c1 |
+------+
| 1 |
| 2 |
| 3 |
+------+
What just happened? This time, thanks to ‘for update’ statement, connection 2 was forced to wait for connection 1 to be completed, for its select query to get executed. As such, returned max() value can be fully trusted => we can increment it and insert it safely. What if connection 1 had been rolledback? then select max(c1) would have returned 1 and we would have incremented it to 2.
What did we just learn? We learnt how to force innodb to handle transactions on fifo basis rather than concurrently. That way we’re sure to have the correct result returned by our select statement. Drawback is on performance side, but it’s quite limited. Our select statement (with “for update” expression) was limited on t1.c1, as such lock is placed on t1.c1 only and if another transactions want to access t1 data other than c1 column, it will be able to do it concurrently.
How does it help us? Well, it helped us better understand how it works, but honestly it did not help us automatically incrementing our column value. We still had to do it manually (ie. after having run our select statement, we manually looked at the value and incremented it by 1). How do we perform automatic incrementation then?
Automatic incrementation
Our final solution will do it all. The bad news is that it’s not 100% mysql, in my case I used php to automatically increment my value.
Workflow is as follow:
- in your php script, start a new transaction
- execute our select statement with “FOR UPDATE” flag
note: at this point, mysql will hang up until your transaction has full access to the table you’re interested into. Then script automatically waits, you have nothing to do, it will automatically resume itself
- retrieve our value from select statement and increment it by one
- execute your update query
- commit / rollback your changes (don’t forget to do it right after all required changes have been performed to minimize lock time on table)
That’s it.
note on http://bugs.mysql.com/bug.php?id=3575:
this bug report provided what I tought first as a solution but in fact is not appropriate (it’s solution rely on “INSERT … SELECT” syntax which is good to insert data that’s already somewhere on your db. While this can be interesting for education purposes, it did not interest me at all since I wanted to update existing data and unfortunately no “UPDATE … SELECT” equivalent syntax exists so far.)
sources