Here is the quick notes from the session scaling heavy concurrent writes in real time by Dathan Pattishall. Its bad that he left Flickr i.e. Yahoo. Hopefully they will find a replacement, if not contact me I have few people who are interested.
- Who am I
- since 1999 working on mysql
- scaled many companies (FriendFinder, Friendster, Flickr, now RockYou)
- Favorites federation, partitioning, shards, RAID-10
- Requirements
- scale better
- store data forever
- associate time with the data
- allow for change
- keep it cheap
- and downtime is not an option
- Spread the data around
- federate
- all referrers are owned by the page owner
- spread data out by that
- but federate in a different direction
- add a new column to the global account lookup
- Did not work
- strings as PK was not good
- inserts slowed when the table grew larger than memory
- not enough I/O to handle double the load
- Start over
- converted URL into a 64-bit ID – conv(substr(md5(url),1,16),16,10) (Dathan had a typo in the function in his slides, thanks to Patrick who tried it when the session is running)
- 64-bit number, its unique
- reduced PK size to 8 bytes + owner, object, object-type
- InnoDB and strings
- index on string takes a lot of space
- each index has its own 16K page
- fragmentation across pages was hurting, chewing up I/O
- InnoDB and High Concurrency of string writes
- requirement: 200 ms for total db access for all apps
- writes gets slowed down when you cross buffer_pool_size
- 10 ms to 20 secs sometimes for full transaction
- replication keeping up
- Buffer the writes
- java daemon that buffers up to 4000 messages grouped to a transaction and apply serially
- does not use much memory or CPU
- written by Zen Java master
- Reduce the use of big strings
- Keep smaller amounts of data
- Use MyISAM for non-pro users and keep only X weeks of data
- MyISAM keeps 1/6th the size of InnoDB
- Migrate the data when they migrate
- Distributed locks
- GET_LOCK and RELEASE_LOCK on the same server for which a user operates on
Doh TYPO! Thanks for that.
[…] Venu Anuganti channels Dathan Pattishall’s talk on scaling heavy concurrent writes in real time. […]