Notes on scaling heavy concurrent writes in real time
Here is the quick notes from the session scaling heavy concurrent writes in real time by Dathan Pattishall. Its bad that he left Flickr i.e. Yahoo. Hopefully they will find a replacement, if not contact me I have few people who are interested.
- Who am I
- since 1999 working on mysql
- scaled many companies (FriendFinder, Friendster, Flickr, now RockYou)
- Favorites federation, partitioning, shards, RAID-10
- Requirements
- scale better
- store data forever
- associate time with the data
- allow for change
- keep it cheap
- and downtime is not an option
- Spread the data around
- federate
- all referrers are owned by the page owner
- spread data out by that
- but federate in a different direction
- add a new column to the global account lookup
- Did not work
- strings as PK was not good
- inserts slowed when the table grew larger than memory
- not enough I/O to handle double the load
- Start over
- converted URL into a 64-bit ID - conv(substr(md5(url),1,16),16,10) (Dathan had a typo in the function in his slides, thanks to Patrick who tried it when the session is running)
- 64-bit number, its unique
- reduced PK size to 8 bytes + owner, object, object-type
- InnoDB and strings
- index on string takes a lot of space
- each index has its own 16K page
- fragmentation across pages was hurting, chewing up I/O
- InnoDB and High Concurrency of string writes
- requirement: 200 ms for total db access for all apps
- writes gets slowed down when you cross buffer_pool_size
- 10 ms to 20 secs sometimes for full transaction
- replication keeping up
- Buffer the writes
- java daemon that buffers up to 4000 messages grouped to a transaction and apply serially
- does not use much memory or CPU
- written by Zen Java master
- Reduce the use of big strings
- Keep smaller amounts of data
- Use MyISAM for non-pro users and keep only X weeks of data
- MyISAM keeps 1/6th the size of InnoDB
- Migrate the data when they migrate
- Distributed locks
- GET_LOCK and RELEASE_LOCK on the same server for which a user operates on
2 Comments
Doh TYPO! Thanks for that.
Comment :: April 21, 2008 @ 11:28 am
[…] Venu Anuganti channels Dathan Pattishall’s talk on scaling heavy concurrent writes in real time. […]
Pingback :: April 22, 2008 @ 4:40 am