From Jeff Rothschild, VP of technology, Facebook
- The power of connectedness
- Impact on the database
- Our challengeÂ
- The power of connectedness
- photo tagging
- getting an email that someone tagged a photo in Facebook
- inter tagging between friends and it continues as a network
- outstanding growth because of the photo tagging
- 26B photos in archive now
- Most trafficked photos applications than anybody else
- events, invites impact on social graph
- Opening up the social graph to outside and create a platform and API, so other developers can make use of it (28K applications so far)
- Hypergrowth because of the platform (1M users in four days)
- 200 applications that has over million users
- circle of connectivity can make the globe (with positive feedback, performance etc)
- the social graph links everything
- and that means lot of work for database
- Impact on the database
- user -> connections -> objects
- interact with people, people has lot of data
- 100 friends, 100 friends will have 100’s of objects/friend…10’s of thousands of possible objects
- Horizontal partitioning supports parallel queries
- 50K+ requests per second now
- memcached can match speeds with the application
- multi-threading with multi-core to take advantage with 1G cards
- New C client which enhances run-time performance for PHP
- New binary protocol for efficiency both ends (app and memcached)
- Introduced memcached proxy to purge dirty records
- MySQL replication to replicate data between data centers
- Proxy is not a solution here due to replication
- Changed Insert statements with MEMCACHE_DIRTY key1, keys …
- Put more RAM in cache (95% hit rate is in cache)
- 20M requests and only 500K requests goes to MySQL server
- Optimizations
- memory + servers (reduce to save $$$)
- Tuning (moderate query scope, aggregate related data)
- Partitioning (selective caching, archiving stale data)
- Our Challenge
- MySQL and memcached is a powerful combination
- MySQL at the speed of memcached
- memcache performance for simple tables
- flash memory storage
- dramatic improvement in I/O
- supports higher miss rate
- allows higher data/ram ratio
- persistent “Cache”
- flash for memcache
- persistent distributed hash table could be a good fit
- structured storage
- not all data benefits
- integrated cache and persistent replication
- unmanaged distribution
- A strong community is our challenge