From Jeff Rothschild, VP of technology, Facebook

  1. The power of connectedness
  2. Impact on the database
  3. Our challenge 
  • The power of connectedness
    • photo tagging
    • getting an email that someone tagged a photo in Facebook
    • inter tagging between friends and it continues as a network
    • outstanding growth because of the photo tagging
    • 26B photos in archive now
    • Most trafficked photos applications than anybody else
    • events, invites impact on social graph
    • Opening up the social graph to outside and create a platform and API, so other developers can make use of it (28K applications so far)
    • Hypergrowth because of the platform (1M users in four days)
    • 200 applications that has over million users
    • circle of connectivity can make the globe (with positive feedback, performance etc)
    • the social graph links everything
    • and that means lot of work for database
  • Impact on the database
    • user -> connections -> objects
    • interact with people, people has lot of data
    • 100 friends, 100 friends will have 100’s of objects/friend…10’s of thousands of possible objects
    • Horizontal partitioning supports parallel queries
    • 50K+ requests per second now
    • memcached can match speeds with the application
    • multi-threading with multi-core to take advantage with 1G cards
    • New C client which enhances run-time performance for PHP
    • New binary protocol for efficiency both ends (app and memcached)
    • Introduced memcached proxy to purge dirty records
    • MySQL replication to replicate data between data centers
    • Proxy is not a solution here due to replication
    • Changed Insert statements with MEMCACHE_DIRTY key1, keys …
    • Put more RAM in cache (95% hit rate is in cache)
    • 20M requests and only 500K requests goes to MySQL server
  • Optimizations
    • memory + servers (reduce to save $$$)
    • Tuning (moderate query scope, aggregate related data)
    • Partitioning (selective caching, archiving stale data)
  • Our Challenge
    • MySQL and memcached is a powerful combination
    • MySQL at the speed of memcached
    • memcache performance for simple tables
    • flash memory storage
    • dramatic improvement in I/O
    • supports higher miss rate
    • allows higher data/ram ratio
    • persistent “Cache”
      • flash for memcache
      • persistent distributed hash table could be a good fit
    • structured storage
      • not all data benefits
      • integrated cache and persistent replication
      • unmanaged distribution
    • A strong community is our challenge