Here is the quick notes from the session Falcon from the beginning by Jim Starkey and Ann Harrison

  • Why Falcon
    • Hardware is evolving rapidly, world is changing, so taking advantage
    • Customers need ACID transactions
  • Where hardware is going
    • CPUS breed like rabbits (more sockets, cores, threads/core)
    • Memory is bigger, faster and cheaper
    • Disks are bigger and cheaper but not much faster
    • In general boxes are getting cheaper
  • Where applications are going
    • batch – dead
    • timesharing – dead
    • departmental computing – dead
    • client server – fading fast
    • application servers for most of us
    • web services for the really big buys
  • Database Challenges
    • Traditional challenge
    • exhaust CPU, memory and disk simultaneously
  • Tradeoffs
    • use memory to page cache to avoid disk reads
    • record cache to avoid page cache manipulation
    • use CPU to find the fastest path to record
    • use CPU to minimize record size
    • Synchronize most data structures with user mode read/write locks
    • Synchronize high contention data structures with interlocked instructions
  • Architecture
    • Incomplete in-memory db with disk backfill
    • Multi-version concurrency control in memory
    • Updates in memory until commit
    • Group commits to a single serial log write
    • post-commit multi-threaded pipe line to move updates to disk
  • Incomplete in-memory database
    • records cached in memory
    • separate cache for disk pages
    • record cache hits 15% the cost of a page cache hit
    • record cache is more memory efficient than page cache
  • Record Encoding – cache efficiency
    • records encoded by value, not declaration
    • string “abc” occupies the same space in varchar(3) or varchar(4096)
    • the number 7 is the same where small, medium, int, bigint, decimal or numeric
  • MVCC
    • update ops create new record versions
    • new one is tagged with id, points to old version
    • keep tracks which
  • Updates are in memory
    • held in memory pending commit
    • index changes held in memory
    • verb rollback is dirt cheap
    • trxs rollback is dirt cheap
  • At commit
    • pending record updates flushed to serial log
    • pending index updates flushed to serial log
    • commit record written to serial log
    • serial log flushed to the oxide
    • and trx is also committed
  • Memory is infinite, so
    • large txns chills uncommitted data (flushes it to the log early)
    • chilled records can be thawed
    • scavenger garbage collects unloved records periodically
    • when things get really had, entire record chains flushed to backlog
  • Weakness
    • transactions are ACID but not serializable
    • latency advantage disappears at saturation
    • very large transactions degrade performance
    • optimized for web, not batch
  • Strengths
    • runs like a memory db when data fits
    • scales like disk-based db when db doesn’t fit in cache
    • lowest possible latency for web apps
    • absorbs huge spiky loads
  • Performance
    • benchmark against InnoDB vs Falcon only
    • DBT2 benchmark (what about sysbench?)
    • High contention
    • Writes intensive – 40% records touched are updated
    • measures only performance at saturation
  • DBT2 is InnoDB’s best spot and Falcon’s worst, so do not take benchmark results, decide on what you want
  • When should you use what ?
    • don’t need ACID ? then MyISAM is good
    • single processor, small memory – InnoDB is good
    • large transactions, batch inserts/updates, InnoDB is good
    • multi cores, more memory, more threads , use Falcon
    • For web, Falcon is hard to beat