Here is the quick notes from the session Falcon from the beginning by Jim Starkey and Ann Harrison

  • Why Falcon
    • Hardware is evolving rapidly, world is changing, so taking advantage
    • Customers need ACID transactions
  • Where hardware is going
    • CPUS breed like rabbits (more sockets, cores, threads/core)
    • Memory is bigger, faster and cheaper
    • Disks are bigger and cheaper but not much faster
    • In general boxes are getting cheaper
  • Where applications are going
    • batch – dead
    • timesharing – dead
    • departmental computing – dead
    • client server – fading fast
    • application servers for most of us
    • web services for the really big buys
  • Database Challenges
    • Traditional challenge
    • exhaust CPU, memory and disk simultaneously
  • Tradeoffs
    • use memory to page cache to avoid disk reads
    • record cache to avoid page cache manipulation
    • use CPU to find the fastest path to record
    • use CPU to minimize record size
    • Synchronize most data structures with user mode read/write locks
    • Synchronize high contention data structures with interlocked instructions
  • Architecture
    • Incomplete in-memory db with disk backfill
    • Multi-version concurrency control in memory
    • Updates in memory until commit
    • Group commits to a single serial log write
    • post-commit multi-threaded pipe line to move updates to disk
  • Incomplete in-memory database
    • records cached in memory
    • separate cache for disk pages
    • record cache hits 15% the cost of a page cache hit
    • record cache is more memory efficient than page cache
  • Record Encoding – cache efficiency
    • records encoded by value, not declaration
    • string “abc” occupies the same space in varchar(3) or varchar(4096)
    • the number 7 is the same where small, medium, int, bigint, decimal or numeric
  • MVCC
    • update ops create new record versions
    • new one is tagged with id, points to old version
    • keep tracks which
  • Updates are in memory
    • held in memory pending commit
    • index changes held in memory
    • verb rollback is dirt cheap
    • trxs rollback is dirt cheap
  • At commit
    • pending record updates flushed to serial log
    • pending index updates flushed to serial log
    • commit record written to serial log
    • serial log flushed to the oxide
    • and trx is also committed
  • Memory is infinite, so
    • large txns chills uncommitted data (flushes it to the log early)
    • chilled records can be thawed
    • scavenger garbage collects unloved records periodically
    • when things get really had, entire record chains flushed to backlog
  • Weakness
    • transactions are ACID but not serializable
    • latency advantage disappears at saturation
    • very large transactions degrade performance
    • optimized for web, not batch
  • Strengths
    • runs like a memory db when data fits
    • scales like disk-based db when db doesn’t fit in cache
    • lowest possible latency for web apps
    • absorbs huge spiky loads
  • Performance
    • benchmark against InnoDB vs Falcon only
    • DBT2 benchmark (what about sysbench?)
    • High contention
    • Writes intensive – 40% records touched are updated
    • measures only performance at saturation
  • DBT2 is InnoDB’s best spot and Falcon’s worst, so do not take benchmark results, decide on what you want
  • When should you use what ?
    • don’t need ACID ? then MyISAM is good
    • single processor, small memory – InnoDB is good
    • large transactions, batch inserts/updates, InnoDB is good
    • multi cores, more memory, more threads , use Falcon
    • For web, Falcon is hard to beat

Related posts:

  1. MySQL 5.2 – Falcon DB Locking issue
  2. Notes from Architecture of Maria Storage Engine
  3. Notes on InnoDB Scale on servers with many cores
  4. Notes from InnoDB status, architecture and new features