High-Throughput Caching Strategies: Scaling MySQL for Ruby Application APIs
Benchmarking Baseline MySQL Performance
Before implementing any caching, a thorough understanding of the baseline performance is critical. This involves simulating realistic read loads against your MySQL database to identify bottlenecks. We’ll use sysbench for this purpose, focusing on its oltp-read-only workload.
Ensure you have sysbench installed on a separate machine from your database server to avoid resource contention. The test database should mirror your production schema and data volume as closely as possible. We’ll configure sysbench to use 16 threads, simulating a moderate concurrent read load.
Configuring Sysbench for Read-Only Load Testing
First, prepare the test database. This involves creating a schema and populating it with data. For a typical API scenario, a table with indexed columns that are frequently queried is representative.
Schema Preparation
Let’s assume a simple `products` table, common in e-commerce APIs.
SQL Schema Definition
CREATE TABLE products (
id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
description TEXT,
price DECIMAL(10, 2) NOT NULL,
category_id INT UNSIGNED,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
KEY idx_category_id (category_id)
);
Sysbench OLTP-Read-Only Preparation
The following command prepares sysbench with 1 million rows in the `products` table. Adjust the `–tables` and `–table-size` parameters based on your production data scale.
Sysbench Preparation Command
sysbench oltp_read_only --db-driver=mysql \ --mysql-host=your_mysql_host \ --mysql-port=3306 \ --mysql-user=sysbench_user \ --mysql-password=sysbench_password \ --mysql-db=api_test_db \ --tables=1 \ --table-size=1000000 \ --threads=16 \ prepare
After preparation, run the read-only test for 60 seconds. Monitor the output for transactions per second (TPS) and query latency.
Sysbench Read-Only Test Command
sysbench oltp_read_only --db-driver=mysql \ --mysql-host=your_mysql_host \ --mysql-port=3306 \ --mysql-user=sysbench_user \ --mysql-password=sysbench_password \ --mysql-db=api_test_db \ --tables=1 \ --table-size=1000000 \ --threads=16 \ --time=60 \ run
Note down the average TPS and the 95th percentile latency. This is your baseline. For instance, if you observe low TPS and high latency, it indicates a database bottleneck under read load.
Implementing In-Memory Caching with Redis
Redis is an excellent choice for caching frequently accessed, relatively static data. For API endpoints that retrieve product details or lists, caching these responses can dramatically reduce database load.
Redis Setup and Configuration
A basic Redis setup is sufficient for caching. Ensure it’s accessible from your application servers. For production, consider Redis Sentinel for high availability or Redis Cluster for sharding.
Ruby Application Integration (Rails Example)
We’ll use the redis-rb gem to interact with Redis. The strategy involves checking Redis for cached data before querying MySQL. If data is found, return it; otherwise, fetch from MySQL, cache it in Redis, and then return it.
Gemfile Addition
# Gemfile gem 'redis'
Redis Client Initialization
# config/initializers/redis.rb $redis = Redis.new(host: 'your_redis_host', port: 6379, db: 0)
Caching Product Index Endpoint
# app/controllers/products_controller.rb
class ProductsController < ApplicationController
def index
cache_key = "products:all"
cached_products = $redis.get(cache_key)
if cached_products
render json: JSON.parse(cached_products)
else
@products = Product.all # Your ActiveRecord query
# Cache for 1 hour (3600 seconds)
$redis.setex(cache_key, 3600, @products.to_json)
render json: @products
end
end
def show
product_id = params[:id]
cache_key = "products:#{product_id}"
cached_product = $redis.get(cache_key)
if cached_product
render json: JSON.parse(cached_product)
else
@product = Product.find(product_id) # Your ActiveRecord query
# Cache for 1 hour (3600 seconds)
$redis.setex(cache_key, 3600, @product.to_json)
render json: @product
end
end
end
Benchmarking with Redis Caching
Rerun the sysbench test against your MySQL instance. You should observe a significant increase in TPS and a decrease in latency, especially for queries that are now served by Redis. The key is that the sysbench test still hits MySQL, but in a real application, Redis would intercept these reads.
To accurately measure the impact of Redis, you’d ideally instrument your application to track cache hit/miss rates and measure end-to-end API response times. However, observing reduced MySQL CPU/IO utilization during peak loads is a strong indicator of caching effectiveness.
Advanced Caching: Query Result Caching with Memcached
While Redis excels at object caching, Memcached is often preferred for raw query result caching due to its simplicity and performance characteristics for this specific use case. It’s a pure in-memory key-value store optimized for speed.
Memcached Setup
Similar to Redis, Memcached requires a running instance accessible by your application servers. For high availability, consider running multiple Memcached instances and using a consistent hashing algorithm in your application to distribute keys.
Ruby Application Integration (Rails Example)
We’ll use the dalli gem for Memcached integration.
Gemfile Addition
# Gemfile gem 'dalli'
Memcached Client Initialization
# config/initializers/memcached.rb
MEMCACHED_SERVERS = ['your_memcached_host:11211']
$memcached = Dalli::Client.new(MEMCACHED_SERVERS, {
namespace: 'api_cache',
compress: true # Enable compression for larger values
})
Caching Specific SQL Queries
# app/models/product.rb
class Product < ApplicationRecord
# ... other model code
def self.find_by_category_cached(category_id)
cache_key = "products:category:#{category_id}"
cached_data = $memcached.get(cache_key)
if cached_data
# Deserialize if necessary, assuming JSON for simplicity
return JSON.parse(cached_data)
else
# Execute the actual query
products = Product.where(category_id: category_id).to_a
# Cache for 15 minutes (900 seconds)
$memcached.set(cache_key, products.to_json, 900)
return products
end
end
def self.find_expensive_products_cached(limit: 10)
cache_key = "products:expensive:#{limit}"
cached_data = $memcached.get(cache_key)
if cached_data
return JSON.parse(cached_data)
else
# Example of a more complex query
products = Product.order(price: :desc).limit(limit).to_a
$memcached.set(cache_key, products.to_json, 900)
return products
end
end
end
# In your controller:
# class ProductsController < ApplicationController
# def by_category
# category_id = params[:category_id]
# @products = Product.find_by_category_cached(category_id)
# render json: @products
# end
# end
Cache Invalidation Strategies
The biggest challenge with caching is invalidation. When data in MySQL changes, the corresponding cache entries must be updated or removed. For the examples above:
- Product Update/Delete: After updating or deleting a product in MySQL, explicitly delete the relevant cache keys from Memcached (e.g.,
products:ID,products:category:CATEGORY_ID). - Bulk Updates: For operations that affect multiple cached items (e.g., updating all products in a category), consider a “flush” strategy for that specific cache key prefix or a slightly longer cache TTL (Time To Live) to tolerate stale data for a short period.
Example: Cache Invalidation on Save
# app/models/product.rb
class Product < ApplicationRecord
after_save :clear_product_caches
after_destroy :clear_product_caches
private
def clear_product_caches
# Invalidate specific product cache
$memcached.delete("products:#{self.id}")
# Invalidate category cache if category changed or product was deleted/updated
$memcached.delete("products:category:#{self.category_id}")
# Potentially invalidate other related caches (e.g., expensive products if price changed)
# This can become complex and might require a more robust invalidation system.
end
end
Database-Level Caching: Query Cache (Deprecated) and Buffer Pool
While application-level caching is generally more flexible and scalable, understanding MySQL’s internal caching mechanisms is crucial for holistic performance tuning.
MySQL Query Cache (Deprecated in 5.7, Removed in 8.0)
Historically, MySQL had a built-in query cache that stored the text of a SELECT statement together with the result set. It was effective for read-heavy workloads with identical queries but suffered from significant overhead due to invalidation on any table modification. Its performance was often a net negative in busy systems. If you are on an older version (pre-5.7), you might encounter it. It’s generally recommended to disable it.
Disabling Query Cache (if applicable)
[mysqld] query_cache_type = 0 query_cache_size = 0
Restart MySQL after making these changes.
InnoDB Buffer Pool
The InnoDB Buffer Pool is MySQL’s most critical cache. It caches data and indexes for InnoDB tables in memory. A well-sized buffer pool is paramount for performance. The goal is to have as much of your active dataset as possible reside in the buffer pool.
Tuning `innodb_buffer_pool_size`
A common recommendation is to set innodb_buffer_pool_size to 70-80% of your available RAM on a dedicated database server. Monitor buffer pool hit rate using `SHOW ENGINE INNODB STATUS;`.
Monitoring Buffer Pool Hit Rate
SHOW ENGINE INNODB STATUS\G
Look for the BUFFER POOL AND MEMORY section. The Buffer pool hit rate should ideally be above 99%. If it’s lower, consider increasing innodb_buffer_pool_size (if you have RAM available) or optimizing queries to access data more efficiently.
Choosing the Right Caching Strategy
The optimal caching strategy depends on your application’s access patterns and data volatility:
- Object Caching (Redis): Ideal for caching entire objects or collections of objects that are frequently read and infrequently modified. Think user profiles, product details, configuration settings.
- Query Result Caching (Memcached): Suitable for caching the results of specific, often complex, SQL queries. Useful when direct object mapping is less straightforward or when you need to cache raw tabular data.
- Database Buffer Pool (InnoDB): Essential for all InnoDB tables. Maximize its size to keep frequently accessed data and indexes in memory.
A multi-layered approach, combining application-level caching (Redis/Memcached) with a well-tuned InnoDB buffer pool, provides the most robust solution for scaling MySQL-backed APIs under high read loads.