Spring Boot (Java) vs. Axum (Rust): Optimizing for Low-Latency Financial Transaction APIs

Benchmarking Foundation: Defining the Transaction API

To rigorously compare Spring Boot and Axum for low-latency financial transaction APIs, we need a standardized, representative workload. This workload will simulate critical operations such as order placement, status checks, and trade cancellations. The key metrics will be request latency (p95, p99, and average) and throughput (requests per second) under varying load conditions. We’ll focus on a simplified but realistic API contract:

POST /orders: Place a new order.
GET /orders/{orderId}: Retrieve order status.
DELETE /orders/{orderId}: Cancel an existing order.

For this benchmark, we’ll assume a single-threaded, in-memory data store for simplicity, eliminating I/O bottlenecks that aren’t directly related to the web framework’s performance. The goal is to isolate the overhead introduced by the framework itself, its request handling, serialization/deserialization, and routing mechanisms.

Spring Boot (Java) Implementation: Baseline Performance

We’ll start with a standard Spring Boot application using Spring WebFlux for reactive, non-blocking I/O, which is crucial for high-throughput, low-latency services. We’ll use Jackson for JSON processing and Netty as the underlying HTTP server.

Project Setup and Dependencies

A typical `pom.xml` for this scenario would include:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
</dependency>
<dependency>
    <groupId>org.projectlombok</groupId>
    <artifactId>lombok</artifactId>
    <optional>true</optional>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

Core Application Logic

We’ll define simple POJOs for requests and responses and use a reactive in-memory map to simulate data storage.

// Order.java
import lombok.Data;
import java.util.UUID;

@Data
public class Order {
    private UUID id;
    private String symbol;
    private double quantity;
    private double price;
    private String side; // BUY/SELL
    private String status; // NEW, FILLED, CANCELLED
}

// OrderRequest.java
import lombok.Data;

@Data
public class OrderRequest {
    private String symbol;
    private double quantity;
    private double price;
    private String side;
}

// OrderService.java
import org.springframework.stereotype.Service;
import reactor.core.publisher.Mono;
import java.util.Map;
import java.util.UUID;
import java.util.concurrent.ConcurrentHashMap;

@Service
public class OrderService {
    private final Map<UUID, Order> orderStore = new ConcurrentHashMap<>();

    public Mono<Order> placeOrder(OrderRequest request) {
        Order order = new Order();
        order.setId(UUID.randomUUID());
        order.setSymbol(request.getSymbol());
        order.setQuantity(request.getQuantity());
        order.setPrice(request.getPrice());
        order.setSide(request.getSide());
        order.setStatus("NEW");
        orderStore.put(order.getId(), order);
        return Mono.just(order);
    }

    public Mono<Order> getOrder(UUID orderId) {
        Order order = orderStore.get(orderId);
        return Mono.justOrEmpty(order);
    }

    public Mono<Void> cancelOrder(UUID orderId) {
        Order order = orderStore.get(orderId);
        if (order != null && "NEW".equals(order.getStatus())) {
            order.setStatus("CANCELLED");
            orderStore.put(orderId, order); // Update in map
            return Mono.empty();
        }
        return Mono.error(new IllegalArgumentException("Order not found or already processed"));
    }
}

// OrderController.java
import org.springframework.web.bind.annotation.*;
import org.springframework.beans.factory.annotation.Autowired;
import reactor.core.publisher.Mono;
import java.util.UUID;

@RestController
@RequestMapping("/orders")
public class OrderController {

    @Autowired
    private OrderService orderService;

    @PostMapping
    public Mono<Order> placeOrder(@RequestBody OrderRequest request) {
        return orderService.placeOrder(request);
    }

    @GetMapping("/{orderId}")
    public Mono<Order> getOrder(@PathVariable UUID orderId) {
        return orderService.getOrder(orderId);
    }

    @DeleteMapping("/{orderId}")
    public Mono<Void> cancelOrder(@PathVariable UUID orderId) {
        return orderService.cancelOrder(orderId);
    }
}

Tuning for Performance

Key JVM and Spring Boot configurations for performance:

# application.properties
server.port=8080
spring.main.web-application-type=reactive
spring.mvc.throw-exception-if-no-handler-found=true
spring.web.resources.add-mappings=false

# JVM arguments (example, tune based on hardware and load)
# -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:MaxGCPauseMillis=10 -XX:ConcGCThreads=4 -Xms4g -Xmx4g

For production, consider tuning the Netty event loop threads and worker threads. The default settings are often conservative. Using a garbage collector like ZGC or Shenandoah with low pause time goals is essential.

Axum (Rust) Implementation: High-Performance Alternative

Axum is a web framework built by the Tokio team, leveraging Rust’s performance characteristics and Tokio’s asynchronous runtime. Its focus on composability and zero-cost abstractions makes it a strong contender for ultra-low-latency services.

Project Setup and Dependencies

We’ll use Cargo for dependency management. The `Cargo.toml` will look something like this:

[package]
name = "axum-finance-api"
version = "0.1.0"
edition = "2021"

[dependencies]
axum = "0.7"
tokio = { version = "1", features = ["full"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
uuid = { version = "1.8", features = ["v4", "serde"] }
tower-http = { version = "0.5", features = ["trace"] }
tracing = "0.1"
tracing-subscriber = "0.3"
http = "1.1"

Core Application Logic

Rust’s strong typing and explicit error handling are key here. We’ll use `serde` for JSON serialization and `uuid` for identifiers.

// src/main.rs
use axum::{
    routing::{get, post, delete},
    http::StatusCode,
    response::IntoResponse,
    Json, Router, extract::Path,
};
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::sync::{Arc, Mutex};
use tokio::net::TcpListener;
use uuid::Uuid;
use tower_http::trace::TraceLayer;
use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt};

#[derive(Debug, Serialize, Deserialize, Clone)]
struct Order {
    id: Uuid,
    symbol: String,
    quantity: f64,
    price: f64,
    side: String,
    status: String,
}

#[derive(Debug, Deserialize)]
struct OrderRequest {
    symbol: String,
    quantity: f64,
    price: f64,
    side: String,
}

// Use a Mutex for interior mutability of the in-memory store.
// Arc allows sharing across threads safely.
type Db = Arc<Mutex<HashMap<Uuid, Order>>>;

#[tokio::main]
async fn main() {
    // Initialize tracing (logging)
    tracing_subscriber::registry()
        .with(tracing_subscriber::fmt::layer())
        .init();

    let db: Db = Arc::new(Mutex::new(HashMap::new()));

    // Build our application with routes
    let app = Router::new()
        .route("/orders", post(place_order))
        .route("/orders/:id", get(get_order))
        .route("/orders/:id", delete(cancel_order))
        .layer(TraceLayer::new_for_http()) // Add request tracing
        .with_state(db); // Share the database state

    // Run the server
    let listener = TcpListener::bind("127.0.0.1:8080").await.unwrap();
    tracing::info!("Listening on {}", listener.local_addr().unwrap());
    axum::serve(listener, app).await.unwrap();
}

// --- Handlers ---

async fn place_order(
    State(db): State<Db>,
    Json(payload): Json<OrderRequest>,
) -> impl IntoResponse {
    let mut db_lock = db.lock().unwrap(); // Lock the mutex

    let order_id = Uuid::new_v4();
    let order = Order {
        id: order_id,
        symbol: payload.symbol,
        quantity: payload.quantity,
        price: payload.price,
        side: payload.side,
        status: "NEW".to_string(),
    };

    db_lock.insert(order_id, order.clone());
    (StatusCode::CREATED, Json(order))
}

async fn get_order(
    State(db): State<Db>,
    Path(order_id): Path<Uuid>,
) -> impl IntoResponse {
    let db_lock = db.lock().unwrap();
    if let Some(order) = db_lock.get(&order_id) {
        (StatusCode::OK, Json(order.clone()))
    } else {
        (StatusCode::NOT_FOUND, Json("Order not found".to_string()))
    }
}

async fn cancel_order(
    State(db): State<Db>,
    Path(order_id): Path<Uuid>,
) -> impl IntoResponse {
    let mut db_lock = db.lock().unwrap();
    if let Some(order) = db_lock.get_mut(&order_id) {
        if order.status == "NEW" {
            order.status = "CANCELLED".to_string();
            (StatusCode::OK, Json("Order cancelled".to_string()))
        } else {
            (StatusCode::BAD_REQUEST, Json("Order already processed".to_string()))
        }
    } else {
        (StatusCode::NOT_FOUND, Json("Order not found".to_string()))
    }
}

Tuning for Performance

Rust and Axum generally require less explicit tuning for raw performance due to their nature. However, key considerations include:

Tokio Runtime Configuration: The number of worker threads in the Tokio runtime can be adjusted via the RUST_NUM_THREADS environment variable or programmatically. For CPU-bound tasks, matching the number of physical cores is often optimal.
Serialization Performance: While serde_json is fast, for extreme cases, consider alternatives like simd-json or binary formats (e.g., MessagePack, Protobuf) if the API contract allows.
Connection Pooling: For database-bound applications (not this benchmark), robust connection pooling is critical.
Compiler Optimizations: Ensure builds are done in release mode (cargo build --release) to leverage full compiler optimizations.

Benchmarking Methodology and Tools

We will use wrk, a modern HTTP benchmarking tool, to simulate concurrent client requests. It’s known for its high performance and ability to generate significant load.

Test Setup

Both applications will be deployed on identical hardware (e.g., a dedicated cloud instance with sufficient CPU and RAM). Network latency between the client generating load and the server will be minimized.

`wrk` Configuration and Commands

We’ll run tests with varying numbers of threads, connections, and durations to observe performance under different load profiles. A typical command for placing orders might look like this:

# Example: Place orders with 100 threads, 1000 connections, for 30 seconds
wrk -t100 -c1000 -d30s --latency --script=./scripts/place_order.lua http://localhost:8080/orders

The Lua script (`scripts/place_order.lua`) would dynamically generate JSON payloads for each request:

-- scripts/place_order.lua
function setup(owner, subtable)
    subtable.symbols = {"AAPL", "GOOG", "MSFT", "AMZN"}
    subtable.sides = {"BUY", "SELL"}
end

function request(owner, subtable)
    local symbol = subtable.symbols[math.random(1, #subtable.symbols)]
    local side = subtable.sides[math.random(1, #subtable.sides)]
    local quantity = math.random(1, 1000)
    local price = math.random(100, 2000) / 10.0

    local body = [[{
        "symbol": "]] .. symbol .. [[",
        "quantity": ]] .. quantity .. [[,
        "price": ]] .. price .. [[,
        "side": "]] .. side .. [["
    }]]

    return wrk.request("POST", "/orders", {
        ["Content-Type"] = "application/json",
        ["Accept"] = "application/json"
    }, body)
end

Similar scripts would be created for GET and DELETE operations, potentially using IDs generated from previous POST requests or a pre-populated list.

Performance Analysis and Trade-offs

After running the benchmarks, we’ll analyze the results focusing on:

Latency Distribution: p95, p99, and average latencies for each endpoint and load level.
Throughput (RPS): Maximum sustainable requests per second before performance degrades significantly.
CPU and Memory Usage: Resource consumption of each application under load.

Expected Outcomes and Considerations:

Axum (Rust) is expected to exhibit lower raw latency and higher throughput due to Rust’s compile-time guarantees, minimal runtime overhead, and efficient memory management. The absence of a Garbage Collector (GC) eliminates unpredictable pause times, which is critical for financial systems.
Spring Boot (Java) with WebFlux, while highly performant for a JVM-based framework, will likely show slightly higher latency due to JVM startup time, GC pauses (even with low-pause GCs), and the inherent overhead of a managed runtime. However, its mature ecosystem, extensive tooling, and developer familiarity can be significant advantages.
Development Velocity: Java developers might find Spring Boot more familiar, leading to faster initial development. Rust’s learning curve can be steeper, but its safety features can prevent entire classes of bugs in production.
Operational Complexity: Both require careful monitoring. Java’s JVM tuning can be complex. Rust’s native compilation simplifies deployment but requires understanding Rust’s build system and tooling.
Ecosystem and Libraries: Spring Boot has a vast ecosystem. Rust’s ecosystem is growing rapidly, particularly in systems programming and performance-critical areas.

The choice between Spring Boot and Axum hinges on the specific latency requirements, team expertise, and tolerance for operational complexity. For absolute minimal latency and predictable performance, Axum is the stronger candidate. For teams prioritizing rapid development within a mature ecosystem, Spring Boot with WebFlux remains a powerful and viable option, provided GC tuning is addressed.

Spring Boot (Java) vs. Axum (Rust): Optimizing for Low-Latency Financial Transaction APIs

Benchmarking Foundation: Defining the Transaction API

Spring Boot (Java) Implementation: Baseline Performance

Project Setup and Dependencies

Core Application Logic

Tuning for Performance

Axum (Rust) Implementation: High-Performance Alternative

Project Setup and Dependencies

Core Application Logic

Tuning for Performance

Benchmarking Methodology and Tools

Test Setup

wrk Configuration and Commands

Performance Analysis and Trade-offs

Recent Posts

Top Categories

Our Products

Our Services

`wrk` Configuration and Commands