• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • Home
  • Projects
  • Products
  • Themes
  • Tools
  • Request for Quote

Vengala Vinay

Having 9+ Years of Experience in Software Development

  • Home
  • WordPress
  • PHP
    • Codeigniter
  • Django
  • Magento
  • Selenium
  • Server
Home » How We Audited a High-Traffic Shopify Enterprise Stack on DigitalOcean and Mitigated Race conditions during high-concurrency payment processing

How We Audited a High-Traffic Shopify Enterprise Stack on DigitalOcean and Mitigated Race conditions during high-concurrency payment processing

Deep Dive: Auditing a High-Traffic Shopify Enterprise Stack on DigitalOcean

Our engagement involved a high-traffic Shopify Plus enterprise deployment hosted on DigitalOcean. The primary concern was a series of intermittent, yet critical, race conditions occurring during peak payment processing periods. These events led to duplicate orders, failed transactions, and significant customer dissatisfaction. The stack comprised a custom Shopify theme, several third-party Shopify Apps, a robust backend API layer built with Node.js and Express, a PostgreSQL database, and a Redis cache, all orchestrated within a Kubernetes cluster on DigitalOcean Kubernetes Engine (DOKE).

Identifying the Root Cause: Race Conditions in Payment Processing Logic

The initial investigation focused on the payment gateway integration and the Shopify webhook handlers. We observed that during high concurrency, multiple webhook events for the same order could be processed almost simultaneously. The critical section of code involved updating the order status and initiating fulfillment requests. A typical flow looked like this:

  • Shopify webhook `orders/paid` is received.
  • Backend API retrieves order details from Shopify.
  • Backend API checks if payment has already been processed (e.g., by querying its internal order status table).
  • If not processed, the API marks the order as paid, initiates fulfillment, and updates the internal status.
  • If processed, the webhook is ignored.

The race condition occurred when two `orders/paid` webhooks for the same order arrived concurrently. Both instances of the API handler would execute the check for payment status. If the database update hadn’t yet been committed by the first instance, the second instance would also see the order as unpaid and proceed to process the payment again, leading to duplicate fulfillment requests or incorrect inventory deductions.

Diagnostic Workflow and Tooling

Our diagnostic approach involved a multi-pronged strategy:

1. Log Analysis and Correlation

We leveraged a centralized logging system (ELK stack: Elasticsearch, Logstash, Kibana) to aggregate logs from all microservices and the Kubernetes cluster. Key metrics and log patterns we searched for included:

  • Timestamps of incoming `orders/paid` webhooks.
  • API request/response times for order status checks and updates.
  • Database transaction logs, specifically for order status modifications.
  • Any errors or exceptions related to payment processing or order updates.
  • Correlation IDs were crucial for tracing a single order’s lifecycle across different services.

A sample log snippet illustrating the problem:

{
  "timestamp": "2023-10-27T10:30:01.123Z",
  "level": "info",
  "message": "Received Shopify webhook",
  "webhook_topic": "orders/paid",
  "order_id": "gid://shopify/Order/1234567890",
  "correlation_id": "abc-123"
}
{
  "timestamp": "2023-10-27T10:30:01.150Z",
  "level": "info",
  "message": "Processing payment for order",
  "order_id": "gid://shopify/Order/1234567890",
  "correlation_id": "abc-123"
}
{
  "timestamp": "2023-10-27T10:30:01.160Z",
  "level": "info",
  "message": "Received Shopify webhook",
  "webhook_topic": "orders/paid",
  "order_id": "gid://shopify/Order/1234567890",
  "correlation_id": "def-456"
}
{
  "timestamp": "2023-10-27T10:30:01.180Z",
  "level": "info",
  "message": "Order status check: Not processed",
  "order_id": "gid://shopify/Order/1234567890",
  "correlation_id": "def-456"
}
{
  "timestamp": "2023-10-27T10:30:01.200Z",
  "level": "info",
  "message": "Processing payment for order",
  "order_id": "gid://shopify/Order/1234567890",
  "correlation_id": "def-456"
}

2. Application Performance Monitoring (APM)

We integrated Datadog APM to gain deep visibility into the Node.js application’s performance. This allowed us to:

  • Identify slow database queries.
  • Trace requests across microservices.
  • Pinpoint the exact lines of code responsible for the contention.
  • Monitor the latency of Shopify API calls.

3. Database-Level Analysis

For PostgreSQL, we enabled detailed logging and used tools like pg_stat_statements to identify long-running transactions and lock contention. We also analyzed the query plans for the order status update operations.

-- Example query to find slow queries
SELECT
    query,
    calls,
    total_time,
    rows
FROM
    pg_stat_statements
ORDER BY
    total_time DESC
LIMIT 10;

Mitigation Strategy: Implementing Atomic Operations and Idempotency

The core of the solution was to ensure that the critical section of updating order status and processing payments was atomic and idempotent. We explored several approaches:

1. Database-Level Locking (Optimistic and Pessimistic)

Initially, we considered pessimistic locking (e.g., `SELECT … FOR UPDATE`) on the order record in our internal database. However, this can lead to deadlocks and reduced throughput under high concurrency. Instead, we opted for an optimistic locking strategy combined with a unique transaction identifier.

2. Idempotent Webhook Processing with Unique Transaction IDs

The most effective solution involved making the webhook handler idempotent. Each incoming webhook request from Shopify was assigned a unique `X-Shopify-Webhook-Id` header. We stored this ID alongside the order processing record in our database. Before processing any payment-related action, we would check if a record with that specific `X-Shopify-Webhook-Id` already existed.

Here’s a conceptual Node.js snippet demonstrating this:

const express = require('express');
const bodyParser = require('body-parser');
const crypto = require('crypto'); // For HMAC verification (essential for security)
const db = require('./db'); // Your database connection module

const app = express();
app.use(bodyParser.json());

// Middleware for HMAC verification (crucial for security)
const verifyShopifyWebhook = (req, res, next) => {
    const hmacHeader = req.headers['x-shopify-hmac-sha256'];
    const sharedSecret = process.env.SHOPIFY_SHARED_SECRET; // Load from environment variables
    const body = req.rawBody; // Ensure raw body is available

    if (!hmacHeader || !sharedSecret) {
        return res.status(400).send('HMAC header or shared secret missing.');
    }

    const generatedHash = crypto.createHmac('sha256', sharedSecret).update(body).digest('base64');

    if (!crypto.timingSafeEqual(Buffer.from(hmacHeader), Buffer.createHmac('sha256', sharedSecret).update(body).digest('base64'))) {
        return res.status(401).send('HMAC verification failed.');
    }
    next();
};

// Middleware to capture raw body for HMAC verification
app.use((req, res, next) => {
    req.rawBody = '';
    req.on('data', (chunk) => {
        req.rawBody += chunk;
    });
    req.on('end', () => {
        next();
    });
});

// Endpoint for orders/paid webhooks
app.post('/webhooks/orders/paid', verifyShopifyWebhook, async (req, res) => {
    const webhookId = req.headers['x-shopify-webhook-id'];
    const orderData = req.body; // Parsed JSON body
    const orderId = orderData.id; // Shopify's internal order ID

    try {
        // 1. Check if this webhook has already been processed using its unique ID
        const existingWebhook = await db.getWebhookById(webhookId);
        if (existingWebhook) {
            console.log(`Webhook ${webhookId} for order ${orderId} already processed. Skipping.`);
            return res.status(200).send('Already processed.');
        }

        // 2. Start a database transaction for atomicity
        await db.beginTransaction();

        // 3. Check internal order status to prevent duplicate processing if webhookId check fails (belt-and-suspenders)
        const internalOrder = await db.getOrderById(orderId);
        if (internalOrder && internalOrder.status === 'paid') {
            console.warn(`Order ${orderId} already marked as paid internally. Recording webhook ${webhookId}.`);
            // Still record the webhook to prevent future processing if this was a race condition
            await db.recordWebhook(webhookId, orderId, 'duplicate_internal_status');
            await db.commitTransaction();
            return res.status(200).send('Already paid internally.');
        }

        // 4. Mark order as paid and initiate fulfillment
        await db.updateOrderStatus(orderId, 'paid');
        await db.createFulfillmentRequest(orderId, orderData.line_items); // Example

        // 5. Record the webhook ID to ensure idempotency
        await db.recordWebhook(webhookId, orderId, 'success');

        // 6. Commit the transaction
        await db.commitTransaction();

        console.log(`Successfully processed payment for order ${orderId} with webhook ${webhookId}.`);
        res.status(200).send('Webhook processed successfully.');

    } catch (error) {
        await db.rollbackTransaction(); // Rollback on any error
        console.error(`Error processing webhook ${webhookId} for order ${orderId}:`, error);
        // Depending on the error, you might want to return 500 or retry later
        res.status(500).send('Internal Server Error.');
    }
});

// Placeholder database functions (implement these in ./db.js)
// db.getWebhookById(webhookId)
// db.getOrderById(orderId)
// db.updateOrderStatus(orderId, status)
// db.createFulfillmentRequest(orderId, lineItems)
// db.recordWebhook(webhookId, orderId, status)
// db.beginTransaction()
// db.commitTransaction()
// db.rollbackTransaction()

const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
    console.log(`Server listening on port ${PORT}`);
});

The database schema for tracking webhooks would be simple:

CREATE TABLE shopify_webhook_log (
    webhook_id VARCHAR(255) PRIMARY KEY,
    order_id VARCHAR(255) NOT NULL,
    processed_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    status VARCHAR(50) NOT NULL,
    -- other relevant fields
);

CREATE INDEX idx_shopify_webhook_log_order_id ON shopify_webhook_log (order_id);

This approach ensures that even if multiple `orders/paid` webhooks arrive for the same order concurrently, only the first one to successfully insert its `webhook_id` into the `shopify_webhook_log` table and complete the transaction will proceed. Subsequent attempts with the same `webhook_id` will be immediately rejected.

3. Transactional Integrity and Database Transactions

Within the Node.js application, we enforced transactional integrity using PostgreSQL’s transaction capabilities. All operations related to updating order status, creating fulfillment requests, and logging the webhook processing were wrapped in a single database transaction. This guarantees that either all operations succeed, or none of them do, preventing partial updates and maintaining data consistency.

Deployment and Monitoring on DigitalOcean

The updated application was deployed to the existing DigitalOcean Kubernetes Engine (DOKE) cluster. Key considerations for deployment and ongoing monitoring included:

1. Kubernetes Configuration

We ensured that the webhook processing service had appropriate resource limits and requests defined in its Kubernetes deployment manifests to handle peak loads. Horizontal Pod Autoscaler (HPA) was configured to scale the number of pods based on CPU utilization or custom metrics (e.g., queue depth if using a message queue).

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-processor
spec:
  replicas: 3 # Initial replicas
  selector:
    matchLabels:
      app: payment-processor
  template:
    metadata:
      labels:
        app: payment-processor
    spec:
      containers:
      - name: app
        image: your-docker-repo/payment-processor:latest
        ports:
        - containerPort: 3000
        resources:
          requests:
            cpu: "200m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: payment-processor-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: payment-processor
  minReplicas: 3
  maxReplicas: 15
  targetCPUUtilizationPercentage: 70

2. DigitalOcean Load Balancer Configuration

The DigitalOcean Load Balancer was configured to distribute incoming webhook traffic across the available pods of the payment processing service. Health checks were essential to ensure traffic was only sent to healthy instances.

# Example DigitalOcean Load Balancer Health Check Configuration (Conceptual)
health_check:
  protocol: "HTTP"
  port: 3000
  path: "/healthz" # A simple /healthz endpoint in the app
  interval_seconds: 15
  timeout_seconds: 5
  unhealthy_threshold: 3
  healthy_threshold: 2

3. Enhanced Monitoring and Alerting

Post-deployment, we intensified monitoring. Alerts were configured in Datadog for:

  • Increased error rates in the payment processing service.
  • High latency in database transactions related to order updates.
  • Failed health checks on the payment processing pods.
  • Any duplicate `webhook_id` entries detected in the `shopify_webhook_log` table (indicating a potential bypass of the idempotency check).

A specific alert was set up to monitor the rate of `duplicate_internal_status` entries in the `shopify_webhook_log` table, which would signal a potential regression or a new race condition scenario.

Conclusion and Future Considerations

By systematically auditing the stack, identifying the precise race condition in the payment processing logic, and implementing a robust idempotency mechanism using unique webhook IDs and database transactions, we successfully stabilized the high-concurrency payment flow. The use of DigitalOcean’s managed Kubernetes services and integrated monitoring tools facilitated both the diagnosis and the ongoing operational stability of the solution. Future considerations include exploring Shopify’s new Order Editing API for more granular control and potentially offloading more complex order state management to a dedicated event-driven architecture using services like Kafka or RabbitMQ for even greater resilience.

Primary Sidebar

A little about the Author

Having 9+ Years of Experience in Software Development.
Expertised in Php Development, WordPress Custom Theme Development (From scratch using underscores or Genesis Framework or using any blank theme or Premium Theme), Custom Plugin Development. Hands on Experience on 3rd Party Php Extension like Chilkat, nSoftware.

Recent Posts

  • Disaster Recovery 101: Architecting Auto-Failovers for Redis and PHP Deployments on OVH
  • How We Audited a High-Traffic WooCommerce Enterprise Stack on Google Cloud and Mitigated Race conditions during high-concurrency payment processing
  • Disaster Recovery 101: Architecting Auto-Failovers for Elasticsearch and Magento 2 Deployments on DigitalOcean
  • An Auditor’s Checklist for Securing WordPress Backends on OVH
  • Step-by-Step: Diagnosing Perl script high CPU throttling due to unoptimized regular expressions on AWS Servers

Copyright © 2026 · Vinay Vengala