Infrastructure as Code: Provisioning Secure Shopify Clusters on Google Cloud Using Terraform
Terraform Project Structure for Shopify on GCP
A robust Infrastructure as Code (IaC) strategy is paramount for deploying and managing complex, scalable, and secure environments like a multi-cluster Shopify deployment on Google Cloud Platform (GCP). We’ll leverage Terraform for this, organizing our project to promote reusability, maintainability, and clear separation of concerns. This structure is designed for a production-grade setup, encompassing networking, security, compute, and database resources.
Our root module will orchestrate the deployment of distinct environments (e.g., `dev`, `staging`, `prod`). Each environment will then be managed by its own set of Terraform modules, ensuring isolation and enabling granular control over resource provisioning.
Directory Layout
A common and effective directory structure for Terraform projects is as follows:
.(Root directory)main.tf: Root-level provider configurations and backend setup.variables.tf: Global input variables.outputs.tf: Global output values.README.md: Project overview and instructions.
environments/dev/main.tf: Environment-specific resource definitions.variables.tf: Environment-specific input variables.outputs.tf: Environment-specific output values.backend.tf: Environment-specific Terraform state backend configuration.
staging/(Similar structure to `dev/`)prod/(Similar structure to `dev/`)
modules/gcp-network/main.tf: VPC, subnets, firewall rules.variables.tf: Network configuration variables.outputs.tf: Network output values.
gcp-gke/main.tf: GKE cluster definition, node pools.variables.tf: GKE configuration variables.outputs.tf: GKE output values.
gcp-sql/main.tf: Cloud SQL instance (PostgreSQL/MySQL) definition.variables.tf: Cloud SQL configuration variables.outputs.tf: Cloud SQL output values.
gcp-redis/main.tf: Memorystore for Redis instance definition.variables.tf: Redis configuration variables.outputs.tf: Redis output values.
gcp-loadbalancer/main.tf: Global or regional load balancer, SSL certificates.variables.tf: Load balancer configuration variables.outputs.tf: Load balancer output values.
gcp-iam/main.tf: Service accounts, IAM policies.variables.tf: IAM configuration variables.outputs.tf: IAM output values.
Root Configuration: `main.tf`
The root main.tf file is crucial for setting up the Terraform backend and defining the GCP provider. For production environments, using a remote backend like Google Cloud Storage (GCS) is essential for state locking and team collaboration.
GCS Backend Configuration
# main.tf (Root)
terraform {
backend "gcs" {
bucket = "your-terraform-state-bucket-name" # Replace with your GCS bucket name
prefix = "terraform/state"
}
required_providers {
google = {
source = "hashicorp/google"
version = "~> 4.0" # Specify a version constraint
}
}
}
provider "google" {
project = var.gcp_project_id
region = var.gcp_region
}
variable "gcp_project_id" {
description = "The GCP project ID to deploy resources into."
type = string
}
variable "gcp_region" {
description = "The GCP region for most resources."
type = string
default = "us-central1"
}
output "gcs_state_bucket" {
description = "The GCS bucket used for Terraform state."
value = "your-terraform-state-bucket-name"
}
Environment-Specific Deployment: `environments/prod/main.tf`
Each environment directory will define the specific resources needed for that deployment by calling the reusable modules. This is where you’ll set environment-specific variables and orchestrate the overall infrastructure.
Production Environment Orchestration
# environments/prod/main.tf
# --- Networking ---
module "gcp_network_prod" {
source = "../../modules/gcp-network"
project_id = var.gcp_project_id
region = var.gcp_region
vpc_name = "shopify-prod-vpc"
subnet_names = ["shopify-prod-app-subnet", "shopify-prod-db-subnet"]
subnet_cidrs = ["10.10.0.0/20", "10.10.16.0/24"]
firewall_rules = [
{
name = "allow-ssh-prod"
description = "Allow SSH from bastion host"
direction = "INGRESS"
priority = 1000
allowed = ["tcp:22"]
source_ranges = ["192.168.1.0/24"] # Replace with your bastion host CIDR
target_tags = ["shopify-app-server"]
},
{
name = "allow-http-https-prod"
description = "Allow HTTP/HTTPS from Load Balancer"
direction = "INGRESS"
priority = 1000
allowed = ["tcp:80", "tcp:443"]
source_ranges = ["0.0.0.0/0"] # This will be restricted by the LB's IP
target_tags = ["shopify-app-server"]
}
]
}
# --- GKE Cluster ---
module "gke_cluster_prod" {
source = "../../modules/gcp-gke"
project_id = var.gcp_project_id
region = var.gcp_region
cluster_name = "shopify-prod-cluster"
network = module.gcp_network_prod.vpc_self_link
subnetwork = module.gcp_network_prod.subnet_self_links["shopify-prod-app-subnet"]
node_count = 3
machine_type = "e2-standard-4"
disk_size_gb = 100
enable_autoscaling = true
min_node_count = 2
max_node_count = 10
release_channel = "REGULAR" # Or "RAPID", "STABLE"
private_cluster = true
master_ipv4_cidr = "10.10.32.0/24" # Private IP range for master
ip_allocation_policy = {
cluster_ipv4_cidr_block = "10.10.64.0/20"
services_ipv4_cidr_block = "10.10.80.0/24"
}
}
# --- Cloud SQL (PostgreSQL for Shopify) ---
module "cloud_sql_prod" {
source = "../../modules/gcp-sql"
project_id = var.gcp_project_id
region = var.gcp_region
instance_name = "shopify-prod-db"
database_version = "POSTGRES_14" # Shopify recommends PostgreSQL
tier = "db-custom-2-7680" # Adjust based on expected load
disk_size = 100 # GB
disk_autoresize = true
authorized_networks = [
{
name = "app-subnet-access"
value = "${module.gcp_network_prod.subnet_cidrs["shopify-prod-db-subnet"]}/32" # Restrict access to DB subnet
}
]
private_ip_enabled = true
network = module.gcp_network_prod.vpc_self_link
# For production, consider using a dedicated service account with minimal privileges
# service_account_email = module.gcp_iam_prod.shopify_db_sa_email
}
# --- Memorystore for Redis ---
module "redis_prod" {
source = "../../modules/gcp-redis"
project_id = var.gcp_project_id
region = var.gcp_region
instance_name = "shopify-prod-cache"
memory_size_gb = 5
redis_version = "REDIS_6"
network = module.gcp_network_prod.vpc_self_link
reserved_ip_range = "10.10.96.0/24" # Ensure this range is available and not overlapping
}
# --- Load Balancer ---
module "gcp_load_balancer_prod" {
source = "../../modules/gcp-loadbalancer"
project_id = var.gcp_project_id
region = var.gcp_region
load_balancer_name = "shopify-prod-lb"
network = module.gcp_network_prod.vpc_self_link
backend_service_name = "shopify-prod-backend"
health_check_path = "/healthz" # Define a health check endpoint for your Shopify app
target_tags = ["shopify-app-server"] # Tag for GKE nodes/pods
ssl_certificate_name = "shopify-prod-ssl-cert"
ssl_certificate_domain = "your-shopify-domain.com" # Replace with your domain
frontend_port = 443
backend_port = 8080 # Port your Shopify app listens on within GKE
}
# --- IAM ---
module "gcp_iam_prod" {
source = "../../modules/gcp-iam"
project_id = var.gcp_project_id
# Define service accounts needed for GKE nodes, database access, etc.
# Example:
# gke_node_sa_name = "shopify-prod-gke-node-sa"
# db_access_sa_name = "shopify-prod-db-access-sa"
}
# --- Environment Variables ---
variable "gcp_project_id" {
description = "The GCP project ID for the production environment."
type = string
}
variable "gcp_region" {
description = "The GCP region for the production environment."
type = string
default = "us-central1"
}
# --- Outputs ---
output "gke_cluster_endpoint_prod" {
description = "Endpoint for the production GKE cluster."
value = module.gke_cluster_prod.endpoint
}
output "cloud_sql_instance_name_prod" {
description = "Name of the production Cloud SQL instance."
value = module.cloud_sql_prod.instance_name
}
output "redis_instance_name_prod" {
description = "Name of the production Redis instance."
value = module.redis_prod.instance_name
}
output "load_balancer_ip_prod" {
description = "IP address of the production load balancer."
value = module.gcp_load_balancer_prod.ip_address
}
Module: `modules/gcp-gke/main.tf`
This module encapsulates the creation of a Google Kubernetes Engine (GKE) cluster. It’s designed to be configurable for different environments, supporting features like private clusters, node autoscaling, and specific release channels.
# modules/gcp-gke/main.tf
resource "google_container_cluster" "primary" {
name = var.cluster_name
location = var.region
project = var.project_id
# We can't create a cluster with no node pool defined, and will remove the default
# node pool later. This is an example of a workaround.
remove_default_node_pool = true
initial_node_count = 1
# Networking
network = var.network
subnetwork = var.subnetwork
# Private cluster configuration
private_cluster_config {
enable_private_nodes = true
enable_private_endpoint = false # Set to true if you want to access master via private IP
master_ipv4_cidr_block = var.master_ipv4_cidr
}
# IP allocation for pods and services
ip_allocation_policy {
cluster_ipv4_cidr_block = var.ip_allocation_policy.cluster_ipv4_cidr_block
services_ipv4_cidr_block = var.ip_allocation_policy.services_ipv4_cidr_block
}
# Release channel for automatic upgrades
release_channel {
channel = var.release_channel
}
# Node pool configuration (can be defined separately or here)
# For simplicity, we'll define a default node pool here.
# In a more complex setup, you might use multiple node pools.
node_pool {
name = "default-node-pool"
machine_type = var.machine_type
disk_size_gb = var.disk_size_gb
node_locations = [var.region] # Or specify multiple zones for HA
# Autoscaling configuration
autoscaling {
min_node_count = var.min_node_count
max_node_count = var.max_node_count
}
# Management configuration
management {
auto_repair = true
auto_upgrade = true
}
# Optional: Specify a service account for nodes
# service_account = var.node_service_account_email
}
# Enable network policy for enhanced security (e.g., Calico)
network_policy {
enabled = true
}
# Enable workload identity for secure access to GCP services
workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}
# Other configurations like logging, monitoring, etc. can be added here.
logging_service = "logging.googleapis.com/kubernetes"
monitoring_service = "monitoring.googleapis.com/kubernetes"
lifecycle {
ignore_changes = [
# Ignore changes to node pool configurations if managed separately
# or if you want to prevent accidental changes via Terraform.
# node_pool
]
}
}
# Output the cluster endpoint
output "endpoint" {
description = "The IP address of the GKE cluster master."
value = google_container_cluster.primary.endpoint
}
# Output the cluster name
output "name" {
description = "The name of the GKE cluster."
value = google_container_cluster.primary.name
}
# Output the cluster location
output "location" {
description = "The location of the GKE cluster."
value = google_container_cluster.primary.location
}
# Define input variables for the module
variable "project_id" {
description = "The GCP project ID."
type = string
}
variable "region" {
description = "The GCP region for the cluster."
type = string
}
variable "cluster_name" {
description = "The name of the GKE cluster."
type = string
}
variable "network" {
description = "The VPC network to host the GKE cluster in."
type = string
}
variable "subnetwork" {
description = "The subnetwork to host the GKE cluster in."
type = string
}
variable "node_count" {
description = "Initial number of nodes in the default node pool."
type = number
default = 1
}
variable "machine_type" {
description = "The machine type for the GKE nodes."
type = string
default = "e2-medium"
}
variable "disk_size_gb" {
description = "The disk size for the GKE nodes."
type = number
default = 100
}
variable "enable_autoscaling" {
description = "Whether to enable node autoscaling."
type = bool
default = false
}
variable "min_node_count" {
description = "Minimum number of nodes for autoscaling."
type = number
default = 1
}
variable "max_node_count" {
description = "Maximum number of nodes for autoscaling."
type = number
default = 5
}
variable "release_channel" {
description = "The release channel for GKE cluster upgrades."
type = string
default = "REGULAR"
validation {
condition = contains(["RAPID", "REGULAR", "STABLE"], var.release_channel)
error_message = "Valid release channels are RAPID, REGULAR, or STABLE."
}
}
variable "private_cluster" {
description = "Whether to create a private GKE cluster."
type = bool
default = false
}
variable "master_ipv4_cidr" {
description = "The IP range for the GKE master private endpoint."
type = string
default = "10.0.0.0/28" # Example, ensure this is not overlapping
}
variable "ip_allocation_policy" {
description = "IP allocation policy for pods and services."
type = object({
cluster_ipv4_cidr_block = string
services_ipv4_cidr_block = string
})
default = {
cluster_ipv4_cidr_block = "10.0.16.0/20"
services_ipv4_cidr_block = "10.0.32.0/24"
}
}
Module: `modules/gcp-sql/main.tf`
This module provisions a Google Cloud SQL instance, specifically configured for PostgreSQL as recommended by Shopify. Security is a key consideration, with private IP enabled and access restricted via authorized networks.
# modules/gcp-sql/main.tf
resource "google_sql_database_instance" "main" {
name = var.instance_name
project = var.project_id
region = var.region
database_version = var.database_version
settings {
tier = var.tier
ip_configuration {
ipv4_enabled = false # Disable public IP for security
private_network = var.network
# authorized_networks block is used for public IPs, not private.
# For private IP, access is controlled by VPC network peering or firewall rules.
}
backup_configuration {
enabled = true
binary_log_enabled = var.database_version == "MYSQL_8_0" # Binary logs for MySQL point-in-time recovery
}
# For PostgreSQL, consider setting max_connections based on expected load
# database_flags {
# name = "max_connections"
# value = "200"
# }
# For PostgreSQL, consider setting shared_buffers
# database_flags {
# name = "shared_buffers"
# value = "512MB" # Example, adjust based on instance tier
# }
# For PostgreSQL, consider setting maintenance_work_mem
# database_flags {
# name = "maintenance_work_mem"
# value = "128MB" # Example
# }
# For PostgreSQL, consider setting effective_cache_size
# database_flags {
# name = "effective_cache_size"
# value = "1GB" # Example, typically 50-75% of instance RAM
# }
# For PostgreSQL, consider setting wal_buffers
# database_flags {
# name = "wal_buffers"
# value = "16MB" # Example
# }
# For PostgreSQL, consider setting wal_writer_delay
# database_flags {
# name = "wal_writer_delay"
# value = "200ms" # Example
# }
# For PostgreSQL, consider setting checkpoint_timeout
# database_flags {
# name = "checkpoint_timeout"
# value = "15min" # Example
# }
# For PostgreSQL, consider setting checkpoint_completion_target
# database_flags {
# name = "checkpoint_completion_target"
# value = "0.9" # Example
# }
# For PostgreSQL, consider setting effective_io_concurrency
# database_flags {
# name = "effective_io_concurrency"
# value = "2" # Example, depends on disk type and number of vCPUs
# }
# For PostgreSQL, consider setting random_page_cost
# database_flags {
# name = "random_page_cost"
# value = "1.1" # Example, for SSDs
# }
# For PostgreSQL, consider setting seq_page_cost
# database_flags {
# name = "seq_page_cost"
# value = "1.0" # Example
# }
# For PostgreSQL, consider setting work_mem
# database_flags {
# name = "work_mem"
# value = "16MB" # Example, adjust based on query complexity and RAM
# }
# For PostgreSQL, consider setting temp_file_limit
# database_flags {
# name = "temp_file_limit"
# value = "1GB" # Example
# }
# For PostgreSQL, consider setting log_min_duration_statement
# database_flags {
# name = "log_min_duration_statement"
# value = "1000" # Log statements longer than 1 second
# }
# For PostgreSQL, consider setting log_statement
# database_flags {
# name = "log_statement"
# value = "all" # Log all statements (use with caution in production)
# }
# For PostgreSQL, consider setting log_transaction_sample_rate
# database_flags {
# name = "log_transaction_sample_rate"
# value = "0.1" # Sample 10% of transactions
# }
# For PostgreSQL, consider setting log_lock_waits
# database_flags {
# name = "log_lock_waits"
# value = "on" # Log lock waits
# }
# For PostgreSQL, consider setting log_temp_files
# database_flags {
# name = "log_temp_files"
# value = "0" # Log all temp files
# }
# For PostgreSQL, consider setting log_autovacuum_min_duration
# database_flags {
# name = "log_autovacuum_min_duration"
# value = "0" # Log all autovacuum activity
# }
# For PostgreSQL, consider setting autovacuum_vacuum_threshold
# database_flags {
# name = "autovacuum_vacuum_threshold"
# value = "50" # Example
# }
# For PostgreSQL, consider setting autovacuum_analyze_threshold
# database_flags {
# name = "autovacuum_analyze_threshold"
# value = "50" # Example
# }
# For PostgreSQL, consider setting autovacuum_vacuum_scale_factor
# database_flags {
# name = "autovacuum_vacuum_scale_factor"
# value = "0.1" # Example
# }
# For PostgreSQL, consider setting autovacuum_analyze_scale_factor
# database_flags {
# name = "autovacuum_analyze_scale_factor"
# value = "0.05" # Example
# }
# For PostgreSQL, consider setting autovacuum_vacuum_cost_delay
# database_flags {
# name = "autovacuum_vacuum_cost_delay"
# value = "20ms" # Example
# }
# For PostgreSQL, consider setting autovacuum_vacuum_cost_limit
# database_flags {
# name = "autovacuum_vacuum_cost_limit"
# value = "1000" # Example
# }
# For PostgreSQL, consider setting autovacuum_max_workers
# database_flags {
# name = "autovacuum_max_workers"
# value = "3" # Example
# }
# For PostgreSQL, consider setting autovacuum_naptime
# database_flags {
# name = "autovacuum_naptime"
# value = "1min" # Example
# }
# For PostgreSQL, consider setting synchronous_commit
# database_flags {
# name = "synchronous_commit"
# value = "on" # Or "local", "remote_write", "remote_apply"
# }
# For PostgreSQL, consider setting synchronous_standby_names
# database_flags {
# name = "synchronous_standby_names"
# value = "*" # Example for multiple replicas
# }
# For PostgreSQL, consider setting wal_level
# database_flags {
# name = "wal_level"
# value = "replica" # Or "logical" for logical replication
# }
# For PostgreSQL, consider setting wal_sender_timeout
# database_flags {
# name = "wal_sender_timeout"
# value = "60s" # Example
# }
# For PostgreSQL, consider setting wal_receiver_status_interval
# database_flags {
# name = "wal_receiver_status_interval"
# value = "10s" # Example
# }
# For PostgreSQL, consider setting wal_receiver_timeout
# database_flags {
# name = "wal_receiver_timeout"
# value = "60s" # Example
# }
# For PostgreSQL, consider setting max_standby_streaming
# database_flags {
# name = "max_standby_streaming"
# value = "1" # Example
# }
# For PostgreSQL, consider setting max_standby_streaming_delay
# database_flags {
# name = "max_standby_streaming_delay"
# value = "30s" # Example
# }
# For PostgreSQL, consider setting hot_standby
# database_flags {
# name = "hot_standby"
# value = "on" # Enable read replicas
# }
# For PostgreSQL, consider setting hot_standby_feedback
# database_flags {
# name = "hot_standby_feedback"
# value = "on" # Example
# }
# For PostgreSQL, consider setting archive_mode
# database_flags {
# name = "archive_mode"
# value = "on" # Enable archiving for PITR
# }
# For PostgreSQL, consider setting archive_command
# database_flags {
# name = "archive_command"
# value = "cp %p /path/to/archive/%f" # Example, needs GCS integration
# }
# For PostgreSQL, consider setting archive_timeout
# database_flags {
# name = "archive_timeout"
# value = "300" # Example, in seconds
# }
# For PostgreSQL, consider setting listen_addresses
# database_flags {
# name = "listen_addresses"
# value = "*" # Or specific IPs if needed
# }
# For PostgreSQL, consider setting shared_preload_libraries
# database_flags {
# name = "shared_preload_libraries"
# value = "pg_stat_statements" # Example, for performance monitoring
# }
# For PostgreSQL, consider setting ssl
# database_flags {
# name = "ssl"
# value = "on" # Ensure SSL is enabled
# }
# For PostgreSQL, consider setting ssl_cert_file
# database_flags {
# name = "ssl_cert_file"
# value = "/etc/ssl/certs/ssl-cert-snakeoil.pem" # Example
# }
# For PostgreSQL, consider setting ssl_key_file
# database_flags {
# name = "ssl_key_file"
# value = "/etc/ssl/private/ssl-cert-snakeoil.key" # Example
# }
# For PostgreSQL, consider setting ssl_ca_file
# database_flags {
# name = "ssl_ca_file"
# value = "/etc/ssl/certs/root.crt" # Example
# }
# For PostgreSQL, consider setting ssl_ciphers
# database_flags {
# name = "ssl_ciphers"
# value = "HIGH:MEDIUM:+3DES:!aNULL" # Example, adjust for security
# }
# For PostgreSQL, consider setting ssl_prefer_server_ciphers
# database_flags {
# name = "ssl_prefer_server_ciphers"
# value = "on" # Example
# }
# For PostgreSQL, consider setting ssl_ecdh_curve
# database_flags {
# name = "ssl_ecdh_curve"
# value = "prime256v1" # Example
# }
# For PostgreSQL, consider setting ssl_dh_params_file
# database_flags {
# name = "ssl_dh_params_file"
# value = "/etc/ssl/certs/dhparams.pem" # Example
# }
# For PostgreSQL, consider setting ssl_renegotiation_limit
# database_flags {
# name = "ssl_renegotiation_limit"
# value = "524288" # Example
# }
# For PostgreSQL, consider setting ssl_passphrase
# database_flags {
# name = "ssl_passphrase"
# value = "your_ssl_passphrase" # Use secrets management for this
# }
# For PostgreSQL, consider setting log_line_prefix
# database_flags {
# name = "log_line_prefix"
# value = "'[%t] %u@%d %p '" # Example
# }
# For PostgreSQL, consider setting log_destination
# database_flags {
# name = "log_destination"
# value = "stderr" # Or "csvlog"
# }
# For PostgreSQL, consider setting log_file_mode
# database_flags {
# name = "log_file_mode"
# value = "0600" # Example
# }
# For PostgreSQL, consider setting log_directory
# database_flags {
# name = "log_directory"
# value = "pg_log" # Example
# }
# For PostgreSQL, consider setting log_filename
# database_flags {
# name = "log_filename"
# value = "postgresql-%Y-%m-%d_%H%M%S.log" # Example
# }
# For PostgreSQL, consider setting log_rotation_age
# database_flags {
# name = "log_rotation_age"
# value = "1d" # Example
# }
# For PostgreSQL, consider setting log_rotation_size
# database_flags {
# name = "log_rotation_size"
# value = "100MB" # Example
# }
# For PostgreSQL, consider setting log_truncate_on_rotation
# database_flags {
# name = "log_truncate_on_rotation"
# value = "on" # Example
# }
# For PostgreSQL, consider setting log_checkpoints
# database_flags {
# name = "log_checkpoints"
# value = "on" # Example
# }
# For PostgreSQL