Infrastructure as Code: Provisioning Secure Python Clusters on Google Cloud Using Terraform

Terraform Provider Configuration for Google Cloud

To provision resources on Google Cloud Platform (GCP) using Terraform, we first need to configure the Google Cloud provider. This involves specifying your GCP project ID, region, and potentially authentication credentials. For production environments, it’s highly recommended to use a service account with granular permissions rather than user credentials.

Create a file named main.tf and define the provider block as follows:

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 4.0"
    }
  }
}

provider "google" {
  project = var.gcp_project_id
  region  = var.gcp_region
}

variable "gcp_project_id" {
  description = "The GCP project ID to deploy resources into."
  type        = string
}

variable "gcp_region" {
  description = "The GCP region to deploy resources into."
  type        = string
  default     = "us-central1"
}

variable "gcp_zone" {
  description = "The GCP zone to deploy resources into."
  type        = string
  default     = "us-central1-a"
}

You can then create a terraform.tfvars file to specify your project ID and desired region, or pass these values via environment variables or command-line flags during terraform apply.

gcp_project_id = "your-gcp-project-id"
gcp_region     = "us-west1"
gcp_zone       = "us-west1-b"

VPC Network and Subnet Creation

A secure Python cluster requires a dedicated Virtual Private Cloud (VPC) network and subnet. This provides network isolation and allows for fine-grained control over ingress and egress traffic. We’ll define a custom VPC network and a subnet within it.

resource "google_compute_network" "vpc_network" {
  name                    = "python-cluster-vpc"
  auto_create_subnetworks = false
  routing_mode            = "REGIONAL"
}

resource "google_compute_subnetwork" "subnet" {
  name          = "python-cluster-subnet"
  ip_cidr_range = "10.0.1.0/24"
  region        = var.gcp_region
  network       = google_compute_network.vpc_network.id
}

The auto_create_subnetworks = false setting is crucial for custom network configurations. We define a specific CIDR range for our subnet, ensuring it doesn’t overlap with other networks in your GCP project.

Firewall Rules for Secure Access

Network security is paramount. We’ll implement firewall rules to restrict access to the cluster. This example allows SSH access from a specific IP range (e.g., your office or bastion host) and allows internal communication within the subnet. All other ingress traffic will be denied by default.

resource "google_compute_firewall" "allow_ssh" {
  name    = "allow-ssh-to-cluster"
  network = google_compute_network.vpc_network.name
  allow {
    protocol = "tcp"
    ports    = ["22"]
  }
  source_ranges = ["YOUR_TRUSTED_IP_RANGE/32"] # e.g., "203.0.113.0/24"
  target_tags   = ["python-cluster-node"]
}

resource "google_compute_firewall" "allow_internal" {
  name    = "allow-internal-cluster-traffic"
  network = google_compute_network.vpc_network.name
  allow {
    protocol = "tcp"
    ports    = ["0-65535"]
  }
  allow {
    protocol = "udp"
    ports    = ["0-65535"]
  }
  allow {
    protocol = "icmp"
  }
  source_ranges = [google_compute_subnetwork.subnet.ip_cidr_range]
  target_tags   = ["python-cluster-node"]
}

# Deny all other ingress traffic by default (implicit if no other rules match)
# For explicit denial, you could add a rule with a lower priority and no allowed protocols.

Replace YOUR_TRUSTED_IP_RANGE/32 with the actual IP address or CIDR block from which you need to access the cluster via SSH. The target_tags attribute ensures these rules only apply to instances with the specified network tag.

Managed Instance Group (MIG) for Python Application Deployment

A Managed Instance Group (MIG) is ideal for deploying and managing a fleet of identical virtual machines. It handles auto-scaling, auto-healing, and rolling updates. We’ll configure a MIG to run our Python application.

First, define an instance template:

resource "google_compute_instance_template" "python_app_template" {
  name_prefix  = "python-app-template-"
  machine_type = "e2-medium"
  tags         = ["python-cluster-node", "http-server"] # Add http-server for potential load balancing

  disk {
    source_image = "debian-cloud/debian-11" # Or your preferred Python-friendly OS image
    auto_delete  = true
    boot         = true
  }

  network_interface {
    subnetwork = google_compute_subnetwork.subnet.id
    # No access_config needed if instances are only accessed internally or via NAT/proxy
  }

  metadata = {
    # User data for cloud-init or startup scripts
    user-data = file("startup-script.sh")
  }

  service_account {
    scopes = ["cloud-platform"] # Adjust scopes as needed for your application
  }

  lifecycle {
    create_before_destroy = true
  }
}

The startup-script.sh file will contain the logic to install Python, dependencies, and start your application. Here’s a basic example:

#!/bin/bash
apt-get update -y
apt-get install -y python3 python3-pip python3-venv nginx

# Create a virtual environment and install dependencies
python3 -m venv /opt/myapp/venv
source /opt/myapp/venv/bin/activate
pip install -r /opt/myapp/requirements.txt # Assuming requirements.txt is copied to /opt/myapp/

# Copy your application code (e.g., using gsutil if stored in GCS)
# gsutil cp gs://your-app-bucket/app.tar.gz /opt/myapp/
# tar -xzf /opt/myapp/app.tar.gz -C /opt/myapp/

# Example: Simple Flask app setup
# echo 'from flask import Flask; app = Flask(__name__); @app.route("/")\ndef hello(): return "Hello from Python Cluster!"\nif __name__ == "__main__": app.run(host="0.0.0.0", port=8080)' > /opt/myapp/app.py
# echo '[uwsgi]' > /opt/myapp/uwsgi.ini
# echo 'module = app:app' >> /opt/myapp/uwsgi.ini
# echo 'callable = app' >> /opt/myapp/uwsgi.ini
# echo 'master = true' >> /opt/myapp/uwsgi.ini
# echo 'processes = 4' >> /opt/myapp/uwsgi.ini
# echo 'socket = /tmp/uwsgi.sock' >> /opt/myapp/uwsgi.ini
# echo 'chmod-socket = 660' >> /opt/myapp/uwsgi.ini
# echo 'vacuum = true' >> /opt/myapp/uwsgi.ini

# Configure Nginx as a reverse proxy (optional, but recommended)
# cat <<EOF > /etc/nginx/sites-available/myapp
# server {
#     listen 80;
#     server_name _;
#     location / {
#         proxy_pass http://127.0.0.1:8080; # Or uwsgi_pass unix:/tmp/uwsgi.sock;
#         proxy_set_header Host \$host;
#         proxy_set_header X-Real-IP \$remote_addr;
#     }
# }
# EOF
# ln -s /etc/nginx/sites-available/myapp /etc/nginx/sites-enabled/
# rm /etc/nginx/sites-enabled/default
# systemctl restart nginx

# Start your Python application (e.g., using systemd service or supervisor)
# For simplicity, this example assumes a basic Flask app running on port 8080
# In production, use a proper WSGI server like Gunicorn or uWSGI and a systemd service.
# systemctl enable nginx
# systemctl start nginx

# Example for Gunicorn:
# pip install gunicorn
# gunicorn --workers 3 --bind 0.0.0.0:8080 app:app & # Run in background for demo, use systemd for production

Now, define the MIG using this instance template:

resource "google_compute_instance_group_manager" "python_mig" {
  name               = "python-app-mig"
  base_instance_name = "python-app"
  zone               = var.gcp_zone
  target_size        = 3 # Initial number of instances

  version {
    instance_template = google_compute_instance_template.python_app_template.id
    name              = "v1"
  }

  # Auto-scaling configuration (optional but recommended)
  auto_healing_policies {
    health_check = google_compute_health_check.http_health_check.id
    initial_delay_sec = 300
  }

  update_policy {
    type = "PROACTIVE"
    minimal_action = "REPLACE"
  }

  # Optional: Load balancing configuration
  # named_port {
  #   name = "http"
  #   port = 80
  # }
}

resource "google_compute_health_check" "http_health_check" {
  name                = "python-app-health-check"
  check_interval_sec  = 5
  timeout_sec         = 5
  healthy_threshold   = 2
  unhealthy_threshold = 2

  http_health_check {
    port         = 80
    request_path = "/"
  }
}

The auto_healing_policies section integrates with a health check to automatically replace unhealthy instances. The target_size defines the desired number of instances. For production, you’d likely want to configure auto-scaling based on CPU utilization or custom metrics.

Load Balancing for High Availability

To distribute traffic across your Python cluster instances and ensure high availability, a Google Cloud Load Balancer is essential. We’ll set up an external HTTP(S) load balancer.

resource "google_compute_backend_service" "app_backend_service" {
  name                  = "python-app-backend-service"
  port_name             = "http"
  protocol              = "HTTP"
  timeout_sec           = 10
  enable_cdn            = false
  load_balancing_scheme = "EXTERNAL"

  backend {
    group = google_compute_instance_group_manager.python_mig.instance_group
  }

  health_checks = [google_compute_health_check.http_health_check.id]
}

resource "google_compute_url_map" "app_url_map" {
  name            = "python-app-url-map"
  default_service = google_compute_backend_service.app_backend_service.id
}

resource "google_compute_target_http_proxy" "app_target_http_proxy" {
  name    = "python-app-target-http-proxy"
  url_map = google_compute_url_map.app_url_map.id
}

resource "google_compute_global_forwarding_rule" "app_forwarding_rule" {
  name                  = "python-app-forwarding-rule"
  ip_protocol           = "TCP"
  load_balancing_scheme = "EXTERNAL"
  port_range            = "80"
  target                = google_compute_target_http_proxy.app_target_http_proxy.id
  ip_address            = "YOUR_STATIC_IP_ADDRESS" # Reserve a static IP for this
}

# Reserve a static IP address
resource "google_compute_address" "static_ip" {
  name = "python-cluster-static-ip"
}

# Update the forwarding rule to use the reserved static IP
resource "google_compute_global_forwarding_rule" "app_forwarding_rule" {
  name                  = "python-app-forwarding-rule"
  ip_protocol           = "TCP"
  load_balancing_scheme = "EXTERNAL"
  port_range            = "80"
  target                = google_compute_target_http_proxy.app_target_http_proxy.id
  ip_address            = google_compute_address.static_ip.address
}

You’ll need to reserve a static IP address using google_compute_address and then reference it in the google_compute_global_forwarding_rule. For HTTPS, you would add a google_compute_target_https_proxy and a corresponding forwarding rule, along with a Google-managed SSL certificate.

Secrets Management with Google Secret Manager

Storing sensitive information like API keys or database credentials directly in your Terraform code or startup scripts is a security anti-pattern. Google Secret Manager is the recommended approach.

resource "google_secret_manager_secret" "db_password" {
  secret_id = "db-password"
  replication {
    automatic = true
  }
}

resource "google_secret_manager_secret_version" "db_password_v1" {
  secret = google_secret_manager_secret.db_password.id
  secret_data = "my-super-secret-db-password" # In production, use a more secure method to provide this data
}

# In your startup-script.sh, you would fetch this secret:
# export DB_PASSWORD=$(gcloud secrets versions access latest --secret="db-password" --project="your-gcp-project-id")
# Then use this environment variable in your Python application.

Ensure the service account used by your instances has the necessary permissions to access Secret Manager (e.g., roles/secretmanager.secretAccessor).

Deployment Workflow

With the Terraform configuration in place, the deployment workflow is straightforward:

Initialize Terraform: Run terraform init in the directory containing your .tf files. This downloads the necessary provider plugins.
Plan the Deployment: Execute terraform plan to review the resources that will be created, modified, or destroyed. This is a crucial step for verifying your configuration.
Apply the Configuration: Run terraform apply to provision the infrastructure on GCP. Terraform will prompt for confirmation before making any changes.
Destroy Resources: When the cluster is no longer needed, run terraform destroy to tear down all provisioned resources and avoid incurring unnecessary costs.

For managing secrets securely during the apply phase, consider using tools like HashiCorp Vault or GCP’s Secret Manager integration with Terraform, where sensitive values are injected at runtime rather than being hardcoded or stored in plain text variables.

Infrastructure as Code: Provisioning Secure Python Clusters on Google Cloud Using Terraform

Terraform Provider Configuration for Google Cloud

VPC Network and Subnet Creation

Firewall Rules for Secure Access

Managed Instance Group (MIG) for Python Application Deployment

Load Balancing for High Availability

Secrets Management with Google Secret Manager

Deployment Workflow

Recent Posts

Top Categories

Our Products

Our Services