Infrastructure as Code: Provisioning Secure Python Clusters on Google Cloud Using Terraform
Terraform Provider Configuration for Google Cloud
To provision resources on Google Cloud Platform (GCP) using Terraform, we first need to configure the Google Cloud provider. This involves specifying your GCP project ID, region, and potentially authentication credentials. For production environments, it’s highly recommended to use a service account with granular permissions rather than user credentials.
Create a file named main.tf and define the provider block as follows:
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "~> 4.0"
}
}
}
provider "google" {
project = var.gcp_project_id
region = var.gcp_region
}
variable "gcp_project_id" {
description = "The GCP project ID to deploy resources into."
type = string
}
variable "gcp_region" {
description = "The GCP region to deploy resources into."
type = string
default = "us-central1"
}
variable "gcp_zone" {
description = "The GCP zone to deploy resources into."
type = string
default = "us-central1-a"
}
You can then create a terraform.tfvars file to specify your project ID and desired region, or pass these values via environment variables or command-line flags during terraform apply.
gcp_project_id = "your-gcp-project-id" gcp_region = "us-west1" gcp_zone = "us-west1-b"
VPC Network and Subnet Creation
A secure Python cluster requires a dedicated Virtual Private Cloud (VPC) network and subnet. This provides network isolation and allows for fine-grained control over ingress and egress traffic. We’ll define a custom VPC network and a subnet within it.
resource "google_compute_network" "vpc_network" {
name = "python-cluster-vpc"
auto_create_subnetworks = false
routing_mode = "REGIONAL"
}
resource "google_compute_subnetwork" "subnet" {
name = "python-cluster-subnet"
ip_cidr_range = "10.0.1.0/24"
region = var.gcp_region
network = google_compute_network.vpc_network.id
}
The auto_create_subnetworks = false setting is crucial for custom network configurations. We define a specific CIDR range for our subnet, ensuring it doesn’t overlap with other networks in your GCP project.
Firewall Rules for Secure Access
Network security is paramount. We’ll implement firewall rules to restrict access to the cluster. This example allows SSH access from a specific IP range (e.g., your office or bastion host) and allows internal communication within the subnet. All other ingress traffic will be denied by default.
resource "google_compute_firewall" "allow_ssh" {
name = "allow-ssh-to-cluster"
network = google_compute_network.vpc_network.name
allow {
protocol = "tcp"
ports = ["22"]
}
source_ranges = ["YOUR_TRUSTED_IP_RANGE/32"] # e.g., "203.0.113.0/24"
target_tags = ["python-cluster-node"]
}
resource "google_compute_firewall" "allow_internal" {
name = "allow-internal-cluster-traffic"
network = google_compute_network.vpc_network.name
allow {
protocol = "tcp"
ports = ["0-65535"]
}
allow {
protocol = "udp"
ports = ["0-65535"]
}
allow {
protocol = "icmp"
}
source_ranges = [google_compute_subnetwork.subnet.ip_cidr_range]
target_tags = ["python-cluster-node"]
}
# Deny all other ingress traffic by default (implicit if no other rules match)
# For explicit denial, you could add a rule with a lower priority and no allowed protocols.
Replace YOUR_TRUSTED_IP_RANGE/32 with the actual IP address or CIDR block from which you need to access the cluster via SSH. The target_tags attribute ensures these rules only apply to instances with the specified network tag.
Managed Instance Group (MIG) for Python Application Deployment
A Managed Instance Group (MIG) is ideal for deploying and managing a fleet of identical virtual machines. It handles auto-scaling, auto-healing, and rolling updates. We’ll configure a MIG to run our Python application.
First, define an instance template:
resource "google_compute_instance_template" "python_app_template" {
name_prefix = "python-app-template-"
machine_type = "e2-medium"
tags = ["python-cluster-node", "http-server"] # Add http-server for potential load balancing
disk {
source_image = "debian-cloud/debian-11" # Or your preferred Python-friendly OS image
auto_delete = true
boot = true
}
network_interface {
subnetwork = google_compute_subnetwork.subnet.id
# No access_config needed if instances are only accessed internally or via NAT/proxy
}
metadata = {
# User data for cloud-init or startup scripts
user-data = file("startup-script.sh")
}
service_account {
scopes = ["cloud-platform"] # Adjust scopes as needed for your application
}
lifecycle {
create_before_destroy = true
}
}
The startup-script.sh file will contain the logic to install Python, dependencies, and start your application. Here’s a basic example:
#!/bin/bash
apt-get update -y
apt-get install -y python3 python3-pip python3-venv nginx
# Create a virtual environment and install dependencies
python3 -m venv /opt/myapp/venv
source /opt/myapp/venv/bin/activate
pip install -r /opt/myapp/requirements.txt # Assuming requirements.txt is copied to /opt/myapp/
# Copy your application code (e.g., using gsutil if stored in GCS)
# gsutil cp gs://your-app-bucket/app.tar.gz /opt/myapp/
# tar -xzf /opt/myapp/app.tar.gz -C /opt/myapp/
# Example: Simple Flask app setup
# echo 'from flask import Flask; app = Flask(__name__); @app.route("/")\ndef hello(): return "Hello from Python Cluster!"\nif __name__ == "__main__": app.run(host="0.0.0.0", port=8080)' > /opt/myapp/app.py
# echo '[uwsgi]' > /opt/myapp/uwsgi.ini
# echo 'module = app:app' >> /opt/myapp/uwsgi.ini
# echo 'callable = app' >> /opt/myapp/uwsgi.ini
# echo 'master = true' >> /opt/myapp/uwsgi.ini
# echo 'processes = 4' >> /opt/myapp/uwsgi.ini
# echo 'socket = /tmp/uwsgi.sock' >> /opt/myapp/uwsgi.ini
# echo 'chmod-socket = 660' >> /opt/myapp/uwsgi.ini
# echo 'vacuum = true' >> /opt/myapp/uwsgi.ini
# Configure Nginx as a reverse proxy (optional, but recommended)
# cat <<EOF > /etc/nginx/sites-available/myapp
# server {
# listen 80;
# server_name _;
# location / {
# proxy_pass http://127.0.0.1:8080; # Or uwsgi_pass unix:/tmp/uwsgi.sock;
# proxy_set_header Host \$host;
# proxy_set_header X-Real-IP \$remote_addr;
# }
# }
# EOF
# ln -s /etc/nginx/sites-available/myapp /etc/nginx/sites-enabled/
# rm /etc/nginx/sites-enabled/default
# systemctl restart nginx
# Start your Python application (e.g., using systemd service or supervisor)
# For simplicity, this example assumes a basic Flask app running on port 8080
# In production, use a proper WSGI server like Gunicorn or uWSGI and a systemd service.
# systemctl enable nginx
# systemctl start nginx
# Example for Gunicorn:
# pip install gunicorn
# gunicorn --workers 3 --bind 0.0.0.0:8080 app:app & # Run in background for demo, use systemd for production
Now, define the MIG using this instance template:
resource "google_compute_instance_group_manager" "python_mig" {
name = "python-app-mig"
base_instance_name = "python-app"
zone = var.gcp_zone
target_size = 3 # Initial number of instances
version {
instance_template = google_compute_instance_template.python_app_template.id
name = "v1"
}
# Auto-scaling configuration (optional but recommended)
auto_healing_policies {
health_check = google_compute_health_check.http_health_check.id
initial_delay_sec = 300
}
update_policy {
type = "PROACTIVE"
minimal_action = "REPLACE"
}
# Optional: Load balancing configuration
# named_port {
# name = "http"
# port = 80
# }
}
resource "google_compute_health_check" "http_health_check" {
name = "python-app-health-check"
check_interval_sec = 5
timeout_sec = 5
healthy_threshold = 2
unhealthy_threshold = 2
http_health_check {
port = 80
request_path = "/"
}
}
The auto_healing_policies section integrates with a health check to automatically replace unhealthy instances. The target_size defines the desired number of instances. For production, you’d likely want to configure auto-scaling based on CPU utilization or custom metrics.
Load Balancing for High Availability
To distribute traffic across your Python cluster instances and ensure high availability, a Google Cloud Load Balancer is essential. We’ll set up an external HTTP(S) load balancer.
resource "google_compute_backend_service" "app_backend_service" {
name = "python-app-backend-service"
port_name = "http"
protocol = "HTTP"
timeout_sec = 10
enable_cdn = false
load_balancing_scheme = "EXTERNAL"
backend {
group = google_compute_instance_group_manager.python_mig.instance_group
}
health_checks = [google_compute_health_check.http_health_check.id]
}
resource "google_compute_url_map" "app_url_map" {
name = "python-app-url-map"
default_service = google_compute_backend_service.app_backend_service.id
}
resource "google_compute_target_http_proxy" "app_target_http_proxy" {
name = "python-app-target-http-proxy"
url_map = google_compute_url_map.app_url_map.id
}
resource "google_compute_global_forwarding_rule" "app_forwarding_rule" {
name = "python-app-forwarding-rule"
ip_protocol = "TCP"
load_balancing_scheme = "EXTERNAL"
port_range = "80"
target = google_compute_target_http_proxy.app_target_http_proxy.id
ip_address = "YOUR_STATIC_IP_ADDRESS" # Reserve a static IP for this
}
# Reserve a static IP address
resource "google_compute_address" "static_ip" {
name = "python-cluster-static-ip"
}
# Update the forwarding rule to use the reserved static IP
resource "google_compute_global_forwarding_rule" "app_forwarding_rule" {
name = "python-app-forwarding-rule"
ip_protocol = "TCP"
load_balancing_scheme = "EXTERNAL"
port_range = "80"
target = google_compute_target_http_proxy.app_target_http_proxy.id
ip_address = google_compute_address.static_ip.address
}
You’ll need to reserve a static IP address using google_compute_address and then reference it in the google_compute_global_forwarding_rule. For HTTPS, you would add a google_compute_target_https_proxy and a corresponding forwarding rule, along with a Google-managed SSL certificate.
Secrets Management with Google Secret Manager
Storing sensitive information like API keys or database credentials directly in your Terraform code or startup scripts is a security anti-pattern. Google Secret Manager is the recommended approach.
resource "google_secret_manager_secret" "db_password" {
secret_id = "db-password"
replication {
automatic = true
}
}
resource "google_secret_manager_secret_version" "db_password_v1" {
secret = google_secret_manager_secret.db_password.id
secret_data = "my-super-secret-db-password" # In production, use a more secure method to provide this data
}
# In your startup-script.sh, you would fetch this secret:
# export DB_PASSWORD=$(gcloud secrets versions access latest --secret="db-password" --project="your-gcp-project-id")
# Then use this environment variable in your Python application.
Ensure the service account used by your instances has the necessary permissions to access Secret Manager (e.g., roles/secretmanager.secretAccessor).
Deployment Workflow
With the Terraform configuration in place, the deployment workflow is straightforward:
- Initialize Terraform: Run
terraform initin the directory containing your.tffiles. This downloads the necessary provider plugins. - Plan the Deployment: Execute
terraform planto review the resources that will be created, modified, or destroyed. This is a crucial step for verifying your configuration. - Apply the Configuration: Run
terraform applyto provision the infrastructure on GCP. Terraform will prompt for confirmation before making any changes. - Destroy Resources: When the cluster is no longer needed, run
terraform destroyto tear down all provisioned resources and avoid incurring unnecessary costs.
For managing secrets securely during the apply phase, consider using tools like HashiCorp Vault or GCP’s Secret Manager integration with Terraform, where sensitive values are injected at runtime rather than being hardcoded or stored in plain text variables.