Infrastructure as Code: Provisioning Secure Python Clusters on DigitalOcean Using Terraform
Terraform Provider Configuration for DigitalOcean
To begin provisioning infrastructure on DigitalOcean using Terraform, we first need to configure the DigitalOcean provider. This involves specifying your API token and the desired region for your resources. It’s crucial to manage your API token securely, ideally using environment variables rather than hardcoding it directly into your Terraform configuration files.
Create a file named main.tf and add the following provider configuration. Replace YOUR_DIGITALOCEAN_TOKEN with your actual DigitalOcean API token. For production environments, it is highly recommended to set the DIGITALOCEAN_TOKEN environment variable.
terraform {
required_providers {
digitalocean = {
source = "digitalocean/digitalocean"
version = "~> 2.0"
}
}
}
provider "digitalocean" {
token = var.do_token
}
variable "do_token" {
description = "DigitalOcean API Token"
type = string
sensitive = true
}
variable "region" {
description = "DigitalOcean region"
type = string
default = "nyc3"
}
You can set the do_token variable by exporting it in your shell:
export DIGITALOCEAN_TOKEN="YOUR_DIGITALOCEAN_TOKEN"
Alternatively, you can create a terraform.tfvars file (ensure this file is excluded from version control if it contains sensitive information):
do_token = "YOUR_DIGITALOCEAN_TOKEN" region = "nyc3"
Provisioning a Secure Python Application Cluster
Our cluster will consist of a load balancer and multiple Droplets running a Python application. We’ll use a simple Flask application as an example. The Droplets will be configured with a user data script to install Python, pip, and clone our application repository.
First, let’s define a simple Flask application. Save this as app.py in a directory named app_code:
from flask import Flask
import os
app = Flask(__name__)
@app.route('/')
def hello():
hostname = os.uname().nodename
return f"Hello from {hostname}!"
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
We’ll also need a requirements.txt file:
Flask==2.3.2 gunicorn==20.1.0
Now, let’s define the Terraform resources. We’ll create a DigitalOcean Load Balancer and a set of Droplets. The user data script will be responsible for setting up the environment on each Droplet.
User Data Script for Droplet Initialization
This script will be executed when each Droplet boots for the first time. It installs necessary packages, sets up a firewall, clones the application, and starts the application using Gunicorn.
#!/bin/bash apt-get update -y apt-get install -y python3 python3-pip git ufw # Configure firewall ufw allow ssh ufw allow 80/tcp ufw allow 5000/tcp ufw --force enable # Clone application code git clone https://github.com/your-username/your-repo.git /opt/app cd /opt/app pip3 install -r requirements.txt # Create systemd service for the application cat > /etc/systemd/system/my-python-app.service <<EOF [Unit] Description=My Python Flask App After=network.target [Service] User=www-data Group=www-data WorkingDirectory=/opt/app ExecStart=/usr/bin/python3 /usr/local/bin/gunicorn --bind 0.0.0.0:5000 app:app Restart=always EOF systemctl daemon-reload systemctl start my-python-app systemctl enable my-python-app
Note: Replace https://github.com/your-username/your-repo.git with the actual URL of your Git repository. Ensure your repository is publicly accessible or configure SSH keys appropriately for private repositories. The www-data user is used for security best practices, running the application with minimal privileges.
Terraform Configuration for Cluster Resources
Now, let’s integrate this into our main.tf. We’ll define a data source for the Ubuntu image, a network, a load balancer, and multiple Droplets. The user_data argument in the digitalocean_droplet resource will embed our initialization script.
# Define the Ubuntu image to use
data "digitalocean_image" "ubuntu" {
slug = "ubuntu-22-04-x64"
}
# Define the VPC network
resource "digitalocean_vpc" "app_vpc" {
region = var.region
name = "python-app-vpc"
ip_range = "10.10.0.0/16"
express_rate_limit = 1000
}
# Define the Load Balancer
resource "digitalocean_loadbalancer" "app_lb" {
region = var.region
name = "python-app-lb"
vpc_uuid = digitalocean_vpc.app_vpc.id
forwarding_rule {
entry_protocol = "http"
entry_port = 80
to_protocol = "http"
to_port = 80
target_loadbalancer_nodes = true
}
healthcheck {
port = 80
protocol = "http"
path = "/"
check_interval_seconds = 5
response_timeout_seconds = 5
healthy_threshold = 3
unhealthy_threshold = 3
}
droplet_ids = digitalocean_droplet.app_droplet[*].id
}
# Define the Droplets for the application
resource "digitalocean_droplet" "app_droplet" {
count = 3 # Number of application servers
region = var.region
size = "s-1vcpu-1gb" # Adjust size as needed
image = data.digitalocean_image.ubuntu.id
vpc_uuid = digitalocean_vpc.app_vpc.id
ssh_keys = ["YOUR_SSH_KEY_FINGERPRINT"] # Replace with your SSH key fingerprint
user_data = file("user_data.sh") # Path to your user_data script
tags = ["python-app", "webserver"]
lifecycle {
create_before_destroy = true
}
}
# Output the Load Balancer IP address
output "load_balancer_ip" {
description = "The public IP address of the DigitalOcean Load Balancer."
value = digitalocean_loadbalancer.app_lb.ip
}
Important:
- Replace
"YOUR_SSH_KEY_FINGERPRINT"with the actual fingerprint of your SSH public key that you have added to your DigitalOcean account. This is crucial for SSH access to your Droplets. - Save the user data script content into a file named
user_data.shin the same directory as yourmain.tf. - The
countparameter indigitalocean_droplet.app_dropletdetermines the number of application servers. - The
vpc_uuidensures all Droplets and the load balancer are within the same private network, allowing them to communicate securely without exposing them to the public internet directly. - The
ssh_keysargument is essential for secure access.
Deployment and Verification
With the Terraform configuration in place, you can now deploy your infrastructure.
Initialization and Planning
First, initialize your Terraform working directory. This downloads the necessary provider plugins.
terraform init
Next, review the execution plan to see what Terraform will create, modify, or destroy. This is a critical step to ensure you understand the impact of your changes.
terraform plan
Applying the Configuration
If the plan looks correct, apply the configuration to provision the resources on DigitalOcean.
terraform apply
Terraform will prompt you to confirm the action. Type yes and press Enter.
Verification
Once the apply process is complete, Terraform will output the public IP address of your load balancer. You can access your Python application by navigating to this IP address in your web browser. Each request should be routed to one of your Droplets, and the response will show the hostname of the server that handled the request, demonstrating the load balancing in action.
You can also SSH into the Droplets (using their private IPs if you are within the VPC, or their public IPs if configured) to verify that the application is running correctly and that the firewall rules are in place.
# Example of checking the service status on a Droplet ssh root@<DROPLET_PRIVATE_IP> "systemctl status my-python-app"
Security Considerations and Enhancements
While this setup provides a basic secure cluster, several enhancements can be made:
- HTTPS: Implement SSL termination at the load balancer. DigitalOcean Load Balancers support SSL certificates, which can be managed directly or via Let’s Encrypt.
- Private Networking: Ensure all inter-Droplet communication happens over the VPC’s private IP addresses. The load balancer handles public traffic and forwards it to Droplets via their private IPs.
- Firewall Rules: The user data script configures UFW on each Droplet. For more granular control, consider using DigitalOcean Cloud Firewalls, which can be managed via Terraform as well.
- Secrets Management: For sensitive application configurations (database credentials, API keys), use a dedicated secrets management solution like HashiCorp Vault or DigitalOcean’s Secrets Manager, rather than embedding them directly in code or environment variables on Droplets.
- Immutable Infrastructure: For production, consider building Docker images of your application and deploying them to Droplets. This promotes immutability and simplifies updates. Terraform can then be used to orchestrate container orchestration platforms or simply deploy updated Docker images.
- SSH Key Management: Ensure your SSH keys are strong and managed securely. Rotate them periodically.
By leveraging Infrastructure as Code with Terraform, you can consistently and securely provision complex application environments on DigitalOcean, enabling rapid scaling and reliable deployments.