Upgrading a high-traffic production PostgreSQL database cluster from version 15 to 16 using pg_upgrade link mode on Debian 12
Pre-Upgrade Assessment and Preparation
Before embarking on a major PostgreSQL version upgrade, a thorough assessment of the existing cluster is paramount. This involves understanding the current resource utilization, identifying any deprecated features or extensions that might not be compatible with PostgreSQL 16, and establishing a robust rollback strategy. For a high-traffic production environment, minimizing downtime is a critical objective, which makes pg_upgrade in link mode an attractive option.
Our target environment is a Debian 12 (Bookworm) system running PostgreSQL 15.x. The cluster consists of a primary and at least one replica, managed via Patroni for high availability. The upgrade process will focus on the primary instance first, followed by the replicas.
Installing PostgreSQL 16 on Debian 12
First, we need to install the PostgreSQL 16 packages alongside the existing PostgreSQL 15 installation. This is crucial for pg_upgrade to operate correctly, as it requires both versions to be present on the system.
Add the PostgreSQL APT repository for version 16:
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' sudo wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - sudo apt update sudo apt install postgresql-16 postgresql-client-16
Verify the installation:
pg_lsclusters
You should see both PostgreSQL 15 and 16 listed, with PostgreSQL 15 still being the default and running.
Preparing for pg_upgrade in Link Mode
pg_upgrade in link mode is significantly faster than copy mode because it creates hard links to the existing data files instead of copying them. This requires that the new PostgreSQL binaries and the old data directory reside on the same filesystem. It also necessitates that the data directories for both versions are not on a network filesystem (like NFS) unless it supports hard links correctly.
Crucially, the PostgreSQL 15 cluster must be shut down cleanly before running pg_upgrade.
Identifying Data Directories and Binaries
We need to know the exact paths to the data directories and binaries for both PostgreSQL versions. You can typically find this information using pg_config or by inspecting the output of pg_lsclusters.
For PostgreSQL 15 (assuming default cluster name ‘main’):
PG_VERSION_OLD=15
PG_CLUSTER_OLD=main
PG_DATA_OLD=$(sudo pg_lsclusters -h | awk -v ver="$PG_VERSION_OLD" -v cluster="$PG_CLUSTER_OLD" '$1 == ver && $2 == cluster {print $5}')
PG_BIN_OLD=$(sudo -u postgres pg_config --bindir --pgdata="$PG_DATA_OLD")
For PostgreSQL 16:
PG_VERSION_NEW=16 PG_CLUSTER_NEW=main # Or a new name if you prefer, but 'main' is common for upgrades PG_BIN_NEW=$(sudo -u postgres pg_config --bindir --version 16) # Assumes pg_config for 16 is in PATH
Verify these paths:
echo "Old Data: $PG_DATA_OLD" echo "Old Binaries: $PG_BIN_OLD" echo "New Binaries: $PG_BIN_NEW"
Stopping the PostgreSQL 15 Cluster
This is a critical step. Ensure all applications and clients are disconnected from the database. If using Patroni, you’ll need to gracefully shut down the primary and then all replicas.
For Patroni, you might scale down the HA cluster to a single instance (the primary) and then stop it:
# Example: Scale down Patroni to 1 replica (the primary itself) # This depends on your Patroni configuration. You might edit the config or use API. # Assuming you have a way to tell Patroni to stop managing the cluster. # Stop the PostgreSQL 15 service directly if not managed by Patroni for this step sudo systemctl stop [email protected] # Or if Patroni is managing it, ensure Patroni stops the PostgreSQL process. # Example: sudo patronictl stop --cluster-name my_cluster --wait
Verify that no PostgreSQL 15 processes are running:
sudo systemctl status [email protected] ps aux | grep postgres
Executing pg_upgrade
Now, we can execute pg_upgrade. We’ll use the --link option for efficiency. It’s highly recommended to run this as the postgres user.
Create a script for clarity and repeatability:
#!/bin/bash
# --- Configuration ---
PG_VERSION_OLD=15
PG_CLUSTER_OLD=main
PG_VERSION_NEW=16
PG_CLUSTER_NEW=main # Can be a new name if desired, but requires more steps later
# --- Find Paths ---
PG_DATA_OLD=$(sudo pg_lsclusters -h | awk -v ver="$PG_VERSION_OLD" -v cluster="$PG_CLUSTER_OLD" '$1 == ver && $2 == cluster {print $5}')
PG_BIN_OLD=$(sudo -u postgres pg_config --bindir --pgdata="$PG_DATA_OLD")
PG_BIN_NEW=$(sudo -u postgres pg_config --bindir --version "$PG_VERSION_NEW") # Assumes pg_config for 16 is in PATH
# --- Validation ---
if [ -z "$PG_DATA_OLD" ]; then
echo "Error: Could not find data directory for PostgreSQL $PG_VERSION_OLD cluster '$PG_CLUSTER_OLD'."
exit 1
fi
if [ ! -d "$PG_DATA_OLD" ]; then
echo "Error: Data directory '$PG_DATA_OLD' does not exist."
exit 1
fi
if [ -z "$PG_BIN_OLD" ]; then
echo "Error: Could not find binaries for PostgreSQL $PG_VERSION_OLD."
exit 1
fi
if [ -z "$PG_BIN_NEW" ]; then
echo "Error: Could not find binaries for PostgreSQL $PG_VERSION_NEW."
exit 1
fi
echo "Old Data Directory: $PG_DATA_OLD"
echo "Old Binaries Path: $PG_BIN_OLD"
echo "New Binaries Path: $PG_BIN_NEW"
# --- Pre-checks ---
echo "Running pre-upgrade checks..."
sudo -u postgres "$PG_BIN_NEW/pg_upgrade" \
--check \
--link \
--old-bindir="$PG_BIN_OLD" \
--new-bindir="$PG_BIN_NEW" \
--old-datadir="$PG_DATA_OLD" \
--new-datadir="/var/lib/postgresql/$PG_VERSION_NEW/main" # Default data dir for new install
if [ $? -ne 0 ]; then
echo "Pre-upgrade checks failed. Please review the output above."
exit 1
fi
echo "Pre-upgrade checks passed."
# --- Execution ---
echo "Starting pg_upgrade in link mode..."
sudo -u postgres "$PG_BIN_NEW/pg_upgrade" \
--link \
--old-bindir="$PG_BIN_OLD" \
--new-bindir="$PG_BIN_NEW" \
--old-datadir="$PG_DATA_OLD" \
--new-datadir="/var/lib/postgresql/$PG_VERSION_NEW/main" # Default data dir for new install
if [ $? -ne 0 ]; then
echo "pg_upgrade failed. Please review the output and logs."
exit 1
fi
echo "pg_upgrade completed successfully."
# --- Post-upgrade Steps ---
echo "Post-upgrade steps: Cleaning up old cluster..."
# The old data directory is now linked. It will be cleaned up by pg_upgrade.
# We need to ensure the new cluster is configured correctly.
# If you used a new cluster name for PG16, you'd need to adjust config.
# For 'main' cluster name, the new data dir is typically /var/lib/postgresql/16/main
# Update pg_hba.conf and postgresql.conf for PostgreSQL 16 if necessary.
# These files are usually copied by the package manager.
# You might want to back up the old ones before the upgrade.
# sudo cp "$PG_DATA_OLD/pg_hba.conf" "$PG_DATA_OLD/pg_hba.conf.bak_pg15"
# sudo cp "$PG_DATA_OLD/postgresql.conf" "$PG_DATA_OLD/postgresql.conf.bak_pg15"
# The new cluster's config files are in /etc/postgresql/16/main/
# Ensure these are correctly configured for your environment.
# Example: sudo nano /etc/postgresql/16/main/postgresql.conf
echo "Upgrade complete. Please start the new PostgreSQL 16 cluster and verify."
Important Notes on the script:
- The
--new-datadirpath should match where PostgreSQL 16 expects its data directory for the ‘main’ cluster (typically/var/lib/postgresql/16/mainon Debian). - The
--checkflag is highly recommended to run first. It performs a dry run and identifies potential issues without making changes. - The
--linkflag is the core of the link mode. - The script assumes you are running it as root and then using
sudo -u postgresfor PostgreSQL commands. Adjust as necessary for your user setup. - Ensure the
postgresuser has appropriate permissions to access both data directories.
Post-Upgrade Configuration and Verification
After pg_upgrade completes, the old data directory’s contents are hard-linked to the new PostgreSQL 16 data directory. The old binaries are no longer needed for the old cluster, and the old data directory itself can be removed (pg_upgrade usually handles this by renaming the old directory, e.g., to PG_UPGRADE_OLD).
Starting the PostgreSQL 16 Cluster
Now, start the new PostgreSQL 16 cluster. If you are using systemd services:
# Ensure the old cluster is stopped and its data directory is handled by pg_upgrade sudo systemctl start [email protected]
Verify the status:
sudo systemctl status [email protected] pg_lsclusters
You should see PostgreSQL 16 running and listed as the active cluster. PostgreSQL 15 should no longer be running.
Running ANALYZE and REINDEX (Optional but Recommended)
While pg_upgrade is generally safe, it’s good practice to run ANALYZE on all databases to update statistics for the new PostgreSQL version. For critical tables or if you encounter performance issues, a targeted REINDEX might be necessary, though it’s often not required immediately after an upgrade.
sudo -u postgres psql -c "ANALYZE VERBOSE;"
Updating Patroni Configuration
If your cluster is managed by Patroni, you need to update its configuration to point to the PostgreSQL 16 binaries and data directories. This typically involves:
- Updating the
pg_bin_dirin Patroni’s configuration to point to the PostgreSQL 16 binary directory (e.g.,/usr/lib/postgresql/16/bin). - Ensuring Patroni is configured to manage the PostgreSQL 16 service (e.g., via systemd unit files).
- Restarting Patroni and allowing it to reinitialize the cluster with PostgreSQL 16.
Example Patroni configuration snippet (YAML):
# ... other settings ... postgresql: bin_dir: /usr/lib/postgresql/16/bin # ... other postgresql settings ... # ... other settings ...
After updating the Patroni configuration, restart Patroni and its associated PostgreSQL service. Monitor Patroni logs closely during this process.
Upgrading Replicas
Once the primary is successfully upgraded and running PostgreSQL 16, you can upgrade the replicas. The process for replicas is similar but involves stopping the old PostgreSQL 15 service, running pg_upgrade (again, in link mode if possible, or copy mode if data directories are on different filesystems), and then starting the PostgreSQL 16 service.
For replicas, you might need to re-establish replication from the new primary. If the replica’s data directory is on the same filesystem as the old PostgreSQL 15 data directory, link mode is feasible. Otherwise, you’ll need to use copy mode or a base backup from the new primary.
Important: Ensure the replica’s PostgreSQL 16 configuration (postgresql.conf, pg_hba.conf) is correctly set up to connect to the new primary.
Final Verification and Rollback Considerations
Perform comprehensive checks on your application’s connectivity and functionality against the upgraded PostgreSQL 16 cluster. Monitor logs for any errors or warnings. Check database performance metrics.
Rollback Strategy: If a critical issue is discovered post-upgrade:
- Stop the PostgreSQL 16 cluster.
- If you have a recent, verified backup of the PostgreSQL 15 cluster, restore it.
- Alternatively, if you performed the upgrade in place and haven’t yet removed the old PostgreSQL 15 data directory (
pg_upgraderenames it, e.g., toPG_UPGRADE_OLD), you can potentially revert by: stopping PostgreSQL 16, renaming the PostgreSQL 16 data directory, renaming the old PostgreSQL 15 data directory back to its original location, and then starting PostgreSQL 15. This is a more complex rollback and depends heavily on the state afterpg_upgrade. A clean backup restore is always preferred.
Thorough testing in a staging environment that mirrors production is indispensable before attempting this on a live, high-traffic cluster.
Leave a Reply
You must be logged in to post a comment.