Source Database: Connection details for your source database
Telemetry: Enable the Prometheus metrics endpoint for connection-based auto-scaling (used in the Auto Scaling section):
Copy
telemetry: prometheus_port: 9090
Bucket Storage: Connection details for your bucket storage database. PowerSync supports MongoDB or Postgres as bucket storage databases. In this guide, we focus on MongoDB.
MongoDB Atlas
Self-Hosted MongoDB on EC2
For bucket storage, we recommend configuring an AWS PrivateLink to establish a secure, private connection between your ECS tasks and MongoDB Atlas that doesn’t traverse the public internet.Follow the AWS PrivateLink guide for MongoDB Atlas to configure the VPC endpoint and update your MongoDB connection string to use the private endpoint. As seen in the Secrets Manager setup, use the updated connection string in your PS_MONGO_URI secret.
For self-hosting MongoDB bucket storage on an EC2 instance, refer to AWS’s guides (which refer to Amazon DocumentDB, but the installation steps are applicable):
This guide uses bash variables throughout for easy copy-paste execution.
Copy
# Set your AWS region and account IDAWS_REGION="us-east-1" # Change to your regionAWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)# Set your VPC ID (or create a new VPC)VPC_ID="vpc-xxxxx"# Set PowerSync version (check Docker Hub for latest: https://hub.docker.com/r/journeyapps/powersync-service/tags)PS_VERSION="1.20.1"
# List all subnets in your VPCaws ec2 describe-subnets \ --filters "Name=vpc-id,Values=$VPC_ID" \ --query 'Subnets[*].[SubnetId,CidrBlock,MapPublicIpOnLaunch,AvailabilityZone]' \ --output table
If MapPublicIpOnLaunch is True, those are public subnets. Save the public subnet IDs:
Create two private subnets in different availability zones for high availability:
Copy
# Get available zones in your regionAZ1=$(aws ec2 describe-availability-zones --region $AWS_REGION --query 'AvailabilityZones[0].ZoneName' --output text)AZ2=$(aws ec2 describe-availability-zones --region $AWS_REGION --query 'AvailabilityZones[1].ZoneName' --output text)echo "Availability Zone 1: $AZ1"echo "Availability Zone 2: $AZ2"# Get VPC CIDR to determine available address spaceVPC_CIDR=$(aws ec2 describe-vpcs --vpc-ids $VPC_ID --query 'Vpcs[0].CidrBlock' --output text)echo "VPC CIDR: $VPC_CIDR"# Create first private subnet (adjust CIDR if conflicts exist)PRIVATE_SUBNET_1=$(aws ec2 create-subnet \ --vpc-id $VPC_ID \ --cidr-block 172.31.96.0/20 \ --availability-zone $AZ1 \ --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=powersync-private-1}]' \ --query 'Subnet.SubnetId' \ --output text)echo "Private Subnet 1: $PRIVATE_SUBNET_1"# Create second private subnet (adjust CIDR if conflicts exist)PRIVATE_SUBNET_2=$(aws ec2 create-subnet \ --vpc-id $VPC_ID \ --cidr-block 172.31.112.0/20 \ --availability-zone $AZ2 \ --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=powersync-private-2}]' \ --query 'Subnet.SubnetId' \ --output text)echo "Private Subnet 2: $PRIVATE_SUBNET_2"
CIDR Block Configuration: The example uses 172.31.96.0/20 and 172.31.112.0/20, which work for the default VPC (172.31.0.0/16). If you get a CIDR conflict error, adjust these blocks to match unused address space in your VPC. Each /20 block provides 4,096 IP addresses.
Store your PowerSync configuration and connection strings securely in AWS Secrets Manager. This allows you to reference them in your ECS task definition without hardcoding sensitive information.
Copy
# Store config (base64-encoded, as required by the POWERSYNC_CONFIG_B64 env variable)aws secretsmanager create-secret \ --name powersync/config \ --secret-string "$(base64 -i service.yaml)"# Store connection strings# Set your source database connection string (e.g., PostgreSQL, MongoDB, MySQL, or SQL Server)aws secretsmanager create-secret \ --name powersync/data-source-uri \ --secret-string "postgresql://user:pass@host:5432/db"# Set your replication bucket storage connection string (e.g., MongoDB or Postgres)aws secretsmanager create-secret \ --name powersync/storage-uri \ --secret-string "mongodb://user:pass@host:27017/?replicaSet=rs0"aws secretsmanager create-secret \ --name powersync/jwks-url \ --secret-string "https://your-auth-provider.com/.well-known/jwks.json"
AWS Secrets Manager automatically appends a 6-character suffix to secret ARNs (e.g., powersync/config-AbCdEf).ECS task definitions support prefix matching, allowing you to reference secrets using just the base name:
The task definitions below allocate 2 vCPU and 4GB memory per container. You can adjust resources based on your workload — see Deployment Architecture for scaling guidance (recommended baseline: 1 vCPU, 2GB memory). Note that AWS Fargate enforces specific CPU/memory combinations — for example, 2 vCPU (2048 CPU units) requires at least 4GB (4096 MiB) memory.
High Availability Setup
Basic Setup (Single Instance)
For production deployments, run separate replication and API processes to enable zero-downtime rolling updates. This allows independent scaling of API containers.Create Replication Task Definition
Create API Task DefinitionThe API task definition includes a CloudWatch Agent sidecar that scrapes Prometheus metrics from the PowerSync container and publishes them to CloudWatch. This enables connection-based auto-scaling.
The CloudWatch Agent sidecar adds ~256MB memory overhead. The task definition below allocates 4096MB total (shared between both containers). If you need more headroom, increase the task memory to 5120MB or 6144MB.
First, create the CloudWatch Agent configuration. This tells the agent to scrape the PowerSync Prometheus endpoint on localhost:9090 and publish the powersync_concurrent_connections metric to CloudWatch:
The Prometheus port (9090) is not exposed through the ALB — it is only accessible within the task via localhost (ECS awsvpc networking). The CloudWatch Agent sidecar scrapes metrics locally every 30 seconds and publishes them to CloudWatch.
This basic setup runs both replication and API processes in the same container. This is not recommended for production.Generate the task definition using your environment variables:
For production deployments, run separate replication and API processes to enable zero-downtime rolling updates. This allows independent scaling of API containers.Deploy Replication Service (1 Instance)
# Check replication service statusaws ecs describe-services \ --cluster powersync-cluster \ --services powersync-replication \ --query 'services[0].[serviceName,status,runningCount,desiredCount]' \ --output table# Check API service statusaws ecs describe-services \ --cluster powersync-cluster \ --services powersync-api \ --query 'services[0].[serviceName,status,runningCount,desiredCount]' \ --output table# Wait for tasks to be running (takes 2-3 minutes)echo "Waiting for tasks to start..."sleep 60# Test endpoint (replace with your domain)curl https://$POWERSYNC_DOMAIN/probes/liveness# View API logsaws logs tail /ecs/powersync-api --follow# View replication logsaws logs tail /ecs/powersync-replication --follow
This basic setup runs both replication and API processes in the same container. Running multiple instances (desired-count > 1) will cause Sync Rule lock errors during rolling updates when deploying new task definitions. A single-instance setup is not recommended for production.
PowerSync API containers are limited to 200 concurrent connections each, with a recommended target of 100 connections or less per container (see Deployment Architecture). Because PowerSync sync connections are long-lived (hours or days), CPU utilization alone may not reflect the actual connection load — a container can be near its connection limit while CPU remains relatively low. For this reason, we recommend scaling on both CPU utilization and concurrent connections.
ALB metrics are not suitable for PowerSync scaling. Metrics like ALBRequestCountPerTarget track request rate (requests per second), but PowerSync sync connections are long-lived HTTP streams or WebSockets — a single request stays open for hours or days. Similarly, ActiveConnectionCount tracks total connections across the entire ALB, not per target. Use the powersync_concurrent_connections Prometheus metric instead.
Prometheus metrics enabled in your service.yaml (see Step 1):
Copy
telemetry: prometheus_port: 9090
CloudWatch Agent sidecar deployed in the API task definition (configured in Step 6). The sidecar scrapes the powersync_concurrent_connections metric from the PowerSync Prometheus endpoint and publishes it to CloudWatch under the PowerSync namespace.
IAM permissions for the task role to publish CloudWatch metrics (configured in Step 6).
min-capacity: We recommend at least 2 for high availability, ensuring your service stays available if one task fails. Auto-scaling handles load increases from there.
max-capacity: Set this to the upper bound of tasks you want auto-scaling to provision.
Choosing your minimum capacity: A minimum of 2 works well for most workloads, letting auto-scaling adjust capacity as needed. However, if your traffic is very spiky (e.g., many users connecting simultaneously at a predictable time), you may want a higher min-capacity to avoid waiting for new tasks to start. New Fargate tasks take 1-3 minutes to launch and pass health checks, so a larger baseline reduces the risk of connection overload during sudden spikes. As a guideline, each API task handles up to 200 concurrent connections (target ~100 for headroom).
This policy scales based on the average number of concurrent sync connections per task, using the custom metric published by the CloudWatch Agent sidecar:
How dual policies work: Both policies operate independently — ECS scales to whichever policy demands the higher number of tasks. For example, if CPU-based scaling wants 3 tasks but connection-based scaling wants 5, ECS runs 5 tasks.
Key configuration values:
Parameter
Value
Rationale
TargetValue (connections)
80
40% of the 200 max connection limit per container. This matches PowerSync Cloud’s scaling strategy and provides headroom before the hard limit.
TargetValue (CPU)
70.0
Scale before CPU saturation impacts sync stream performance.
ScaleOutCooldown
120s
New Fargate tasks take 1–3 minutes to start, pass health checks, and begin accepting connections. A shorter cooldown risks triggering multiple scale-out events before the first new task is ready.
ScaleInCooldown
300s
Prevents rapid scale-in oscillations. When a task is removed, its clients reconnect to remaining tasks, causing a temporary connection spike. The cooldown allows this spike to settle.
Scaling in (removing tasks) terminates active sync connections on the affected tasks. PowerSync client SDKs handle reconnection automatically, but there will be a brief interruption for affected clients.What happens during scale-in:
ECS deregisters the task from the ALB target group — new connections are routed to other tasks
The ALB deregistration delay allows existing connections to drain (default: 300s). Since sync streams never complete naturally, connections are forcefully closed after this timeout.
ECS sends SIGTERM to the container — PowerSync closes all active sync streams gracefully
After the stopTimeout period (configured to 120s in the task definition), ECS sends SIGKILL
Disconnected clients automatically reconnect to remaining healthy tasks
This approach is simpler but less responsive to connection spikes — CPU may not increase proportionally with new sync connections. Without connection-aware scaling, consider increasing min-capacity if your traffic is spiky, to provide a larger baseline while auto-scaling reacts.
Adjust CIDR blocks in Step 2 to match available VPC address space
Certificate validation fails
Verify DNS nameservers are updated and propagated Check validation CNAME record exists in Route 53
CloudWatch metric not appearing
Verify telemetry.prometheus_port: 9090 is set in service.yaml Check CW Agent logs: aws logs tail /ecs/powersync-api/cwagent --follow Confirm the SSM parameter exists: aws ssm get-parameter --name /ecs/powersync/cwagent-config
Connection-based scaling not triggering
Verify metric in CloudWatch: aws cloudwatch list-metrics --namespace PowerSync Check the scaling policy: aws application-autoscaling describe-scaling-policies --service-namespace ecs Metric may take 2-3 minutes to appear after task startup
Clients disconnecting during scale-in
This is expected behavior — sync connections on terminated tasks are closed and clients reconnect automatically. Increase deregistration_delay.timeout_seconds on the target group for a longer drain period