Skip to main content
When we say “PowerSync instance” we are referring to an instance of the PowerSync Service, which is the server-side component of the sync engine responsible for the read path from the source database to client-side SQLite databases: The primary purposes of the PowerSync Service are (1) replicating data from your source database (Postgres, MongoDB, MySQL, SQL Server), and (2) streaming data to clients. Both of these happen based on your Sync Streams (or legacy Sync Rules).

Bucket System

The concept of buckets is core to PowerSync and its scalability. Buckets are basically partitions of data that allow the PowerSync Service to efficiently query the correct data that a specific client needs to sync.
With Sync Streams, buckets are created implicitly based on your stream definitions, their queries, and subqueries. You don’t need to understand or manage buckets directly — the PowerSync Service handles this automatically.For example, if you define a stream like:
streams:
  user_lists:
    auto_subscribe: true
    query: SELECT * FROM lists WHERE owner_id = auth.user_id()
PowerSync automatically creates the appropriate buckets internally based on the query parameters.

How Buckets Work

To understand how buckets enable efficient syncing, consider this example: Let’s say you have data scoped to users — the to-do lists for each user. Based on the data that exists in your source database, PowerSync will create individual buckets for each user. If users with IDs 1, 2, and 3 exist in your source database, PowerSync will create buckets with IDs user_todo_lists["1"], user_todo_lists["2"], and user_todo_lists["3"]. When a user with user_id=1 in their JWT connects to the PowerSync Service, PowerSync can very efficiently look up the appropriate bucket to sync, i.e. user_todo_lists["1"].
With legacy Sync Rules, a bucket ID is formed from the bucket definition name and its parameter values, for example user_todo_lists["1"]. With Sync Streams, the bucket IDs are generated automatically based on your stream queries — you don’t need to define and name buckets explicitly.

Deduplication for Scalability

The bucket system also allows for high-scalability because it deduplicates data that is shared between different users. For example, let’s pretend that instead of user_todo_lists, we have org_todo_lists buckets, each containing the to-do lists for an organization., and we use an organization_id parameter from the JWT for this bucket. Now let’s pretend that both users with IDs 1 and 2 both belong to an organization with an ID of 1. In this scenario, both users 1 and 2 will sync from a bucket with a bucket ID of org_todo_lists["1"]. This also means that the PowerSync Service has to keep track of less state per-user — and therefore, server-side resource requirements don’t scale linearly with the number of users/clients.

Operation History

Each bucket stores the recent history of operations on each , not just the latest state of the row. This is another core part of the PowerSync architecture — the PowerSync Service can efficiently query the operations that each client needs to receive in order to be up to date. Tracking of operation history is also key to the data integrity and consistency properties of PowerSync. When a change occurs in the source database that affects a certain bucket (based on your Sync Streams, or legacy Sync Rules), that change will be appended to the operation history in that bucket. Buckets are therefore treated as “append-only” data structures. That being said, to avoid an ever-growing operation history, the buckets can be compacted (this is automatically done on PowerSync Cloud).

Bucket Storage

The PowerSync Service persists the bucket state in durable storage: there is a pluggable storage layer for bucket data, and MongoDB and Postgres are currently supported as bucket storage databases. The bucket storage database is separate from the connection to your source database (Postgres, MongoDB, MySQL or SQL Server). Our cloud-hosting offering (PowerSync Cloud) uses MongoDB Atlas as the bucket storage database. Persisting the bucket state in a database is also part of how PowerSync achieves high scalability: it means that the PowerSync Service can have a low memory footprint even as you scale to very large volumes of synced data and users/clients.

Replication From the Source Database

As mentioned above, one of the primary purposes of the PowerSync Service is replicating data from the source database, based on your Sync Streams (or legacy Sync Rules):
When the PowerSync Service replicates data from the source database, it:
  1. Pre-processes the data according to your Sync Streams (or Sync Rules), splitting data into buckets (as explained above) and transforming the data if required.
  2. Persists each operation into the relevant buckets, ready to be streamed to clients.

Initial Replication vs. Incremental Replication

Whenever a new version of Sync Streams (or legacy Sync Rules) is deployed, initial replication takes place by means of taking a snapshot of all tables/collections they reference. After that, data is incrementally replicated using a change data capture stream (the specific mechanism depends on the source database type: Postgres logical replication, MongoDB change streams, the MySQL binlog, or SQL Server Change Data Capture).

Streaming Sync

As mentioned above, the other primary purpose of the PowerSync Service is streaming data to clients. The PowerSync Service authenticates clients/users using JWTs. Once a client/user is authenticated:
  1. The PowerSync Service calculates a list of buckets for the user to sync based on their Sync Stream subscriptions (or Parameter Queries in legacy Sync Rules).
  2. The Service streams any operations added to those buckets since the last time the client/user connected.
The Service then continuously monitors for buckets that are added or removed, as well as for new operations within those buckets, and streams those changes. Only the internal bucket storage of the PowerSync Service is used — the source database is not queried directly during streaming. For more details on exactly how streaming sync works, see PowerSync Protocol.

Source Code Repo

The repo for the PowerSync Service can be found here:

GitHub - powersync-service