Bucket System
The concept of buckets is core to PowerSync and its scalability. Buckets are basically partitions of data that allows the PowerSync Service to efficiently query the correct data that a specific client needs to sync. When you define Sync Rules, you define the , and you define which parameters are used for each bucket.Sync Streams: Implicit Buckets: In our new Sync Streams system which is in early alpha, buckets and parameters are not explicitly defined, and are instead implicit based on the streams, their queries and subqueries.
user_todo_lists that contains the to-do lists for a user, and that bucket utilizes a user_id parameter (which will be embedded in the JWT) to scope those to-do lists.
Now let’s say users with IDs 1, 2 and 3 exist in the source database. PowerSync will then replicate data from the source database and create individual buckets with bucket IDs of user_todo_lists["1"], user_todo_lists["2"] and user_todo_lists["3"].
If a user with user_id=1 in its JWT connects to the PowerSync Service and syncs data, PowerSync can very efficiently look up the appropriate bucket to sync, i.e. user_todo_lists["1"].
As you can see above, a bucket’s definition name and set of parameter values together form its bucket ID, for example
user_todo_lists["1"]. If a bucket makes use of multiple parameters, they are comma-separated in the bucket ID, for example user_todos["user1","admin"]Deduplication for Scalability
The bucket system also allows for high-scalability because it deduplicates data that is shared between different users. For example, let’s pretend that instead ofuser_todo_lists, we have org_todo_lists buckets, each containing the to-do lists for an organization., and we use an organization_id parameter from the JWT for this bucket. Now let’s pretend that both users with IDs 1 and 2 both belong to an organization with an ID of 1. In this scenario, both users 1 and 2 will sync from a bucket with a bucket ID of org_todo_lists["1"].
This also means that the PowerSync Service has to keep track of less state per-user — and therefore, server-side resource requirements don’t scale linearly with the number of users/clients.
Operation History
Each bucket stores the recent history of operations on each , not just the latest state of the row. This is another core part of the PowerSync architecure — the PowerSync Service can efficiently query the operations that each client needs to receive in order to be up to date. Tracking of operation history is also key to the data integrity and consistency properties of PowerSync. When a change occurs in the source database that affects a certain bucket (based on the Sync Rules or Sync Streams configuration), that change will be appended to the operation history in that bucket. Buckets are therefore treated as “append-only” data structures. That being said, to avoid an ever-growing operation history, the buckets can be compacted (this is automatically done on PowerSync Cloud).Bucket Storage
The PowerSync Service persists the bucket state in durable storage: there is a pluggable storage layer for bucket data, and MongoDB and Postgres are currently supported. We refer to this as the bucket storage database and it is separate from the connection to your source database (Postgres, MongoDB, MySQL or SQL Server). Our cloud-hosting offering (PowerSync Cloud) uses MongoDB Atlas as the bucket storage database. Persisting the bucket state in a database is also part of how PowerSync achieves high scalability: it means that the PowerSync Service can have a low memory footprint even as you scale to very large volumes of synced data and users/clients.Replication From the Source Database
As mentioned above, one of the primary purposes of the PowerSync Service is replicating data from the source database, based on the Sync Rules or Sync Streams configuration:
- Pre-processes the data according to the Sync Rules or Sync Streams, splitting data into buckets (as explained above) and transforming the data if required.
- Persists each operation into the relevant buckets, ready to be streamed to clients.
Initial Replication vs. Incremental Replication
Whenever a new version of Sync Rules or Sync Streams are deployed, initial replication takes place by means of taking a snapshot of all tables/collections referenced in the Sync Rules / Streams. After that, data is incrementally replicated using a change data capture stream (the specific mechanism depends on the source database type: Postgres logical replication, MongoDB change streams, the MySQL binlog, or SQL Server Change Data Capture).Streaming Sync
As mentioned above, the other primary purpose of the PowerSync Service is streaming data to clients. The PowerSync Service authenticates clients/users using JWTs. Once a client/user is authenticated:- The PowerSync Service calculates a list of buckets for the user to sync using Parameter Queries.
- The Service streams any operations added to those buckets since the last time the client/user connected.