> ## Documentation Index
> Fetch the complete documentation index at: https://docs.powersync.com/llms.txt
> Use this file to discover all available pages before exploring further.

# PowerSync Protocol

> Technical overview of the sync protocol between PowerSync clients and the Service.

This contains a broad overview of the sync protocol used between PowerSync clients and the [PowerSync Service](/architecture/powersync-service).
For details, see the implementation in the various PowerSync Client SDKs.

## Design

The PowerSync protocol is designed to efficiently sync changes to clients, while maintaining [consistency](/architecture/consistency) and integrity of data.

The same process is used for:

* Downloading the initial set of data
* Bulk downloading changes after being offline for a while
* And incrementally streaming changes while connected.

## Concepts

### Buckets

All synced data is grouped into [buckets](/architecture/powersync-service#bucket-system). A bucket represents a collection of synced rows, synced to any number of users.

[Buckets](/architecture/powersync-service#bucket-system) is a core concept that allows PowerSync to efficiently scale to tens of thousands of concurrent clients per PowerSync Service instance, and incrementally sync changes to hundreds of thousands of rows (or even [a million or more](/resources/performance-and-limits#sync-powersync-service-→-client)) to each client.

Each bucket keeps an ordered list of changes to rows within the bucket (operation history) — generally as `PUT` or `REMOVE` operations.

* `PUT` is the equivalent of `INSERT OR REPLACE`
* `REMOVE` is slightly different from `DELETE`: a row is only deleted from the client if it has been removed from <Tooltip tip="It is possible for different buckets to include overlapping data (for example, if multiple buckets contain data from the same table). So the same row may be present in more than one bucket.">*all* buckets</Tooltip> synced to the client.

<Note>
  As a practical example of how buckets manifest themselves, let's say you have a bucket named `user_todo_lists` that contains the to-do lists for a user, and that bucket utilizes a `user_id` parameter (which will be obtained from the JWT). Now let's say users with IDs `A` and `B` exist in the source database. PowerSync will then replicate data from the source database and create individual buckets with bucket IDs `user_todo_lists["A"]` and `user_todo_lists["B"]`.

  As you can see, buckets are essentially scoped by their parameters (`A` and `B` in this example), so they are always synced as a whole. For user `A` to receive only their relevant to-do lists, they would sync the entire contents of the bucket `user_todo_lists["A"]`
</Note>

### Checkpoints

A checkpoint is a sequential ID that represents a single point-in-time for consistency purposes. This is further explained in [Consistency](/architecture/consistency).

### Checksums for Verifying Data Integrity

For any checkpoint, the client and <Tooltip tip="PowerSync Service">server</Tooltip> compute a per-bucket checksum. This is essentially the sum of checksums of individual operations within the bucket, which each individual checksum being a hash of the operation data.

The checksum helps to ensure that the client has all the correct data. In the hypothetical scenario where the bucket data becomes corrupted on the PowerSync Service, the checksums will stop matching, and the client will re-download the entire bucket.

<Note>Note: Checksums are not a cryptographically secure method to verify data integrity. Rather, it is designed to detect simple data mismatches, whether due to bugs, bucket data tampering, or other corruption issues.</Note>

### Compacting

To avoid indefinite growth in size of buckets, the operation history of a bucket can be [compacted](/maintenance-ops/compacting-buckets). Stale updates are replaced with marker entries, which can be merged together, while keeping the same checksums.

## Protocol

A client initiates a sync session using:

1. A JWT token that typically contains the `user_id`, and additional parameters (optional).
2. A list of current buckets that the client has, and the latest operation ID in each.

The <Tooltip tip="PowerSync Service">server</Tooltip> then responds with a stream of:

1. **Checkpoint available**: A new checkpoint ID, with a checksum for each bucket in the checkpoint.
2. **Data**: New operations for the above checkpoint for each relevant bucket, starting from the last operation ID as sent by the client.
3. **Checkpoint complete**: Sent once all data for the checkpoint have been sent.

The server then waits until a new checkpoint is available, then repeats the above sequence.

The stream can be interrupted at any time, at which point the client will initiate a new session, resuming from the last point.

If a checksum validation fails on the client, the client will delete the bucket and start a new sync session.

Data for individual rows are represented [using JSON](/architecture/client-architecture#client-side-schema-and-sqlite-database-structure). The protocol itself is schemaless — the client is expected to use their own copy of the <Tooltip tip="The PowerSync Client SDK requires a client-side schema to be provided when instantiating the client-side managed SQLite database.">schema</Tooltip>, and gracefully handle schema differences.

#### Write Checkpoints

Write checkpoints are used to ensure clients have synced their own mutations back before applying downloaded data locally.

Creating a write checkpoint is a separate operation, which is performed by the client after all mutations has been uploaded (i.e. the client's [upload queue](/architecture/client-architecture#writing-data-via-sqlite-database-and-upload-queue) has been successfully fully processed and is empty). It is [important](/handling-writes/writing-client-changes#why-must-my-write-endpoint-be-synchronous) that this happens after the data has been written to the backend source database.

The server then keeps track of the current <Tooltip tip="Change data capture - the generic term for tracking deltas on a database">CDC</Tooltip> stream position on the database (LSN in Postgres and SQL Server, resume token in MongoDB and GTID+Binlog Position in MySQL), and notifies the client when the data has been replicated, as part of checkpoint data in the normal data stream.
