Core Data with CloudKit: Synchronizing Public Database

Published on

This article will introduce how to synchronize a public database to the local environment, creating a local Core Data database mirror using Core Data with CloudKit.

Three Types of CloudKit Databases

Let’s explore the three types of databases in CloudKit:

Public Database

The public database contains data that developers want accessible to everyone. It’s not possible to add custom Zones in the public database; all data is saved in the default area. The data is accessible through the app or CloudKit Web services regardless of whether the user has an iCloud account. The contents of the public database are visible on the CloudKit dashboard.

The data capacity of the public database is counted against the application’s CloudKit storage quota.

Private Database

This is where iCloud users store their personal data, which they don’t want to be publicly visible. Users can access this data only when logged into their iCloud account. By default, only the user themselves can access the content in their private database (although some content can be shared with other iCloud users). Users have full control over their data (create, view, change, delete). Data in the private database is invisible on the CloudKit dashboard and completely confidential to developers.

Developers can create custom zones in the private database for better organization and management of data.

The data capacity of the private database counts towards the user’s iCloud storage quota.

Shared Database

In the shared database, iCloud users see data projections shared with them by other iCloud users. These data still reside in others’ private databases. You don’t own this data and can view and modify it only if you have the necessary permissions. This database is only available if you are logged into your iCloud account.

For example, if you share a piece of data with someone, it remains in your private database, but the shared user can see this record in their shared database due to your authorization and can only operate according to the permissions you set.

Custom zones cannot be created in the shared database. Data in the shared database is not visible on the CloudKit dashboard.

The capacity of the shared database counts towards the application’s CloudKit storage quota.

Same Terms, Different Meanings

In Syncing Local Database to iCloud Private Database, we discussed syncing the local database to the iCloud private database. In this article, we talk about syncing the shared database to local. Although both articles discuss syncing, the inherent meaning and logic of these two syncs are different.

Syncing local data to the private database is essentially a standard Core Data project. From model design to code development, there’s no difference from developing a project that only supports local persistence. CloudKit merely acts as a bridge for syncing data to the user’s other devices. In most cases, developers can completely ignore the existence of the private database and CKRecord when using managed objects.

Syncing the public database to local is entirely different. The public database is a concept of a network database. The standard logic is for developers to create Record Type on the CloudKit dashboard, add CKRecord records to the public database via the dashboard or client, and access network data records through the server. Core Data with CloudKit makes it convenient to use our existing Core Data knowledge for this process. The data synced locally is a mirror of the server-side public database, and local manipulation of managed object data indirectly performs operations on server-side CKRecord records.

The upcoming discussion on authentication, although it involves managed objects or local persistent storage, actually checks network-side records or databases.

Public Database vs Private Database

Let’s compare public and private databases across several dimensions.

Authentication

Without considering data sharing, only the user themselves (logged into their iCloud account) can access data in the private database. The user, as the data creator, has all operational permissions. The authentication rules for private databases are very simple:

image-20210812153836921

In the article iCloud Dashboard, we introduced the concept of security roles. The system creates three preset roles for the public database: World, Authenticated, and Creator. In the public database, authentication considers various factors like whether the user is logged into their iCloud account and whether they are the creator of the data record.

image-20210812154950463

  • Any user can read records (regardless of whether they are logged in)
  • Any logged-in user can create records
  • Logged-in users can only modify or delete records they created

Using standard CloudKit API to determine permissions involves extensive code and takes longer (requiring server access for results each time). Core Data with CloudKit perfectly solves the efficiency issue by locally backing up CKRecord metadata, offering convenient APIs for developers.

We can use similar code to determine whether a user has permission to modify or delete a current managed object (ManagedObject):

Swift
let container = PersistenceController.shared.container

if container.canUpdateRecord(forManagedObjectWith:item.objectID) {
    // Modify or delete item
}

In recent years, Apple has significantly enhanced NSPersistentCloudKitContainer, adding many important methods. These methods are not only applicable to the public database or its managed objects but also to other types of databases or data (private databases, local databases, shared data, etc.).

  • canUpdateRecord and canDeleteRecord

    Determine if you have permission to modify data. The following situations will return true:

    1. objectID is a temporary object identifier (meaning it has not yet been persisted).
    2. The persistent store containing the managed object does not use CloudKit (for local databases not used for syncing).
    3. The persistent store manages a private database (users have full permissions for private databases).
    4. The persistent store manages a public database, and the user is the creator of the record, or Core Data has not yet updated the managed object to iCloud.
    5. The persistent store manages a shared database, and the user has permission to change the data.

    In actual use, canDeleteRecord does not return accurate results; it is currently recommended to only use canUpdateRecord

    canUpdateRecord returning false does not mean you cannot delete data from local storage; it just means you do not have permission to modify the corresponding network record of that managed object.

  • canModifyManagedObjects(in:NSPersistentStore)

    Indicates whether you can modify a specific persistent store.

    Use this method to determine if users can write records to the CloudKit database. For example, when a user is not logged into their iCloud account, they cannot write to a persistent store that manages the public database.

    Similarly, canModifyManagedObjects returning false does not mean you cannot write data in the local sqlite file; it only means you do not have permission to modify the corresponding network storage of that persistent store.

Since there is no concept of permissions for local data and persistent storage, developers may write code that incorrectly operates locally despite lacking network-side permissions. This is particularly risky in projects syncing public or shared databases. If you modify or delete a data record without network-side permission, the network will reject your request, and Core Data with CloudKit will stop all subsequent syncing work. Thus, when writing projects syncing public or shared databases, you must ensure you have the corresponding permissions before operating on the data.

Synchronization Mechanism

From the perspective of export (syncing local data changes to the server), syncing either private or public databases behaves the same. Core Data with CloudKit will sync changes to the server immediately after local data changes. This is an instantaneous, one-way action.

From the import perspective (syncing server data changes to local), the mechanisms for private and public databases are completely different.

In the articles Basics and CloudKit Dashboard, we already

introduced the syncing mechanism for private databases:

  • The client subscribes to CKDatabaseSubscription on the server.
  • The server sends silent remote notifications to the client after changes occur in the custom Zone of the private database.
  • The client requests change data from the server with CKFetchRecordZoneChangesOperation upon receiving the notification.
  • The server syncs the updated change data to the client after comparing tokens.

This process involves cooperation between both parties.

Due to some technical limitations of public databases, the above mechanism cannot apply to public database syncing.

  • Public databases cannot customize Zones.
  • Without custom Zones, you cannot subscribe to CKDatabaseSubscription.
  • CKFetchRecordZoneChangesOperation utilizes private database-exclusive technology; public databases can only use CKQueryOperation.
  • Public databases lack a tombstone mechanism and cannot record all user operations (deletions).

Because of these reasons, Core Data with CloudKit can only use a polling method (poll for changes) to obtain change data from public databases.

When the application starts or every 30 minutes of operation, NSPersistentCloudKitContainer queries the public database for changes and retrieves data through CKQuery. The import process is initiated by the client and responded to by the server.

This sync mechanism limits applicable scenarios: only data not requiring high immediacy is suitable for storage in public databases.

Data Model

Due to different sync mechanisms, consider the following when designing data models for public databases:

  • Complexity

    Public databases use CKQueryOperation to query server-side changes since the last query. Its efficiency is much lower than CKFetchRecordZoneChangesOperation. The fewer Managed Object Models’ entities and attributes, the fewer required Requests, and the higher the execution efficiency. Unless necessary, the complexity of the model for the public database should be minimized.

  • Tombstones

    Private databases immediately delete server-side records upon receiving client-sent record deletion operations and save a tombstone marker for the deletion. Other client devices receive change data (including tombstones) through CKFetchRecordZoneChangesOperation. The client deletes corresponding local data records based on tombstone instructions, ensuring data consistency.

    Public databases also delete server-side records immediately upon receiving record deletion operations. However, since public databases lack a tombstone mechanism, when other clients query for data changes, the public database can only inform client devices of new or changed records, unable to notify of deletions. This means we cannot transfer deletion operations from one device to another, causing discrepancies in local mirrors of public databases across devices.

    When designing data models for public databases, we add an attribute similar to a tombstone (e.g., isDeleted) to avoid such discrepancies as much as possible.

Swift
// When "deleting", set isDelete to true
if container.canUpdateRecord(forManagedObjectWith:item.objectID){
    item.isDeleted = true
    try! viewContext.save()
}

When calling data, only fetch records where isDeleted is false.

Swift
@FetchRequest(
        sortDescriptors: [NSSortDescriptor(keyPath: \Item.timestamp, ascending: true)],
        predicate: NSPredicate(format: "%K = false", #keyPath(Item.isDelete)),
        animation: .default
)
private var items: FetchedResults<Item>

Records are not truly deleted but merely hidden. Public databases can transfer record modification operations between devices. While ensuring data consistency across devices, this also “deletes” the data. However, “deleted” data still occupies space on both local and server sides, so choose when to clear this space judiciously.

Storage Quota

Data in the private database is stored in the user’s personal iCloud space, consuming their personal space quota. If the user’s iCloud space is full, data will not be able to continue syncing across devices via the network. Users can resolve this by cleaning up their personal space or choosing a larger space plan.

Data in the public database consumes the space quota of your application. Apple provides a basic storage capacity for each app supporting CloudKit, with the following limits: 10 GB of Asset storage, 100 MB of database storage, 2 GB of data transfer per month, and 40 query requests per second. Space, bandwidth, and request limits increase with the number of active users of your app (used the app within the last 16 months), up to a maximum of 10 PB, 10 TB, and 200 TB per day, respectively.

Although most apps won’t exceed these limits, developers should still aim to minimize space usage and improve data response efficiency.

Core Data with CloudKit syncs the entire public library locally, creating a mirror. Thus, if data volume is not well controlled, the app can consume a significant amount of space on the user’s device. The “deletion” method discussed above will further encroach on network and device space.

Developers should consider when to clear pseudo-”deleted” data right from the start of the project.

We cannot guarantee that cleaning will occur after all clients have synced the “deleted” state. Allowing some data inconsistency between devices, without affecting the app’s business logic, is acceptable.

Developers can plan to clear “deleted” data that was marked a certain time ago, based on the average usage frequency of the app. Although Core Data with CloudKit saves the metadata of CKRecord corresponding to managed objects locally, it does not provide an API for developers. To facilitate deletion, we can add a “deletion” time attribute in the model, aiding the query work during cleaning.

Suitable Scenarios for Public Database

Using public databases via CloudKit and syncing them through Core Data with CloudKit have different technical characteristics and considerations.

I personally recommend the following scenarios for syncing public databases with Core Data with CloudKit:

  • Read-Only

    For example, providing templates, initial data, news alerts, etc.

    The creation, modification, and deletion of public database data are all done by the developer through the dashboard or specific app operations. The user’s app only reads from the public database and does not create or change data.

  • Handling a Single Record

    The app creates only one record associated with the user or device and only updates the content of that record.

    Typically used in recording the state of a device or user (can be associated), such as a game’s high score leaderboard (only saving the user’s highest score).

  • Create Only, No Modification

    Scenarios like logging. Users are responsible for creating data and do not particularly rely on the data itself. The app regularly clears expired local data. Public database records are queried or backed up through CloudKit Web services or other specific apps and cleared regularly.

Developers should carefully consider the pros and cons when deciding to use Core Data with CloudKit to sync public database data, choosing the appropriate application scenarios.

Syncing the Public Database

This section heavily references knowledge from Syncing Local Database to iCloud Private Database and Exploring the CloudKit Dashboard. Please read these articles before proceeding.

Project Configuration

Configuring a public database in a project is almost identical to configuring a private database.

  • In the project’s Target under Signing & Capabilities, add iCloud.
  • Select CloudKit and add a Container.

If only using a public database in the project, there is no need to add Remote notifications under Background Modes.

Creating a Local Mirror with NSPersistentCloudKitContainer

  • In Xcode Data Model Editor, create a new Configuration and add the entities you wish to make public to this new configuration.
  • In your Core Data Stack (like the template project’s Persistence.swift), add the following code:
Swift
let publicURL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first!.appendingPathComponent("public.sqlite")
let publicDesc = NSPersistentStoreDescription(url: publicURL)
publicDesc.configuration = "public" // Configuration name
publicDesc.cloudKitContainerOptions = NSPersistentCloudKitContainerOptions(containerIdentifier: "your.public.containerID")
publicDesc.cloudKitContainerOptions?.databaseScope = .public

Does this code look familiar? That’s right. In fact, syncing the public database only requires one more line of code than syncing the private database:

Swift
publicDesc.cloudKitContainerOptions?.databaseScope = .public

databaseScope is a property added by Apple in 2020 to cloudKitContainerOptions. The default value is .private, so it’s unnecessary to set when syncing a private database.

Is that all?

Yes, that’s it. All other configurations are the same as syncing a private database. Add Description to persistentStoreDescriptions, configure the context, and if needed, set up Persistent History Tracking.

Configuring the Dashboard

Since NSPersistentCloudKitContainer uses different methods to fetch public (CKQuery) and private (CKFetchRecordZoneChangesOperation) data, we need to make some modifications in the CloudKit dashboard to ensure the program runs correctly.

In the CloudKit dashboard, select Indexes, and for each Record Type used in the public database, add two indexes:

image-20210813153127111

At the time of writing, I found that one more index was needed to sync the public database properly when using Xcode 13 beta5. If you are using Xcode 13, please add one more index Sortable in the dashboard.

image-20210813153521321

Other Considerations

Initializing Schema

Following the above steps, when adding indexes in the CloudKit dashboard, you’ll find no Record Type to add indexes to. This is because the Schema has not been initialized on the network database side.

There are two methods to initialize the Schema on the network:

  • Create a managed object data and sync it to the server.

    The server will automatically create the corresponding Record Type upon receiving data if it doesn’t exist.

  • Use initializeCloudKitSchema.

    initializeCloudKitSchema allows us to initialize the Schema on the server side without creating data. Add the following code to your Core Data Stack:

Swift
try! container.initializeCloudKitSchema(options: .printSchema)

After running the project, we can see the corresponding Record Type in the dashboard.

This code needs to be executed only once. Remove or comment it out after initialization.

Additionally, initializeCloudKitSchema can be used in unit tests to verify whether the Model meets the compatibility requirements for syncing.

Swift
let result = try! container.initializeCloudKitSchema(options: .dryRun)

If the Model is compatible, result will be true. .dryRun means it only checks locally and does not actually initialize on the server.

Multiple Containers, Multiple Configurations

As mentioned in previous articles, you can link multiple CloudKit containers in a single project, and one container can correspond to multiple applications.

If your project uses both private and public databases, and the containers are different, you need to link both containers in the project and set the correct `ContainerID

in the code for eachDescription`.

Swift
let publicDesc = NSPersistentStoreDescription(url: publicURL)
publicDesc.configuration = "public"
publicDesc.cloudKitContainerOptions = NSPersistentCloudKitContainerOptions(containerIdentifier: "public.container")
publicDesc.cloudKitContainerOptions?.databaseScope = .public

let privateDesc = NSPersistentStoreDescription(url: privateURL)
privateDesc.configuration = "private"
privateDesc.cloudKitContainerOptions = NSPersistentCloudKitContainerOptions(containerIdentifier: "private.container")

The URL for the NSPersistentStoreDescription of the public database must be different from that of the private database (meaning two different sqlite files should be created). The coordinator cannot load the same URL multiple times.

Swift
let publicURL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first!.appendingPathComponent("public.sqlite")

let privateURL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first!.appendingPathComponent("private.sqlite")

Xcode 13 Beta

Xcode 13 beta seems to have made undisclosed adjustments to the CloudKit module. Using Core Data with CloudKit under Xcode 13 beta5 produces many strange warnings. At this stage, it is better to use Xcode 12 for testing this article.

Conclusion

While the code implementation for syncing local data to a private database and syncing a public database is very similar, developers should not be misled by this similarity. It’s crucial to understand the essence of the syncing mechanism to better design data models and plan business logic.

I will continue with the next article in the series — syncing the shared database — after Xcode 13 stabilizes.

Get weekly handpicked updates on Swift and SwiftUI!