Sunday, October 26, 2008

Synchronization

I have spent the past two weeks procrastinating (to some extent) on working on my top secret project. The problem I've been struggling with is just a really hard one to solve and I've gotten quite used to instant gratification with my code. Much of what I've been working on has ended up with results by the end of a hacking session. Not so much with my current task -- synchronization.

One of the key features of the top secret project is that it is always available. Whether by a browser or a native app like an android client, the user is expected to be able to interact with the app no matter if they have internet connectivity or not. This means I have to spend a lot of time working on offline access as a requirement for letting anyone use the app. I thought I could get away with dogfooding my app while I was in Toronto, but I quickly realized that without 3G data on the phone the top secret project would just not function correctly.

One of the challenges I've faced so far is a temporal one. My first thought when deciding to do offline access was that the client would do the synchronization and call back to the server to push a canonical dataset into the datastore. After several nights of hacking I was fed up. I couldn't get the synchronzation to work at all. I found other things (like Statusinator) to work on instead.

On the flight to Chicago I had a "breakthrough" that really should have been my first thought. Do synchronization on the server side! My idea is as such:

  1. Client creates local data, assigns hypothetically-unique UUID to record, tags it as existing locally only.


    1. Stores:
      {"key": "possibly-unique-key", "value": "some-value", updated: "2008-10-27 04:41:01", "created": "2008-10-27 04:41:01", "is_pending": true}



  2. Client requests sync session with server, gets data for min/max records to sync. **All further RPCs have a sync_uuid.


    1. Client Sends:

      {"device_uuid": "some-possibly-id"}
    2. Client Recieves:

      {"sync_uuid":
      "some-unique-session-based-on-device-uuid-and-user", "last_sync": null,
      "max_checkin": "2008-10-27 04:41:01"}

  3. Client pushes record to server, is_pending to denote that the server is receiving a note with an unrecognizable key.


    1. Client sends:

      [{"key": "possibly-unique-key", "value": "some-value", updated: "2008-10-27 04:41:01", 
      "created": "2008-10-27 04:41:01", "is_pending": true}]



  4. Server processes record, changes the key to a valid server key, stores the original as an attribute on the record for future book keeping: local_id.


    1. Server Stores:


      {"key": "some-real-key", "local_id": "possibly-unique-key", "value": "some-value", 
      updated: "2008-10-27 04:41:01", "created": "2008-10-27 04:41:01", "is_pending": false}



  5. Client requests updates from the server, Server responds with all new or updated records modified serverside, along with the newly added records from step 3.


    1. Client requests, asking for all records modified after last_sync (or all records, if None) but before max_checkin.
    2. Server responds:


      [{"key": "some-real-key", "local_id":
      "possibly-unique-key", "value": "some-value", updated: "2008-10-27
      04:41:01", "created": "2008-10-27 04:41:01"}]



  6. Client parses record for "local_id" attribute, and replaces the record in the local datastore with the copy from the server, stripping the local_id attribute and removing the pending bit.


    1. Client stores:


      {"key": "some-real-key", "value": "some-value", updated: "2008-10-27 04:41:01", 
      "created": "2008-10-27 04:41:01", "is_pending": false}



  7. Client tells server sync sesison is complete, using the newest record received from the "push/pull" to specifiy the end date of the session. (just as last_sync is the start). To save a write during the server-to-client record update, the client is the one noting the end date for the session here, instead of the server.


    1. Client sends: last_update.
    2. Server stores: stores last_update as last_sync.

I haven't thought this through all the way yet, but I think this will work just fine for a google-gears based browser client just as it will work for my android client. There is something that still bothers me and I'm having a hard time scoping it out in my head: What happens when clock scews occur?

Remaining questions:
  • Do I leave around "local_id?" When is an appropriate time to strip those records? I don't want the server modifying the records withouth confirmation from the client that the info is no longer needed.
  • What are the error condtions when the sync fails at each of the above steps, how does this pattern resolve conflicts that occur when records are modified after a failed sync?