Monday, June 22, 2009

Dumpcatcher: Design Doc

As I do with any project at work, I want to put together a short doc describing the scope and scale at which I will write this app.

Summary

Dumpcatcher is a simple web service that takes authorized requests from remote clients and logs key-value pairs in its datastore for future analysis. These pairs are typically an aribitrary identifier and an exception/stack-trace.

Features

  • Crash Stack Message storage
  • Clients must be able to submit stack traces (well, arbitrary strings) along with various bits of meta-data. Version, app name, etc.
  • Client libraries for Python, Java
  • I am targeting my <a href="http://joelapenna.com/git/foursquared.git">Foursquared Android Client

    Unknown end tag for </a>

    as well as any other pet projects I may use in the future.
  • Data Aggregation
  • I plan on allowing data aggregation by exception type, custom label and line.
  • Authenticated Client Requests
  • All requests by clients must be sent by authorized clients to prevent the service from becoming a black hole for spam. Design:

Design

App Engine has a very simple data store and webapp framework that I intend to utiltize for the basic functionality of the app.

Users

Users represent a single Google Id and a particular developer using the system.

Products

A product is an application that uses the dumpcatcher to log crashes. A user may have multiple products.

Each product registered will have two values associated with it, a productKey which will be passed as a paramter in all HTTP requests to the server and a secret which will be used to HMAC sign a request.

Product secrets will be randomly generated UUIDs.

HTTP Request

All requests to the dumpcatcher service will be secured with an HMAC hash. The hash will be keyed by a unique identifier provided to the client

productKey

Each client -> server request will include a productKey, an identifier used to differentiate between different products using the service.

HMAC

All requests must be submitted with an HMAC-SHA1 hex digest of the request query paramters as well as an increasing "request" identifer. The message consists of a standard http "query", sorted by keyname and quoted, request

For: http://localhost/add?product_keyd=1234&some=pair&other=pair we would construct the digest like so:

TODO(jlapenna): Probably don't want to split on & if the contents of the request might contain one though, they should already be encoded. Something like that...

sorted_query = ''.join(sorted(request.query_string.split('&'))) hash = hmac.new('SOME KEY', sorted_query, hashlib.SHA1)

And, as such, the actual request made to the server will be:

'http://localhost/add?product_key=12345&some=pair&other=pair&hmac=%(hash)s'

On the backend the server will take the reverse steps and using the secret associated with the provided productKey, will verify the authenticity of the request by encoding the query paramters the same way it is done on the client, keying the result by secret associated with the provided productKey.

Datastore

Initially there will be three models, one corresponding to "crashes," another to "users" and the third to "products."

Each user will be associated with a specific Google ID but a single Google ID can have many products.

Security

Security and validity of client-> server requests will be handled via the usage of HTTPS for securing communications and for HMAC to verify authenticity of a client request.

Replay Attack

An attacker with access to the HTTP stream a client -> server request is sent over will be able to execute a replay attack by capturing the HTTP post made by the client and submitting it as its own, at any rate he so desires.

The solution as such is to only allow requests over HTTPS. This gains the added advantage of preventing any private data from leaking via a network observer packet sniffing.

Caveats

It is likely and highly reasonable that an app like this exists in a highly more polished and featureful way. I chose this project because I felt like it would be a good way to explore some new technologies and have a fun time; not because this is in any way "new" or "exciting"

Labels: , , , ,

Friday, June 19, 2009

Dumpcatcher

I took the day off in order to spend it writing some code. My goal for today is to finish with a fully-functional exception catcher for my projects so that I don't have to require users to send me tracebacks or exceptions when they occur.

I started by registering http://dumpcatcher.appspot.com. I will be pushing frequently to this site as I add features. I chose App Engine because I like that its hassle-free application deployment. Launch the app, run it, walk away and it should just work!

Second, I registered http://dumpcatcher.googlecode.com. I will be pushing the code here pretty much just as frequently as I push the site. I chose Mercurial as the SCM because I am sick and tired of git.

So... Let me begin...

Addendum: I actually ended up going into work instead of taking the day off and coded this on a flight to Cambodia.

Labels: , , ,

Sunday, August 31, 2008

Android/App Engine prototyping problems

Almost a year ago, Google unleashed its Android SDK in a preview capacity. One of the pain points I experienced with it was the tedium in creating ContentProviders. Today, I'm feeling the same tedium. A ContentProvider is the sole mechanism for an android application to share information with external packages. ContentProviders can be backed by any kind of data, file, database or live server with custom backend. The popular method is creating a ContentProvider that is backed by a database. Creating such a beast is a time consuming operation and making it with a SQL database is quite contrary to the BigTable based datastore API in Google App Engine (GAE).
In my top secret project, I'm using the HttpClientService api that I've defined in my Missing SVN repository (the top-secret project is indeed different than Missing) to interface with my server. Currently, the client does no caching of data -- all information it needs is pulled at user request, from the interwebs. When testing on my local machine, this is not a problem. I have a fast workstation and there isn't much packet loss or latency on a loopback interface.

Unfortunately, this won't work in the "real world." I'm now at the point where I have to implement a sql-lite backed ContentProvider that tries to model a table-based GAE datastore api. Not only does this mean I have to keep track of two different schema, one for GAE and one for Android, but it means I have to implement the same "get my data" api twice, with a layer of abstraction between the android UI and the ContentProvider I'm resenting having to create.

What I'm considering doing is using something like Google's Protocol Buffers or Facebook's Thrift to define my data model and create stub interfaces for both GAE and Android. This seems like a bit of overkill for the current state of my project but even in the not-yet-ready-to-show-anyone stage of this project I'm having to consider these very-high-time-cost coding excersizes. This is going to consume time when I should be iterating on features and trying to get the app in a state where I can finally start to use it. I think I've proto-typed the android app as far as I can go but I don't want to start spending hordes of time on this when I have several core concepts not-yet-implemented.

I'm also feeling the pain of knowing that the stuff I prototyped at the beginning of the project are going to have to be fully re-written and if I don't provide a more featureful content provider.

One of the ways you can provide an abstracted interface to the ContentProvider is by wrapping it in a Service, which of course requires yet another interface declaration, this one with java primatives and simple classes using the AIDL interface spec provided by Android. I figure I'll have to cross that bridge some day, I'm just glad I don't have to do it right now.

Labels: , , , , , ,

The views and opinions expressed in the blog are of Joe LaPenna. Google has nothing to do with these pages.
For information about Google please visit: Google Press Center