Objectify-Appengine, Part 1

Java development 2.0: Twitter mining with Objectify-Appengine, Part 1
It’s no secret to readers of this series that NoSQL datastores have inspired an explosion of innovation in the Java™ world over the past couple of years. In addition to the datastores themselves (like CouchDB, MongoDB, and Bigtable), we have begun to see tools that extend their usefulness. ORM-like mapping libraries lead this pack by addressing one of the pernicious challenges of NoSQL: how to efficiently map plain old Java objects (the common currency of schemaless datastores) and make those objects useful, much like what Hibernate does for relational datastores.

About this series
The Java development landscape has changed radically since Java technology first emerged. Thanks to mature open source frameworks and reliable for-rent deployment infrastructures, it’s now possible to assemble, test, run, and maintain Java applications quickly and inexpensively. In this series, Andrew Glover explores the spectrum of technologies and tools that make this new Java development paradigm possible.

SimpleJPA is one example in this category: a persistence library that lets JPA annotated objects work almost seamlessly with Amazon’s SimpleDB. I introduced SimpleJPA a few columns back, but noted that even though it’s based on JPA, it doesn’t implement the full JPA specification. This is due to the fact that JPA is intended to work with relational databases, which SimpleDB (and thus its little helper, SimpleJPA) avoid. Other projects don’t even try to mimic the full JPA specification: they just borrow what they want from it. One such project — Objectify-Appengine — is the subject of this month’s column.

Objectify-Appengine: An object-non-relational mapping library

Objectify-Appengine, or Objectify, is an ORM-like library that simplifies data persistence in Bigtable, and thus GAE. As a mapping layer, Objectify inserts itself, by way of an elegant API, between your POJOs and Google’s heavy equipment. You use a familiar subset of JPA annotations (although Objectify doesn’t implement the full specification), along with a handful of life-cycle annotations, to persist and retrieve data in the form of Java objects. In essence, Objectify is a lighter weight Hibernate expressly designed for Google’s Bigtable.

ORM-like?
Object-relational-mapping is the most common way to overcome the so-called impedance mismatch between object-oriented data models and relational databases (see Resources). In the non-relational world there is no impedance mismatch, so Objectify isn’t really an ORM library; it’s more like ONRM (object non-relational mapping) library. “ORM-like” is convenient shorthand for those of us with acronym fatigue.

Objectify is similar to Hibernate in that it allows you to map and leverage POJOs against Bigtable, which you view as an abstraction in GAE. In addition to a subset of JPA annotations, Objectify employs annotations of its own, which address the unique features of the GAE datastore. Objectify also permits relationships and exposes a query interface that supports the GAE notions of filtering and sorting.

In the next sections, we’ll develop an example application that lets you try your hand at mapping and data persistence with Objectify, using Google’s Bigtable to store application data. In the second half of this article, we’ll leverage our data in a GAE web application.

Back to top

Big picture, Bigtable

I’m giving the “races and runners” domain a break, and we can skip the parking tickets, too. Instead, we’ll be mining Twitter — another familiar application domain for those who read last month’s introduction to MongoDB. This time we’ll investigate not just who has retweeted us (or me, or you) on Twitter, but which of our top retweeters are the most influential.

For this application, we’ll need to create two domain classes: Retweet and User. The Retweetobject obviously represents a retweet from a Twitter account. The User object represents the Twitter user whose account data we’re mining. (Note that this User object is different from the GAE User object.) Every Retweet has a relationship to a User.

About Bigtable
Bigtable is a column-oriented NoSQL datastore that is accessible via Google App Engine. Rather than the schemas you’d find in a relational database, Bigtable is basically a massively distributed persistence map — one that permits queries on keys and attributes of the underlying data values. Bigtable to GAE is much like SimpleDB to Amazon Web Services.

Objectify leverages Google’s low-level Entity API to intuitively map domain objects to the GAE datastore. I introduced the Entity API in a previous article (see Resources), so I won’t discuss it much here. The main thing you need to know is that in the Entity API domain names become the kind type — that is, User will logically map to a User kind — which is similar to a table in relational terms. (For a closer analogy, think of a kind as a map holding keys and values.) Domain attributes are then essentially column names in relational terms, and attribute values are column values. Unlike Amazon’s SimpleDB, the GAE datastore supports a rich set of data types including blobs (see Resources) and all manner of numbers, dates, and lists.

Back to top

Class definition in Objectify

The User object will be pretty basic: just a name and two attributes related to Twitter’s OAuth implementation, which we’ll leverage for its intuitive approach to authorization. Rather than storing a user’s password, users in an OAuth paradigm store tokens, which represent the user’s permission to act on their behalf. OAuth operates much like a credit card does, but with authorization data as the currency. Instead of giving every website your user name and password, you give sites permission to access that information. (OAuth is similar to OpenID — but different; see Resources to learn more.)
Listing 1. The beginnings of a User object

import javax.persistence.Id;

public class User {
@Id
private String name;
private String token;
private String tokenSecret;

public User() {
super();
}

public User(String name, String token, String tokenSecret) {
super();
this.name = name;
this.token = token;
this.tokenSecret = tokenSecret;
}

public String getName() {
return name;
}

//…
}
As you can see in Listing 1, the only persistence-specific code associated with the User class is the @Id annotation. @Id is standard JDO, which you can tell from the import. The GAE datastore allows identifiers or keys to be either Strings or Longs/longs. In Listing 1, I’ve specified the Twitter account’s name as the key. I’ve also created a constructor that takes all three properties, which will facilitate creating new instances. Note that I do not actually have to define getters and setters for this object to be utilized in Objectify (though I’ll need them if I want to access or set properties programmatically!).

When the User object is persisted to the underlying datastore, it’ll be a User kind. This entity will have a key dubbed name and two other properties: token and tokenSecret, all of which areStrings. Pretty easy, eh?

The powers of User

Next, I’ll add a tiny bit of behavior to my User domain class. I’m going to make a class method that enables User objects to find themselves by name.
Listing 2. Finding Users by name

//inside User.java…
private static Objectify getService() {
return ObjectifyService.begin();
}

public static User findByName(String name){
Objectify service = getService();
return service.get(User.class, name);
}
A few things are going on in the newly minted User in Listing 2. In order to leverage Objectify, I need to fire it up, so to speak. So grab an instance of Objectify, which handles all CRUD-like operations. You can think of the Objectify class as roughly analogous to Hibernate’sSessionFactory class.

Objectify has a simple API. To find an individual entity by its key, you simply invoke the getmethod, which takes a class type and the key. Thus, in Listing 2, I issue a call to get with the underlying User class and the desired name. Also note that Objectify’s exceptions are unchecked — which means I don’t have to worry about catching a bunch of Exception types. That’s not to say exceptions don’t occur; they just don’t have to be handled at compile time,per se. For instance, the get method will throw a NotFoundException if the User kind can’t be located. (Objectify also provides a find method, which instead returns null.)

Next up is instance behavior: I want my User instances to support the ability to list all retweets in order of influence, which means I need to add another method. But first I’m going to model my Retweet object.

How many Retweets?

Retweet, as you can guess, represents a Twitter retweet. This object will hold a number of attributes, including a relationship back to the owning User object.

I’ve mentioned already that an identifier or key in the GAE datastore must either be a Stringor a Long/long. Keys in the GAE datastore are also unique, just as they would be in a traditional database. That’s why the User object’s key is the name of a Twitter account, which is inherently unique. The key on the Retweet object in Listing 3 will be a combination of the tweet id and the user who retweeted it. (Twitter doesn’t allow tweeting the same text twice, so for now this key makes sense.)
Listing 3. Defining Retweet

import javax.persistence.Id;
import com.googlecode.objectify.Key;

public class Retweet {
@Id
private String id;
private String userName;
private Long tweetId;
private Date date;
private String tweet;
private Long influence;
private Key owner;

public Retweet() {
super();
}

public Retweet(String userName, Long tweetId, Date date, String tweet,
Long influence) {
super();
this.id = tweetId.toString() + userName;
this.userName = userName;
this.tweetId = tweetId;
this.date = date;
this.tweet = tweet;
this.influence = influence;
}

public void setOwner(User owner) {
this.owner = new Key(User.class, owner.getName());
}
//…
}
Note that the key in Listing 3, id, is a String; it combines the tweetId and the userName. ThesetOwner method shown in Listing 3 will make more sense once I explain relationships.

Back to top

Modeling relationships

Retweets and Users in this application have a relationship; that is, every User holds a logicalcollection of Retweets, and every Retweet holds a direct link back to its User. Look back toListing 3 and you might notice something unusual: A Retweet object has a Key object of typeUser.

Objectify’s use of Keys, rather than object references, reflects GAE’s non-traditional datastore, which among other things lacks referential integrity.

The relationship between the two objects really only needs a hard connection on the Retweetobject. That’s why an instance of Retweet holds a direct Key to a User instance. Consequently, a User instance doesn’t actually have to persist Retweet Keys on its side — a User instance can simply query retweets for those that link back to itself.

Still, in order to make interaction between the objects more intuitive, in Listing 4 I’ve added to User a few methods that accept Retweet. These methods cement the relationship between the two objects: User now directly sets its ownership of a Retweet.
Listing 4. Adding Retweets to a User

public void addRetweet(Retweet retweet){
retweet.setOwner(this);
Objectify service = getService();
service.put(retweet);
}

public void addRetweets(List retweets){
for(Retweet retweet: retweets){
retweet.setOwner(this);
}

Objectify service = getService();
service.put(retweets);
}
In Listing 4, I’ve added two new methods to the User domain object. One works with a collection of Retweets, while the other works on just one instance. You’ll note that the reference to service was previously defined in Listing 2 and its put method is overloaded to work with both single instances and Lists. The relationship in this case is also handled by the owning object — the User instance adds itself to the Retweet. Thus Retweets are created separately, but once they are added to an instance of a User, they are formally attached.

Back to top

Twitter mining

My next step is to add a finder-like method on the User object. This method will allow me to list all owning Retweets in order of influence — that is, from an initial owning account to accounts that have retweeted it. I’ll track from the account with the most followers to the one with the least.
Listing 5. Retweets by influence

public List listAllRetweetsByInfluence(){
Objectify service = getService();
return service.query(Retweet.class).filter(“owner”, this).order(“-influence”).list();
}
The code in Listing 5 resides in the User object. It returns a List of Retweets ordered by theirinfluence property, which is an integer. The “-” in this case indicates that I want Retweets in descending order, from highest to lowest. Notice Objectify’s query code: the serviceinstance supports filtering by property (in this case owner) and even ordering the results. Also note the continuing pattern of unchecked exceptions, which keeps the code remarkably concise.

Querying multiple properties

The GAE datastore leverages an index for any query issued. This makes for fast reads because single properties in an entity are automatically indexed. But if you end up querying by multiple properties (like I did in Listing 5, querying by owner and then by influence), you must provide a datastore-index.xml file for GAE. This gives GAE advance warning of an incoming query. Listing 6 is the custom index that makes querying multiple properties possible:
Listing 6. Defining a custom index for the GAE datastore

Persistence

Last but not least, I need to add some ability to persist my domain objects. You might have noticed that there’s an implicit workflow to the relationship between the User and Retweetobjects. Namely, I need to have a User instance created (and saved into the GAE datastore) before I can logically add related Retweets.

In Listing 7, I add a save method on the User object, but note that I don’t need one on theRetweet object. Retweets are automatically saved when I add them to a User instance — which I do via the addRetweet and addRetweets methods (notice the calls to service.put in Listing 4).
Listing 7. Saving Users

public void save(){
Objectify service = getService();
service.put(this);
}
See how terse that code is? That’s the Objectify API at work.

Back to top

Registering domain classes

I’m about ready to pull my Twitter mining application together, which involves a bit of wiring with the Servlets API. I’ll use servlets to handle logging into Twitter, pulling retweet data, and finally displaying a nifty report. I’m going to leave that to your imagination for now, though, and focus on one last requirement of working with Objectify: manually registering domain classes.

Objectify doesn’t auto-load domain classes — which means it doesn’t scan your classpath for entities. You must tell Objectify up-front what classes are special, so that later you’ll be able to access and use them via the Objectify API. The ObjectifyService object allows you to register domain classes, which of course you need to do before attempting to invoke their CRUD-like behavior. Fortunately, because I’m writing a simple web application to be deployed on GAE, I can use the Servlet API to register my two classes in a ServletContextListenerinstance.

ServletContextListeners have two methods, one invoked when a context is created, the other when one is destroyed. Contexts are created when you first fire up a web application, so this will work nicely.
Listing 8. Registering domain objects

import javax.servlet.ServletContextEvent;
import javax.servlet.ServletContextListener;
import com.googlecode.objectify.ObjectifyService;

public class ContextInitializer implements ServletContextListener {

public void contextDestroyed(ServletContextEvent arg) {}

public void contextInitialized(ServletContextEvent arg) {
ObjectifyService.register(Retweet.class);
ObjectifyService.register(User.class);
}
}
Listing 8 shows a simple implementation of a ServletContextListener, in which I register my two Objectify domain classes, User and Retweet. As per the Servlet API,ServletContextListener instances are registered in a web.xml file. When my application starts up on Google’s servers, the code in Listing 8 will be invoked. All future servlets that use my domain objects will work just fine, and with no further ado.

Back to top

Conclusion to Part 1

At this point, we’ve written up a couple of classes and defined their relationships and CRUD-like abilities, all using Objectify-Appengine. You might have noticed a few things about the Objectify API as we worked through the sample application — like the fact that it takes a lot of the verbosity out of normal Java code. It also leverages a few standard JPA annotations, thus smoothing the path for developers accustomed to working with JPA-enhanced frameworks like Hibernate. On the whole, the Objectify API makes domain modeling for GAE easier and more intuitive, which is a boost to developer productivity.

In the second half of this article, we’ll take our domain application to the next level, wiring it together with OAuth, the Twitter API (via Twitter4J), and Ajax-plus-JSON. All of this will be slightly complicated by the fact that we’re deploying on Google App Engine, which places some restrictions on implementation. But on the upside, we’ll end up with truly scalable, cloud-based web application. We’ll explore those trade-offs further next month, when we start preparing the sample application for deployment on GAE.

Resources

Learn

Java development 2.0: This developerWorks series explores technologies that are redefining the Java development landscape; recent topics include MongoDB (September 2010), SimpleDB with SimpleJPA (August 2010), and Google App Engine (August 2009).
Max Ross on GAE: Google Software Engineer Max Ross talks GAE, including the datastore (his passion) and Objectify-Appengine in this podcast with Andrew Glover.
“Java development 2.0: NoSQL” (Andrew Glover, developerWorks, May 2010): Find out why NoSQL datastores like Bigtable and CouchDB are moving from margin to center.
“Java Object Persistence: State of the Union” (InfoQ panel, March 2008): Step into the Internet time machine for a reminder of what the future of Java persistence looked like before NoSQL (just a few years ago). Includes a high-level discussion about the object-relational impedance mismatch.
“OAuth-ing Twitter with Twitter4J” (Andrew Glover, The Disco Blog, September 2010): Twitter and Twitter4J no longer allow basic authorization — which makes it a good idea to learn about OAuth.
“OAuth-OpenID: You’re Barking Up the Wrong Tree if you Think They’re the Same Thing” (Michael Mahemoff, Software As She’s Developed, November 2007): Find out what OAuth doesand does not have in common with OpenID.
Browse the Java technology bookstore for books on these and other technical topics.
developerWorks Java technology zone: Find hundreds of articles about every aspect of Java programming.
Download Objectify: The simplest convenient interface to the Google App Engine datastore.
Get products and technologies

Gửi phản hồi

Mời bạn điền thông tin vào ô dưới đây hoặc kích vào một biểu tượng để đăng nhập:

WordPress.com Logo

Bạn đang bình luận bằng tài khoản WordPress.com Log Out / Thay đổi )

Twitter picture

Bạn đang bình luận bằng tài khoản Twitter Log Out / Thay đổi )

Facebook photo

Bạn đang bình luận bằng tài khoản Facebook Log Out / Thay đổi )

Google+ photo

Bạn đang bình luận bằng tài khoản Google+ Log Out / Thay đổi )

Connecting to %s