Wednesday, January 5, 2011

Baby Steps into Datanucleus HBase JPA

I was playing around with Datanucleus JPA with HBase from last few days. Datanucleus also have support for JDO but my choice to go for JPA is due to Sun standardization on it. There are couple of good tutorials already available but my personal favorite is Matzew one. Apart from few issues of jar dependencies its quite straight forward Maven script which runs a jetty server and a web app.

But if you are crazy about IDEs like Eclipse then it’s a bit tricky to get things in place and working. So, here is the tutorial which guides you to run the same application in Eclipse and Tomcat.

Prerequisites:

  • Hadoop and HBase up and running.
  • Basic knowledge of HBase concepts (Schema, Columns, Column Families etc).
  • Datanucleus Eclipse plug-in installed on your Eclipse.
  • High level understanding of Datanucleus and JPA.

For Whom:
If you are eying any one of following issues:

  • Integration of HBase using JPA in your application.
  • Port Matzew’s Maven script based example in Eclipse Tomcat environment.
  • Suffering from “javax.persistence.* class not found” exception.
  • “No persistence providers available for storename” exception is frustrating you in-spite to few try/solutions suggested on forums.
  • Confused where to put persistence.xml in your web application.
  • Need some way to control column families name in your HBase schema.
  • @Column(name=”familyname:columnname”) is making no effect on your HBase column families.
  • Didn’t find Datanucleus menu item after successfully installing Datanucleus eclipse plug-in.


Let’s Start:

Here I am using the same code committed by Matzew on github and would show you steps to create a simple web application.

Step 1: Create a simple Servlet.
I have created a simple servlet which persist Contact Entity class into database.

import java.io.IOException;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.persistence.EntityManager;
import javax.persistence.EntityManagerFactory;
import javax.persistence.Persistence;

import net.wessendorf.addressbook.Contact;
import net.wessendorf.addressbook.dao.HBaseJPAImpl;

public class Index extends javax.servlet.http.HttpServlet implements
javax.servlet.Servlet {
protected void doGet(HttpServletRequest request,
HttpServletResponse response) throws ServletException, IOException {

EntityManagerFactory emf = Persistence
.createEntityManagerFactory("hbase-addressbook");
EntityManager em = emf.createEntityManager();

Contact contact = new Contact();
contact.setId("id");
contact.setFirstname("name");
contact.setSecondname("second name");

HBaseJPAImpl hbase = new HBaseJPAImpl(em);
hbase.save(contact);
}
}

Step 2: Web application Structure
Your Eclipse dynamic web project should look like this.

Make sure to add all required jars (pretty obvious) and remember few critical points:

  • Make sure you have “persistence-api-XXX.jar” in web-inf\lib (Because Tomcat doesn’t ships with its own version of Persistence.jar)
  • Add META-INF\persistence.xml source (SRC) folder.
  • Place orm.xml in parallel with the Entity classes.


Step 3: Important Configuration XMLs

The persistence.xml file

When Datanucleus starts persisting entities to the database, it needs to know how to connect to that database, where the database is, and which components are its responsibility for managing. All of that information goes in the persistence.xml file.

<persistence xmlns="http://java.sun.com/xml/ns/persistence"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_1_0.xsd" version="1.0">

<persistence-unit name="hbase-addressbook" transaction-type="RESOURCE_LOCAL">
<provider>org.datanucleus.jpa.PersistenceProviderImpl</provider>
<class>net.wessendorf.addressbook.Contact</class>
<mapping-file>net/wessendorf/addressbook/orm.xml
</mapping-file>
<properties>
<property name="datanucleus.ConnectionURL" value="hbase"/>
<property name="datanucleus.ConnectionUserName" value=""/>
<property name="datanucleus.ConnectionPassword" value=""/>


<property name="datanucleus.autoCreateSchema" value="true"/>
<property name="datanucleus.validateTables" value="false"/>
<property name="datanucleus.Optimistic" value="false"/>
<property name="datanucleus.validateConstraints" value="false"/>
</properties>
</persistence-unit>
</persistence>

Fine touch by orm.xml file

If you need fine control over database schema like column family name and constraints orm.xml is the one for you.

The idea of this xml is to map your Entity class with corresponding table in the database. You can also map member variable of Entity class with field/column of the table and can declare constraints on them as well.


<?xml version="1.0" encoding="UTF-8"?>
<entity-mappings xmlns="http://java.sun.com/xml/ns/persistence/orm" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/persistence/orm http://java.sun.com/xml/ns/persistence/orm_1_0.xsd" version="1.0">
<entity class="fully qualified entity class name here" name="Login" >
<table name="Login" />
<attributes>
<id name="userId">
<column name="Login_data:userId" />
</id>
<basic name="pwd">
<column name="Login_data:pwd" />
</basic>
</attributes>
</entity>….
</entity-mapping>

Structure of this xml is quite self-explanatory, still few points to remember:
  • <id> …</id> tag is used to declare and map row key of the HBase table.
  • <basic>...</basic> maps field with the column name of table.
  • Most Important: Note here that column name is given in “family-name:column-name” format. If you don’t specify column name in this format, Datanucleus would take class name as the column family name of the HBase table.


Step 4: Enhance your Entity classes

Make sure to enhance your classes with Datanucleus enhancer. This can be found in right click Datanucleus menu option. Please Note Datanucleus menu would be visible only in Java perspective. If you are in J2EE perspective make sure to change your perspective.


Step 5: Deploy and Run
Go head. Deploy your first JPA app on server and run.

If everything goes well you would have Contact table created in your HBase with one row entry.

I have uploaded the zip of this Eclipse Project here.
That’s It. Enjoy!!

Additional Resources: