Tuesday, 23 August 2011

Summer and HBase!

I spent most of my summer dealing and learning Hadoop’s column database called HBase. Why this database? To put in Ian Boston’s words : “The reason for HBase is its the DB of choice for many systems that want to do large scale data analysis on Hadoop.” Ian declared that the choice of database was HBase on June 27 and since then, I must have clicked more number of links on Google for ‘HBase’ than perhaps the number of HBase deployments itself! Then started the quest of mine, to reconsider database driver in light of this new database, and code for the tests which would judge its performance. After a plethora of idea exchange the code was written and tests were implemented. This post aims at explaining and shedding some light on deployment of HBase as how I had done it during the summer.

World, HBase. HBase, world.

People coming from RDBMS background, just shake your head vigorously, and forget all the schema that you have ever implemented in your life! To absorb concepts of HBase, just vacuum clean your head and make room for new concepts to flow in.

Why HBase?

To answer that question I’ll have to walk you through the problems of traditional databases. Traditional databases face basically two types of problems: First is scaling and the second one, well, can be called ‘sparse-ity’. Traditional RDBMS may be reliable, widely used, developer friendly blah blah blah, but when you ask it “How do I scale you”, it would reply “Put more money in me and buy more hardware”. The second problem can be addressed thus: Imagine you are trying to stuff an intricate object graph, with many interdependent objects and relations into RDBMS schema. Definitely, you are bound to end up with a schema wherein an object may have several attributes, which well, may be seldomly used. Your RDBMS surely is going to charge you for all those extra ‘NULL’ references out there! So how does HBase deal with this, lets see!

HBase and scaling :

HBase is specialized column DB(more on it soon), mastered in scaling. It partitions horizontally and ‘distributes’ data over huge number of commodity servers. HBase is built on Hadoop , which implements functionality similar to Google's GFS and Map/Reduce systems. It provides means to efficiently organize and serve huge amount of data. If you are more interested read Google’s BigTable Paper and Map/Reduce concepts.

HBase and ‘sparse-ity’:

HBase is a column oriented database. This means that it stores contents in form of columns rather than rows. This frees up the need of attributes which may not be necessary for an object. In row DBs, where you would have nineteen NULLs and one attribute, HBase(or rather any column DB) would only save that one attribute. So, less room for storage, and high speed performance!

HBase datamodel:

An excellent read for this would be : http://wiki.apache.org/hadoop/Hbase/DataModel

How I went about setting up HBase for SparseMapContent:

Configuring Apache HBase database on Windows:
The page describes how to configure Apache HBase in a standalone mode on Windows using Cygwin.

For Windows environment, 3 technologies are required which are JAVA, SSH and Cygwin.

Installing JAVA:
Download the standard Edition JAVA plateform from here and follow the simple GUI wizard to install the same.

Installing Cygwin
Cygwin provides *nix like environment in Windows. Steps for installation are as follows:
1. Make sure you have Administrator privileges on the target system.

2. Create Root and Local Package directories. A good suggestion is to use C:\cygwin\root and C:\cygwin\setup folders.

3. Download the setup.exe utility from here and save it to the Local Package directory.

4. Run the setup.exe utility,

1. Choose the Install from Internet option,
2. Choose your Root and Local Package folders
3. and select an appropriate mirror.
4. Don't select any additional packages yet, as we only want to install Cygwin for now.
5. Wait for download and install
6. Finish the installation
5. Add CYGWIN_HOME system-wide environment variable that points to your Root directory.
6. Add %CYGWIN_HOME%\bin to the end of your PATH environment variable.
7. Reboot the sytem after making changes to the environment variables otherwise the OS will not be able to find the Cygwin utilities.
8. Test your installation by running your freshly created shortcuts or the Cygwin.bat command in the Root folder. You should end up in a terminal window that is running a
Bash shell. Test the shell by issuing following commands:

1. cd / should take you to the Root directory in Cygwin;
2. the LS commands that should list all files and folders in the current directory.
3. Use the exit command to end the terminal.

9. When needed, to uninstall Cygwin you can simply delete the Root and Local Package directory, and the shortcuts that were created during installation.

Installing SSH:
HBase (and Hadoop) rely on
SSH for interprocess/-node communication and launching remote commands.

1. Rerun the setup.exe utility.
2. Leave all parameters as is, skipping through the wizard using the Next button until the Select Packages panel is shown.
3. Maximize the window and click the View button to toggle to the list view, which is ordered alphabetically on Package, making it easier to find the packages we'll need.
4. Select the following packages by clicking the status word (normally Skip) so it's marked for installation. Use the Next button to download and install the packages.

1. OpenSSH
2. tcp_wrappers
3. diffutils
4. zlib
5. Wait for the install to complete and finish the installation.

Installing HBase
Downlaod HBase from here, unzip it and place it under the directory C:\cygwin\usr\local\ so that it gets installed in Cygwin(C:\cygwin\usr\local\hbase-)

Configuring JAVA
1. Create a symbolic link in /usr/local to the Java home directory by using the following command and substituting the name of your chosen Java environment:
LN -s /cygdrive/c/Program\ Files/Java/ /usr/local/

2. Test your java installation by changing directories to your Java folder CD /usr/local/ and issueing the command ./bin/java -version. This should output your version of the chosen JRE.

Configuring SSH
1. On Windows Vista and above make sure you run the Cygwin shell with elevated privileges, by right-clicking on the shortcut an using Run as Administrator.

2. First of all, make sure that the rights on some crucial files are correct. Use the commands underneath and you can verify all rights by using the LS -L command on the different files. Also, notice the auto-completion feature in the shell using is extremely handy in these situations.

1. chmod +r /etc/passwd to make the passwords file readable for all
2. chmod u+w /etc/passwd to make the passwords file writable for the owner
3. chmod +r /etc/group to make the groups file readable for all
4. chmod u+w /etc/group to make the groups file writable for the owner
5. chmod 755 /var to make the var folder writable to owner and readable and executable to all

3. Edit the /etc/hosts.allow file using your favorite editor (why not VI in the shell!) and make sure the following two lines are in there before the PARANOID line:

1. ALL : localhost : allow
2. ALL : [::1]/128 : allow

4. Next we have to configure SSH by using the script ssh-host-config. The following may be asked in random order but don’t worry about that.

1. If this script asks to overwrite an existing /etc/ssh_config, answer yes.
2. If this script asks to overwrite an existing /etc/sshd_config, answer yes.
3. If this script asks to use privilege separation, answer yes.
4. If this script asks to install sshd as a service, answer yes. Make sure you started your shell as Adminstrator!
5. If this script asks for the CYGWIN value, just as the default is ntsec.
6. If this script asks to create the sshd account, answer yes.
7. If this script asks to use a different user name as service account, answer no as the default will suffice.
8. If this script asks to create the cyg_server account, answer yes. Enter a password for the account.

5. Start the SSH service using net start sshd or cygrunsrv --start sshd. Notice that cygrunsrv is the utility that make the process run as a Windows service. Confirm that you see a message stating that the CYGWIN sshd service was started succesfully.

6. Harmonize Windows and Cygwin user account by using the commands:

1. mkpasswd -cl > /etc/passwd
2. mkgroup --local > /etc/group

7. Test the installation of SSH:

1. Open a new Cygwin terminal
2. Use the command whoami to verify your userID
3. Issue an ssh localhost to connect to the system itself
1. Answer yes when presented with the server's fingerprint
2. Issue your password when prompted
3. test a few commands in the remote session
4. The exit command should take you back to your first shell in Cygwin
5. Exit should terminate the Cygwin shell.

8. If you get stuck with some password problem, you can change it using the command passwd.

Configuring HBase

(2nd and 3rd steps are optional.)
1. HBase uses the ./conf/hbase-env.sh to configure its dependencies on the runtime environment. Copy and uncomment following lines just underneath their original, change them to fit your environemnt. They should read something like:

1. export JAVA_HOME=/usr/local/
2. export HBASE_IDENT_STRING=$HOSTNAME as this most likely does not inlcude spaces.

2. HBase uses the ./conf/hbase-default.xml file for configuration. Some properties do not resolve to existing directories because the JVM runs on Windows. This is the major issue to keep in mind when working with Cygwin: within the shell all paths are *nix-alike, hence relative to the root /. However, every parameter that is to be consumed within the windows processes themself, need to be Windows settings, hence C:\-alike. Change following propeties in the configuration file, adjusting paths where necessary to conform with your own installation:

1. hbase.rootdir must read e.g. file:///C:/cygwin/root/tmp/hbase/data
2. hbase.tmp.dir must read C:/cygwin/root/tmp/hbase/tmp
3. hbase.zookeeper.quorum must read because for some reason localhost doesn't seem to resolve properly on Cygwin.

3. Make sure the configured hbase.rootdir and hbase.tmp.dirdirectories exist and have the proper rights set up e.g. by issuing a chmod 777 on them.

Testing the installation and configuration of HBase on Windows using Cygwin.
1. Start a Cygwin terminal.

2. Change directory to HBase installation using CD /usr/local/hbase-, preferably using auto-completion.

3. Start HBase using the command ./bin/start-hbase.sh

1. When prompted to accept the SSH fingerprint, answer yes.
2. When prompted, provide your password. Maybe multiple times.
3. When the command completes, the HBase server should have started.
4. However, to be absolutely certain, check the logs in the ./logs directory for any exceptions.

4. Next we start the HBase shell using the command ./bin/hbase shell

5. You can run some simple test commands

6. Leave the shell by exit

7. To stop the HBase server issue the ./bin/stop-hbase.sh command. And wait for it to complete. Killing the process might corrupt your data on disk.

8. In case of problems,

1. Verify the HBase logs in the ./logs directory.
2. Try to fix the problem.
3. Get help on the forums or IRC (#hbase@freenode.net). People are very active and keen to help out!
4. Stop, restart and retest the server.

Getting the code using git
Open the GIT bash or command prompt and follow the following commands:

$ cd
$ mkdir sparsemapcontent
$ cd sparcemapcontent
$ git clone
$ cd sparsemapcontent/
$ maven clean install
$ exit

For developing the code in eclipse

1. Import sparsemapcontent folder as existing maven project.
2. Include the following jar files into the project in case they are not there from /usr/local/hbase- folder.
· Hbase-.jar
· Hbase--test.jar
3. Start the HBase server as stated before.
4. Create the tables au, an, cn and smcindex.

So that was how I dirtied my hands in HBase. It’s a great DB to understand column DB concepts. I hope this was helpful.

Please feel free to mail me at kotwal.aadish@gmail.com. Hopefully your mail would put me in a tizzy!

Also interested readers may ponder over the matter in these site:

1. Configuration of HBase :


2. HBase data model :


3. HBase book :


4. HBase on Windows OS :


5. Place to start learning about Hadoop :


6. HBase debugging and troubleshooting :

i. http://hbase.apache.org/book/trouble.html

ii. http://old.nabble.com/HBase-User-f34655.html

Sunday, 14 August 2011

Sakai OAE Native mobile app: almost the end.

It is almost the end of the summer of code, (time flies!), so this post will be just a quick summary about how the project is going on, because I want to use all the remaining time to finish some code, add new features and write documentation.

So this is just a sneak view, to keep the community update. Again, I must begin this post thanking my mentor Carl Hall, for his hard work, he has asked every question that has arisen and also he gave me some freedom to make take some decisions. So, thanks Carl! I am really proud to work with you :)

Ok, let’s focus on what matters. If you read my previous post (Why should I develop native mobile application (sometimes)?), one of the bad things about writing native applications is the maintenance, when I wrote this post I have already complained about the problems of implementing functions in one platform that are completely different in the other. I have been stuck in a lot of device specific problems, I tried my best to take benefit of the iOS and Android user interface, so the layout is slightly different in this two apps.

I am going to quickly enumerate things that are already implemented:

  • Skeleton application for Android : It has grown up a lot since last time I talk about it. It has 3 tabs for the main features in Sakai OAE that they match with the 3 main tabs on Sakai web version. Also it has the navigation menu inside “You” tab. All the strings than you can see are internationalized.
  • Skeleton application for iOS: idem in IOS. It took me a lot of time make it work. As you could see in the demo tabs are in the lower part of the screen, to use the UI specific tabs, and make the application easier to use for iOS and Android users.
  • Authenticate users in Android and iPhone: last time I wrote I had users could authenticate themselves sending their credentials in Android. Right now they can do the same also in the iOS application.
  • Store and manage Sakai URL in Android: Carl and I were thinking than write down every time the Sakai URL was really tedious. So we add a new first view where the user writes the URL, the app checks if it’s a valid URL and store it inside the device. In Android it is stored inside the SharedPreferences of the app, so we can get the URL every time we needed it.
  • Store and manage Sakai URL in Android: same for iOS. Here the URL is stored in a singleton class inside the NSUserDefaults.
  • Calling a web service and show data in Android: Although the authentication was already a web service, we have achieve to call the Me Service with the user credentials, get back the Basic Information inside a JSON Object, parsing it, populate java classes and showing the information inside the Android App. Apache has really useful libraries to do this.
  • Calling a web service and show data in iOS: again the result for iOS is the same, but the implementation has nothing to do, here I used JSONFramework to parse the data and Objective-C classes to manage the connection. This web service is inside a thread so it runs in background.

And since a picture is worth a thousand words, I have uploaded a small demo video to youtube.

I hope you find this project interesting. Thank you for your time and I’d be waiting for your comments.

Saturday, 13 August 2011

Sakai CLE Mobile Application using phoneGap and jQuery mobile.

After a long time I think I should write something here as this will be the last or one before the last opportunity to write in the Sakai Google Summer of code projects blog.

Last month or few weeks was really a hard time because the things I had to handle were almost new to me and almost all the cases were issues and my mentor was getting so many mails with the title * issues (this is a regex) examples would be “Localhost issues”, “Cross domain issues” :-).

I think discussing solutions for them would be really worthy because when I googling also what I noticed in some places was “21 users has this question (Stackoverflow)” so I was the (n+1)th person to have same question.

First issue raised just after the last post because as I discussed in that I was using JSONP feeds from the server to render application on mobile but due to security reasons JSONP feeds were stopped and I was in trouble. When I searched I note that the solution is JSONP and having a proxy sever both are not going to work for mobile application as JSONP support also no longer there. But all these restrictions are for http:// and https:// protocols I realized later with the help from my mentor. Because as we are developing the application using phoneGap and it uses file:/// protocol there is no such restriction to get JSON feeds from a remote server so finally the problem solved. For completeness here is a snippet.

Next the issue with localhost, when we are going to test application with emulator (in my case Android) when it sees localhost with in the emulator (device) it is looking for a localhost inside the emulator and eventually failed to find. So I had to use my friends laptop as my dedicated Sakai server. But for mac users I saw a solution here [1] but I can't try it out. Though phoneGap wiki has something like this [2], I could not get it working, please correct me if I am wrong by adding a comment.

Next the very immediate issue I had was ajax requests that were in different pages to not working. The issue is like this, I used all the ajax requests in doument.ready event in different htmls but as jquery mobile uses ajax is used to load the contents of each page into the DOM as we navigate, and the DOM ready handler only executes for the first page. [3] So instead of DOM ready, in jquery mobile, we have to bind the pagecreate event in order to execute the code when the page is loaded and created by ajax. Note that the above snippet is using pagecreate. And one more thing I noticed most of the places is to use data-role=page and create new pages without adding a new HTML this will increase the loading time of each page as well. Use separate HTMLs if you really need only and if you are not happy with the pagecreate you can just use rel=”external” flag with your link like this,

but this will stop you by using page transition effects like slide and all and currently there is no way to get the effect with this.
And finally, I would like to add some screens that we can see in the application.

"This will be the view of a profile for an user in sakai CLE. User details are categorized in to sections as we can see and they collapsible too. Moreover users can update their status on the go via @sakaimobile :-)"

"This is how an user will see what are the new alerts from different tools in a selected site. We will, most probaby, be supporting Announcements, Assignments, Forums and Roster tools." Note that un-supported tools are grayed out. And users can see how many new alerts are there from each tool.

At the beginning as I said this will be the last or one before the last post in this blog but one day I might be writing here again but as a proud *mentor* for an student of Sakai foundation, who knows? :-)
Add you valuable comment and correct me if there is anything I have mentioned wrong. Thank you all for giving this opportunity to write in this blog and thanks to my mentor for guiding me and help me in all the issues I had.
Next week will be for any documentation stuff, if there are any, and correct the application where necessary to work with different devices.