Posts tagged ‘Programming’

Reverting a file in ‘git’

I have just begun to learn ‘git’ and understand the motivation behind distributed SCMs. One of the great powers of git, in addition to the fact that it is super fast is its excellent merging and branching capabilities. I use perforce at work and SVN at home and I have used CVS as well so to tune your head around a distributed SCM is challenging and requires some reorientation.

We use perforce at work and created merged 104 branches in the last year (all experimentation and execution is done in branches with the trunk or mainline being stable) - I think the merging in perforce is quite good but too slow, plus it costs money. SVN and CVS are different stories however - merging is downright painful and requires endless hours on manual merges and testing. So I was interested in seeing what GIT has to offer.

Installation and creation of a repository was quite straight forward. I checked in a project and was functional.

Next I tried to edit a file and tried to revert it.

pranay@pranaydesktop:~/dev/workspace/grails/testApp/web-app/js/prototype$ vim prototype.js
pranay@pranaydesktop:~/dev/workspace/grails/testApp/web-app/js/prototype$ git status
# On branch master
# Changed but not updated:
#   (use "git add ..." to update what will be committed)
#   (use "git checkout -- ..." to discard changes in working directory)
#
#	modified:   prototype.js
#
no changes added to commit (use "git add" and/or "git commit -a")
pranay@pranaydesktop:~/dev/workspace/grails/testApp/web-app/js/prototype$ git revert prototype.js
fatal: Cannot find 'prototype.js'
pranay@pranaydesktop:~/dev/workspace/grails/testApp/web-app/js/prototype$ 

Not what I really expected. A little research made me learn is “revert” in git is actually a “rollback” of a checked in change. This was a little counter intuitive - I would have preferred it being called “rollback” instead.

The real way to actually revert a file is check it out again.

pranay@pranaydesktop:~/dev/workspace/grails/testApp/web-app/js/prototype$ git checkout prototype.js
pranay@pranaydesktop:~/dev/workspace/grails/testApp/web-app/js/prototype$ git status
# On branch master
nothing to commit (working directory clean)
pranay@pranaydesktop:~/dev/workspace/grails/testApp/web-app/js/prototype$
pranay@pranaydesktop:~/dev/workspace/grails/testApp/web-app/js/prototype$ 

Hopefully I will never forget this.

High speed file splitting/integration in java using NIO

I have many times been faced with a situation where I am trying to move very large files (ISOs or zips upto 1-4 GB in size) but I don’t have a USB drive of that capacity and for some reason I can’t do it over the network. Of course if you want to P2P broadcast of huge files (think updating 200 machines simultaneously) - splitting them up helps in this case specially if you want to replicate a managed bit-torrent like environment. I have found some commercial file splitters out there but they are too slow and clunky. There is no concievable reason why they have to be so slow or I should live without options.
So I just decided to write one from scratch plus it gave me a reason to refresh my NIO knowledge. With some tweaking and proper usage of buffers and channels I have managed to get a comparable/better throughput in java than even the native operating system tools. I tested the integrity of the file and everything was OK.

The amount of code to do it minuscule and quite straight forward. First the splitter:

package net.ahlawat.file;

import java.io.*;
import java.nio.ByteBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;

/**
 * Program that splits the file
 * User: Pranay Ahlawat
 * Date: Jan 18, 2010
 * Time: 8:14:03 PM
 */
public class Splitter {
    static long BYTE_TO_MB  = 1024 * 1024;
    static long BUFER_SIZE = 128 * 1024;

    public static void main(String[] args) throws Exception {
        if (args.length < 3) {
            System.out.println("splitter [fileName] [split size in MB] [out dir]");
            System.exit(1);
        }

        //create the local variables to be used in the rest of the application
        File inFile = new File(args[0]);
        long partitionSize = Long.parseLong(args[1]) * BYTE_TO_MB;
        File outDir = new File(args[2]);

        //create inital counters
        final long totalFileSize = inFile.length();

        //create the out dirs if they dont exist
        if (!outDir.exists()) {
            System.out.println("Creating directory : " + outDir.getName());
            outDir.mkdirs();
        }

        FileChannel inChannel =  new FileInputStream(inFile).getChannel();

        long currentPosition = 0;
        int ctr = 0;
        ByteBuffer buff = ByteBuffer.allocate((int)BUFER_SIZE);
        long start = System.currentTimeMillis();

        while(currentPosition < totalFileSize) {
            //get the out channel for the file - roughly is the "originalFileName.ext.n" where 'n' is the partition number
            FileChannel outChannel = getChannel(inFile, outDir, ++ctr); //init the out channel
            //the size of the nth partition
            long size = currentPosition + partitionSize < totalFileSize? partitionSize : totalFileSize - currentPosition;
            //sout
            System.out.print(String.format("Creating part %s of size %s MB", ctr, size/BYTE_TO_MB));
            long start2 = System.currentTimeMillis();

            //the end position of the nth partition w.r.t the entire file
            long endPosition = currentPosition + size;

            //write partition in BUFFER_SIZE chunks
            while(currentPosition < endPosition) {
                //read the chunk into the buffer
                long subSize = (currentPosition + BUFER_SIZE) < endPosition ? BUFER_SIZE : endPosition - currentPosition;
                inChannel.read(buff, currentPosition);
                //prepare for writing
                buff.flip();
                //write
                outChannel.write(buff);
                currentPosition += subSize;
                //clear the buffer - so we can write again
                buff.clear();
            }

            outChannel.close(); //close

            //print throughput for this file partition
            double delta = (double)(System.currentTimeMillis() - start2)/1000;
            System.out.println(String.format(" -> Transferred in %.2f s @ %.2f MB/s", delta,
                    (double) size/BYTE_TO_MB/delta));
        }

        //calculate time
        double delta =  (double)(System.currentTimeMillis() - start)/1000;

        //print out the total throughput
        System.out.println(String.format("Copied %.2f MB in %.2f s @ %.2f MB/s", (double)totalFileSize/BYTE_TO_MB, delta, (double)totalFileSize/BYTE_TO_MB/delta));

        //finally close the channel
        inChannel.close();
    }

    private static FileChannel getChannel(File inFile, File outDir, int ctr) throws FileNotFoundException {
        return new FileOutputStream(new File(outDir, (inFile.getName() + "." + ctr))).getChannel();
    }
}

There are a couple of things I would like to mention about this code. First I tried a variety of things - I tried the MappedMemoryBuffers which was not giving me good performance so I reverted to using vanilla byte buffers. Next I tried a variety of buffer sizes unsurprisingly too low a buffer size means too many reads and too high meant very slow buffer manipulation - vanilla byte buffers of 128K seemed to be just right and gave me great speed and memory numbers.

The file under experiment was the open solaris ISO - about 700 MB in size. Here is the output:

Creating part 1 of size 100 MB -> Transferred in 0.27 s @ 366.30 MB/s
Creating part 2 of size 100 MB -> Transferred in 0.25 s @ 403.23 MB/s
Creating part 3 of size 100 MB -> Transferred in 0.24 s @ 413.22 MB/s
Creating part 4 of size 100 MB -> Transferred in 0.25 s @ 406.50 MB/s
Creating part 5 of size 100 MB -> Transferred in 1.19 s @ 84.32 MB/s
Creating part 6 of size 100 MB -> Transferred in 2.16 s @ 46.38 MB/s
Creating part 7 of size 76 MB -> Transferred in 2.21 s @ 34.85 MB/s
Copied 676.99 MB in 6.69 s @ 101.21 MB/s

Not bad I could split the file up in under 7 seconds - this is better throughput than what the native tool gives me. The result of this code was that the big file was split into 100MB chunks (and change).

Next the integrator -

package net.ahlawat.file;

import java.io.File;
import java.io.FileOutputStream;
import java.io.FileInputStream;
import java.nio.channels.FileChannel;
import java.nio.ByteBuffer;
import static net.ahlawat.file.Splitter.*;

/**
 * Integrator - integrate files
 * User: Pranay Ahlawat
 * Date: Jan 18, 2010
 * Time: 10:51:43 PM
 */
public class Integarator {

    public static void main(String[] args) throws Exception {
        if (args.length < 3) {
            System.out.println("integrator [fileName] [dir] [out file name]");
            System.exit(1);
        }

        //create core variables
        File dir = new File(args[1]);
        String baseFileName = args[0];
        File outFile = new File(args[2]);

        //create the out channel - to which the data will be written
        FileChannel outChannel = new FileOutputStream(outFile).getChannel();

        //core buffer
        ByteBuffer buff = ByteBuffer.allocate((int)BUFER_SIZE);

        int ctr = 0;
        long start = System.currentTimeMillis();
        while(true) {
            //some profiling
            long start2 = System.currentTimeMillis();
            //create the file and test to see if it's there
            File file = new File(dir, String.format("%s.%s", baseFileName, ++ctr));
            if (!file.exists()) { //no the file 'n' does not exist - integration complete
                break;
            }

            System.out.print(String.format("Integrating %s", file.getName()));

            //creat the in channel for the partitioned file 'n'
            FileChannel inChannel = new FileInputStream(file).getChannel();

            long currentPosition = 0;
            long fileSize = file.length();

            //read the file in chunks of BUFFER_SIZE
            while(currentPosition < fileSize) {
                long chunkSize = (currentPosition + BUFER_SIZE) < fileSize? BUFER_SIZE : fileSize - currentPosition;
                inChannel.read(buff, currentPosition);
                currentPosition += chunkSize;
                buff.flip(); //flip the buffer we are ready to write
                outChannel.write(buff);
                buff.clear(); //clear
            }

            //close/flush the information
            inChannel.close();

            //print profiling inforamtion
            double delta = (double) (System.currentTimeMillis() - start2)/1000;
            System.out.println(String.format(" -> Integration complete in %.2f s @ %.2f MB/s",
                    delta, file.length()/BYTE_TO_MB/delta));
        }

        outChannel.close();
        double delta = (double) (System.currentTimeMillis() - start)/1000;
        System.out.println(String.format("Integration complete in %.2f @ %.2f MB/s", delta, outFile.length()/BYTE_TO_MB/delta));
    }
}

Again I tried the outChannel.transferFrom() but it just bew up - the performance was horrible. The best results were when I used vanilla buffers and manipulated them myself.

Here are the results:

Integrating osol.iso.1 -> Integration complete in 0.35 s @ 283.29 MB/s
Integrating osol.iso.2 -> Integration complete in 0.26 s @ 378.79 MB/s
Integrating osol.iso.3 -> Integration complete in 0.25 s @ 393.70 MB/s
Integrating osol.iso.4 -> Integration complete in 0.28 s @ 361.01 MB/s
Integrating osol.iso.5 -> Integration complete in 1.68 s @ 59.56 MB/s
Integrating osol.iso.6 -> Integration complete in 2.10 s @ 47.55 MB/s
Integrating osol.iso.7 -> Integration complete in 1.39 s @ 54.87 MB/s
Integration complete in 6.44 @ 104.97 MB/s

Not bad at all. Just to put how fast this in in perspective - using cygwin just copying about 700 MB takes about 15 seconds.

deepti@aanyalaptop /cygdrive/c/test
$ time cp osol.iso cp_of_osol.iso

real    0m16.014s
user    0m0.031s
sys     0m1.825s

deepti@aanyalaptop /cygdrive/c/test
$

And I wrote this little bat script to measure the throughput of native windows command line.

prompt $d $t $_$P$G
copy osol.iso another_cp.iso
prompt $d $t $_$P$G

Here is the output.

C:\test>prompt $d $t $_$P$G

Tue 01/19/2010  1:07:48.86
C:\test>copy osol.iso another_cp.iso
        1 file(s) copied.

Tue 01/19/2010  1:08:00.32
C:\test>prompt $d $t $_$P$G

Tue 01/19/2010  1:08:00.32
C:\test>

Which is approximate 12 seconds… :) - java NIO rocks.

I will package this up with a UI and make it available as a tool on ahlawat.net soon for all interested.

A flawless Publisher-Subscriber using BlockingQueue

java.util.concurrent simply rocks. I cant believe how simple it has made every day programming tasks.

What is the first thing your learn when you do multi-threading - a producer consumer. It’s a great example to learn notify, wait and understanding the locking semantics of java threading. Its a pity I started earlier because those of us starting out with java 1.5/6 will have their lives too easy. The BlockingQueue is a fantastic addition to the language and using it one can implement a synchronized multi publisher-multi subscriber system using semantics and constructs no different from java collections.

Here is an example with 5 publishers and 2 subscribers:

package net.ahlawat;

import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.Date;

/**
 * @author Pranay Ahlawat
 */
public class PubSubTest {
    static class Publisher implements Runnable {
        BlockingQueue<String> queue;
        String name;
        public Publisher(BlockingQueue<String> queue, String name) {
            this.queue = queue;
            this.name = name;
        }

        public void run() {
            while(true) {
                queue.add(String.format("Msg from %s: %s [on %s]", name, Math.random(), new Date()));
                try {
                    Thread.sleep(1000);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                    break;
                }
            }
        }
    }

    static class Subscriber implements Runnable {
       BlockingQueue<String> queue;
        String name;
        public Subscriber(BlockingQueue<String> queue, String name) {
            this.queue = queue;
            this.name = name;
        }

        public void run() {
            while(true) {
                try {
                    String in = queue.take();
                    System.out.println(String.format("[%s GOT MESSAGE] %s",name,in));
                } catch (InterruptedException e) {
                    e.printStackTrace();
                    break;
                }

            }
        }
    }

    public static void main(String[] args) {
        final int numberOfPublishers = 5;
        BlockingQueue<String> blockingQueue = new ArrayBlockingQueue<String>(10);
        for (int x=1; x<=numberOfPublishers; x++) {
            Publisher publisher = new Publisher(blockingQueue, x+"");
            new Thread(publisher).start();
        }

        Subscriber subscriber1 = new Subscriber(blockingQueue, "Subscriber 1");
        new Thread(subscriber1).start();
        Subscriber subscriber2 = new Subscriber(blockingQueue, "Subscriber 2");
        new Thread(subscriber2).start();
    }
}

It’s quite straight forward - there are a total of 7 threads interacting with the queue. 5 publishers are putting stuff on the queue and 2 subscribers are picking up stuff from it and creatively printing it out the standard out. What I want you to see is the number of times I have used ’synchronized’ in the code - 0.

The output is not surprising:

[Subscriber 1 GOT MESSAGE] Msg from 4: 0.6466875854315378 [on Fri Dec 11 02:11:27 EST 2009]
[Subscriber 1 GOT MESSAGE] Msg from 1: 0.33362845358296433 [on Fri Dec 11 02:11:27 EST 2009]
[Subscriber 1 GOT MESSAGE] Msg from 2: 0.11207796566244055 [on Fri Dec 11 02:11:27 EST 2009]
[Subscriber 1 GOT MESSAGE] Msg from 3: 0.6810655758824113 [on Fri Dec 11 02:11:27 EST 2009]
[Subscriber 1 GOT MESSAGE] Msg from 5: 0.5679631128460616 [on Fri Dec 11 02:11:27 EST 2009]
[Subscriber 1 GOT MESSAGE] Msg from 1: 0.6304440131162121 [on Fri Dec 11 02:11:28 EST 2009]
[Subscriber 1 GOT MESSAGE] Msg from 2: 0.021117766277559014 [on Fri Dec 11 02:11:28 EST 2009]
[Subscriber 2 GOT MESSAGE] Msg from 3: 0.1955294791717468 [on Fri Dec 11 02:11:28 EST 2009]
[Subscriber 2 GOT MESSAGE] Msg from 4: 0.884529348835637 [on Fri Dec 11 02:11:28 EST 2009]
[Subscriber 1 GOT MESSAGE] Msg from 5: 0.034690283475101946 [on Fri Dec 11 02:11:28 EST 2009]
[Subscriber 2 GOT MESSAGE] Msg from 1: 0.5764439934861816 [on Fri Dec 11 02:11:29 EST 2009]
[Subscriber 1 GOT MESSAGE] Msg from 2: 0.3629499102212388 [on Fri Dec 11 02:11:29 EST 2009]
[Subscriber 1 GOT MESSAGE] Msg from 4: 0.3770428828123388 [on Fri Dec 11 02:11:29 EST 2009]
[Subscriber 2 GOT MESSAGE] Msg from 3: 0.9450938944637225 [on Fri Dec 11 02:11:29 EST 2009]
[Subscriber 1 GOT MESSAGE] Msg from 5: 0.8910317407643176 [on Fri Dec 11 02:11:29 EST 2009]
[Subscriber 2 GOT MESSAGE] Msg from 1: 0.5785955008786261 [on Fri Dec 11 02:11:30 EST 2009]
[Subscriber 2 GOT MESSAGE] Msg from 4: 0.9442550853581151 [on Fri Dec 11 02:11:30 EST 2009]
[Subscriber 2 GOT MESSAGE] Msg from 3: 0.3308239883343358 [on Fri Dec 11 02:11:30 EST 2009]
[Subscriber 2 GOT MESSAGE] Msg from 5: 0.5450057593023042 [on Fri Dec 11 02:11:30 EST 2009]
[Subscriber 1 GOT MESSAGE] Msg from 2: 0.13504231409694423 [on Fri Dec 11 02:11:30 EST 2009]
[Subscriber 2 GOT MESSAGE] Msg from 1: 0.1018850869879191 [on Fri Dec 11 02:11:31 EST 2009]
[Subscriber 1 GOT MESSAGE] Msg from 2: 0.7325884278324815 [on Fri Dec 11 02:11:31 EST 2009]
[Subscriber 2 GOT MESSAGE] Msg from 4: 0.8804538983093999 [on Fri Dec 11 02:11:31 EST 2009]
...

Such an elegant solution to a classic problem - the wonderful BlockingQueue …

Visual C# .net - some interesting language features

So after being a member of the anti Microsoft club for a long time, I finally found myself wanting to write some COM code and decided to pick up a 1350 page C# book and start learning something that any self respecting java developer would never do, learn C# and the .net platform.
The first thing that struck me about C# is the syntactical similarity to Java. All Algol family languages C,C++ etc. share the same heritage so the similarity was not so shocking - but everything from constructor chaining, single class inheritance model and signature based method was strangely similar. In fact I can safely say that if you are a java developer with any experience, you should be up and running with C# in a matter of days if not hours. Similarities apart - here are a few things that I love about C#:

  • Operator overloading - this is such an awesome tool when you are writing a DSL or an engineering based application. Groovy allows this for the JVM but C# natively supports it for the .net CLR.
  • Implicit and explicit casts - this is an extension to the operator overloading but how many times have we found ourselves writing a utility class that has a single argument constructor that takes a String or a numeric type. What an ingenious idea - using such a feature is common in dynamic languages. I have found myself using the “as” keyword in groovy all the time. With C# you can customize how implicit casting is done.
  • Delegates - with ruby on rails and grails radically changing the way web applications are done its quite amazing to imagine a language without delegates and closures. (yes I know all java lovers will say they never needed it), but the fact of the matter is most powerful frameworks either do their magic using template patterns, byte code manipulation or AOP. The concept of closures or delegates is missing from Java (thank god they are finally working on it). C# although not a full fledged functional language has delegates and there is a lot of really cool stuff you can do with them. What I absolutely loved is delegate chaining and multicast delegates. What an absolutely awesome concept built into the language. The listener pattern can be very elegantly implemented with this feature.

I have just begun with C# - as time goes on I am sure I will be adding more stuff to the list. The features are unfortunately not compelling enough for me to pick up C# for any client or live projects that I am doing. I can achive the same things and much much more by using poly-language groovy/java mix. Not to mention the millions of free lines of code and the flexibility that comes with the java platform. But I am very happy with C# so far, definitely not a total waste of time.

Eclipse vs IntelliJ - a comparison for voracious JEE developers

I hate writing or reading topics like editor and IDE wars. I think the answer to this question will vary with developer needs, experience and requirements.
I have been using eclipse for a better part of five years now, not only for Java but also for Ruby, Python, Perl and PHP - although I must admit I have spent about ninety percent of the time doing JEE only. For the last three months or so I have been doing a lot of groovy and grails development and I have found the groovy plugin for eclipse quite inadequate for serious development. The debugger just does not work very well and the editor is bloated, buggy and non-functional. On my 64 bit linux machine it keeps crashing eclipse 3.4. (although this could possibly be fixed by modifying some JVM parameters - I just haven’t fixed it). A vanilla download of eclipse PDT would not even start on 64 bit linux - it’s quite a shame that they dont fix these rough edges.

For pure java development I think there is yet a better open source platform, netbeans is a hog and is quite bloated to work with. I always believe that open source software will grow faster and perform better and will be a better bet longer term both in terms of productivity, tool support and not to mention cost benefit and certainly eclipse has served me well. There is one tool though that has defied this theory - intelliJ has keep chugging along and has just released its 8.0 M1 release - congratulations to the team. We are primarily a intelliJ shop and I have tried hard to make developers move to eclipse, I have successfully moved quite a few over to the dark side.

My visit to the New England Java Symposium had a surprise, a one year personal license of intelliJ 7. So in the last 3 weeks or so I tried it and to be quite honest - I love what I see. It does not have a huge set of plugins, but all the ones present are quite easy to work with and certainly dont compromise the stability of the IDE platform. I installed the Jet Groovy plugin and imported the Grails project and was happily coding in less than an hour.

The groovy class drill down, code completion and other features, to my pleasant surprise work by orders of magnitude faster and better than eclipse. Here are some interesting observations I would like to share:

  • Eclipse has an overall better editor - invoking templates (textmate style tab completions) is trivial, the editor has better and more intuitive code completion, I think the shortcuts are more sensible - hitting tab makes more sense to me than a ctrl+shift+enter and templates are invoked with a ctrl+space just like standard code completion unlike the ctrl+J in intelliJ
  • IntelliJ has an awesome grails and groovy support, refactoring, drill down and code completion works without a hitch. (I prefer eclipse JEE for GSP/JSP editing however, this could just be the eclipse DNA speaking however).
  • Eclipse has a much more sophisticated XML support. The ant editor and support is better as well, in my opinion.
  • I like the automatic facet recognition in intelliJ. This makes the use of frameworks like spring quite simple. Although the overall support for spring is much better in eclipse. Spring IDE also gives you tools like web flow integration, you would expect a little better support for eclipse since the tools are actually written by the spring team and the project is well funded
  • IntelliJ supports JSF out of the box, although supporting frameworks like A4J and facelets is missing. Eclipse does support them via plugins.
  • I love the fact that intelliJ is designed form the ground up to support projects with various structures - supports the importing of eclipse, maven, JBuilder out of the box. This means you can have frameworks spit out .classpath and .project files for eclipse and import them into intelliJ. I am sure there are open source tools or plugins for eclipse, but no such support is built in.
  • Both IDEs have a plugin architecture. There are without a doubt more eclipse plugins, a lot of them are quite flaky and quite frankly not production ready. Although the eclipse versioning and dependency managment simply sucks. Installation of plugins has, on more than occasion, destabilized my install. I have had my JVM crash and the plugins freeze. My experience is to be a little conservative with the eclipse releases for best results. I still use Europa and most plugins work quite well with it. IntelliJ so far has been rock solid stable. Most of the plugins can be accessed and installed from within the IDE. Both IDEs require a restart, which is OK - but I still don’t understand why this cannot be done programmatically.
  • The source control integrations are mixed. I think eclipse is better with CVS and perforce (via P4WSAD plugin), intelliJ has better subversion support. I think both subclipse and subversive are great on windows - but I have had a torrid time getting them to work on 64 bit Linux. The subversive plugin does not even install an SVN provider by default on x86-64 linux. I just don’t have the time to muck around with it any more. IntelliJ’s SVN is provider works out of the box, is stable and I love the inbuilt diff.

So are you still wondering which IDE to use? My answer is simple, use what works for you. IntelliJ is an awesome commercial tool for java development and spending $250 for a personal license might not be a bad deal if it makes you productive. After using the jet groovy plugin, I will certainly not mind spending the $250 - since it will make me more productive and just the time you save might be worth it. Eclipse can be a great tool with the right plugins and certainly will remain alive and kicking for a long time, in software terms, to come. So continue coding yourselves crazy.