Archive for the ‘Random’ Category.

Python difflib - minimum fuss and impressive results

So I wanted to write a little library that does a little website monitoring for me and send me an email every time the page changes.

I really did not want to keep wasting time looking at the page - so what would be ideal for me is to have the email send me a diff instead.

Long story short I discovered python difflib - which is ridiculously easy to use:

First I have my two files:

Pranay-Ahlawats-MacBook-Pro:test pranay$ more filea.txt
one
two
three
four
Pranay-Ahlawats-MacBook-Pro:test pranay$ more fileb.txt
one
two
four
five
Pranay-Ahlawats-MacBook-Pro:test pranay$

And here is the python script that generates the diff output:

import difflib

lines_a = open("filea.txt", "r").readlines()
lines_b = open("fileb.txt", "r").readlines()

for line in difflib.ndiff(lines_a, lines_b):
    print line,

Here is the output:

  one
  two
- three
  four
+ five

What I really liked about difflib is a little class called HtmlDiff. With HtmlDiff you can output an html file which gives a color coded diff:

So changing the script to:

import difflib

lines_a = open("filea.txt", "r").readlines()
lines_b = open("fileb.txt", "r").readlines()

html_diff = difflib.HtmlDiff()
print html_diff.make_file(lines_a, lines_b)

Generates:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html>

<head>
    <meta http-equiv="Content-Type"
          content="text/html; charset=ISO-8859-1" />
    <title></title>
    <style type="text/css">
        table.diff {font-family:Courier; border:medium;}
        .diff_header {background-color:#e0e0e0}
        td.diff_header {text-align:right}
        .diff_next {background-color:#c0c0c0}
        .diff_add {background-color:#aaffaa}
        .diff_chg {background-color:#ffff77}
        .diff_sub {background-color:#ffaaaa}
    </style>
</head>

<body>

    <table class="diff" id="difflib_chg_to0__top"
           cellspacing="0" cellpadding="0" rules="groups" >
        <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
        <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>

        <tbody>
            <tr><td class="diff_next" id="difflib_chg_to0__1"><a href="#difflib_chg_to0__0">f</a></td><td class="diff_header" id="from0_1">1</td><td nowrap="nowrap">one</td><td class="diff_next"><a href="#difflib_chg_to0__0">f</a></td><td class="diff_header" id="to0_1">1</td><td nowrap="nowrap">one</td></tr>
            <tr><td class="diff_next"></td><td class="diff_header" id="from0_2">2</td><td nowrap="nowrap">two</td><td class="diff_next"></td><td class="diff_header" id="to0_2">2</td><td nowrap="nowrap">two</td></tr>
            <tr><td class="diff_next"><a href="#difflib_chg_to0__1">n</a></td><td class="diff_header" id="from0_3">3</td><td nowrap="nowrap"><span class="diff_sub">three</span></td><td class="diff_next"><a href="#difflib_chg_to0__1">n</a></td><td class="diff_header"></td><td nowrap="nowrap"></td></tr>
            <tr><td class="diff_next"></td><td class="diff_header" id="from0_4">4</td><td nowrap="nowrap">four</td><td class="diff_next"></td><td class="diff_header" id="to0_3">3</td><td nowrap="nowrap">four</td></tr>
            <tr><td class="diff_next"><a href="#difflib_chg_to0__top">t</a></td><td class="diff_header"></td><td nowrap="nowrap"></td><td class="diff_next"><a href="#difflib_chg_to0__top">t</a></td><td class="diff_header" id="to0_4">4</td><td nowrap="nowrap"><span class="diff_add">five</span></td></tr>
        </tbody>
    </table>
    <table class="diff" summary="Legends">
        <tr> <th colspan="2"> Legends </th> </tr>
        <tr> <td> <table border="" summary="Colors">
                      <tr><th> Colors </th> </tr>
                      <tr><td class="diff_add">&nbsp;Added&nbsp;</td></tr>
                      <tr><td class="diff_chg">Changed</td> </tr>
                      <tr><td class="diff_sub">Deleted</td> </tr>
                  </table></td>
             <td> <table border="" summary="Links">
                      <tr><th colspan="2"> Links </th> </tr>
                      <tr><td>(f)irst change</td> </tr>
                      <tr><td>(n)ext change</td> </tr>
                      <tr><td>(t)op</td> </tr>
                  </table></td> </tr>
    </table>
</body>

</html>

Which looks quite pretty:

Great attachment for an email.

Setting up amazon EC2 tools on a mac

I wanted to setup an amazon VPC and had to use amazon ec2 tools for it.

It was not completely clear what environment variables to set and although the tools themselves are fairly easy to use setting them up so that you can talk to the amazon webservices securely is not completely straight forward. So I thought I will put down the steps.

Step 1. Download the tools
You can download the amazon ec2 tools from the amazon developer website here.

Step 2. Unzip the tools to a directory

Step 3. Generate the x.509 certificate and download the certificate and private key.

This was not totally clear. But you can go to https://aws.amazon.com/account and click on security credentials. Hit the x.509 tab and create a new certificate for the account. You will be asked to download the private key - download it and store it in a safe location on your computer. You can also click on the link there and download the actual crert file - store it on a safe location on your computer.

Step 4. Set up the environment variables.

You will have to set up 4 environment variables for this to work.

Pranay-Ahlawats-MacBook-Pro:ec2-api-tools-1.3-53907 pranay$ export EC2_HOME=`pwd`
Pranay-Ahlawats-MacBook-Pro:ec2-api-tools-1.3-53907 pranay$ export EC2_PRIVATE_KEY=/path/to/pk....pem
Pranay-Ahlawats-MacBook-Pro:ec2-api-tools-1.3-53907 pranay$ export EC2_CERT=/path/to/cert-...pem
Pranay-Ahlawats-MacBook-Pro:ec2-api-tools-1.3-53907 pranay$ export JAVA_HOME=`/usr/libexec/java_home`

After that you should be able to execute commands just fine.

Pranay-Ahlawats-MacBook-Pro:ec2-api-tools-1.3-53907 pranay$ ./bin/ec2-describe-instances
RESERVATION	r-7720491c	922634713555	desktoneSecurityGroup
INSTANCE	i-3e4cafff54	ami-c3e40ddfaa	ec2-18adsf4-ada73-116-67.compute-1.amazonaws.com	domU-12-31-39-02-D5-57.compute-1.internal	running	desktone	0		m1.small	2010-08-04T14:44:46+0000	us-east-1b			windows	monitoring-disabled	18.73.116.67	10.248.218.165			ebs					hvm
BLOCKDEVICE	/dev/sda1	vol-22fb894b	2010-08-01T00:04:56.000Z
RESERVATION	r-1777037c	922634713555
INSTANCE	i-36a1195dfc	ami-5b77fdf9c32			running	desktone	0		m1.small	2010-08-04T14:44:46+0000	us-east-1b			windows	monitoring-disabled		17.19.10.41 	vpc-0ae3ceb67	subnet-023cefb6b	ebs					hvm
BLOCKDEVICE	/dev/sda1	vol-6478060d	2010-08-03T21:05:41.000Z
Pranay-Ahlawats-MacBook-Pro:ec2-api-tools-1.3-53907 pranay$

Regenerating open ssh host keys for ubuntu/debian

Often after you clone a linux VM with open ssh server installed - the ssh keys generated at the time open ssh server was installed are copied over and you might have problems sshing to that machine.

To regenerate the host keys -

# rm /etc/ssh/ssh_host_*
# dpkg-reconfigure openssh-server
Creating SSH2 RSA key; this may take some time ...
Creating SSH2 DSA key; this may take some time ...

That is it.

Mounting NTFS partitions as read-write on Ubuntu Linux

By default when you install ubuntu server and try to mount an ntfs file system you will only be able to mount it as a read only file system.

For example - I have attached a hard disk with a standard windows installation on device 1 (sdb) on my computer.

root@server:/dev$ ls
block            disk      input  loop3   mem                 pktcdvd  ram1   ram2  ram9    sda2  shm       tty    tty14  tty20  tty27  tty33  tty4   tty46  tty52  tty59  tty8     usbdev1.1_ep00  vcs1  vcsa1     zero
bus              ecryptfs  kmem   loop4   net                 port     ram10  ram3  random  sda5  snapshot  tty0   tty15  tty21  tty28  tty34  tty40  tty47  tty53  tty6   tty9     usbdev1.1_ep81  vcs2  vcsa2
cdrom            fd        kmsg   loop5   network_latency     ppp      ram11  ram4  rtc     sdb   sndstat   tty1   tty16  tty22  tty29  tty35  tty41  tty48  tty54  tty60  ttyS0    usbdev1.2_ep00  vcs3  vcsa3
char             full      log    loop6   network_throughput  psaux    ram12  ram5  rtc0    sdb1  sr0       tty10  tty17  tty23  tty3   tty36  tty42  tty49  tty55  tty61  ttyS1    usbdev1.2_ep81  vcs4  vcsa4
console          fuse      loop0  loop7   null                ptmx     ram13  ram6  scd0    sg0   stderr    tty11  tty18  tty24  tty30  tty37  tty43  tty5   tty56  tty62  ttyS2    usbmon0         vcs5  vcsa5
core             hpet      loop1  lp0     oldmem              pts      ram14  ram7  sda     sg1   stdin     tty12  tty19  tty25  tty31  tty38  tty44  tty50  tty57  tty63  ttyS3    usbmon1         vcs6  vcsa6
cpu_dma_latency  initctl   loop2  mapper  parport0            ram0     ram15  ram8  sda1    sg2   stdout    tty13  tty2   tty26  tty32  tty39  tty45  tty51  tty58  tty7   urandom  vcs             vcsa  xconsole
root@server:/dev$ mount
/dev/sda1 on / type ext3 (rw,relatime,errors=remount-ro)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
varrun on /var/run type tmpfs (rw,nosuid,mode=0755)
varlock on /var/lock type tmpfs (rw,noexec,nosuid,nodev,mode=1777)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
fusectl on /sys/fs/fuse/connections type fusectl (rw)
securityfs on /sys/kernel/security type securityfs (rw)

root@server:~# mkdir /mnt/ntfs
root@server:~# mount /dev/sdb1 /mnt/ntfs/
root@server:~# cd /mnt/ntfs
root@server:/mnt/ntfs# ls
AUTOEXEC.BAT  boot.ini  CONFIG.SYS  Documents and Settings  hiberfil.sys  IO.SYS  MSDOS.SYS  NTDETECT.COM  ntldr  pagefile.sys  Program Files  RECYCLER  sysprep_dat  System Volume Information  WINDOWS
root@server:/mnt/ntfs# touch hello.txt
touch: cannot touch `hello.txt': Read-only file system

To fix this problem all you have to do is install the ntfs-3g drivers on your machine - on ubuntu you can do that by using apt-get install - after that your file system mounts as read-write.

root@server:/mnt# sudo apt-get install ntfs-3g
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
  fuse-utils libfuse2 libntfs-3g49
The following NEW packages will be installed:
...
root@server:/mnt# mount /dev/sdb1 /mnt/ntfs/
The disk contains an unclean file system (0, 0).
The file system wasn't safely closed on Windows. Fixing.
root@server:/mnt# cd /mnt/ntfs
root@server:/mnt/ntfs# ls
AUTOEXEC.BAT  boot.ini  CONFIG.SYS  Documents and Settings  hiberfil.sys  IO.SYS  MSDOS.SYS  NTDETECT.COM  ntldr  pagefile.sys  Program Files  RECYCLER  sysprep_dat  System Volume Information  WINDOWS
root@server:/mnt/ntfs# touch hello.txt
root@server:/mnt/ntfs# vim hello.txt
root@server:/mnt/ntfs# exit

Showing hidden files and unix folders in mac finder

By default your mac finder will not show you hidden files like “~/.bashrc” and/or “~/.m2″ to show these files you have to:

1. Open terminal and type the following command:

defaults write com.apple.Finder AppleShowAllFiles YES

2. Relaunch finder - by clicking down on finder in the dock while keeping the “Alt/option” key pressed and then selecting relaunch.

Next this time you will see all your hidden files and folders and also core unix folders like /usr and /var.

Useful git resources

In my attempt to learn git I have come across many good online resources - here is a list of all of them, hopefully it will help you learn git faster and I will never Google for them again:

1. Pro Git Book - good reference.

2. Git magic book - a good condenced book with lots of examples.

3. Git for Designers -  an introduction to SCM and how GIT fits in.

4. Git cheat sheet - full command reference - very useful.

5. Git Man Pages

6. Git from the bottom up - a pdf explaining all concepts of git from the bottom up

7. Visual Git Cheat sheet

git

git

Manual transactional demarcation in spring and hibernate

Often you are forced to write code where the @Transactional in spring simply does not cut it. You want to execute certain pieces of code in different transactions all with a different propagation and isolation levels. The manual transactional demarcation in spring is stupidly simple after some upfront configuration and works uniformly by enrolling in what ever the enclosing transaction manager is. In case you are running in a JEE server it will tie in with the JTA transaction manager otherwise hibernate transaction manager.

In case you want to integrate with the JTA transaction manager and want the spring @Transactional to work simply put this int he application context:

<!-- configure JTA transaction manager -->
    <tx:annotation-driven transaction-manager="transactionManager" proxy-target-class="true"/>
    <bean id="transactionManager" class="org.springframework.transaction.jta.JtaTransactionManager">
        <property name="allowCustomIsolationLevels" value="true" />
    </bean>

If you are running inside a plain servlet container like tomcat you can configure the hibernate transactions, vanilla JDBC transactions and etc. to use the hibernate transaction manager like this:

<!-- use the hibernate transaction manager -->
    <bean id="transactionManager" class="org.springframework.orm.hibernate3.HibernateTransactionManager">
        <property name="sessionFactory" ref="sessionFactory" />
        <property name="dataSource" ref="dataSource" />
    </bean>

Notice that the session factory and the data source properties are both set. This then allows jdbcTemplate etc. to all participate in the same transactions as a hibernate call to save() and load().

Once configured its great any spring managed bean with the following annotations on methods will work flawlessly.

@Transactional(isolation = Isolation.READ_COMMITTED, propagation = Propagation.REQUIRED)

This is where the problem starts. Even though you get very granular in the transactions here - calling methods within a service implementation will not be transactionally aware because the demarcation, flushing, commits etc. happen by generating a proxy around the class and internal calls are simply by passed.

In this case you can make use of the spring TransactionTemplate and the PlatformTransactionManager class.

Using it is quite simple - inject the platformTransactionManager into a bean:

<!-- use the hibernate transaction manager -->
    <bean id="xyzService" class="XYZServiceImpl">
        <property name="platformTransactionManager" ref="transactionManager" />
    </bean>

and then directly use transaction templates by specifying custom isolation and propagation levels like this:

public class XYZServiceImpl implements XyzService {
    PlatformTransactionManager platformTransactionManager;

    public void doService() {
        TransactionTemplate template = new TransactionTemplate(platformTransactionManager);
        template.setIsolationLevel(TransactionDefinition.ISOLATION_READ_COMMITTED);
        template.setPropagationBehavior(TransactionDefinition.PROPAGATION_REQUIRES_NEW);
            //annon. inner class
            template.execute(new TransactionCallback() {
            public Object doInTransaction(TransactionStatus status) {
                //your business logic here
            }
        });
    }

    public void setPlatformTransactionManager(PlatformTransactionManager platformTransactionManager) {
        this.platformTransactionManager = platformTransactionManager;
    }
}

The cool thing is that you could have parts of the same method execute in a different transactions - looking at different isolation levels. No more complex XA/JTA code, hibernate sessions will flush, transactions will commit/rollback at the demarcations you expect - spring makes it too easy.

High speed file splitting/integration in java using NIO

I have many times been faced with a situation where I am trying to move very large files (ISOs or zips upto 1-4 GB in size) but I don’t have a USB drive of that capacity and for some reason I can’t do it over the network. Of course if you want to P2P broadcast of huge files (think updating 200 machines simultaneously) - splitting them up helps in this case specially if you want to replicate a managed bit-torrent like environment. I have found some commercial file splitters out there but they are too slow and clunky. There is no concievable reason why they have to be so slow or I should live without options.
So I just decided to write one from scratch plus it gave me a reason to refresh my NIO knowledge. With some tweaking and proper usage of buffers and channels I have managed to get a comparable/better throughput in java than even the native operating system tools. I tested the integrity of the file and everything was OK.

The amount of code to do it minuscule and quite straight forward. First the splitter:

package net.ahlawat.file;

import java.io.*;
import java.nio.ByteBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;

/**
 * Program that splits the file
 * User: Pranay Ahlawat
 * Date: Jan 18, 2010
 * Time: 8:14:03 PM
 */
public class Splitter {
    static long BYTE_TO_MB  = 1024 * 1024;
    static long BUFER_SIZE = 128 * 1024;

    public static void main(String[] args) throws Exception {
        if (args.length < 3) {
            System.out.println("splitter [fileName] [split size in MB] [out dir]");
            System.exit(1);
        }

        //create the local variables to be used in the rest of the application
        File inFile = new File(args[0]);
        long partitionSize = Long.parseLong(args[1]) * BYTE_TO_MB;
        File outDir = new File(args[2]);

        //create inital counters
        final long totalFileSize = inFile.length();

        //create the out dirs if they dont exist
        if (!outDir.exists()) {
            System.out.println("Creating directory : " + outDir.getName());
            outDir.mkdirs();
        }

        FileChannel inChannel =  new FileInputStream(inFile).getChannel();

        long currentPosition = 0;
        int ctr = 0;
        ByteBuffer buff = ByteBuffer.allocate((int)BUFER_SIZE);
        long start = System.currentTimeMillis();

        while(currentPosition < totalFileSize) {
            //get the out channel for the file - roughly is the "originalFileName.ext.n" where 'n' is the partition number
            FileChannel outChannel = getChannel(inFile, outDir, ++ctr); //init the out channel
            //the size of the nth partition
            long size = currentPosition + partitionSize < totalFileSize? partitionSize : totalFileSize - currentPosition;
            //sout
            System.out.print(String.format("Creating part %s of size %s MB", ctr, size/BYTE_TO_MB));
            long start2 = System.currentTimeMillis();

            //the end position of the nth partition w.r.t the entire file
            long endPosition = currentPosition + size;

            //write partition in BUFFER_SIZE chunks
            while(currentPosition < endPosition) {
                //read the chunk into the buffer
                long subSize = (currentPosition + BUFER_SIZE) < endPosition ? BUFER_SIZE : endPosition - currentPosition;
                inChannel.read(buff, currentPosition);
                //prepare for writing
                buff.flip();
                //write
                outChannel.write(buff);
                currentPosition += subSize;
                //clear the buffer - so we can write again
                buff.clear();
            }

            outChannel.close(); //close

            //print throughput for this file partition
            double delta = (double)(System.currentTimeMillis() - start2)/1000;
            System.out.println(String.format(" -> Transferred in %.2f s @ %.2f MB/s", delta,
                    (double) size/BYTE_TO_MB/delta));
        }

        //calculate time
        double delta =  (double)(System.currentTimeMillis() - start)/1000;

        //print out the total throughput
        System.out.println(String.format("Copied %.2f MB in %.2f s @ %.2f MB/s", (double)totalFileSize/BYTE_TO_MB, delta, (double)totalFileSize/BYTE_TO_MB/delta));

        //finally close the channel
        inChannel.close();
    }

    private static FileChannel getChannel(File inFile, File outDir, int ctr) throws FileNotFoundException {
        return new FileOutputStream(new File(outDir, (inFile.getName() + "." + ctr))).getChannel();
    }
}

There are a couple of things I would like to mention about this code. First I tried a variety of things - I tried the MappedMemoryBuffers which was not giving me good performance so I reverted to using vanilla byte buffers. Next I tried a variety of buffer sizes unsurprisingly too low a buffer size means too many reads and too high meant very slow buffer manipulation - vanilla byte buffers of 128K seemed to be just right and gave me great speed and memory numbers.

The file under experiment was the open solaris ISO - about 700 MB in size. Here is the output:

Creating part 1 of size 100 MB -> Transferred in 0.27 s @ 366.30 MB/s
Creating part 2 of size 100 MB -> Transferred in 0.25 s @ 403.23 MB/s
Creating part 3 of size 100 MB -> Transferred in 0.24 s @ 413.22 MB/s
Creating part 4 of size 100 MB -> Transferred in 0.25 s @ 406.50 MB/s
Creating part 5 of size 100 MB -> Transferred in 1.19 s @ 84.32 MB/s
Creating part 6 of size 100 MB -> Transferred in 2.16 s @ 46.38 MB/s
Creating part 7 of size 76 MB -> Transferred in 2.21 s @ 34.85 MB/s
Copied 676.99 MB in 6.69 s @ 101.21 MB/s

Not bad I could split the file up in under 7 seconds - this is better throughput than what the native tool gives me. The result of this code was that the big file was split into 100MB chunks (and change).

Next the integrator -

package net.ahlawat.file;

import java.io.File;
import java.io.FileOutputStream;
import java.io.FileInputStream;
import java.nio.channels.FileChannel;
import java.nio.ByteBuffer;
import static net.ahlawat.file.Splitter.*;

/**
 * Integrator - integrate files
 * User: Pranay Ahlawat
 * Date: Jan 18, 2010
 * Time: 10:51:43 PM
 */
public class Integarator {

    public static void main(String[] args) throws Exception {
        if (args.length < 3) {
            System.out.println("integrator [fileName] [dir] [out file name]");
            System.exit(1);
        }

        //create core variables
        File dir = new File(args[1]);
        String baseFileName = args[0];
        File outFile = new File(args[2]);

        //create the out channel - to which the data will be written
        FileChannel outChannel = new FileOutputStream(outFile).getChannel();

        //core buffer
        ByteBuffer buff = ByteBuffer.allocate((int)BUFER_SIZE);

        int ctr = 0;
        long start = System.currentTimeMillis();
        while(true) {
            //some profiling
            long start2 = System.currentTimeMillis();
            //create the file and test to see if it's there
            File file = new File(dir, String.format("%s.%s", baseFileName, ++ctr));
            if (!file.exists()) { //no the file 'n' does not exist - integration complete
                break;
            }

            System.out.print(String.format("Integrating %s", file.getName()));

            //creat the in channel for the partitioned file 'n'
            FileChannel inChannel = new FileInputStream(file).getChannel();

            long currentPosition = 0;
            long fileSize = file.length();

            //read the file in chunks of BUFFER_SIZE
            while(currentPosition < fileSize) {
                long chunkSize = (currentPosition + BUFER_SIZE) < fileSize? BUFER_SIZE : fileSize - currentPosition;
                inChannel.read(buff, currentPosition);
                currentPosition += chunkSize;
                buff.flip(); //flip the buffer we are ready to write
                outChannel.write(buff);
                buff.clear(); //clear
            }

            //close/flush the information
            inChannel.close();

            //print profiling inforamtion
            double delta = (double) (System.currentTimeMillis() - start2)/1000;
            System.out.println(String.format(" -> Integration complete in %.2f s @ %.2f MB/s",
                    delta, file.length()/BYTE_TO_MB/delta));
        }

        outChannel.close();
        double delta = (double) (System.currentTimeMillis() - start)/1000;
        System.out.println(String.format("Integration complete in %.2f @ %.2f MB/s", delta, outFile.length()/BYTE_TO_MB/delta));
    }
}

Again I tried the outChannel.transferFrom() but it just bew up - the performance was horrible. The best results were when I used vanilla buffers and manipulated them myself.

Here are the results:

Integrating osol.iso.1 -> Integration complete in 0.35 s @ 283.29 MB/s
Integrating osol.iso.2 -> Integration complete in 0.26 s @ 378.79 MB/s
Integrating osol.iso.3 -> Integration complete in 0.25 s @ 393.70 MB/s
Integrating osol.iso.4 -> Integration complete in 0.28 s @ 361.01 MB/s
Integrating osol.iso.5 -> Integration complete in 1.68 s @ 59.56 MB/s
Integrating osol.iso.6 -> Integration complete in 2.10 s @ 47.55 MB/s
Integrating osol.iso.7 -> Integration complete in 1.39 s @ 54.87 MB/s
Integration complete in 6.44 @ 104.97 MB/s

Not bad at all. Just to put how fast this in in perspective - using cygwin just copying about 700 MB takes about 15 seconds.

deepti@aanyalaptop /cygdrive/c/test
$ time cp osol.iso cp_of_osol.iso

real    0m16.014s
user    0m0.031s
sys     0m1.825s

deepti@aanyalaptop /cygdrive/c/test
$

And I wrote this little bat script to measure the throughput of native windows command line.

prompt $d $t $_$P$G
copy osol.iso another_cp.iso
prompt $d $t $_$P$G

Here is the output.

C:\test>prompt $d $t $_$P$G

Tue 01/19/2010  1:07:48.86
C:\test>copy osol.iso another_cp.iso
        1 file(s) copied.

Tue 01/19/2010  1:08:00.32
C:\test>prompt $d $t $_$P$G

Tue 01/19/2010  1:08:00.32
C:\test>

Which is approximate 12 seconds… :) - java NIO rocks.

I will package this up with a UI and make it available as a tool on ahlawat.net soon for all interested.

OpenSolaris - my first experience

During the last two weeks or so I have been trying to carefully evaluate what the best platform is run an enterprise java app. I am not even going to consider windows. During my research on the internet I came across this article.

I have worked with Solaris off and on - both on work and also when I was at school - at Cornell our entire packet switched networks class lab was completely in C and solaris, but I have never really liked it that much - I just prefer linux. However the differences between the file IO and numeric computation was considerable. (possibly because of the differences between EXT3 and ZFS). It intrigued  me enough to try to run open solaris. So I downloaded the iso from the opensolaris and installed it as a virtual box vm. The default version, like ubuntu started as live CD from where the user has an option to install it. The installation was quick and easy - no issues at all. I logged in and was very quickly up and running with javac, ant, mvn, groovy etc. No issues at all - then I went to the idea website to get a version of the IDE and guess what - they don’t support solaris - out of hope I downladed the linux version and it did not work.

Of course eclipse has been recently ported over to opensolaris and there is net beans.

The package manager sucks though - it downloads stuff one at a time? I liked that there was a default AMP package that installed PHP, Apache and MySql so you had the basics of a web server in place. Installing your own stuff as a service is radically different from linux. The default init.d scripts have been deprecated in favor of the service manager facility. Its seems to be very well thought out - moving services between run levels - auto restart of services gone bad etc. are awesome features but there is a lot of admin to do here. Linux is just a heck of a lot easier - you can easily find init.d scripts and if you are using centos managing run levels and services using the chkconfig command is too simpile. Sample scripts for svcadm are hard to find and frankly administering solaris is not what interested me most so I never bothered with it too much.

All in all would I ever develop on opensolaris - probably not because the tooling support sucks, some of my favorite python libraries might not install on solaris. Would I use it in as a production server - POSSIBLY .. if I find that my language performs faster on solaris - which the bench mark seems to suggest and once I get around the initial installing as a service, monitoring bits. Other people seem to think that open solaris is too slow - I never benchmarked high network or file IO but even working on VM seemed not that bad - the unizpping and moving around copying stuff seemed to be reasonable for a VM. I have a mixed feeling about it - hopefully when I start using it more I might have more to say. For now linux it is.

Groovy Adaptable Packaging Engine (GRAPE)

I think I am falling in love with groovy all over again. GRAPE is a fantastic addition to Groovy 1.6.x. Transitive dependency management is becoming more and more common with the JAR hell that java/groovy application developers face. With the number of libraries growing exponentially and complex interdependence between them - to start a java project with any complexity one has to maintain a huge stash of jars and the overhead required to maintain them is unbelievable. One reason why projects like Grails/App fuse have been successful is that they reduce the ramp up time to develop something interesting. For me a major part of the ramp up time is setting up all the libraries - spring, hibernate, eh-cache (and all the dependent libraries). Open source build tools like IVY and Maven have taken transitive dependency management to the next level and GRAPE builds the capability inside the language itself.

By placing a couple of really simple annotations - one can forget about the dependent jars altogether. The packages will be downloaded automatically and used before executing the script.

So lets say I have a simple script that depends on commons-logging.

I can use it inside my script like this:

import org.apache.commons.logging.*;

@Grab(group = 'commons-logging', module = 'commons-logging', version='1.1.1')
public class GrapeTest {
	static Log logger = LogFactory.getLog(GrapeTest.class);
	public static void main(String[] args) {
		//log
		logger.info("Hello world from Grape!")

	}
}

Here the group is the maven GroupId and the module is the maven ArtifactId. On the command line all I do is groovy <fileName> and the jar download and classpath resolution etc. are handled transparently:

deepti@aanyalaptop /cygdrive/c/Users/deepti/Desktop
$ groovy test.groovy
Dec 16, 2009 12:24:49 AM sun.reflect.NativeMethodAccessorImpl invoke0
INFO: Hello world from Grape!

deepti@aanyalaptop /cygdrive/c/Users/deepti/Desktop
$

If one takes a look the downloded files are present in ~/.groovy/grapes/…

deepti@aanyalaptop /cygdrive/c/Users/deepti
$ find . -name commons*.jar -print
./.groovy/grapes/commons-logging/commons-logging/jars/commons-logging-1.1.1.jar

deepti@aanyalaptop /cygdrive/c/Users/deepti
$

Incredibly simple - what is cool is that the grape handles transitive dependencies transparently - so all the libraries on which commons-logging depends will be downloaded and installed automatically.

So now one can ship a groovy script (piece of code) without any dependencies. Very interesting and very useful.