Myth about InputStream.read(byte [], int, int) method

I know it may sound stupid but only yesterday I discovered that
InputStream.read(byte [] data, int offset, int length) 
method can end up reading lesser number of bytes than length.

What that means is if you are doing something like,
InputStream in = getInputStream(); // getting it from somewhere..
byte data = new byte[10];
int read = in.read(data, 0, data.length);
System.out.println("Bytes read: " + read);
The you cannot gurantee that in.read() call will read 10 bytes even when the stream is connected / not broken (applicable particularly in sockets). My understanding was that if it ever returns lesser bytes than lenght, then the channel us broken and we should assume that connection is dropped from counter-party.

There is good explanation in javadocs about this method. The default implementation of this method at the level of InputStream class does this,
  1. It tries to read the first byte from the stream. If the read fails due to any other reason other than end of stream, then IOException is thrown. If end of stream is detected, -1 is returned.
  2. After the first read, if any of the subsequent read throws and IOException, It is swallowed and end of stream is assumed and the number of bytes read until this point are returned. If any of the subsequent reads detects end of stream, again whaterver bytes are read are returned. 
The javadoc also suggests the extensions of InputStream to provide a better implementation of this method. In SocketInputStream (not public in java api) which is what I was dealing with, read(byte [] data, int offset, int length) call delegates to a native method.

So out of my curiosity, I wrote a sample where I had a ServerSocket as a produces producing 2 bytes and waiting for 1 second. And a consumer using Socket which would attempt to read 4 bytes at a time without waiting.  The problem was easily reproduced and I could see my every read call in consumer only reading 2 bytes at a time.

BufferedInputStream provides a more convinient implementation of the read method. It repeatedly invkoes the multibyte read method on the underlying stream until
  1. Length number of bytes are read
  2. End of stream is reached
  3. Subsequent call to read will block. This is identified by calling available method on the stream.
But note the point 3, it still does not gurantee that it will always return length number of bytes unless end of stream is reached.

In my case our SocketInputStream was wrapped in the BufferedInputStream, which relatively shielded us from this problem but it aggrevated the problem as we saw this problem very rarely. Only after some code review is what we identified this issue.

Now a few questions in your mind maybe why call the multibyte read? why not just keep calling single byte read and track how many bytes we read? One of the grey hair in my company answered this question saying, 
Whe we use to do it in C, the multibyte read made sure that there were fewer stack pushes and pops as we would end up doing fewer read calls. 
I havent personally measured the performance gain by using multibyte read but Ithink it never hurts using it if you understand how it works.

So when does the multibyte read call wait then? Since it is capable of returing lesser bytes than requested, it can always just return with lesser data. My guess is that it will only block when there are 0 bytes avaialble to read. In a way multibyte read call will never return 0, as it will make an attempt to read at least one byte.

Good comparison of OpenESB and ServiceMix

This link gives very good comparison between OpenESB and ServiceMix ESBs

Quick way to untar and bunzip files in java

Have you ever felt a need to deal with bzip2 compressed tarballs in Java? Recently I compressed a lot of test resources in our source tree using bzip2 compression. The resources were static and were meant to change very rarely. bzip2 was the best choice in terms of amount of compression. What it also meant was my SVN download of the source tree would take much lower time. I planned to extract the resources and make them available at runtime while running tests.

If you already know, there is no way to handle bzip compression in Java core API. But at the back of my mind I knew that you can create bzipped tarballs using Ant. So I looked at the Ant tasks and figured that there was untar task which can be instructed to also process it through bzip2 uncompression.

Here is the code snippet that can untar and bunzip the file using Java code.
Untar untar = new Untar();
untar.setSrc(new File("./src/test/resources/files.tar.bz2"));
untar.setDest(new File("./target"));
UntarCompressionMethod compression = new UntarCompressionMethod();
compression.setValue("bzip2");
untar.setCompression(compression);
untar.setOverwrite(true);
untar.execute();

Also make sure that you put ant jar on the classpath.

Maven users can simply add following dependency,

<dependency>
   <groupId>org.apache.ant</groupId>
   <artifactId>ant</artifactId>
   <version>1.7.0</version>
   <scope>test</scope>
</dependency>

Disabling test under JUnit 4.4

Recently I had to disable a unit test in our test system. We use Maven as our build tool and JUnit 4.4 for unit testing. I had a few options,

  1. Exclude that particular class from tests under surefire plugins configuration in my pom.xml
  2. Remove @Test annotation from test method in the test class. Which fails as it finds a Test class but does not find any test methods
  3. Rename the class from say MyTest to something that does not end with "Test" say "Tezt"
I remembered that in TestNG you can just disable a test by saying @Test (enable = false) and I was desperately trying to find how to do this in JUnit 4.4. But to my dissappointment Test annotation only allows a couple of attributes, timeout and expected which did not do what I want.

After looking at the JUnit javadocs, I stumbled upon an annotation @Ignore it is indeed an annotation to be used if you want to skip the test.

So if you want to disable your test case in JUnit 4.4 just annotate your test methond with @Ignore @Test annotations.

What it also did was that Maven started reposting one test as being "Skipped" which is nicer as it keeps reminding you that you need to look after the test that you have disabled.