Running on Linux, Building with Java: Myth about InputStream.read(byte [], int, int) method

I know it may sound stupid but only yesterday I discovered that

InputStream.read(byte [] data, int offset, int length)

method can end up reading lesser number of bytes than length.

What that means is if you are doing something like,

InputStream in = getInputStream(); // getting it from somewhere..
byte data = new byte[10];
int read = in.read(data, 0, data.length);
System.out.println("Bytes read: " + read);

The you cannot gurantee that in.read() call will read 10 bytes even when the stream is connected / not broken (applicable particularly in sockets). My understanding was that if it ever returns lesser bytes than lenght, then the channel us broken and we should assume that connection is dropped from counter-party.

There is good explanation in javadocs about this method. The default implementation of this method at the level of InputStream class does this,

It tries to read the first byte from the stream. If the read fails due to any other reason other than end of stream, then IOException is thrown. If end of stream is detected, -1 is returned.
After the first read, if any of the subsequent read throws and IOException, It is swallowed and end of stream is assumed and the number of bytes read until this point are returned. If any of the subsequent reads detects end of stream, again whaterver bytes are read are returned.

The javadoc also suggests the extensions of InputStream to provide a better implementation of this method. In SocketInputStream (not public in java api) which is what I was dealing with, read(byte [] data, int offset, int length) call delegates to a native method.

So out of my curiosity, I wrote a sample where I had a ServerSocket as a produces producing 2 bytes and waiting for 1 second. And a consumer using Socket which would attempt to read 4 bytes at a time without waiting. The problem was easily reproduced and I could see my every read call in consumer only reading 2 bytes at a time.

BufferedInputStream provides a more convinient implementation of the read method. It repeatedly invkoes the multibyte read method on the underlying stream until

Length number of bytes are read
End of stream is reached
Subsequent call to read will block. This is identified by calling available method on the stream.

But note the point 3, it still does not gurantee that it will always return length number of bytes unless end of stream is reached.

In my case our SocketInputStream was wrapped in the BufferedInputStream, which relatively shielded us from this problem but it aggrevated the problem as we saw this problem very rarely. Only after some code review is what we identified this issue.

Now a few questions in your mind maybe why call the multibyte read? why not just keep calling single byte read and track how many bytes we read? One of the grey hair in my company answered this question saying,

Whe we use to do it in C, the multibyte read made sure that there were fewer stack pushes and pops as we would end up doing fewer read calls.

I havent personally measured the performance gain by using multibyte read but Ithink it never hurts using it if you understand how it works.

So when does the multibyte read call wait then? Since it is capable of returing lesser bytes than requested, it can always just return with lesser data. My guess is that it will only block when there are 0 bytes avaialble to read. In a way multibyte read call will never return 0, as it will make an attempt to read at least one byte.

Running on Linux, Building with Java

Myth about InputStream.read(byte [], int, int) method

0 comments :

Blog Archive

Labels

Countdown to next Ubuntu release