My realization with synchronization

Recently I had to do major code refactoring on a component to make it thread safe and also to improve its performance. The component is a persistence layer that is capable of persisting and querying data. The component was already thread safe but the synchronization levels were on rather higher level. Older synchronization levels would block the query if the component is already executing some other query. So despite being multi threaded, queries executed from multiple threads will only execute one after the other. What we wanted now was to be able to execute queries simultaneously. Regardless of how many queries are executed in parallel. Two queries executed in parallel should block each other at rather micro level (like record level).

This component involves extensive use Java Collections. And at most of the places (while querying) you need to copy from these Collections. A particular Collection can contain more than a million object sometimes. So copying naturally takes up a lot of time in this case. The bad part is you have to synchronize on the Collection for the time you are copying it. So you are stopping any inserts and updates into the database (add, remove, on Collection).  When you need to synchronize access to the Object you always have to synchronize on all accesses to the Object. Similarly you have to synchronize all the actions on the Collections. To make the synchronization simpler, one can always use synchronized versions of Collections. You can obtain synchronized version of the underlying Collection instance by using methods Collections.synchronizedXXXX(). These synchronized versions of Collections can get rid of hassle of explicitly synchronizing on every access to the Collection. But there is a thing to remember with the synchronized Collections. Wherever you obtain any sort of Iterator, you need to externally synchronize the Collection. If you iterate on Map.keySet of synchronized map or you iterate on a synchronized Collection, you must externally synchronize the Collection (More explanation of why, coming later)

So in our case when I am copying from the collection I am actually iterating on the Collection and adding items to the other Collection. This access is synchronized. Which avoids any other access to the underlying Collection. Which means only one thread can iterate on a Collection at a time (There can be multiple iterators on a Collection at a certain time. But then we assume that no one modifies the Collection). To avoid all this blocking iteration, I came up with a rather stupid idea of using a synchronized List instead. The idea behind synchronized List was to avoid iteration (Remember you can do index base access to the List) So,

Collection src = Collections.synchronizedCollection(new ArrayList());
Collection dest = new ArrayList();
synchronized (src) {
        Iterator itr = src.iterator();
        while (itr.hasNext()) {
                 dest.add(itr.next());
        }
}


will be replaced with

List src = Collections.synchronizedList(new ArrayList());
List dest = new ArrayList();
for (int i = 0; i <= src.size(); i++) {
         dest.add(src.get(i));
}



With this approach, the synchronization is even finer now. We dont lock the Collection for all the time we spend iterating it. But we just lock it for the period we are doing src.get(i) operation. This means individual List.get() operations block each other instead of whole iteration process blocking the other one. For a while I considered this as a fabulous idea. But if you have noticed, we are breaking synchronization here. We can get into all sorts of problems using this approach.

For example:

Thread 1 : starts iterating List calculates List size as 10
Thread 2 : Removes an item from the List
Thread 1 : Reaches step List.get(i) where i=9. This will result into ArrayIndexOutOfBoundsException.

Since Thread 2 has removed one object from list by the time Thread 1 reaches 9th iteration of its for loop, we have broken the synchronous access to the List.

This lead me to an obvious conclusion that, Iteration is one logical operation on the Collection and it should block to all add, remove, get operations on corresponding Collection.

Well, thats not all! This rule can be generalized for all the objects. If you are making certain object as completely synchronized internally then the same rule applies to all such objects. Any logical operation to such object should block other logical operation (Maybe not always but most of the times). I encountered the same problem with a custom object. Which is suppose to a database index. I tried to synchronize on the entry level in index. The entries are actually stored on a Map. IndexEntry is mapped with its key. This did not help me because by the time i get an IndexEntry (Map.get(key)) out and as I operate on it. Some other thread can completely remove the entry from the Map (Map.remove(key)). Then all the operations by original thread on this IndexEntry are invalid. Same principle of the logical operation applies here. Each logical operation like read, add, update on index should be synchronous.

The other catch with synchronization is, Anything going out of scope of internally synchronized object needs to be externally synchronized. As we saw in iterators of synchronized collections. Even if the object is internally synchronized, if we return part of it as reference (internal structure, object), this part breaks the synchronization limits of the object and then needs explicit external synchronization from the user of this part. Most of the times we can avoid this scenario by making defensive copies of the internal structures before returning. But sometimes this can hit your performance.

Quite a long one! What do you guys say?