HashSet.contains(): does your busket contain something?

October 1, 2006

When you put some good into your supermarket basket, you really suppose it will stay there, don’t you? Well, we’re living inside materialistic world. All of us, except Java programmers.

Consider the following Entity class:

       public static class Entity { 
       private String _name; 
       private int _count; 
        public Entity(String name, int count) { 
            _name = name; 
            _count = count; 
        public void setName(String name) { 
            _name = name; 
        public boolean equals(Object o) { 
            if (this == o) return true; 
            if (o == null || getClass() != o.getClass()) return false; 
            final Entity entity = (Entity) o; 
            if (_count != entity._count) return false; 
            if (_name != null ? !_name.equals(entity._name) : entity._name != null) return false; 
            return true; 
        public int hashCode() { 
            int result; 
            result = (_name != null ? _name.hashCode() : 0); 
            result = 29 * result + _count; 
            return result; 

Let’s store our entity into various Java collections instances, both implementing Set interface:

        Set<Entity> hashSet = new HashSet<Entity>(); 
        Set<Entity> arraySet = new CopyOnWriteArraySet<Entity>(); 
        Entity e = new Entity("OldName", 5); 

Now, let’s see whenever the previously stored entity is still inside the collections:

        System.out.println(arraySet.contains(e));  // returns true 
        System.out.println(hashSet.contains(e));   // returns true

Now, a simple mutation, changing an entity name:


And, voila — now the entity is still in ArraySet, but not in TreeSet:

        System.out.println(arraySet.contains(e)); // returns true 
        System.out.println(hashSet.contains(e));  // returns false      

This looks like a quite stupid bug from the first time, but googling aroung brings an answer: TreeSet implementation uses hashCode() and not equals() for storing and retrieving entities, thus TreeSet requires hashCode() of contained entity to be immutable!

Presonally I think it’s too smart to be good, but, let’s say “the big brother knows beter”. The really annoying thing is that the request for proper documentation of this feature was initially issued for Java 1.3.1 at 16 May, 2001 and nothing was done since that.

Shame on you, Sun Microsystems…

6 Responses to “HashSet.contains(): does your busket contain something?”

  1. Rentar Says:

    Disclaimer: I’m doing professional Java development for some time now, so I’m probably biased.

    How else would you like it to compute a hash value? The only alternative I can think of would be to use some kind of identity-based hash code (such as produced by System.identityHashCode() or the default implementation of Object.hashCode()). But then you could never check wether the HashSet contains an object that equal()s another object you’ve got (or you could only check that by calling equal() on each object in the set, which kind of defeats the purpose of a HashSet).

    You’ve also got the same problem with the keys in a HashMap: they must not change or the behaviour is unspecified. But I’ve never found that to be a problem …

    How’s that solved in .NET?

    (I agree that the documentation could be a bit clearer, but as the Bugreport mentions the documentation is in Object.hashCode() and .equals() and clearly specified in their contract).

  2. Jonathan Pryor Says:

    “How’s that solved in .NET?”

    It isn’t. Or it’s “solved” by documenting the limitations:

    From: http://www.go-mono.com/docs/index.aspx?tlink=8@ecma%3a147%23Hashtable%2f

    “Objects used as keys in a Hashtable is required to either implement both System.Object.GetHashCode and System.Object.Equals(System.Object) or neither. Furthermore, for a particular key, these methods are required to produce the same results when called with the same parameters while that key exists in a particular Hashtable . Keys cannot be mutated while they are used in the table.”

    If you think about it, it’s really the only sane way to go (for precisely the reasons Rentar specifies): if your hash value changes when members change (as it should!), if you change the hash value of an existing key within a hashtable it will no longer be in the right “bucket” of the hashtable, or in the correct position for RBL trees, etc. Meaning that you can never find that entry again.

  3. TheMuuj Says:

    A single guideline will keep this from being a problem: Only override hashCode/equals (or GetHashCode/Equals in .NET) on immutable classes.

    If an object is mutable, then their identity needs to stay with the object instance, not with the data. In .NET, I tend to only override GetHashCode and Equals on structs and sealed classes.

    If you need a way to compare the values of two instances, you can always add a “Matches” method.

    If you need to look mutable objects up in a hashtable, consider a custom hashcode/equality provider (I assume this is possible in Java).

  4. […] 29th, 2007 My «HashSet.contains(): does your basket contain something?» post got too expected responses: «There is no way to avoid this behavior, why should you […]

  5. […] «HashSet.contains(): does your basket contain something?» post got too expected responses: «There is no way to avoid this behavior, why should you […]

  6. […] by extension, equals) on immutable values. More details regarding this can be found in blog posts HashSet.contains(): does your busket contain something? and Back to hashCode […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: