Tags

Debugged through a legacy C# LINQ query, and this query contained a Distinct operator. The query exhibited strange behaviors. It took me a few minutes to realize that Distinct was one of the deferred operations. [1]

Another much more interesting and important topic is that what happens when we use Distinct without supplying an IEqualityComparer comparer [1]. Let’s use an example to illustrate.

public static IEnumerable Distinct(
    this IEnumerable source
)
void Main()
{
   Product[] products = {            
           new Product { Name = "apple", Code = 9 }, 
           new Product { Name = "orange", Code = 4 }, 
           new Product { Name = "apple", Code = 9 }, 
           new Product { Name = "lemon", Code = 12 } };

   IEnumberable noduplicates = products.Distinct();
   
   foreach (var product in noduplicates)
      Console.WriteLine(product.Name + " " + product.Code 
         + ", hash code: " + product.GetHashCode());
}
public class Product 
{
   public string Name { get; set; }
   public int Code { get; set; }
}

The code produces the following output:

apple 9, hash code: 4143056
orange 4, hash code: 52219803
apple 9, hash code: 1080906
lemon 12, hash code: 48640813

Why does “apple 9” appear twice? Why are the hash codes for “apple 9” different?

The system returns a default equality comparer [3]. Without any special equality comparer method implemented on our class Product, the system basically depends first on Object.GetHashCode, and then on Object.Equals to find out whether two objects are equal or not.

For reference  types, if GetHashCode is not overridden, hash codes are computed by calling the Object.GetHashCode method of the base class, which computes a hash code based on an object’s reference. In our case, two product instances of “apple 9” have different object references, therefore, different hash codes, and “apple 9” appear twice.

If value types do not override GetHashCode, the ValueType.GetHashCode method of the base class uses reflection to compute the hash code based on the values of the type’s fields. In other words, value types whose fields have equal values have equal hash codes.

References:

  1. Enumerable.Distinct
  2. Object.GetHashCode
  3. EqualityComparer.Default Property
  4. Object.Equals
  5. GetHashCode and LINQ
Advertisements