Word Swatter: GIST NOTES 7 - Java Generics and Collections

GIST NOTES 7 - Java Generics and Collections

[DISCLAIMER: This is solely for non-commercial use. I don't claim ownership of this content. This is a crux of all my readings studies and analysis. Some of them are excerpts from famous books on the subject. Some of them are my contemplation upon experiments with direct hand coded code samples using IDE or notepad.

I've created this mainly to reduce an entire book into few pages of critical content that we should never forget. Even after years, you don't need to read the entire book again to get back its philosophy. I hope these notes will help you to replay the entire book in your mind once again.]

[JDK 7]

> Object.equals() uses == operator to compare two objects. It just compares the two references refer to the same object or not

public boolean equals(Object obj) {
return (this == obj);
}

> trying to narrow the access level while overriding a method causes compilation error (be wary when overriding equals() method)

equals() contract
-----------------
Reflexive: x.equals(x) returns true
Symmetric: if x.equals(y) true, then y.equals(x) should be true
Transitive: if x.equals(y) and y.equals(z) are true, then x.equals(z) also true
Consistent: equals() method should be consistent always within a JVM for a given pair of objects no matter how many times equals() method is called

null equality: for any non-null x, x.equals(null) should be false

hashCode() compatibility: honor the hashCode() method while implementing equals()

hashcode
--------
public int hashCode() is a native method in Object class.

Hashcodes are typically used to increase the performance of large collections of data.

Hashcode need not be unique to a particular object.

If two objects are considered equal, then their hashCode also should be equal in order for java Collections framework to work consistently with your class objects. Java Collections framework uses hashcode of the object to distribute keys across different buckets. If hashcode is not consistent, your class objects may be lost forever in Collections classes(e.g. HashMap).

hashCode() contract
-------------------
>whenever hashCode() is invoked on the same object more than once during an execution of a java application it must return the same value unless otherwise the information used in equals() comaparison has changed; during another execution of the same application it can return a different value

>if two objects are equal according to equals() method, then hashCode() method should produce same integer for both objects

>if two objects are different according to equals(), then it is not required for hashCode() to return different hash codes; however producing different hash code for unequal objects would improve the performance of the hashtables

>if hash codes are different for two objects, then they cannot and should not be equal; equals() method should uphold this

>transient variables should not be used in hashCode() implementation; though compiler will not complain, it might cause problems while trying to locate your object in collections; since transient variables are not saved while serializing, the restored object may not be the same and might break hashCode() implementation; same goes for equals() method; say NO to transient variables while implementing these two methods

The core collections interfaces
-------------------------------
Collection
List
Queue

Set
Map

NavigableSet
NavigableMap
SortedSet
SortedMap

The core concrete Classes
-------------------------
Maps:: HashMap, Hashtable, TreeMap, LinkedHashMap
Sets:: HashSet, LinkedHashSet, TreeSet
Lists:: ArrayList, Vector, LinkedList
Queues:: PriorityQueue
Utilities:: Collections, Arrays

>not all collections in collection framework implement "Collection" interface
>none of the map related interfaces and classes extend from "Collection" interface, though they are thought of as collections

1.collection - represents general data structures
2.Collection - represents java.util.Collection interface (Set, List and Queue extend from this);
there are no direct implementation of Collection interface;
3.Collections - represents java.util.Collections concrete class which has loads of static utility methods

Class Hierarchy (refer the image)
---------------

I:Collection <- I:Set <- HashSet <- LinkedHashSet
I:Collection <- I:Set <- I:SortedSet <- I:NavigableSet <- TreeSet**

I:Collection <- I:List <- ArrayList/Vector/LinkedList*
I:Collection <- I:Queue <- PrioriotyQueue/LinkedList

I:Map <- Hashtable
I:Map <- HashMap <- LinkedHashMap
I:Map <- I:SortedMap <- I:NavigableMap <- TreeMap

[Interfaces start with 'I:' and parents come on the left hand side, children on the right]

* LinkedList is both a Queue as well as a List
** Tree things(which has tree in the name) are both navigable and sorted
*** Those that are unordered by nature(map and set), given Tree status

Collection Flavors
------------------
List - list of things
Set - bunch of unique things
Map - things with unique ID
Queue - things arranged in order

Sub-Flavors
------------
>A class can be 'unsorted and unordered'
>A class can be 'ordered but unsorted'
>A class can be 'ordered and sorted'
>A class can NEVER be 'sorted but unordered', because sorting is a specific type of ordering inherently

Sorting:: order in the collection is based on the natural order of elements, based on some rules known as sort order

HashSet - unordered and unsorted
LinkedHashSet - ordered but not sorted(ordered by addition)
Hashtable - unordered and unsorted (when iterating over, the order is random)
ArrayList - ordered but not sorted (ordered by position index, can insert anywhere)
LinkedHashSet - ordered but not sorted(ordered by addition)

List
====
>cares about index
>index related methods like get(int index) are available only in List based classes
>lists are ordered by index position
>can add elements at a particular index or at the end of the list

ArrayList
----------
>growable array
>gives fast iteration and fast random access
>ordered but not sorted
>implements RandomAccess interface(marker interface), hence ArrayList supports fast(generally constant time) random access
>choose this over LinkedList when you need fast iteration but aren't as likely to be doing a lot of insertion and deletion
>methods are not synchronized

Vector
------
>holdover from earliest days of java; Vector and Hashtable were the original collections (rest were added at 1.2 and 1.4)
>same as ArrayList but methods are synchronized for thread safety
>due to synchronized, Vector is slow in performance
>thread safety can be obtained from Collections utility methods as well instead of using Vector
>Vector is the only class other than ArrayList to implement RandomAccess

LinkedList
----------
>is ordered by index position(like ArrayList), except that elements are doubly-linked to one another
>methods to add/remove from beginning or end
>easy choice to implement stack or queue
>iterates more slowly than ArrayList
>but a good choice for fast insertion and deletion
>as of java 5, it was enhanced to implement Queue interface as well; hence supports peek(), poll(), offer() and such methods

Set
===
>cares about uniqueness
>doesn't allow duplicates
>bases itself on equals() method implementation

HashSet
-------
>unsorted and unordered set
>uses hashcode of the object being inserted; the more efficient hashCode() implementation, the better access performance
>use this when you want no duplicates and don't care about order when you iterate through the elements

LinkedHashSet
-------------
>an ordered version of HashSet
>maintains doubly-linked list across all elements
>if you care about iteration order, use this
>it lets you iterate through in the order in which elements were inserted(or added)

# when using HashSet or LinkedHashSet, you must override hashCode(), otherwise uniqueness of objects is not guaranteed

TreeSet
-------
>is one of the two sorted collections (the other being TreeMap)
>uses Red-Black tree structure(special binary search tree with minimum possible height)
>guarantees that the elements will be in ascending order according to natural order
>lets you specify your own ordering rules by using Comparable or Comparator
>as of java 6, TreeSet also implements NavigableSet

Map
===
>cares about unique identifiers for each value
>both key and value are objects
>relies on equals() to determine two keys are same or not

HashMap
-------
>unsorted unordered map
>access performance depends on the hashCode() implementation for your keys
>allows one null key and multiple null values

Hashtable
---------
>legacy class
>synchronized counter part to HashMap
>doesn't allow null keys or values(throws NullPointerException in such cases)

LinkedHashMap
-------------
>maintains insertion order(optionally access order, thru a flag to the constructor)
>slower than HashMap for adding removing elements
>faster iteration than HashMap

TreeMap
-------
>sorted Map and hence ordered in the natural order
>lets you define custom sort order through Comparable or Comparator interfaces
>as of java 6, implements NavigableMap

Queue
=====
>list of TO-DOs
>typically thought of as FIFO(first-in first-out)
>supports standard Collection methods + methods to add/subtract/review queue elements

PriorityQueue
-------------
>new with Java 5
>basic queues can be implemented with LinkedList
>purpose of PriorityQueue is to create "priority-in priority-out" queue as opposed to "first-in first-out)
>elements are ordered according to natural order or according to the Comparator set
>elements that are sorted first are accessed first; ordering represents the relative priority between elements

ArrayList over arrays
---------------------
>ArrayList can grow dynamically
>it provides more powerful insertion and search mechanisms than arrays
>if there are duplicate elements in ArrayList, and when you try to remove an element, it removes the first encountered element that satisfies equals() method
>be vary of remove(Object o) and remove(int index) overloaded methods getting confused when you use autoboxing for an array list of Integer type (ArrayList); e.g. if you try to remove element 54 with remove(54), it tries to remove element at index 54 rather than an Integer object with value 54; if you are lucky you will get index out of bounds exception if index 54 is not present; remove(new Integer(54)) works fine

>when you override equals() method you always take 'Object type' argument because the method is inherited from Object class
>but when you implement compareTo() method from Comparable interface, you should take 'argument of the type of the object you are comparing' because Comparable interface supports generics and it lets you specify the type of the thing being compared before hand

equals(Object o){}
compareTo(MyDvd dvd){}

>to sort a class objects in multiple different ways without getting stuck with one implementation of Comparable interface, go for Comparator where the rule of comparing is stored externally in your own Comparator class and you can make any number of different Comparator classes and use them wherever appropriate

>you can even sort classes which you can't modify using this external Comparator

Comparable
----------
>java.lang.Comparable
>int obj1.compareTo(obj2)
>returns negative int when obj2 is greater, positive int when obj1 is greater, zero when both are equal
>you must modify the class whose instances you want to sort
>only one sort sequence can be done
>implemented frequently by APIs such as String, Wrapper classes, Date, Calendar, etc

Comparator
----------
>java.util.Comparator
>int compare(obj1, obj2)
>return value is same as Comparable
>you build a class separate from the class whose instances you want to sort
>many sort sequences can be created
>meant to be implemented by third party classes

*both Comparable and Comparator are interfaces; they are appropriately put in different packages based on their purpose

# Arrays.sort() provides sorting capability for arrays in the same way Collections.sort() provides

Sorting arrays
--------------
>Arrays.sort(arrayToSort[])
>Arrays.sort(arrayToSort[], Comparator)
>loads of sort() overloaded methods to sort primitive data arrays in their natural order without the use of Comparator

#both Arrays.sort() and Collections.sort() are static methods; both alter the same object they are sorting and do not return result as a separate object;

#if two objects that are not comparable present in the array or collection, the sort will fail

Searching arrays and collections
--------------------------------
>searches are performed using the binarySearch() method
>successful search returns the int index of the element searched
>unsuccessful search returns insertion point(index) as return value; insertion point is where the searched element can be added without affecting the sorted order of other elements
>0 and positive return values indicate successful search
>negative return value indicates insertion point on unsuccessful search
>since 0 is not available for negative range, -1 indicates 0th index, -2 indicates 1st index and so on; so insertion point is calculated by [-n -1] where n is the signed insertion index returned.

>the collection/array being searched must be sorted before you can search it
>attempting to search an unsorted array or collection will result in unpredictable results(not compiler error :))
>the collection/array sorted in natural order can only be searched in natural order using binarySearch() method; hence no Comparator is allowed
>comparators cannot be used while searching arrays of primitives(nobody wants your own way of arranging integers, strings, etc)
>the same comparator used to sort has to be used while searching; otherwise the consequences are dire
>even trying to search an array or collection that has been reverse-sorted(descending) will return incorrect result

SEARCH RULE--> SORT : SAME_COMPARATOR : ASCENDING

Collection to Arrays
--------------------
List and Set have toArray() methods to convert the collection into an array(returns a copy of the internal array)

Arrays to Collection
--------------------
Arrays.asList() copies an array into a List; this list is backed by the original array; changes to the returned list write through to the array; that is changing either the array or the list, updates the other automatically; array and list both become joined at the hip;(makes the given array as the internal array for the new list)

> list.toArray() can return a new array or you can pass your own destination array list.toArray(dest[])

Lists and Iterators
--------------------
list.iterator() returns an Iterator object;

Iterator methods: boolean hasNext() and Object next()

Sets
----
>no duplicates are allowed
>adding duplicate object, add() method returns false without adding the object to the set
>HashSet is fast because it uses hashcodes
>TreeSet keeps the unique objects sorted; adding uncomparable objects to a TreeSet throws ClassCastException; e.g. adding String to an integer TreeSet

Maps
-----
>Class of objects used as keys in map, should override equals() and hashCode() methods for the map to work properly at runtime
>enums can be used as keys in maps; because enum overrides equals() and hashCode() methods
>if the object that is used as key is modified after using it in the map, then retrieval of the corresponding value for that key will fail because the attributes used by equals() method could have been modified in that object

Navigating TreeSet and TreeMap
------------------------------
>Since TreeSet and TreeMap are sorted in nature, Java 6 added NavigableMap and NavigableSet interfaces to their behavior in order to make arbitrary searching or navigation a lot easier than before
>before Java 6, headSet(), tailSet(), first() and last() methods were used to navigate TreeSet and methods like headMap(), tailMap(), firstKey() and lastKey() were used to navigate TreeMap
>now, more modern and easy to use methods are provided for navigation by NavigableSet and NavigableMap interfaces

Methods from SortedSet and SortedMap interfaces
------------------------------------------------
headSet(item)/headMap(item) -> returns the portion of the set/map which is strictly less than 'item'; the sub set/map is tied to the original one; modifying one will update the other automatically; item is key in case of Map

tailSet(item)/tailMap(item) -> returns the portion strictly higher than 'item'

first()/firstKey() -> returns the first element(lowest) in the current set/map

last()/lastKey() -> returns the last element(highest) in the current set/map

subSet(fromVal,toVal)/subMap(fromVal,toVal) -> returns the portion of the set/map from value 'fromVal' (inclusive) to value 'toVal' (exclusive)

Methods from NavigableSet and NavigableMap interfaces
-----------------------------------------------------
From NavigableSet:-

TreeSet.ceiling(e) - returns lowest possible element >= e (nothing but returning the next value to e or e itself)
TreeSet.higher(e) - returns lowest possible element > e (returns next value to e)
TreeSet.floor(e) - returns highest possible element <= e (returns previous value to e or e itself)
TreeSet.lower(e) - returns highest possible element < e (returns previous value to e)
TreeSet.pollFirst() - removes and returns the first element
TreeSet.pollLast() - removes and returns the last element
TreeSet.descendingSet() - returns a NavigableSet in reverse order

From NavigableMap:-

TreeMap.ceilingKey(k) - returns lowest possible key >= k (same as TreeSet)
TreeMap.higherKey(k) - returns lowest possible key > k (-do-)
TreeMap.floorKey(k) - returns highest possible key <= k
TreeMap.lowerKey(k) - returns highest possible key < k
TreeMap.pollFirstEntry() - returns first key-value pair
TreeMap.pollLastEntry() - returns last key-value pair
TreeMap.descendingMap() - returns a NavigableMap in reverse order

* all the subsets or submaps returned in above methods are backed by the original set or map; when you add a new element to the original set/map, the subset/submap also gets updated as well(as long as the new element satisfies the condition to be part of the submap), not just the one way

* such behavior is called "backed collections"

* any attempt to add out of range element to a subset or submap, it throws exception

Inclusivity of subsets or submaps
---------------------------------
headSet(element, boolean isInclusive)
headSet(element) -----behaves like-----> headSet(element, false)

tailSet(element, boolean isInclusive)
tailSet(element) ------behaves like------> tailSet(element, true)

subSet(fromEle, boolean isFromInclusive, toEle, boolean isToInclusive)
subSet(fromEle, toEle) ------behaves like------> subSet(fromEle, true, toEle, false)

*overloaded methods with isInclusive boolean argument are from NavigableSet interface; their normal counter parts are from SortedSet
*methods without isInclusive bool argument include starting point element as a rule
*the above inclusivity discussion applies to headMap(), tailMap() and subMap() methods from TreeMap in the similar manner
*methods without bool argument return SortedSet or SortedMap(old interfaces - before Java 6)
*methods with bool argument return NavigableSet or NavigableMap(interfaces that are new to Java 6)
*pollFirst/pollLast/pollFirstEntry and such polling methods are available only in Navigable[Set/Map]
*while creating sub sets or maps(using methods with bool arg), be sure to assign them to Navigable[Set/Map] variable; can't be assigned to Sorted[Set/Map] variable or Tree[Set/Map] variables; using methods without bool arg returns Sorted[Set/Map] and they don't have polling methods; so if you want to do some polling, you have to use methods with bool arg only
*polling either on parent set/map or sub set/map removes that element from both parent as well as the sub set/map if that element is present in both; so polling on sub set always affects the parent, but polling on parent may not affect the sub set because that element may not present in the sub set/map so no updates there

PriorityQueue
-------------
>orders elements using a user-defined priority
>can be a natural ordering (1 is higher priority than 2)
>priority can be ordered through a Comparator also that lets you define your own ordering
>Queue specific methods are peek(), poll() and offer()
>PriorityQueue uses natural ordering or a given Comparator to prioritize

peek() - returns top element without removing it from the queue
poll() - returns top element and removes it from the queue
offer() - puts the element into the queue according to its priority; top priority item goes to the top of the queue, least priority item goes to the end of the queue

java.util.Arrays
----------------
static List asList(T[])

static int binarySearch(Object[], key) - search or detect insertion point
static int binarySearch(primitive[], key)
static int binarySearch(T[], key, Comparator)

static boolean equals(Object[], Object[])
static boolean equals(primitive[], primitive[])

public static void sort(Object[])
public static void sort(primitive[])
public static void sort(T[], Comparator)

public static String toString(Object[])
public static String toString(primitive[])

java.util.Collections
---------------------
static int binarySearch(List, key)
static int binarySearch(List, key, Comparator)

static void reverse(List)
static Comparator reverseOrder() - reverse comparator for all Comparable objects
static Comparator reverseOrder(Comparator) - reverse comparator for the given Comparator

static void sort(List)
static void sort(List, Comparator)

List, Set and Map methods
-------------------------
boolean add(element) -> List/Set
boolean add(index, element) -> List

boolean contains(Object) -> List/Set
boolean containsKey(Object key) -> Map
boolean containsValue(Object value) -> Map

Object get(index) -> List
Object get(key) -> Map

int indexOf(Object) -> List

Iterator iterator() -> List/Set

Set keySet() -> Map

put(key,value) -> Map

remove(index) -> List
remove(Object) -> List/Set
remove(key) -> Map

int size() -> List/Set/Map

Object[] toArray() -> List/Set
T[] toArray(T[]) -> List/Set

String natural order
--------------------
>space comes before letters
>uppercase letters come before lowercase letters
>e.g. " f", "FF", "f ", "ff" are alphabetically sorted

Iterator/Enumeration/ConcurrentModificationException
----------------------------------------------------
Iterator - provided by all collections; helps to navigate the collection; Iterator also provides method to remove an element; more modern than Enumeration; throws ConcurrentModificationException when somebody else directly modified the collection while we are trying to modify the same collection through Iterator; has shorter method names than Enumeration[hasNext(), next() and remove()]

Enumeration - Some of the collections provide Enumeration to iterate over the collection; older version; does not provide facility to remove elements from the collection; since it doesn't modify the collection underneath, there is no need for ConcurrentModificationException

ConcurrentModificationException - it is a failfast feature on a best effort basis; rather than continuing to function under erraneous conditions it is better to fail quickly; though this behavior is not guaranteed, but provided on best effort basis; never rely on this behavior for decision making in your program or for correctness of the logic.

Hashtable provides both Enumerations and Iterators on certain views of the map; only Iterators throw ConcurrentModificationException for obvious reasons as stated above.

HashMap/Capacity/Load/Rehashing
-------------------------------
Initial Capacity - can be specified in the constructor; denotes no.of buckets used for distributing the keys; specifying large initial capacity for maps that would hold large no.of items, is beneficial because auto reallocation and rehashing is avoided and hence good performance;

Load Factor - can be specified in the constructor; denotes how much of current capacity has to be reached before reallocation is initiated; high load value, makes searching slow; low load value doesn't utilize the space but faster performance; trade off between time and space; default value 0.75;

Rehashing - whenever capacity of the map is increased, all the keys that were distributed across buckets have to be redistributed once again across the newer no.of buckets; quite performance intensive

*Values of Initial Capacity and Load Factor are just indicators; otherwise their actual behavior is implementation dependent

Big O Notation or order of complexity or computation time
---------------------------------------------------------
Big O notation - describes a simplified maximum possible value of a function in terms of a factor variable the function depends on; this simplified function tells about what level of scale of the function value was caused by that factor variable

Let n be the factor variable of a function say time taken for performing an algorithm.

>Constant time O(1) - takes constant time to execute the algorithm for any n
>Linear time - O(n)
>Logarithmic - O(log n)
>Polynomial or algebraic - O(n^c) where c > 1
>Exponential - O(c^n) where c > 1

Generic Types
==============
*arrays had always been type safe, but collections had not
*a non-generic collection can hold any object or mix of different varieties(types) of objects together
*things in generic syntax angle bracket are called "parameterized type" or "type parameter"
*lets say the generic version of variables as 'type safe' variables
*method input arguments can be declared as type safe
*method return value can be declared as type safe
*compiler will stop you from using any container(object) other than the one that holds the declared type (illegal attempts especially inside those, those whose signature boasts the generic type safety)

e.g. List getDogList(){}
void process(ArrayList nameList){}

*generics way of allowing anything:

List list = new ArrayList();

[is identical to]

List

Word Swatter

Friday, May 25, 2012

GIST NOTES 7 - Java Generics and Collections

2 comments: