
With a HashMap you get the convenience of using the student ID numbers as indices, while keeping the storage requirements reasonable. The idea is that the indices that the user provides are transformed/mapped internally by the data structure to indices that fit within the desired capacity.
Consider a HashMap with capacity m = 2700. Here is one way to ensure that student ID's map to the valid cells in the map:
HashMap<Integer, Student> map(2700); // create a map of capacity 2700
Key Value
map.put(76544632, mickey); // internally mickey is stored at index 76544632 % 2700 = 2332
map.put(67587658, minnie); // internally minnie is stored at index 67587658 % 2700 = 1258
map.put(14742300, donald); // internally donald is stored at index 14742300 % 2700 = 300
map.put(87648487, calvin); // internally calvin is stored at index 87648487 % 2700 = 1087
...
map.put(14742300, garfield); // donald is replaced by garfield (used the same index 14742300)
Essentially, each key, k, supplied by the user is transformed to an integer hash code, h, which is then mapped to a bucket index, i, into the HashMap as i = h % capacity, which ensures that the key given by the user maps into a valid index internally in the HashMap data structure. (This is only the simplest choice of hash function/mapping, but there other many other possibilities.)
Note that from the user's point of view mickey is stored at index 76544632, minnie is stored at index 67587658, etc. However, internally the map only has 2700 cells, so mickey, minnie, etc. are stored at completely different locations.
Unfortunately, it is possible for the hash function to map two different keys to the same cell within the map, in which case we say that a collision occurs. For example:
Key Value
map.put(76544632, mickey); // internally mickey is stored at index 76544632 % 2700 = 2332
map.put(67587658, minnie); // internally minnie is stored at index 67587658 % 2700 = 1258
...
map.put(34049332, jerry); // internally jerry is stored at index 34049332 % 2700 = 2332 <- same as mickey
map.put(22571632, tom); // internally tom is stored at index 22571632 % 2700 = 2332 <- same as mickey, jerry
In the example above mickey, jerry, and tom have different keys/indices from the user's point of view, but are forced to share cell 2332 inside the hash map.
Note that the following is not a collision:
map.put(67587658, donald); // used exactly the same key (67587658) as minnie
// so donald replaces minnie at index 67587658
Here are possible ways to handle collisions.
mickey, jerry, and tom will all be stored in the same cell
2330 |*-|-->// 2331 |*-|-->// 2332 |*-|-->|76544632:mickey|-->|34049332:jerry|-->|22571632:tom|-->// 2333 |*-|-->// 2334 |*-|-->//Note that we need to keep a whole entry with both the original
key and the corresponding value, i.e. |76544632:mickey|, |34049332:jerry|, etc. The user of the hash map has no knowledge of index 2332 -- this is internal to the hash map.
i=h,h+1,h+2, ...,h+capacity(with%capacityfor wrap around)
2330 | -empty- | 2331 | -empty- | 2332 | 76544632:mickey | 2333 | 34049332:jerry | 2334 | 22571632:tom | 2335 | -empty- |
EMPTY to indicate that the cell is availableHashMap