Programming Algorithm Hash Table | Indexing | Hashing Algorithm | Python...

Hash Table | Indexing | Hashing Algorithm | Python Implementation

-

This article will talk about a high-level view of the Hash Table. As a programmer, this technique should always keep at the top of our mind when solving any difficult problem. At a high level, a hash table is a key-value lookup. Essentially this is a data structure consisting of a key with associating value that allows very fast retrieval of data. Suppose we have some dataset that associates with some names. In this situation, the hash table would be a perfect solution. Because we can just map a dataset with someone’s name and then say give me the data associating with this specific name, and immediately it returns the information.

Hash Table Concept | Hash Function – Indexing

Hash table is a convenient way to store data no matter how big the data is. That’s why it is widely used in database indexing, caching, program compilation, error checking, and much more. Information in such a table has two basic components, the key, and the value. For example. a piece of information in this table can be people’s phone numbers. Here the name of a people is the key and other info including phone number is the value. In a very simple case, we can consider the hash table is an implementation of an associative array.

The way hash table works that it takes a key as an input and runs it through a hash function. The hash function is just a mapping key to some number. The Keyes are input strings and the numbers correspond to indices of the array.

Hash Table | Indexing | Hashing Algorithm| Python Implementation

How the Table Works | Hash Function | Hashing

consider a simple one-dimensional array variable. To find an item (key) in that list (array) we can employ a brute force approach such as linear search. In this case, if we know the index number of that item in the array, we will be able to retrieve that item very quickly. So if we can calculate the index number using the item itself then the index number will be somehow related to data.

let an item in the array is ‘Name-1.’ We can take the ASCII value of each character and add up them together. Now divide the sum by the total number of items in the array. The remainder of the division will be the index number for the item, ‘Name-1’ in the array.

So, the Index Number = (Sum of ASCIIs) mod array_length

Similarly to retrieve an item (key) such as ‘Name-1,’ we will follow the same procedure to calculate index number and then jump to that index number to grab the value very quickly.

The above method of indexing of Hash-Function is called the ‘Division Method.’ This method has its own restriction and limitation. So another more effective method for Hash-Function or Hashing is Multiplication Method.’ In this method, we take a key and multiply it by an integer a, and then mod the multiple by $2^{w}$, finally Shift it right >> (w-r). So $Hash (key) = (a.k) mod (2^{w}) >> (w-r)$. w – here is a w-bit machine or word size of the machine is w-bit, and a is a random integer. Remember that for a w-bit machine, the length of our key k is also w-bit.

There are also many other Hashing methods. Look for web references for the details on Hashing Methods.

Storing key-value Pair

A hash table generally stores a key-value pair. For example, a name would be just a key by which the hash function generates an index number and the phone number will be its corresponding value. This is why a hash table of key-value pairs is sometimes also referred to as Hashmap. So if we take an object-oriented approach then each person would be an instance of a class, and the key, in this case, would be any of its many different properties. Hence by the populating array with objects, we can effectively store as much related data as we want for each key.

Collision | Open Addressing and Chaining

A collision is an incident that occurs when the hash function generates the same index number for two different input keys.

For a single one dimensional array, if the hash function generates the memory address for key-2 where the value of key-1 is, then we search the array for the closest following free location and put the key-2 (the item) there. This way resolving collision by placing an item somewhere other than it’s calculated address is called open addressing. Because every location here is open to any item.

Linear Probing

Open addressing can use a variety of techniques to decide where to place an item that doesn’t go where it should. So this particular open addressing technique above is the linear probing. If linear probing gets to the ends of the array and still can’t find a free space then it might cycle around to the beginning of the array and continue searching from there.

Hash Table | Hash Function | Collision | Linear Probing | CS School
Open Addressing Technique: Linear Probing

To look up an item in hash table hash function takes a key and generates an index number. If there is a collision or some item is not in a calculated position then finding that item also involves linear probing. Which is in this case is called linear search.

The more item in the hash table there is more possibilities of collision when inserting data. One way to deal with it is to make the hash table bigger than the total amount of data. The ratio between number of items and the size of the array is the load factor. So $Load_Factor = \frac{num_of_itens}{size_of_array}$. If the hash table is a resizable dynamic array then it will increase its size automatically according to a certain threshold of load factor. As long as load factor is reasonably low, open addressing with linear probing will work reasonably well.

Chaining

Another way to deal with a collision is chaining, sometimes also refers to closed addressing. In this case, we can resolve collisions using a linked list.

Let the hash function generates an address location 04 for key1. So this case, the address location 04 holds a pointer that points to the first node of a linked list which contains the item1 for key1. Now if another key ends up in the same address location 04 then we put that item to the second node of the LinkedList and the first-node points to that second-node and so on.

To search an item in such a table the hash function generates the index number for the given key and follows the standard Linked List traversal for that exact item. With the chaining method, there is a greater proportion of items in the correct place. So the lookup is quicker than linear probing.

Hash Table | Indexing | Collision | Chaining | CS School
Avoiding Collision: Chaining

Although traversing a LinkedList also comes at some cost. So if the load factor is low then it may be efficient to use open addressing.

Primary Clustering and Rehashing

When resolving a collision when a calculated memory location is occupied linear probing involves trying the next location and the next until an empty location is eventually found. But this can result in what is known as Primary Clustering. This means keys might bunch together inside an array while a large proportion of it might unoccupied.

The alternative to linear probing that can help to avoid clustering is Rehashing. Rather than simply scanning for next available location conflict resolution may involve looking at every $3^{rd}$ location until it finds a free space. This technique is called +3 rehashing. Double Hashing on the other hand applies a second hash function to the key when a collision occurs. The result of the second hash function gives the number of positions along from the point of the original collision to try next.

Python Implementation

HashTable or Hashmap in Python is a built-in Data Structure called Dictionary. It is an Abstract Data Type (ADT). It can store items, insert items, delete items, and search for items. Each item in a Dictionary has a unique key. If we search a key in the dictionary and it will return the item corresponding to that key. In Python Dictionary, inserting an item with any existing key will completely overwrite the item.

Dictionary is a dict data type object in python. An item in a dictionary is a key-value pair. We can perform a search using my_dic(key) function. The procedure of insertion is my_dic(key) = value. Also, Del my_dic(key) will perform a delete operation on my_dic dictionary.

There are two ways to create a dictionary in python. We create a dictionary from scratch using curly braces or we can use python’s built-in dict() function.

From Scratch using curly braces

my_dic = {"key-1":111, "key-2":"value-2", "key-3":"value-3"}

here every key-value pair in my_dic separated by a comma.

Using dict() function

Python also provide dict constructor to create a dictionary.

my_dic2 = dict(key1=999, key2="value2")

To add new data in my_dic2 we should use following syntax:

my_dic2["key3"] = 5509
my_dic2["key4"] = "value4"

A python dictionary can contain another dictionary as well, which is a concept of the Nested-Dictionary. Look for web references on python dictionaries. Check here or you can read our article on python dictionary as well. We are here just giving a proper intuition over fundamentals.

Latest Articles

Dictionaries | HashMap in Python | Working with Key-Values

Dictionaries in Python is similar to Hashmap comparing to other languages. It stores data as a key-value...

Hash Table | Indexing | Hashing Algorithm | Python Implementation

This article will talk about a high-level view of the Hash Table. As a programmer, this technique...

Eigenvector Eigenvalue | Linear Algebra Fundamentals

Eigenvector ($bar{v}$) in linear algebra is a non-zero vector (matrix) that doesn't change its direction during linear...

Pivot Table | Microsoft Excel | Create Data Insight Easily

Pivot table in microsoft Excel is an useful function that gives us a way to create insight...

Macro Function in Microsoft Excel | Automate Repetitive Task

This article we will talk about the Macro. It is a function in microsoft excel which basically...

SVD | Singular Value Decomposition | Machine Learning Fundamentals

Singular Value Decomposition or SVD is the general purpose useful tool in Numerical Linear Algebra for data...

Must read

Dictionaries | HashMap in Python | Working with Key-Values

Dictionaries in Python is similar to Hashmap...

Eigenvector Eigenvalue | Linear Algebra Fundamentals

Eigenvector ($bar{v}$) in linear algebra is a...

You might also likeRELATED
Recommended to you