Wednesday, 29 November 2017

Hashing In Data Structure

Hashing involves less key comparison and searching can be performed in constant time. Suppose we have keys which are in the range 0 to n-1 and all of them are unique. We can take an array of size n and store the records in that array based on the condition that key and array index are same. The searching time required is directly proportional to the number of records in the file. We assume a function f and if this function is applied on key K it returns i, an index so that i=f(K).then entry in the access table gives the location of record with key value K.

Hash Functions:
The main idea behind any hash function is to find a one to one correspondence between a key value and index in the hash table where the key value can be placed. There are two principal criteria deciding a hash function H: K->I are as follows:
a. The function H should be very easy and quick to compute.
b. The function H should achieve an even distribution of keys that actually occur across the range of indices.
Some of the commonly used hash functions applied in various applications are:

1. Division Method:
It is obtained by using the modulo operator. First convert the key to an integer then divide it by the size of the index range and take the remainder as the result. H(k)=k%i if indices start from 0, H(k)=(k%i)+1 if indices start from 1

2. Mid –Square Method:
The hash function H is defined by H(k)=x where x is obtained by selecting an appropriate number of bits or digits from the middle of the square of the key value k.it is criticized as it is time consuming.

3. Folding Method:
The key is partitioned into a number of parts and then the parts are added together. There are many variation in this method one is fold shifting method where the even number parts are each reversed before the addition. Another is the fold boundary method here the two boundary parts are reversed and then are added with all other parts.

Collision Resolution Techniques:
If for a given set of key values the hash functions does not distribute then uniformly over the hash table, some entries are there which are empty and in some entries more than one key value are to be stored. Allotment of more than one key values in one location in the hash table is called collision.

1. Closed Hashing:
The simplest method to resolve a collision is closed hashing. Here the hash table is considered as circular so that when the last location is reached the search proceeds to the first location of the table. That is why this is called closed hashing.
The search will continue until any one case occurs:
a. The key value is found.
b. Unoccupied location is encountered.
c. It reaches to the location where the search was started.

Drawback of Closed Hashing: 
As the half of the hash table is filled there is a tendency towards clustering. The key values are clustered in large groups and as a result sequential search becomes slower and slower.
Some solutions to avoid this are:
a. Random probing
b. Double hashing or rehashing
c. Quadratic probing

Random Probing:
This method uses a pseudo random number generator to generate a random sequence of locations, rather than an ordered sequence as was the case in linear probing method. The random sequence generated by the pseudo random number generator contains all positions between 1 and h, the highest location of the hash table.
I=(i+m)%h+1, i is the number in the sequence. m and h are integers that are relatively prime to each other.

Double Hashing:
When two hash functions are used to avoid secondary clustering then it is called double hashing. The second function should be selected in such a way that hash address generated by two hash functions are distinct and the second function generates a value m for the key k so that m and h are relatively prime.
(k)=(k%h)+1
(k)=(k%(h-4))+1. 

Quadratic Probing:
It is a collision resolution method that eliminates the primary clustering problem of linear probing. For quadratic probing the next location after i will be i+,i++...... etc.
H(k)+%h for i=1,2,3.....

2. Open Hashing:
In closed hashing two situations occurs 1. If there is a table overflow situation 2. If the key values are haphazardly intermixed. To solve this problem another hashing is used open chaining.

Advantages and Disadvantages of  Chaining:
a. Overflow situation never arises. Hash table maintains lists which can contain any number of key values.
b. Collision resolution can be achieved very efficiently if the lists maintain an ordering of keys so that keys can be searched quickly.
d. Open hashing is best suitable in applications where number of key values varies drastically as it uses dynamic storage management policy.
e. The chaining has one disadvantage of maintaining linked lists and extra storage space for link fields.

No comments:

Post a Comment

Thanks
Mukesh Rajput