Sunday, 1 February 2015

GRAPH THEORY FOR PROGRAMMING CONTEST (POST - 2)

        This tutorial is about graph traversal. We will understand what graph traversal is. In this post, our discussion will be limited to Breadth First Search (BFS) traversal technique on adjacency matrix. In a future tutorial, we will see about Depth First Search (DFS) technique.
              
            First thing that comes to mind is what graph traversal is. It is nothing but going from node to node of a graph to visit all nodes exactly once. The purpose of this is to see whether all nodes are reachable from a source node or not, which nodes are present in the graph, listing the nodes etc. 
            Can't we just travel the nodes in any random order? The answer to this is "Yes" and "No". Of course in visual representation of the graph, we can randomly visit any node. But in programming, it might be difficult and time consuming. A simple approach to traverse in any random order would be :

Let there be n vertices, numbered 0 to n-1.

We define an array  
            int visit[n]={0};
This array will be used to check whether a node has been visited or not. Without it, we might visit same node again and again.

Now, to traverse the nodes randomly, we need to choose a node randomly, till all vertices have been visited. To get a random node, we can use something like,
           int present_node=rand()%n;      //rand() is defined in stdlib.h

To check whether all vertices have been visited or not, we can use another variable "count" which is initially set to 0, and is incremented by 1 whenever we visit a new vertex. When count reaches the value n-1, we know all vertices have been visited.

So the overall code would look something like : 



  //n=number of vertices
   int count=0;
   int visit[n]={0};
   while(count<n)
   {
         int present_node=rand()%n;
         if(visit[present_node]!=0)
         {
                printf("%d\t",present_node);
                visit[present_node]=1;
                count++;
         }
   }



This method is not suitable as it is dependent on chance. Traversing each node doesn't have equal probabilities. So we might visit a node multiple times, while not visiting another one at all. This reduces the time complexity to a greater extent. Also, this method is suitable only if we need to visit all the vertices. This can not be extended to solve any other type of problem.  

So, we need an efficient method and which can be used to solve other problems that are similar to graph traversal but require a little bit modifications



There are two methods of graph traversal that are widely used. These are Breadth First Search and Depth First Search. Lets see about BFS here and about DFS in the next post.

Breadth First Search Traversal : 

In this method, we select a node first. Then we visit all the nodes that are directly related to this node (i.e. there is a link/relation between them). Then we visit the nodes having a link with the newly visited vertices in the same order in which they were visited. This process repeats till all the vertices have been visited. (Read the example below for better understanding).

 Consider the following graph (Green indicates visited, pink indicates adjacent nodes to be visited next) : 

 The first node visited it node-1.
 Then the nodes adjacent to node-1 are selected which are node-2 and node-5 in this case.
 Next, these two are visited.
 After visiting node-2 and node-5, first the nodes related to node-2 are selected which are node-3 and node-4 and node-1.
 But node-1 has been visited already. So the other two are visited.
 After visiting node-3 and node-4, next we visit the nodes adjacent to node-5 (Which was visited just after visiting node 2). The nodes adjacent to node-5 are node-1, node-4 and node-6. Here node-6 is the only not visited node. So we visit it next.
 Next, we visit nodes adjacent to node-3, then node-4 and then node-6. But the nodes adjacent to these are already visited.
At this point, we have visited all the nodes.

Advantage of it over random method :

1) We are never visiting a node more than once.
2) The chances of  a node getting visited is not dependent on any chance event.
3) It gives the shortest distance between the starting node and the node visited.
4) It keeps track of which portion of the graph is visited (in a connected component discussed below).
5) We can use it in finding connected components in a graph, reachability of a node from another node, dijsktras algorithm (discussed in another post) etc.

N.B. Here, we are visiting the nodes adjacent to a node in the same order in which they were visited. I mean, in the above example,we visited the nodes adjacent to node-2 before visiting nodes adjacent to node-5 because node-2 was visited first by node-1 then node-5 was visited. To keep track of which node was visited first, we can use a queue. Here, we put all the nodes adjacent to present node in
Now, lets understand an implementation of bfs in C++ step by step.

 1)First thing first. How to represent a graph as adjacency matrix.

         In previous post, we saw what adjacency matrices are. They are basically a 2D array where cell (i,j) is 1 if node-i has a link to node-j. In programming contests, either an adjacency matrix is given or a list of pair of nodes given that represent which pair of nodes are related.
          Let say the graph has n nodes (n is user input, 1 based indexing). Lets define the adjacency matrix as 
                                          int arr[n][n];
=> when adjacency matrix is given, we can take them as input in the following manner :
                                          for(int i=0;i<n;i++)
                                          for(int j=0;j<n;j++)
                                                        scanf("%d",&arr[i][j]);

=> When list of pairs of nodes are given, the first line usually indicates number of pairs to be input (say x) and next x lines each give a pair of integers. We can use two variables say a and b to take pairs of input and use them to store value in the matrix.
                              int a,b,x;
                              scanf("%d",&x);           //x number of pairs to be input
                              while(x--)
                             {
                                  scanf("%d%d",&a,&b);
                                  arr[a-1][b-1]=1;            //-1 coz array uses 0-based indexing
                              }
                                          
 2) Now we have a graph. But as mentioned above, we need a queue to keep track of the order in which the nodes are visited. A queue is a First In First Out structure. Here, we insert the new nodes to be explored in one end and remove nodes we are exploring right now from the other end. Remember we do not add a node that has already been visited. So, we need to add the first node to the queue and keep exploring till the queue is not empty. As a result, in each turn, we keep adding "not yet visited" nodes to the queue and the queue becomes empty only after all nodes have been visited. (Try to do it using pen and paper if not clear).

       Now, how to create a queue. We can use array as a queue whose size must be equal to the number of nodes in the graph. But this results in wastage as most of the times, we will not be using the full array. We can use a linked list instead. This saves space. Now, writing a code for a linked list takes a little time. But, in C++, we have a "Standard Template Library (STL)", that provides some basic, frequently used data structures. Lets use one.
 
        We will use list data structure defined in header file <list>. The following snippet gives the required features and we will understand them after looking at the code.
                       #include<list>
                       using namespace std;
                      
                       list <int> abc;                    //Creates a list named abc
                       
                       abc.push_back(value);       //Adds value to the list at the end

                       abc.front();                         //Gives the element present in 
                                                                     queue  front
                           
                       abc.pop_front();                 //Removes the first element from the 
                                                                     queue

                       abc.empty();                      //Returns 0 if queue is not empty


           Lets understand now. List data structure is defined under the header list. Then we created a list named abc which is an integer list, i.e. all elements of the list are integer variables. Since queue adds element at one end and removes them from the other end, we have used the above functions (there are more but not required at this moment). push_back() method adds elements in the back of the list while pop_front() removes elements from the front. empty() method is required to check whether the queue is empty or not.

 3) Lets implement bfs in adjacency matrix now. First, we start at a source node (let it be names source). We add it to the queue. Then while queue is not empty, we add all the nodes that are linked with the present node. Then we remove this node and continue to the next node in the queue. To find whether a node has been visited or not, we use another array named visit[] whose size is equal to the number of nodes in the graph.

     To visit all the vertices that are adjacent to the present node we are visiting, we need to visit all the cell of the adjacency matrix where the row number is the 
present node. Take a look at the code for better understanding/

#include<list>
int arr[n][n];
list <int> abc;
int visit[n];
void bfs(int start)
{
     abc.push_back(start);
     while(!abc.empty())
     {
           printf("%d\t",abc.front());   //elements in queue being printed
           for(int i=0;i<n;i++)            //To visit all nodes in adjacency matrix
           {
                 if(arr[abc.front()][i]!=0)   //i.e we can go to that node from front node
                 {
                         if(visit[i]==0)                       //node i hasn't been visited already
                                abc.push_back(i);         //add node i to list
                 }
           } 
           abc.pop_front();                                //So the first element is deleted
     }
}

          In the next post, we will see about connected components (defined in the next post itself). Also I will post a complete code on it soon. Also, if someone finds any error or typo, or has some easier implementations, please share in the comment.



No comments:

Post a Comment