Sunday, 1 February 2015

GRAPH THEORY FOR PROGRAMMING CONTEST (POST - 2)

        This tutorial is about graph traversal. We will understand what graph traversal is. In this post, our discussion will be limited to Breadth First Search (BFS) traversal technique on adjacency matrix. In a future tutorial, we will see about Depth First Search (DFS) technique.
              
            First thing that comes to mind is what graph traversal is. It is nothing but going from node to node of a graph to visit all nodes exactly once. The purpose of this is to see whether all nodes are reachable from a source node or not, which nodes are present in the graph, listing the nodes etc. 
            Can't we just travel the nodes in any random order? The answer to this is "Yes" and "No". Of course in visual representation of the graph, we can randomly visit any node. But in programming, it might be difficult and time consuming. A simple approach to traverse in any random order would be :

Let there be n vertices, numbered 0 to n-1.

We define an array  
            int visit[n]={0};
This array will be used to check whether a node has been visited or not. Without it, we might visit same node again and again.

Now, to traverse the nodes randomly, we need to choose a node randomly, till all vertices have been visited. To get a random node, we can use something like,
           int present_node=rand()%n;      //rand() is defined in stdlib.h

To check whether all vertices have been visited or not, we can use another variable "count" which is initially set to 0, and is incremented by 1 whenever we visit a new vertex. When count reaches the value n-1, we know all vertices have been visited.

So the overall code would look something like : 



  //n=number of vertices
   int count=0;
   int visit[n]={0};
   while(count<n)
   {
         int present_node=rand()%n;
         if(visit[present_node]!=0)
         {
                printf("%d\t",present_node);
                visit[present_node]=1;
                count++;
         }
   }



This method is not suitable as it is dependent on chance. Traversing each node doesn't have equal probabilities. So we might visit a node multiple times, while not visiting another one at all. This reduces the time complexity to a greater extent. Also, this method is suitable only if we need to visit all the vertices. This can not be extended to solve any other type of problem.  

So, we need an efficient method and which can be used to solve other problems that are similar to graph traversal but require a little bit modifications



There are two methods of graph traversal that are widely used. These are Breadth First Search and Depth First Search. Lets see about BFS here and about DFS in the next post.

Breadth First Search Traversal : 

In this method, we select a node first. Then we visit all the nodes that are directly related to this node (i.e. there is a link/relation between them). Then we visit the nodes having a link with the newly visited vertices in the same order in which they were visited. This process repeats till all the vertices have been visited. (Read the example below for better understanding).

 Consider the following graph (Green indicates visited, pink indicates adjacent nodes to be visited next) : 

 The first node visited it node-1.
 Then the nodes adjacent to node-1 are selected which are node-2 and node-5 in this case.
 Next, these two are visited.
 After visiting node-2 and node-5, first the nodes related to node-2 are selected which are node-3 and node-4 and node-1.
 But node-1 has been visited already. So the other two are visited.
 After visiting node-3 and node-4, next we visit the nodes adjacent to node-5 (Which was visited just after visiting node 2). The nodes adjacent to node-5 are node-1, node-4 and node-6. Here node-6 is the only not visited node. So we visit it next.
 Next, we visit nodes adjacent to node-3, then node-4 and then node-6. But the nodes adjacent to these are already visited.
At this point, we have visited all the nodes.

Advantage of it over random method :

1) We are never visiting a node more than once.
2) The chances of  a node getting visited is not dependent on any chance event.
3) It gives the shortest distance between the starting node and the node visited.
4) It keeps track of which portion of the graph is visited (in a connected component discussed below).
5) We can use it in finding connected components in a graph, reachability of a node from another node, dijsktras algorithm (discussed in another post) etc.

N.B. Here, we are visiting the nodes adjacent to a node in the same order in which they were visited. I mean, in the above example,we visited the nodes adjacent to node-2 before visiting nodes adjacent to node-5 because node-2 was visited first by node-1 then node-5 was visited. To keep track of which node was visited first, we can use a queue. Here, we put all the nodes adjacent to present node in
Now, lets understand an implementation of bfs in C++ step by step.

 1)First thing first. How to represent a graph as adjacency matrix.

         In previous post, we saw what adjacency matrices are. They are basically a 2D array where cell (i,j) is 1 if node-i has a link to node-j. In programming contests, either an adjacency matrix is given or a list of pair of nodes given that represent which pair of nodes are related.
          Let say the graph has n nodes (n is user input, 1 based indexing). Lets define the adjacency matrix as 
                                          int arr[n][n];
=> when adjacency matrix is given, we can take them as input in the following manner :
                                          for(int i=0;i<n;i++)
                                          for(int j=0;j<n;j++)
                                                        scanf("%d",&arr[i][j]);

=> When list of pairs of nodes are given, the first line usually indicates number of pairs to be input (say x) and next x lines each give a pair of integers. We can use two variables say a and b to take pairs of input and use them to store value in the matrix.
                              int a,b,x;
                              scanf("%d",&x);           //x number of pairs to be input
                              while(x--)
                             {
                                  scanf("%d%d",&a,&b);
                                  arr[a-1][b-1]=1;            //-1 coz array uses 0-based indexing
                              }
                                          
 2) Now we have a graph. But as mentioned above, we need a queue to keep track of the order in which the nodes are visited. A queue is a First In First Out structure. Here, we insert the new nodes to be explored in one end and remove nodes we are exploring right now from the other end. Remember we do not add a node that has already been visited. So, we need to add the first node to the queue and keep exploring till the queue is not empty. As a result, in each turn, we keep adding "not yet visited" nodes to the queue and the queue becomes empty only after all nodes have been visited. (Try to do it using pen and paper if not clear).

       Now, how to create a queue. We can use array as a queue whose size must be equal to the number of nodes in the graph. But this results in wastage as most of the times, we will not be using the full array. We can use a linked list instead. This saves space. Now, writing a code for a linked list takes a little time. But, in C++, we have a "Standard Template Library (STL)", that provides some basic, frequently used data structures. Lets use one.
 
        We will use list data structure defined in header file <list>. The following snippet gives the required features and we will understand them after looking at the code.
                       #include<list>
                       using namespace std;
                      
                       list <int> abc;                    //Creates a list named abc
                       
                       abc.push_back(value);       //Adds value to the list at the end

                       abc.front();                         //Gives the element present in 
                                                                     queue  front
                           
                       abc.pop_front();                 //Removes the first element from the 
                                                                     queue

                       abc.empty();                      //Returns 0 if queue is not empty


           Lets understand now. List data structure is defined under the header list. Then we created a list named abc which is an integer list, i.e. all elements of the list are integer variables. Since queue adds element at one end and removes them from the other end, we have used the above functions (there are more but not required at this moment). push_back() method adds elements in the back of the list while pop_front() removes elements from the front. empty() method is required to check whether the queue is empty or not.

 3) Lets implement bfs in adjacency matrix now. First, we start at a source node (let it be names source). We add it to the queue. Then while queue is not empty, we add all the nodes that are linked with the present node. Then we remove this node and continue to the next node in the queue. To find whether a node has been visited or not, we use another array named visit[] whose size is equal to the number of nodes in the graph.

     To visit all the vertices that are adjacent to the present node we are visiting, we need to visit all the cell of the adjacency matrix where the row number is the 
present node. Take a look at the code for better understanding/

#include<list>
int arr[n][n];
list <int> abc;
int visit[n];
void bfs(int start)
{
     abc.push_back(start);
     while(!abc.empty())
     {
           printf("%d\t",abc.front());   //elements in queue being printed
           for(int i=0;i<n;i++)            //To visit all nodes in adjacency matrix
           {
                 if(arr[abc.front()][i]!=0)   //i.e we can go to that node from front node
                 {
                         if(visit[i]==0)                       //node i hasn't been visited already
                                abc.push_back(i);         //add node i to list
                 }
           } 
           abc.pop_front();                                //So the first element is deleted
     }
}

          In the next post, we will see about connected components (defined in the next post itself). Also I will post a complete code on it soon. Also, if someone finds any error or typo, or has some easier implementations, please share in the comment.



Saturday, 31 January 2015

GRAPH THEORY FOR PROGRAMMING CONTEST (Post - 1)



        This post will explain graph theory in as simple way as possible. This is not a complete tutorial and lacks a lot of informations. Please provide comments on how I can improve it.

         A graph is a structure comprising of nodes and edges. Nodes are basically objects and edges represent the relations between them.

        Consider the following problem. Suppose there are n islands and there are some roads connecting certain pairs between them. We have to find whether we can use the roads (possibly more than one different roads) to reach from one island to another.

        Let us denote each island as a circle with a number on it (The number is to distinguish each island). The lines joining the circles are the roads. Take a look at the picture for clarification.
        Now the above problem can be solved easily as we can visualize the entire thing. We can go to island 4 from island 5 (5 -> 1 -> 4), but not to island 7. We can go to island 7 from 6 but can not reach there any other way. We can not go to island 3 from any other island.

        In graph theory, the islands are represented by nodes and the roads by edges. Node and edge are graph theory terminologies. We can represent any object as nodes and the relation between them as edge. In programming, obviously we cannot use visual representations. We need some other technique.Two techniques commonly in use are adjacency matrix and adjacency list form. Lets discuss them one by one.


adjacency matrix:

         Consider there are n nodes (i.e. n different objects). In this method, we use a 2D array of size nXn. For each cell (i,j), we put 1 if there is a link between node i+1 and j+1 (Plus 1 because array uses zero based indexing and we have nodes starting from 1).

         The adjacency matrix for the above island problem is

 0 1 0 1 1 0 0 
 1 0 0 1 0 0 0 
 0 0 0 0 0 0 0 
 1 1 0 0 0 0 0 
 1 0 0 0 0 0 0 
 0 0 0 0 0 0 1 
 0 0 0 0 0 1 0 

        Here, we have considered a node is not related to itself. Had it been the case, we would have put 1 in (i,j) cell where i==j.

        Another thing to notice is that, here if node i is related to node j, then node j is also related to node i (i.e. the roads are bi-directional). So the adjacency matrix is symmetrical about the diagonal. This type of graph is called undirected graph. If it had not been the case, and if node i being related to node j didn't imply node j is related to node i, such a graph would be called directed graph. In visual representation, instead of a line between two nodes, an arrow is used that points from node i to node j, if i is related to node j. Adjacency matrix of such a graph is certainly not symmetrical.

        As can be seen from the above adjacency matrix, such a representation takes O(nXn) i.e. O(n^2) size. This method is applicable when number of nodes is small (About 2000 for most programming contests (Give or take a few nodes)). As number of nodes increases, we need to use adjacency list representation.

adjacency list:

        In this method, we create an array of linked list. Each node of the array represents each node of the graph and the linked list in it represents all the nodes that are related to the present node.

        In our example, node 1 is related to node 2, 4 and 5. Node 2 is related to node 1 and 4 while node 3 is not related to any other nodes.

 1) -> 2 -> 4 -> 5  
 2) -> 1 -> 4          
 3)                         
 4) -> 1 -> 2          
 5) -> 1                  
 6) -> 7                  
 7) -> 6                  

        In programming language like c++, an array of vector or list is used for this representations.

       Also, as can be seen above, there is no need to distinguish between directed and undirected graph. Also, a lot of space is saved. Maximum space required is O(n*e) where e represents maximum number of edges.

     This is all the theory needed in introduction. There are lots of more terminologies used in graph theory. We will see them in subsequent posts when we need them. Also, check the next post on graph traversal for information on how to implement adjacency matrix and list.