Ordered Set in C++ : Policy Based Data Structure

 

=> Why we require Ordered Set ?

If you are a C++ competitive programmer, then you must already be taking advantage of C++ STL. If you don't know what is STL, then I would suggest you first understand it pretty well and learn about the containers and algorithms provided in STL. You can check out my detailed C++ STL series.

Now let's assume that you have a decent knowledge about C++ STL. Then you might have used Associative Containers, more precisely set, or multiset. You generally use them when you want your data in sorted order.Therefore they allow you to find a value using binary search which works in O(log n). Set has  inbuilt member functions find(), lower_bound() and upper_bound() which allows you to perform this operation. 

Consider the below two situations :

Situation 1 : What if instead of just finding a particular value, you want to find the position(or index) of that value (in sorted order).

Situation 2 : You want to find the value which is at the kth position in the set (in sorted order) .

 

Ordered Set allows us to solve these problems over set or multiset.  Now let's understand what is Ordered Set first before discussing problems where you would tackle these situations.

=> Ordered Set in C++

Ordered set is a policy based data structure in g++ that keeps elements in sorted order. It performs all the operations as performed by the set data structure in STL(like find(), lower_bound() etc.) in O(log(n)) complexity and performs two additional functions with the same O(log(n)) complexity:

  • order_of_key (k) : number of items which are strictly smaller than k .
  • find_by_order(k) : k-th element in a set (counting from zero).

  • Required header files to implement ordered set and their description

    For implementing ordered_set or other Policy based data structures we need to include :

    1. include <ext/pb_ds/assoc_container.hpp>  

    The tree-based data structures which we will be using below is present in this header file.

    2. include <ext/pb_ds/tree_policy.hpp> 

    used to include the tree_order_statistics_node update which is explained below: 

     

    We also require to use this namespace :

    using namespace __gnu_pbds;

    It is a namespace necessary for the GNU based Policy based data structures.

    The necessary structure required for the ordered set implementation is :

    tree < int , null_type , less , rb_tree_tag , tree_order_statistics_node_update >

    ordered_set is used as a macro(using typedef) for  tree<int, null_type, less, rb_tree_tag, tree_order_statistics_node_update>.

    Let's understand what each purpose each parameter serves and how can you modify it as per your requirement.

    1. int : It is the type of the data that we want to insert. It can be any type like int,pair<int,int> as per your requirement similarly as you would use a type in set.

    2. null_type : It is the mapped policy. It is null here to use it as a set.If we want to use map instead of set, as the second argument type must be used mapped type.

    3. less : It defines comparison criteria.

    4. rb_tree_tag : type of tree used. It is generally Red black trees because it takes log(n) time for insertion and deletion.

    5. tree_order_statistics_node__update : It is included in tree_policy.hpp and contains various operations for updating the node variants of a tree-based container, so we can keep track of metadata like the number of nodes in a subtree.

     

    Ordered set has similar functionality as set. It is just an extended version of set which provides two extra functionalities which we require to solve the two problems discussed above :

    1. find_by_order(k): It returns to an iterator to the kth element (counting from zero) in the set in O(logn) time.
    Let us assume we have a set s : {3, 15, 26, 46, 88}, then :
    *(s.find_by_order(2)) : 3rd element in the set i.e. 26
    *(s.find_by_order(4)) : 5th element in the set i.e. 88 

    2. order_of_key(k) : It returns to the number of items that are strictly smaller than our item k in O(logn) time.
    Let us assume we have a set s : {3, 15, 26, 46, 88}, then :
    s.order_of_key(6) : Count of elements strictly smaller than 6 is 1.
    s.order_of_key(50) : Count of elements strictly smaller than 25 is 4.

     

    Since you have good knowledge of what is an ordered_set now, let's discuss some actual problems for the situations we discussed before and see how ordered_set will help us solve them.

     

    => Problem example for situation 1

    A very popular problem is counting Inversions in an array. Inversion Count for an array indicates – how far (or close) the array is from being sorted. Formally speaking, two elements a[i] and a[j] form an inversion if a[i] > a[j] and i < j.

    There are multiple ways of solving this, like using merge sort or advanced data structure like segment tree. But we will discuss the solution using ordered set.

    For every number, you need to find the count of numbers coming before it and having value greater than it. If you have a data structure which stores elements in sorted order then while inserting any element in it you can check how many elements already exist in it whose value is greater than this element.

    Consider an array arr = {20,10,5,4,2,9};

    let's say we are currently at index 4 (value - 2). We have already inserted elements upto this index. Then the sorted data structure will currently look like {2,4,5,10,20}.  We want inversions formed with this index.

    We find the number of elements strictly less than 3 (2+1) which is 1 and subtract it from totalsize : 5  - 1 = 4 to get the answer for index 4. Similarly this is done for each index from 0 to n-1.

    So if we can find the position(pos) of an element in the data structure, we can easily solve this problem.

    We know that set and multiset store elements in sorted order. But they dont provide any function to find to find position of an element, They can just provide the value of element. Therefore we can't use them for such situation. But Ordered Set comes to our rescue. 

     

    We will be using the function order_of_key(K) which returns number of elements strictly smaller than K in log N time.

    Algorithm :

     Insert the first element of the array in the Ordered_Set. For all the remaining elements(i=1 to n-1) in arr[] do the following :

    1. Insert the current element in the Ordered_Set.

    2. Find the number of element strictly less than current element+1 in Ordered_Set using function order_of_key(arr[i]+1).

    3. The difference between size of Ordered_Set and order_of_key(current_element + 1) will give us the inversion count for the current element.

     


    #include <bits/stdc++.h> 
    #include <ext/pb_ds/assoc_container.hpp> 
    #include <ext/pb_ds/tree_policy.hpp> 
    using namespace __gnu_pbds; 
    using namespace std; 
    
    // Ordered Set Tree 
    typedef tree<int, null_type, less_equal<int>, 
    			rb_tree_tag, 
    			tree_order_statistics_node_update> 
    	ordered_set; 
     
    int getInvCount(int arr[], int n) 
    { 
    	int key; 
    	                
    	ordered_set oset;             
    
    	// Insert the first 
    	// element in set 
    	oset.insert(arr[0]); 
    
    	// Intialise inversion count to zero 
    	int invcount = 0; 
    
    	
    	for (int i = 1; i < n; i++) { 
    		oset.insert(arr[i]); 
    
    		// Number of elements strictly 
    		// less than arr[i]+1 
    		key = oset.order_of_key(arr[i] + 1); 
    
    		// Difference between ordered_set size  and key will give the inversion count 
    		invcount += oset.size() - key; 
    	} 
    	return invcount; 
    } 
    
    int main() 
    { 
    	int arr[] = { 20, 44, 32, 15 }; 
    	int n = sizeof(arr) / sizeof(int); 
    
    	// Function call to count inversions in array 
    	cout << getInvCount(arr, n); 
    	return 0; 
    } 
    

    Time Complexity : O(nlog(n))

     Note  : To handle duplicate elements in the above code perform changes as discussed in the section

    Using ordered_set as multiset  below .

     

    => Problem example for situation 2

    => Consider we are provided two types of queries randomly.

    Type 1 :  insert(k)                     :  Insert an element 

    Type 2 :  findKthSmallest(k)   : Return kth smallest element  

    Total no of queries q <= 10^6

    Ordered_set allows us to very easily solve this problem. 

    Algorithm

    Declare an ordered_set. Iterate on all queries one by one and do the following :

    1. If query is of type 1, insert the given element in the ordered_set. Time complexity : (O(log (n))

    2. If query is of type 2, use find_by_order(k-1) to get kth smallest element.(We use k-1 because indexing starts from 0 in ordered_set so kth smallest element has k-1 index). Time complexity : (O(log (n))


    #include <bits/stdc++.h> 
    #include <ext/pb_ds/assoc_container.hpp> 
    #include <ext/pb_ds/tree_policy.hpp> 
    using namespace __gnu_pbds; 
    using namespace std; 
    
    // Ordered Set Tree 
    typedef tree<int, null_type, less_equal<int>, 
    			rb_tree_tag, 
    			tree_order_statistics_node_update> 
    	ordered_set; 
     
    
    
    int main() 
    { 
    	int arr[] = { 20, 44, 32, 15 }; 
    	int n = sizeof(arr) / sizeof(int); 
    
        ordered_set oset;
        
        int k;      // value to be inserted or found
        int q;      // number of queries
        int qtype;  // query type
        cin>>q;
        
        while(q--)
        {
            cin>>qtype>>k;
            if(qtype==1)        // insert
                oset.insert(k);
            else                // find kth smallest element
                cout<<*(oset.find_by_order(k-1))<<endl;
        }
        
    	return 0; 
    } 
    

     Time Complexity : O(nlog(n)) 

     Note  : To handle duplicate elements in the above code perform changes as discussed below.

     

    => Using ordered_set for multiset 

    Everything we discussed about ordered_set, it is an extension of set and therefore contains only the UNIQUE elements. But we may not always have all unique elements. Therefore we want same functionality with multiset also. ordered_set by default does not provide multiset implementation but we can still use it as multiset in the following way :

    Use pair<int,int> as data type instead of int  .Now use second value in pair as unique for every value(take as index in original array). The first value in pair is the actual value of the element. Now each pair element will be unique and we are also able to store duplicate elements by storing their indexes.


    me.order_of_key({x, 0}) : gives no of elements strictly smaller than x since {x,0} is smaller than each pair {x,y} as y>=0 for any element with value x.

    size of multiset - me.order_of_key({x, INT_MAX}) : gives no of elements strictly greater than x since {x,INT_MAX} is greater than each pair {x,y} as y<INT_MAX for any element with value x.

    For multiset, declare ordered_set as : 

    typedef tree< pair<int,int>, null_type, less<pair<int,int>>, rb_tree_tag, tree_order_statistics_node_update> ordered_set;  


    Now you must be ready to utilize ordered_set to its full potential. I am also providing some questions where you can apply ordered_set and understand it's use cases better.

    1. Codeforces Problem D. Nested Segments

    2. SPOJ : INVCNT - Inversion Count

    3. SPOJ : RATING - Coder Ratings

     

    This brings us to the end of this article.

    If you have any doubts regarding ordered_set please mention in the comments section.

    Thank you for your patience reading. If you enjoyed this post, I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Whatsapp or Facebook. 

    😇Happy Learning!!