The Algo Blog: sort

Showing posts with label sort. Show all posts

Sorting when all the input elements are unknown

While reading Knuth's TAOCP vol.3 Sorting and Searching, p.139, another characteristic of sorting algorithms to me : Few sorting techniques require the entire elements to be present before-hand before sorting, for ex: Selection sort. Other sorting techniques do not need this, for ex: insertion sort (the sort of card-players). To quote Knuth:
A selection method requires all of the input items to be present before sorting may proceed, and it generates the final outputs one by one in a sequence.

This is essentially the opposite of insertion, where the inputs are received sequentially but we don not know any of the final outputs until sorting is complete.
(Knuth(1998) ''TAOCP'', vol.3 p.139)

Segregating the positive and negative numbers

[Problem from Knuth,TAOCP, vol.3, p.136]
Design an algorithm that rearranges all the numbers in a given table so that all negative values precede all nonnegative one. (The items need not be sorted completely, just separated between negative and nonnegative.) Your algorithm should use the minimum possible number of exchanges.

Knuth's solution:
Knuth suggests a modified radix exchange sort which compares the MSB (bit corresponding to sign )

Another way to do this is to write a specialized comparison routine which is passed to a sorting routine. The routine can be something like:

compare(a,b)
  if a<0 and b<0: return 0 // same
  if a>0 and b>0: return 0 // same
  return a>b // one -ve, one +ve

Other solution is to include counting sort technique.
The counting sort technique is implemented below in python and is stable.

import random 

N = 50
# generate an array of +ve and -ve numbers.
array = random.sample( range(-N, N), N )

# counting techinque : 2 pass algorithm
def counting_seperate(array):
    num_pos = 0
    num_neg = 0
    for i in array:
        if i<0: num_neg += 1
        else: num_pos += 1
    # destination array
    dest = [0] * len(array)

    #the final index of a +ve number, can simply be len of array
    num_pos += num_neg

    # Count from n-1 to 0 : stable 
    i = len(array)-1
    while i>=0:
        if array[i]< 0: 
            dest[num_neg-1]=array[i]
            num_neg -= 1 # decrease the index 
        else:
            dest[num_pos-1] = array[i]
            num_pos -= 1 # decrease the index
        i -= 1

    print( "segregated array:", dest )

if __name__ == "__main__":
    print( "Original array: ", array )
    counting_seperate(array)

Finding the mth smallest of a given set on n elements.

[Problem posed by C.A.R.Hoare] Suppose that, instead of sorting an entire file, we only want to determine the mth smallest of a given set of n elements. Show that quicksort can be adapted to this purpose avoiding many of the computations required to do a complete sort. ( Knuth(1998)The Art of Computer Programming Vol.3:Sorting and Searching, p.136 // Bentley (2011) Programming Pearls Column 11 )

In the normal partitioning process of quicksort, if partition position s=m, stop;
if s < m, use the same techinque to find (m-s)th position;
If s > m, find the mth smallest element in the left hand.

Searching a sorted 2D array

A 2D Array is sorted on its row and columns, find a key.

"""
Program to search an element in a 2-d array
The 2D array has sorted rows and sorted columns
"""
from random import randrange

N=10 # we will generate a NxN array

# initializes a 2-d array sorted on rows and columns
ARRAY = [ [i for i in range(j,N+j)] for j in range(N) ]

KEY = randrange(0,ARRAY[N-1][N-1])

# We will skip the brute force method of O(n^2) 

# Attempt 1 : Use binary search
def binary_search(lst, l, u, key):
    if l > u : return -1;
    m = (l+u)/2
    if lst[m] == key: return m
    if lst[m] < key: return binary_search(lst, m+1,u, key)
    else : return binary_search(lst, l, m-1,key)


def binsearch_method():
    "Use binary search on each row : takes loglinear time: O( n log(n) )"
    for i in range(N):
        idx = binary_search(ARRAY[i],0,len(ARRAY[i])-1,KEY)
        if idx != -1:
            print("found %d at %d,%d" % (KEY, i,idx))
            return

def linear_method():
    "search in linear time"
    currow = 0
    curcol = N-1
    steps = 0
    while curcol > 0 and currow < N:
        steps += 1
        #print(currow, curcol)
        # If the elem in a col is greater than key, we can skip the entire col
        if ARRAY[currow][curcol] > KEY: curcol -= 1 
        # If the first element in a row is > than key or if the last elem is < key, we skip the entire row.
        if ARRAY[currow][0] > KEY or ARRAY[currow][curcol] < KEY: currow += 1
        if ARRAY[currow][curcol] == KEY:
            print("found %d at %d,%d in %d steps" % (KEY, currow,curcol, steps) )
            return

if __name__ == "__main__":
    print("Key is %d" % KEY )
    binsearch_method()
    linear_method()

merging sorted streams

Given k sorted streams where each stream could possibly be infinite in length, describe an efficient algorithm to merge the k streams into a new stream (also in sorted order).

Answer:

Solution I (Divide and Conquer) :


                             list1
                      listA /
                     /  .   \
                    /   .    list2
                   /    .      .
         Merged   /     .      .
         List     \     .      .
                   \
                    \   .    listk-1
                      listC /
                            \
                             listk

This solution is not efficient, Complexity is 2^k-1 - 1

Solution II (Using buckets)

Use technique similar to Bucket sort, maximum complexity = O(N²)

Solution III (Using Heap)

(credit : The Great Khali)

Step1: Take first element of all k streams and build a min heap of them. => O(k)
Step2: Remove the min element (elem at top of heap) from the heap and put in the new stream. => O(log k)
Step3: Put new element in heap from the stream to which the prev elem belongs (which was at heap min). => O(log k)
Step4: continue above steps till we exhaust all the streams. If all streams in combination have n elements then order is O(n log(k))

Find the non duplicate element

There is an array in which except one element, all other elements have duplicates, find that non duplicate element.

Answer:


/**
 * An array in which all elements except one are duplicate.,
 * find that element
 * */
#include "vector"
#include "algorithm"
#include "stdio.h"
using namespace std;

/** SOLUTION I 
 * Sort the array - O(n log(n))
 * Find the duplicate - O(n) (single pass)
 * */
void SortAndFind(vector &vec)
{
    // O(n log(n))
    sort(vec.begin(), vec.end() );
    int i = 0;
    int j = i + 1;
    int n = vec.size();
    // O(n)
    for ( i = 0, j = i+1; j < n ; ++j)
    {
        if ( vec[j] != vec[i] )
        {
            if ( (j-i) == 1 )
            {
                /* non duplicate found at j */
                printf("\n Non Duplicate (%s) %d \n", __func__, vec[i]);
                return;
            }
            else
            {
                i=j;
            }
        }
    }
    // the last element is the duplicate
    printf("\n Non Duplicate (%s) %d \n", __func__, vec[n-1]);
}

/** SOLUTION II , better than first solution.
 * Heapify the array - O(n)
 * Remove one element at a time and find the duplicate - O(log(n))
 * This approach is better because there is no need to sort the entire array
 * */
void HeapifyAndFind(vector &vec)
{
    // Create the heap O(n)
    make_heap(vec.begin(), vec.end() );
    // Find the duplicate, by removing one element at a
    // time.
    //
    int i = 0;
    int n = vec.size();
    for ( int j = i+1; j < n; ++j)
    {
        // vec[j-1] is the min element
        make_heap( vec.begin() + i , vec.end() );
        if ( vec[i] != vec[j] )
        {
            if ( (j-i) == 1 )
            {
                /* non duplicate found at j */
                printf("\n Non Duplicate (%s) %d \n", __func__, vec[i]);
                return;
            }
            else
            {
                i=j;
            }
        }

    }
    // the last element is the duplicate
    printf("\n Non Duplicate (%s) %d \n", __func__, vec[n-1]);

}

int main(int argc, char *argv[])
{
    if ( argc > 1 )
    {
        vector vec1(argc-1);
        vector vec2(argc-1);

        for (int i = 1; i <= argc-1; ++i)
        {
            vec1[i-1] = atoi(argv[i]);
            vec2[i-1] = atoi(argv[i]);
        }

        SortAndFind( vec1 );
        HeapifyAndFind( vec2 );
    }
    return 0;
}

Sort an Array of 0s and 1s

How would you sort an array that consists of only zeros and ones in only one pass?
Answer:


Use Quick Sort :

0 1 1 1 0 0 1 0 0 1
^       ^         ^
up     pivot      down

Quick sort will do this in one pass.

Count Sort will do this in two passes.

Min element in a sorted array

Given an array, which has been sorted in ascending order and rotated, how will you go about
finding the min element in it.

Answer:


Let original array be 10 20 30 40 50 , so on rotation it become 40 50 10 20 30
40 50 10 20 30
 ^  ^
Up Down

up = array.begin();
down = Up+1;

while(down != array.end() )
{
  if(*down < *up ) then *down is the smallest element
  ++up;
  ++down;
}

Merge unsorted linked lists

Write the algorithm to merge two unsorted linked lists.

Answer:^[1]



The bucket sort will give the best performance.

Add both the linked lists to a bucket data structure and then its simple to perform the bucket sort.

The complexity depends on the input values and how well they can be uniformly distributed among the buckets.

K smallest numbers

Given N numbers in an unsorted array, find the K smallest numbers.

Merging two arrays

Given two sorted integer arrays and the larger array has room for the second. Merge the smaller array into the larger one, in O(n) time and O(1) space.

Answer:


l_begin     l_end         free
 |            |            |
[4][6][8][10][22][ ][ ][ ][ ]

sml_begin small_end
 |        |
[1][2][3][20]

void InplaceMerge(int * l_up, int * l_down, int* s_up, int* s_down)
{
    int* free = l_down + (s_down-s_up);
    while(s_end <= s_begin)
    {
        while(*l_end >= *s_end)
        {
            *free = *l_end;
            --free; --l_end;
        }
        while (*s_end >= *l_end)
        {
            *free = *s_end;
            --free; --s_end;
        }
    }
}