INVENTORS: Cache optimized Bucket sort. High speed bucket sort for floating point and integer data.

Experience is wasted if history does not repeat itself.

Sometime 2009, I was working on a sorting algorithm during my Student Industrial Training days and I got it working to a certain extent, but had to leave it for a while.
It had to do with something that occurred to me then, which I called sorted clusters.

Later, some online research showed that much work had been done in the field, and it belonged to a class of sorting algorithms called distribution sorts.

Earlier this year, I was on a software development team and I happened to be the lead developer and we had to work on an algorithm and we agreed on sorting algorithms and then I remembered the work I did earlier, and I decided to re-work it and see what could come out of it

That is where COBSort came from.

It employs 3 stage cache-optimization and a formula that suggests sort indices for elements of an unsorted array to produce a high speed distribution sort.

There are 2 flavors of this algorithm.

1. Speed Optimized COBsort

2. Memory Optimized COBsort.

Only the first is discussed here. The difference between the two is in the extra memory needed to hold data while the sorting is been carried out.
The first needs 2 extra arrays to do this, whereas the second only needs one, making it one of the few distribution sorts to achieve this. It does this at a certain cost, however: its speed.

Its speed is usually a bit behind that of the first algorithm. A great benefit of using only 1 extra array is that the second algorithm can handle larger input arrays than the former on limited memory systems.
Both handle floating point data arrays and integer data arrays. Their speeds rival those of the quicksorts, mergesorts, shellsorts and the heapsorts of this world and outperform some of these algorithms in some cases.

Java implementations of both algorithms are ready but here I give a generic algorithm for the Speed Optimized COBSort. I will post more later on.

The time complexity of these algorithms are similar to those of the bucket sort algorithm.

Best case = Average case = O(k*n) Worst case =O(n^2) Here is the algorithm for the Speed Optimized COBSort:


Pseudocode

CobSort(inputArray,bucketSizesArray,bucketLocationsArray) 

 1. n←length[A] 

2. MAX_INSERTION_SORT←25 

3. IF n > 1000000 

4. For i = 1 to 3 do 

5. partition(array)//cache optimization
//END LOOP 

6. For j = 0 to n-1 do 

7. ++bucketSizesArray[guessIndex(inputArray[j])]
///END LOOP 

8. x←0, bucketSizesAccumulator←0 

9. For k = 0 to n-1 do 

10.  IF (x←bucketSizesArray[k]) > 0 

11.  bucketLocationsArray[k] = bucketSizesAccumulator
    bucketSizesAccumulator ← bucketSizesAccumulator + x
//END LOOP 

12. For i = 0 to n-1 do 

13.    bucketSizesArray[bucketLocationsArray[guessIndex(inputArray[i])]++] = inputArray[i]
//END LOOP 

14. For i = 1 to n-1 do 

15.    inputArray[i] = bucketSizesArray[i]; 

16.   For index = 0 to n-1 do 

17.   start←index 

18.   suggestedIndex←guessIndex(inputArray[j]) 

19.   While index < n do 

20.   IF suggestedIndex EQUALS guessIndex(index) 

21.   index++ 

22.   ELSE 

23.    break
//END While loop 

24.  IF index GREATER THAN OR EQUALS n 

25.   index = right 

26.   bucketSize←index - start + 1 

27.  IF bucketSize EQUALS 1 && index EQUALS n-1 

28.  break 

29. ELSE IF bucketSize GREATER THAN OR.           EQUALS 2 AND bucketSize LESS THAN OR EQUALS MAX_INSERTION_SORT 

30. insertionSort(inputArray,start,index) 

31. ELSE bucketSort(inputArray,start,index)
//END FOR LOOP 

32. END CobSort




Thanks. Comments are welcome.

INVENTORS

Tuesday, July 21, 2015

Cache optimized Bucket sort. High speed bucket sort for floating point and integer data.

No comments:

LinkedList implementation in C with java.util.ArrayList like sublist capability

Followers

My Blog List