Sorting
- Total time required to sort element sequence proportional to plus # of inversions → nearly sorted inputs can be sorted in nearly linear time
- Variation of insertion sort → use insertion sort repeatedly on evenly spaced subsequences of elements, decreasing spacing each time until ending up w/ ordinary insertion sort (diminishing increment sort)
- Starting by sorting widely spaced elements → can reduce # of inversions much more quickly than ordinary insertion sort (each subsequent insertion sorts go faster)
- Sorting w/ increment of = sort items → → , up to subsequence starting w/ item
- Any sequence that ends w/ increment of 1 will produce results
- Certain sequences better than others
- Sequences where results in overall running time of
- Maximum partition size limits # of inversions in sequences → limits execution time of insertion sort
- # of inversions in sequences partitioned by quicksort equal to sum of # of inversions in individual partitions
- Once max partition size reduced to constant → total # of inversions linear in size of entire sequence
- Insertion sort has better constant factors than quicksort → common modification to quicksort = apply insertion sort once max partition size reduced below predetermined threshold
- Each partitioning operation requires linear time in size of subsequence being partitioned
- In worst case, maximum partition size reduced by one on each step → runtime
- Various strategies for choosing pivots, including taking small random samples of subsequence
- Select successively smaller/larger items from list & add to output sequence
- Looks at all unprocessed data on each pass → performs no better for nearly sorted inputs than random or reverse-sorted inputs (unlike insertion sort)
- Possible optimization = remember order of other items on each pass through (turns into heapsort?)
- Complete binary tree: Each level of tree completely full, w/ possible exception of lowest level → nodes on lowest level located as far to left of level as possible
- Many variations, all of which follow divide-and-conquer scheme
- Divide data into subsequences → recursively sort subsequences → merge sorted subsequences into increasingly large sorted sequences
- For merging large numbers of sequences → use variety of priority queue (or heap) to keep track of which sequence currently has smallest item
- B/c deals w/ data sequentially & requires only enough memory to buffer input & output (finite amount) → merge sorting ideally suited to external sorting (amount of data to be sorted exceeds capacity of primary memory by arbitrarily large amounts)
- Size of sorted data limited only by size of secondary memory
- Sort stream of data?
- B/c of divide & conquer character → merge sorting eminently parallelizable
- Comparison based sorts used binary comparisons to control algorithms
- Theoretical lower bound on # of comparisons (& thus overall operations) needed to sort items →
- By exploiting more properties of data being sorted than ordering relation, can arrive at different bounds
- Radix sorting: Uses digits/bytes constituting data to make multi-way decisions → able to sort bytes of data in time
- Comparing to comparisons to sort (multi-byte) records tricky
- If assume in worst case, comparisons take time proportional to # of bytes of data being compared → radix sorting should win
- For practical purposes, must consider constant factors
- Idea behind radix sorting = view keys to be sorted sequences of numerals consisting of digits in finite radix ()
- ASCII character strings = base-256 numerals
- Pad all keys to same # of digits (on left/right, depending on sorting order)
- Right for strings, left for #s
- If common length of keys is now digits, sort keys (up to) times (once on first digit, once on second, etc.)
- Precise logistics differ depending on whether processing digits left to right (MSD radix sort) or right to left (LSD radix sort)
- MSD radix sort → each sort rearranges dat into sections that can be sorted independently, sections already ordered correctly w/ respect to each other
- Further refine sections having more than one key by later character positions until all sections contain just one item
- LSD radix sort → sort all data together on each pass, carefully maintaining previously established order for keys w/ same digit in position currently being sorted
- Can accomplish each one-character sorts in time using stable counting sort
- First count # of keys containing each possible digit (0 to ) at current character position () → easy to compute # of keys whose is , & thus # of keys left of those whose is → allows another pass through keys, moving to proper positions