Sorting

Insertion Sorting

  • Total time required to sort element sequence proportional to plus # of inversions → nearly sorted inputs can be sorted in nearly linear time

Shellsort

  • Variation of insertion sort → use insertion sort repeatedly on evenly spaced subsequences of elements, decreasing spacing each time until ending up w/ ordinary insertion sort (diminishing increment sort)
  • Starting by sorting widely spaced elements → can reduce # of inversions much more quickly than ordinary insertion sort (each subsequent insertion sorts go faster)
  • Sorting w/ increment of = sort items , up to subsequence starting w/ item
  • Any sequence that ends w/ increment of 1 will produce results
  • Certain sequences better than others
    • Sequences where results in overall running time of

Quicksort

  • Maximum partition size limits # of inversions in sequences → limits execution time of insertion sort
  • # of inversions in sequences partitioned by quicksort equal to sum of # of inversions in individual partitions
  • Once max partition size reduced to constant → total # of inversions linear in size of entire sequence
  • Insertion sort has better constant factors than quicksort → common modification to quicksort = apply insertion sort once max partition size reduced below predetermined threshold
  • Each partitioning operation requires linear time in size of subsequence being partitioned
  • In worst case, maximum partition size reduced by one on each step → runtime
  • Various strategies for choosing pivots, including taking small random samples of subsequence

Straight Selection Sorting

  • Select successively smaller/larger items from list & add to output sequence
  • Looks at all unprocessed data on each pass → performs no better for nearly sorted inputs than random or reverse-sorted inputs (unlike insertion sort)
  • Possible optimization = remember order of other items on each pass through (turns into heapsort?)

Heapsort

  • Complete binary tree: Each level of tree completely full, w/ possible exception of lowest level → nodes on lowest level located as far to left of level as possible

Merge Sorting

  • Many variations, all of which follow divide-and-conquer scheme
    • Divide data into subsequences → recursively sort subsequences → merge sorted subsequences into increasingly large sorted sequences
  • For merging large numbers of sequences → use variety of priority queue (or heap) to keep track of which sequence currently has smallest item
  • B/c deals w/ data sequentially & requires only enough memory to buffer input & output (finite amount) → merge sorting ideally suited to external sorting (amount of data to be sorted exceeds capacity of primary memory by arbitrarily large amounts)
    • Size of sorted data limited only by size of secondary memory
    • Sort stream of data?
  • B/c of divide & conquer character → merge sorting eminently parallelizable

Beyond Comparisons: Radix Sort & Counting Sort

  • Comparison based sorts used binary comparisons to control algorithms
    • Theoretical lower bound on # of comparisons (& thus overall operations) needed to sort items →
  • By exploiting more properties of data being sorted than ordering relation, can arrive at different bounds
  • Radix sorting: Uses digits/bytes constituting data to make multi-way decisions → able to sort bytes of data in time
    • Comparing to comparisons to sort (multi-byte) records tricky
      • If assume in worst case, comparisons take time proportional to # of bytes of data being compared → radix sorting should win
      • For practical purposes, must consider constant factors
  • Idea behind radix sorting = view keys to be sorted sequences of numerals consisting of digits in finite radix ()
    • ASCII character strings = base-256 numerals
    • Pad all keys to same # of digits (on left/right, depending on sorting order)
      • Right for strings, left for #s
    • If common length of keys is now digits, sort keys (up to) times (once on first digit, once on second, etc.)
    • Precise logistics differ depending on whether processing digits left to right (MSD radix sort) or right to left (LSD radix sort)
  • MSD radix sort → each sort rearranges dat into sections that can be sorted independently, sections already ordered correctly w/ respect to each other
    • Further refine sections having more than one key by later character positions until all sections contain just one item
  • LSD radix sort → sort all data together on each pass, carefully maintaining previously established order for keys w/ same digit in position currently being sorted
  • Can accomplish each one-character sorts in time using stable counting sort
    • First count # of keys containing each possible digit (0 to ) at current character position () → easy to compute # of keys whose is , & thus # of keys left of those whose is → allows another pass through keys, moving to proper positions

results matching ""

    No results matching ""