Notebook - Sim Joseph Fernandez

Introduction

An algorithm is a set of instructions used to solve a problem. It is important to note that a problem can be solved with different algorithms. This begs the question which algorithm to use then? And the answer to that is vaguely the efficiency of the algorithm. In computing, the efficiency is often evaluated by the processing time and memory usage of an algorithm.

In this article, we will be comparing the time complexities of different sorting algorithms in python. There are a number of sorting algorithms developed, some of which are simply variations or optimizations of existing ones. That being said, only a selected few and most commonly discussed of these algorithms will be discussed. This includes the slow O(n²) insertion sort and bubble sort; the fast O(n log(n)) merge sort and quick sort; and relatively faster O(n log(n)) tim sort. As a cherry-on-top, the python built-in sort is also included.

INSCRIPTION

Sample Data Generation and Function Timer

from random import randint
import time

data = [randint(-1000, 1000) for _ in range(10000)]

def function_timer(fxn, *args):
    print(fxn.__name__)
    start = time.time()
    fxn(*args)
    end = time.time()
    print(f"processing time: {end - start} seconds\n")

The efficiency of an algorithm is oftentimes slightly influenced by the data itself. Thus, there is a need to randomize our values to avoid bias hence our importation of random module. If you wish to clone the above code, feel free to mess around the data values.

Of course, since we're experimenting with time complexities, we'd be needing some sort of timer to time different algorithms. We utilized the time module in the creation of our function_timer. Basically what this function_timer does is as follows:

it takes our sorting algorithm function fxn and all of its arguments *args;
prints the function name fxn.__name__;
starts the timer;
executes the sorting algorithm;
ends the timer;
displays the processing time of the algorithm.

Insertion Sort and Bubble Sort

def insertion_sort(nums):
    for i in range(1, len(nums)):
        j = i
        while nums[j] < nums[j-1] and j > 0:
            nums[j], nums[j-1] = nums[j-1], nums[j]
            j -= 1

def bubble_sort(nums):
    for i in range(len(nums)):
        for j in range(len(nums) - 1 - i):
            if nums[j] > nums[j+1]:
                nums[j], nums[j+1] = nums[j+1], nums[j]

Insertion and bubble sort are among the most direct approach in sorting data. This is what humans usually adapt when dealing with fairly small dataset such as arranging playing cards and whatnots. There's also the selection sort which is, just for the sake of this experiment, excluded. However, as was said, these algorithms are slow with theoretical time complexities of O(n²).

Merge Sort, Quick Sort, and Tim Sort

def merge(nums, left, mid, right):
    temp1 = nums[left:mid]
    temp2 = nums[mid:right]

    i = j = 0
    k = left
    while i < len(temp1) and j < len(temp2):
        if temp1[i] < temp2[j]:
            nums[k] = temp1[i]
            i += 1
        else:
            nums[k] = temp2[j]
            j += 1
        k += 1

    while i < len(temp1):
        nums[k] = temp1[i]
        i += 1
        k += 1

    while j < len(temp2):
        nums[k] = temp2[j]
        j += 1
        k += 1

def merge_sort(nums, left, right):
    if right - left <= 1:
        return
    mid = (right + left) // 2
    merge_sort(nums, left, mid)
    merge_sort(nums, mid, right)
    merge(nums, left, mid, right)

def quick_sort(nums, left, right):
    if left >= right:
        return
    pivot = right
    i = left
    for j in range(left, right):
        if nums[j] < nums[pivot]:
            nums[i], nums[j] = nums[j], nums[i]
            i += 1
    nums[pivot], nums[i] = nums[i], nums[pivot]
    quick_sort(nums, left, i-1)
    quick_sort(nums, i+1, right)

def tim_sort(nums):
    size = 64
    if len(nums) <= size:
        insertion_sort(nums)
        return
    part = nums[:size]
    insertion_sort(part)
    nums[:size] = part
    tim_sort(nums[size:])
    merge(nums, 0, size, len(nums))

The sorting algorithms for merge sort, quick sort and tim sort are defined by the functions merge_sort, quick_sort, and tim_sort, respectively. The merge function is just a helper function used by merge sort and tim sort. These algorithms may be too sophisticated for humans alone over a small dataset, however these prove to be significantly faster in theory over large dataset with the use of computers.

Python sort

def python_sort(nums):
    nums.sort()

In python, we can sort a dataset with sorted() function and .sort() method. The latter is used just for the sake of this experiment.

Testings

sample = [_ for _ in data]
function_timer(insertion_sort, sample)

sample = [_ for _ in data]
function_timer(bubble_sort, sample)

sample = [_ for _ in data]
function_timer(merge_sort, sample, 0, len(sample))

sample = [_ for _ in data]
function_timer(quick_sort, sample, 0, len(sample)-1)

sample = [_ for _ in data]
function_timer(tim_sort, sample)

sample = [_ for _ in data]
function_timer(python_sort, sample)

To objectively compare the time efficiency of our sorting algorithms, there is a need to implement them on a constant variable which should be a constant data. Thus to avoid sorting an already sorted dataset, the sorting algorithm is applied on a temporary variable sample which repetitively copies our constant variable data.

Result

insertion_sort
processing time: 17.423025131225586 seconds

bubble_sort
processing time: 18.802334308624268 seconds

merge_sort
processing time: 0.08085751533508301 seconds

quick_sort
processing time: 0.05469846725463867 seconds

tim_sort
processing time: 0.38039231300354004 seconds

python_sort
processing time: 0.0 seconds

Conclusion

The results we got were more or less as expected as in theory.

The insertion sort is a bit faster than bubble sort, but may be otherwise with a different dataset. Moreover, this difference in time is insignificant if we are to dramatically increase the size of our dataset.

The merge sort, quick sort and tim sort are significantly faster than insertion sort and bubble sort as predicted. If anything, the tim sort didn't behave as expected.

Tim sort is a combination of merge sort and insertion sort with the idea that merge sort operates on large dataset and breaks them to smaller dataset for which the insertion sort can efficiently operate. Yet, tim sort ran slower than merge sort and quick sort. Why is that? My scientific guesses to where to attribute this deviation from expectation are (1) the dataset may not be large enough for tim sort in the sense that the merge sort - insertion sort combination isn't fully utilized; (2) the tim sort size variable should be considered more intensely; or (3) the tim sort function itself needs recoding.

Lastly, we have the insanely fast python sort. Why is it fast then? Does it ran on better algorithm? I was informed that python sort was ran with tim sort algorithm, but I haven't had time to confirm it. But perhaps what made this significantly faster is the fact that python sort is coded in a low-level language.

If we are to learn from this experiment, it is that python built-in functions or methods are reliable and efficient. Instead of coding from scratch, try to explore and use for appropriate built-in functions.

P.S.: Should there be any corrections, feel free to message me. It shall be corrected and referenced accordingly.

Time Complexities of Selected Sorting Algorithms in Python

/Notebook/Experiments