Algorithm and Data Structure Theory Study

12/12/2022
17-minute read

In this post, I will summarize some important concepts about algorithms and data structures. The contents in this post can be a bit theoretical and require some memorization.

Complexity

The worst-case complexity of the algorithm is the function defined by the maximum number of steps taken in any instance of size n.
The best-case complexity of the algorithm is the function defined by the minimum number of steps taken in any instance of size n.
The average-case complexity of the algorithm is the function defined by the average number of steps over all instances of size n.

Big O Notation

The big O notation simplifies analysis by ignoring levels of detail that do not impact the comparison of algorithms.

Two Types of Data Structures

Contiguously-allocated structures are composed of single slabs of memory, and include arrays, matrices, heaps and hash tables.
- Advantages: Constant-time access | Space efficiency | Memory locality
Linked data structures are composed of distinct chunks of memory bound together by pointers, and include lists, trees, and graph adjacency lists.
- Advantage: More freedom | Simpler insertions and deletions | Moving pointers is easier and faster

Arrays

Operation	Unsorted Array	Sorted Array
Search(L, k)	O(n)	O(logn)
Insert(L, x)	O(1)	O(n)
Delete(L, x)	O(1)	O(n)
Successor(L, x)	O(n)	O(1)
Predecessor(L, x)	O(n)	O(1)
Minimum(L)	O(n)	O(1)
Maximum(L)	O(n)	O(1)

When deleting an element x from an unsorted array with size n, we can write over array[x] with array[n], and decrement n.

Linked List

Operation	Singly Unsorted List	Doubly Unsorted List	Singly Sorted List	Doubly Sorted List
Search(L, k)	O(n)	O(n)	O(n)	O(n)
Insert(L, x)	O(1)	O(1)	O(n)	O(n)
Delete(L, x)	O(n)	O(1)	O(n)	O(1)
Successor(L, x)	O(n)	O(n)	O(1)	O(1)
Predecessor(L, x)	O(n)	O(n)	O(n)	O(1)
Minimum(L)	O(n)	O(n)	O(1)	O(1)
Maximum(L)	O(n)	O(n)	O(1)	O(1)

For deletion, assume we are given a pointer x to the item to be deleted. However, we actually need a pointer to the element pointing to x in the list. Therefore, the deletion time complexity for doubly linked list is always O(1).
To find the maximum element in a singly sorted list, we can maintain a separate pointer to the list tail. This will not change the cost for deletion.

Implementation of singly linked list

class Node {
    int val;
    Node next;
    Node(int val) {
        this.val = val;
    }
}

class MyLinkedList {
    int size;
    Node dummyHead;

    public MyLinkedList() {
        size = 0;
        dummyHead = new Node(0);
    }
    
    public int get(int index) {
        if (index < 0 || index >= size) return -1;
        Node currentNode = dummyHead;
        for (int i = 0; i <= index; i++) {
            currentNode = currentNode.next;
        }
        return currentNode.val;
    }
    
    public void addAtHead(int val) {
        Node currentNode = dummyHead;
        Node newNode = new Node(val);
        newNode.next = currentNode.next;
        currentNode.next = newNode;
        size++;
    }
    
    public void addAtTail(int val) {
        Node currentNode = dummyHead;
        for (int i = 0; i < size; i++) {
            currentNode = currentNode.next;
        }
        Node newNode = new Node(val);
        currentNode.next = newNode;
        size++;
    }
    
    public void addAtIndex(int index, int val) {
        if (index < 0 || index > size) return;
        Node currentNode = dummyHead;
        for (int i = 0; i < index; i++) {
            currentNode = currentNode.next;
        }
        Node newNode = new Node(val);
        newNode.next = currentNode.next;
        currentNode.next = newNode;
        size++;
    }
    
    public void deleteAtIndex(int index) {
        if (index < 0) index = 0;
        if (index >= size) return;
        Node currentNode = dummyHead;
        for (int i = 0; i < index; i++) {
            currentNode = currentNode.next;
        }
        currentNode.next = currentNode.next.next;
        size--;
    }
}

Implementation of doubly linked list

class Node {
    int val;
    Node prev, next;
    Node(int val) {
        this.val = val;
    }
}

class MyLinkedList {
    int size;
    Node dummyHead;
    Node dummyTail;

    public MyLinkedList() {
        size = 0;
        dummyHead = new Node(0);
        dummyTail = new Node(0);
        dummyHead.next = dummyTail;
        dummyTail.prev = dummyHead;
    }
    
    public int get(int index) {
        if (index < 0 || index >= size) return -1;
        Node currentNode;
        if (index >= size / 2) {
            currentNode = dummyTail;
            for (int i = 0; i < size - index; i++) {
                currentNode = currentNode.prev;
            } 
        } else {
            currentNode = dummyHead;
            for (int i = 0; i <= index; i++) {
                currentNode = currentNode.next;
            }
        }
        return currentNode.val;
    }
    
    public void addAtHead(int val) {
        Node currentNode = dummyHead;
        Node newNode = new Node(val);
        newNode.next = currentNode.next;
        newNode.next.prev = newNode;
        currentNode.next = newNode;
        newNode.prev = currentNode;
        size++;
    }
    
    public void addAtTail(int val) {
        Node currentNode = dummyTail;
        Node newNode = new Node(val);
        newNode.prev = currentNode.prev;
        newNode.prev.next = newNode;
        currentNode.prev = newNode;
        newNode.next = currentNode;
        size++;
    }
    
    public void addAtIndex(int index, int val) {
        if (index < 0 || index > size) return;
        Node currentNode;
        if (index >= size / 2) {
            currentNode = dummyTail;
            for (int i = 0; i < size - index + 1; i++) {
                currentNode = currentNode.prev;
            }
        } else {
            currentNode = dummyHead;
            for (int i = 0; i < index; i++) {
                currentNode = currentNode.next;
            }
        }
        Node newNode = new Node(val);
        newNode.next = currentNode.next;
        newNode.next.prev = newNode;
        currentNode.next = newNode;
        newNode.prev = currentNode;
        size++;
    }
    
    public void deleteAtIndex(int index) {
        if (index < 0 || index >= size) return;
        Node currentNode;
        if (index >= size / 2) {
            currentNode = dummyTail;
            for (int i = 0; i < size - index; i++) {
                currentNode = currentNode.prev;
            }
        } else {
            currentNode = dummyHead;
            for (int i = 0; i <= index; i++) {
                currentNode = currentNode.next;
            }
        }
        currentNode.next.prev = currentNode.prev;
        currentNode.prev.next = currentNode.next;
        size--;
    }
}

Implementation of Binary Search Tree

class Node {
    int val;
    Node left, right;

    public Node(int val) {
        this.val = val;
        left = right = null;
    }
}

class BST {
    Node root;

    // Insert elements
    public Node insert(Node current, int val) {
        if (current == null) return new Node(val);
        if (val < current.val) {
            current.left = insert(current.left, val);
        } else if (val > current.val) {
            current.right = insert(current.right, val);
        }
        return current;
    }

    // Insert Helper Function
    public void insertHelper(int val) {
        root = insert(root, val);
    }

    // Search Function
    public boolean search(Node current, int val) {
        if (current == null) return false;
        if (current.val == val) return true;
        if (val < current.val) {
            return search(current.left, val);
        } else {
            return search(current.right, val);
        }
    }

    // Search Helper Function
    public void searchHelper(int val) {
        System.out.println(search(root, val));
    }

    // FindMax function
    public Node findMax(Node current) {
        while (current.right != null) {
            current = current.right;
        }
        return current;
    }

    // Delete Function
    public Node delete(Node current, int val) {
        if (current == null) return null;
        if (val < current.val) {
            current.left = delete(current.left, val);
        } else if (val > current.val) {
            current.right = delete(current.right, val);
        } else {
            if (current.left == null && current.right == null) {
                current = null;
            } else if (current.left == null) {
                current = current.right;
            } else if (current.right == null) {
                current = current.left;
            } else {
                Node temp = findMax(current.left);
                current.val = temp.val;
                current.left = delete(current.left, temp.val);
            }
        }
        return current;
    }

    // Delete Helper Function
    public void deleteHelper(int val) {
        root = delete(root, val);
    }

    // InOrder Traversal
    public void InOrder(Node current) {
        if (current == null) return;
        InOrder(current.left);
        System.out.print(current.val + " ");
        InOrder(current.right);
    }
}

Hashing Tables

Closed addressing hashing (Open Hashing) handles collision by storing all elements with the same hashed key in one table entry.
Open addressing hashing (Closed Hashing) handles collision by storing subsequent elements with the same hashed key in different table entries.

Operation	Hash Table (expected)	Hash Table (worst case)
Search(L, k)	O(n/m)	O(n)
Insert(L, x)	O(1)	O(1)
Delete(L, x)	O(1)	O(1)
Successor(L, x)	O(n+m)	O(n+m)
Predecessor(L, x)	O(n+m)	O(n+m)
Minimum(L)	O(n+m)	O(n+m)
Maximum(L)	O(n+m)	O(n+m)

Using chaining with doubly-linked lists to resolve collisions in an m-elements hash table with n items

Heap

Heap is a simple and elegant data structure for efficient supporting the priority queue operations insert and extract-min/max. They work by maintaining a partial order on the set of elements which is weaker than the sorted order.
A heap-labeled tree is defined to be a binary tree such that the key labeling of each node dominates the key labeling of each of its children.
In a min-heap, a node dominates its children by containing a smaller key than they do.
In a max-heap, parent nodes dominate by begin bigger.
Heaps are mostly used to implement Priority Queues.

Heap Implementation

import java.util.Arrays;

public class MinHeap {
    int size;
    int maxSize;
    int[] heap;

    MinHeap(int maxSize) {
        if (maxSize <= 0) return;
        this.maxSize = maxSize;
        this.size = 0;
        this.heap = new int[maxSize + 1];
        Arrays.fill(heap, Integer.MAX_VALUE);
        this.heap[0] = Integer.MIN_VALUE;
    }

    MinHeap(int maxSize,int[] arr) {
        if (arr.length > maxSize) return;
        this.maxSize = maxSize;
        this.size = arr.length;
        this.heap = new int[maxSize + 1];
        Arrays.fill(heap, Integer.MAX_VALUE);
        this.heap[0] = Integer.MIN_VALUE;
        for (int i = 0; i < arr.length; i++) {
            this.heap[i + 1] = arr[i];
        }
        buildHeap();
    }

    int parent(int pos) {
        return pos / 2;
    }

    int leftChild(int pos) {
        return pos * 2;
    }

    int rightChild(int pos) {
        return pos * 2 + 1;
    }

    boolean isLeaf(int pos) {
        return pos > size / 2 && pos <= size;
    }

    void swap(int pos1, int pos2) {
        int temp = heap[pos1];
        heap[pos1] = heap[pos2];
        heap[pos2] = temp;
    }

    void minHeapify(int pos) {
        if (!isLeaf(pos)) {
            if (heap[pos] > heap[leftChild(pos)] || heap[pos] > heap[rightChild(pos)]) {
                int smallerElementIndex = heap[leftChild(pos)] < heap[rightChild(pos)] ? leftChild(pos) : rightChild(pos);
                swap(pos, smallerElementIndex);
                minHeapify(smallerElementIndex);
            }
        }
    }

    void insert(int element) {
        if (size >= maxSize) return;
        heap[++size] = element;
        int current = size;
        while (heap[parent(current)] > heap[current]) {
            swap(parent(current), current);
            current = parent(current);
        }
    }

    void buildHeap() {
        for (int i = size / 2; i >= 1; i--) {
            minHeapify(i);
        }
    }

    void removeMin() {
        if (size == 0) return;
        heap[1] = heap[size];
        heap[size--] = Integer.MAX_VALUE;
        minHeapify(1);
    }

    // Function to print the contents of the heap
    void display() {
        if (size == 0) {
            System.out.println("Empty Heap!");
            return;
        }
        System.out.println("PARENT" + "\t" + "LEFT" + "\t" + "RIGHT");
        if (size == 1) {
            System.out.println(" " + heap[1] + "\t\t");
            return;
        }
        for (int i = 1; i <= size / 2; i++) {
            System.out.print(" " + heap[i] + "\t\t" + heap[2 * i]
                    + "\t\t" + heap[2 * i + 1]);
            System.out.println();
        }
    }
}

When passing parameters into the function of ‘swap(int pos1, int pos2)’, we must be aware that the function handles indices instead of elements.

Runtime of Heap Operations

Operation	Runtime
Find the minimum element	O(1)
Delete minimum element	O(logn)
Insert an element	O(logn)
Build a heap	O(n)

Heap Sort

Build the heap for n elements in time O(n)
Each step picks and deletes the minimum element in time O(logn)
Iterate until the heap is empty.
In total n iterations implies a total runtime of O(nlogn)

Selection Sort

private static void selection(int[] array) {
    for (int i = 0; i < array.length - 1; i++) {
        int min = i;
        for (int j = i + 1; j < array.length; j++) {
            if (array[j] < array[min]) min = j;
        }
        int temp = array[i];
        array[i] = array[min];
        array[min] = temp;
    }
}

Insertion Sort

private static void InsertionSort(int[] array) {
    for (int i = 1; i < array.length; i++) {
        int j = i;
        while (j > 0 && array[j - 1] > array[j]) {
            int temp = array[j];
            array[j] = array[j - 1];
            array[j - 1] = temp;
            j--;
        }
    }
}

Merge Sort

Merge sort involves partitioning the elements into two groups, sorting each of the smaller problems recursively, and then interleaving the two sorted lists to totally order the elements. It is a classic divide-and-conquer algorithm but its primary disadvantage is the need for an auxiliary array when sorting arrays.

import java.util.ArrayList;
import java.util.Arrays;

public class MergeSort {
    public ArrayList<Integer> mergeSort(ArrayList<Integer> myArr) {
        if (myArr.size() == 1) return myArr;
        int mid = myArr.size() / 2;
        ArrayList<Integer> left = new ArrayList<Integer>();
        ArrayList<Integer> right = new ArrayList<Integer>();
        for (int i = 0; i < mid; i++) {
            left.add(myArr.get(i));
        }
        for (int i = mid; i < myArr.size(); i++) {
            right.add(myArr.get(i));
        }
        ArrayList<Integer> lSplit = mergeSort(left);
        ArrayList<Integer> rSplit = mergeSort(right);
        return merge(lSplit, rSplit);
    }

    public ArrayList<Integer> merge(ArrayList<Integer> left, ArrayList<Integer> right) {
        int i = 0;
        int j = 0;
        ArrayList<Integer> outcome = new ArrayList<Integer>();
        while (i < left.size() && j < right.size()) {
            if (left.get(i) < right.get(j)) {
                outcome.add(left.get(i++));
            } else {
                outcome.add(right.get(j++));
            }
        }
        while (i < left.size()) {
            outcome.add(left.get(i++));
        }
        while (j < right.size()) {
            outcome.add(right.get(j++));
        }
        return outcome;
    }
}

Quick Sort

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;

public class QuickSort {
    int partition(ArrayList<Integer> arr, int start, int end) {
        int pivot = arr.get(end);
        int index = start;
        for (int i = start; i < end; i++) {
            if (arr.get(i) < pivot) {
                Collections.swap(arr, i, index);
                index++;
            }
        }
        Collections.swap(arr, index, end);
        return index;
    }

    void quickSort(ArrayList<Integer> arr, int start, int end) {
        if (start < end) {
            int partition = partition(arr, start, end);
            quickSort(arr, start, partition - 1);
            quickSort(arr, partition + 1, end);
        }
    }
}

Topological Sort

public class TopologicalSort {
    HashSet<Integer> visited;
    LinkedList<Integer>[] adjacencyList;
    LinkedList<Integer> myStack;
    int nodeNum;

    public TopologicalSort(int nodeNum) {
        visited = new HashSet<>();
        adjacencyList = new LinkedList[nodeNum];
        for (int i = 0; i < nodeNum; i++) {
            adjacencyList[i] = new LinkedList<>();
        }
        myStack = new LinkedList<>();
        this.nodeNum = nodeNum;
    }

    public void sort() {
        for (int i = 0; i < nodeNum; i++) {
            if (!visited.contains(i)) {
                sortUntil(i);
            }
        }
    }

    public void sortUntil(int node) {
        visited.add(node);
        for (int n : adjacencyList[node]) {
            if (!visited.contains(n)) {
                sortUntil(n);
            }
        }
        myStack.push(node);
    }

    public void addEdge(int parent, int child) {
        adjacencyList[parent].add(child);
    }
}

Backtracking

Backtracking is a systematic way to iterate through all the possible configurations of a search space. These configuration may represent all possible arrangements of objects. Backtracking ensures correctness by enumerating all possibilities. It ensures efficiency by never visiting a state more than once.

Dynamic Programming

Dynamic programming gives us a way to design custom algorithms that systematically search all possibilities while storing results to avoid recomputimg. By storing the consequences of all possible decisions and using this information in a systematic way, the total amount of work is minimized. Essentially, dynamic programming is a trade off of space for time.

Trie

Trie is a tree data structure used for locating specific keys from a set. It is also known as prefix tree or digital tree.
Trie can be used to build Auto-complete and Spell-checker.
It allows efficient storage of words with common prefixes. Each node in a trie represents a character in the string and also indicates the termination of the string.
There are three main functions: insert(word), search(word), startsWith(word).
Main advantage of trie: startsWith() is very efficient.
The time complexity of insert, search and startWith are all O(k), where k is the length of the string.

Implementation

class Node {
    HashMap<Character, Node> children;
    boolean endOfWord;
    Node() {
        this.children = new HashMap<Character, Node>();
        this.endOfWord = false;
    }
}

class Trie {
    Node root;

    public Trie() {
        root = new Node();
    }

    public void insert(String word) {
        Node cur = root;
        for (int i = 0; i < word.length(); i++) {
            char c = word.charAt(i);
            if (!cur.children.containsKey(c)) {
                cur.children.put(c, new Node());
            }
            cur = cur.children.get(c);
        }
        cur.endOfWord = true;
    }

    public boolean search(String word) {
        Node cur = root;
        for (int i = 0; i < word.length(); i++) {
            char c = word.charAt(i);
            if (!cur.children.containsKey(c)) return false;
            cur = cur.children.get(c);
        }
        return cur.endOfWord;
    }

    public boolean startsWith(String prefix) {
        Node cur = root;
        for (int i = 0; i < prefix.length(); i++) {
            char c = prefix.charAt(i);
            if (!cur.children.containsKey(c)) return false;
            cur = cur.children.get(c);
        }
        return true;
    }
}

Amortized Analysis

Amortized analysis is used for algorithms where an occasional operation is very slow, but most of the other operations are faster. In amortized analysis, we analyze a sequence of operations and guarantee a worst-case average time that is lower than the worst case of a particular expensive operation. (GeeksforGeeks)
A good example is to analyze the dynamic array. For dynamic array, we need to double the size of the array whenever it becomes full. If the array becomes full, we need to allocate memory for larger array size, typically twice the old array, copy the contents of the old array to a new array, and free the old array. All of the steps above can take O(n). If we use simple analysis, we can draw the conclusion that the worst-case cost of n insertions is n * O(n) which is O(n^2). This analysis gives an upper bound, but not a tight upper bound for n insertions as all insertions don’t take Θ(n) time.
However, by using amortized analysis, we list the cost of inserting each element and divide the sum of the insertion cost by the total number of insertions, we can get the amortized cost and find that the time complexity of dynamic array insertion is actually O(1).

Bit Manipulation

Shifting
- For positive numbers:
  - Left Shift: Multiplying a number by 2.
  - Right Shift: Divide a number by 2 (Round Down: 3 -> 1)
- For negative numbers:
  - Logical Right Shift: Adding 0 in front (Get a meaningless number).
  - Arithmatic Right Shift: Adding its original sign bit in front (Round Down: -5 -> -3).
Masking
- Get c-th bit: ((1 « c) & x) != 0
- Set c-th bit to be 1: (1 « c) | x
- Set c-th bit to be 0: (~(1 « c)) & x

Singleton Design Pattern

Singleton design pattern is used when we need to ensure that only one object of a particular class need to be created. All further reference to this object are referred to the same underlying instance created.
Singleton classes are used for logging, driver objects, caching and thread pool, database connections.
Issues: Coupling issue / Concurrency issue

Factory Design Pattern

Factory design pattern deals with the problem of creating objects without having to specify the exact class of the object that will be created. This is done by creating objects by calling a factory method.
We use it when a method returns one of several possible classes that share a common parent class.
All potential classes are in the same subclass hierarchy.

Union Find

A disjoint-set data structure is defined as a data structure that keeps track of set of elements partitioned into a number of disjoint subsets.
A union-find algorithm is an algorithm that performs two useful operations on such a data structure:
- Find: Determine which subset a particular element is in. This can also be used for determining if two elements are in the same subset.
- Union: Join two subsets into a single subset.
For naïve linking, a Union or Find operation can take O(n) time in the worst case, where n is the number of elements.
- Proof: Find takes proportional time to the height of the tree. In the worst case, the tree can be degenerated to a list. Union(1, 2), Union(2, 3), Union(3, 4)…..

QuickSelect

Quick Select can be used to find the k-th largest element in an array with time complexity of O(n).
The average time complexity is O(n)
- Assume we keep on executing quick select on half of the array: n + (n / 2) + (n / 4) + … = 2n
The worst case time complexity is O(n^2)
- Assume each quick select we can only eliminate one element, therefore, we need to run the algorithm n times

public int findKthLargest(int[] nums, int k) {
    return quickSelect(nums, 0, nums.length - 1, nums.length - k);
}

public void swap(int[] nums, int left, int right) {
    int temp = nums[left];
        nums[left] = nums[right];
        nums[right] = temp;
}

public int quickSelect(int[] nums, int left, int right, int k) {
    int ptr = left;
    int pivot = nums[right];
    for (int i = left; i < right; i++) {
        if (nums[i] < pivot) {
            swap(nums, i, ptr);
            ptr++;
        }
    }
    swap(nums, ptr, right);
    if (k < ptr) return quickSelect(nums, left, ptr - 1, k);
    if (k > ptr) return quickSelect(nums, ptr + 1, right, k);
    return nums[k];
}

Balanced Binary Search Tree

AVL Tree (Adelson-Velsky and Landis Tree)
- It is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one. If at any time they differ by more than one, re-balancing is done to restore this property.
- Search / Delete / Insert: O(logn)
Red Black Tree
- A node is either red or black.
- The root and leaves (NIL) are black.
- All paths from a node to its NIL descendants contain the same number of black nodes.
- The longest path is no more than twice the length of the shortest path.
- Search / Delete / Insert: O(logn)
Red Black Trees provide faster insertion and removal operations than AVL trees as fewer rotations are done due to relatively relaxed balancing.

Reference

GeeksforGeeks 2022, Introduction to Amortized Analysis, GeeksforGeeks, viewed 11 November 2022, https://www.geeksforgeeks.org/introduction-to-amortized-analysis/.
Skiena, S 2012, The Algorithm Design Manual, 2nd edn, Springer Publishing, New York, USA.
What Is A Heap Data Structure In Java 2022, What Is A Heap Data Structure In Java, SoftwareTestingHelp, viewed 27 November 2022, https://www.softwaretestinghelp.com/heap-data-structure-in-java/.

Algorithm