Kawin's CSC148 Musings

Week 12: Constant-Time Access

Most of this week was review, but there was still some new content to be learned: constant-time access. Now surprisingly, Python lists have constant-time access when the index is known. This is a huge revelation, because access time then does not depend on the size of the list.

It would make sense then, for every other data type, such as dictionaries, to work as a list. The biggest advantage to dictionaries - which is also a disadvantage of lists - is the presence of immutable keys that don't necessarily have to be integers, making access more 'meaningful'. In a list of my courses for instance, if I wanted to access my favourite course, I would want to use the term 'favourite' to access that course instead of memorizing an index value associated with it.

Hash tables are useful for this reason. They turn objects such as strings into integers that can be used as indices for a list. This also explains to long-standing mystery of why we cant' use immutable objects as keys in dictionaries. When we create a key, it is hashed into an integer, which is a concrete value. If we were to change the key so that it resembled another key instead, then we would have 2 of the same object with different hash values.

Week 10/11: Sorting and Efficiency

To date, I had only been familiar with three types of sorting: bubble sort, selection sort, and insertion sort, which all had performance n^2. That is to say ignoring the speed of the computer on which the program was being run, all these sorting functions would perform worse than a function with performance n. Another way to think of it is if you doubled the size of the list that needed to be sorted, the time needed to sort the list would quadruple (in the worst case).

This week, I learned of two other types of sorting: quicksort and merge sort. Both are big-O of n log n, which is much better than the previously known sorting functions, which were all big-O of n^2. For especially large lists, quicksort and merge sort would then run noticeably faster. This can largely be attributed to their recursive structure. While all our past sorting methods would involving passing over sections of the lists numerous times, quicksort and merge sort reduce that number by partitioning the list and reducing the number of items that we need to work with at a single time. For instance, merge sort works by splitting the list in half each time, merge sorting each half, and then merging the sort halves. We split the list log n times and make n comparisons each time (we compare each card in the set to every other card), giving us the n log n performance that we need.

In a world where data analysis involves working with lists unfathomably large, the performance of the sorting method we choose can then save us a significant amount of time. For instance, if I were using my code for scientific computation - say decoding the human genome - then there would be a huge difference between n log n run time and n^2 run time. How large? Depending on the volume I was working with, this could in fact be a matter of days.

Though today's post focused on quicksort and merge sort, it's worth pointing out that there are other sorting methods as well. A notable example is timsort, pythons built-in sorting method which works differently depending on the list that has been passed.

Week 9: Binary Search Trees and Performance

The types of list searching that I am most familiar with, from my time in CSC108, are linear search and binary search. The first is big-O of n, meaning that its performance is proportional to the size of the list, making the search rather tedious. Binary search is much less tedious, and has log n performance. The only stipulation with using binary search is that the list is already sorted.

The past few weeks have been all about presenting known concepts in the form of a tree, and this week that was done for binary searches. Aptly named binary search trees are based on two classes, a node class and a wrapper class. The nodes contain a value and references to nodes on their left and right, while the wrapper class denotes the root node.

Binary search trees, I found, makes searching for a particular value or set of values in the tree remarkably efficient. Knowing that there are 2 paths, one which leads to a greater value and one which leads to a lesser value, dramatically cuts down on performance time. However, there are also many nuances to using binary search trees. When deleting a value, several cases have to be considered, from nodes that have no children to nodes that have one children, to nodes that are root nodes for the entire tree. Though working with them requires more extensive coding, I certainly find it more efficient than my previous methods.

Week 8: Linkedlists and More Trees

This week focused on the dualism of linked lists, which can be regarded as both a nested set of lists and also as chain of nodes that refer to the next.

The former type of linked list is of the sort I was already familiar with - it is not too different after all, from a list with only two elements. The latter of type list was a novel concept. Linked lists are chains of nodes which contains two pieces of information - a value and a reference to the next node in the chain. This way of looking at linked lists is useful as it allows the nodes to be treated as disparate elements that can be added or removed as needed. But all good things come with caveats, it seems, because there is no way of telling where the linked list begins - without a wrapper class that is. A wrapper class denotes the starting node of the linked list, as well as some other details such as size.

During the lab this week, I realized, upon being prompted by a question, that linked lists are a remarkably efficient way to emulate queues. Creating 3 mutable linked lists, one which refers to all items, one which refers to the front of the queue, and one which refers to the back of the queue, allow the queue's prepend and decapitate options to have constant run time. The trick to working with linked list methods, I realized, involves using recursion and decreasing the index that you are working with each time (as the passed argument falls in size).

Week 7: Recursion, a Summary

After many weeks spent trying to understand recursion, brows furrowed in befuddlement, today is the day we summarize recursion.

Recursion, in short, is the strategy of breaking down a problem into a single, self-referential function. A convoluted problem can be reduced to a single line solution.

A recursive function has a base case and a general case; the base case is first employed at the deepest point in the input, that is the point at which we can no longer pass a value to the recursive function. The first value is returned at this point, which allows every recursive call before then to also return a value based on the preceding return values. Aside from the return statement at the deepest point in the input, all the other recursive calls proceed according to the general case.

The result of this approach, which is both remarkably simple in appearance and difficult to grasp on an intellectual level, is a function with emergent properties, accomplishing in a single line of code what, in our naivete, we expect would require several lines of code.

Despite its simplicity, there are also certain nuances to recursion: for one, having an explicit base case is not necessary when the general case is structured in a manner that will not exceed maximum recursion depth. Many also err when they fail to consider the most basic input (e.g. an empty list) and the importance of accommodating that (e.g. by appending [0] to the empty list when taking it max).

Week 6: Trees

No, not the kind you see outside.

Trees, in the scope of programming vernacular, refer to fractal structures with a root at the top, from which come branchlike structures called 'edges' which lead to further 'nodes'. These nodes are anthropomorphized as 'parents' and 'children'; a parent node is one from which other child nodes stem. Though in the midst of lectures I caught myself ruminating on the thoughts and feelings of childless nodes and the parentless root node, I still wasn't quite sure what purpose these tree serves, other than being fodder for daydreams.

As it later became apparent, trees are useful structures for representing multidimensional lists in a comprehensible form. These trees can be traversed in various ways so that each node in the tree can be accessed in various sequences.

Though we have yet to be taught the applications of trees, their mere structures suggests that they will be useful in list searching and list indexing. For instance, finding the nth element in a flat list requires going through n - 1 elements, a number which would be significantly lower were the desired element located in a branched tree.

Week 5: Towers of Hanoi

With our first assignment looming on the horizon (and haunting me in my dreams), I've resolved to dedicate this week's post to the Towers of Hanoi.

Though this puzzle bears resemblance to some child's toy one would see in the purgatory we know as the doctor's waiting room, it finds its origins in ancient India. Indeed, the original story tells the tale of monks who endeavour to solve this puzzle by moving its three discs from the first peg to the last. Our assignment is a modified version in which Anne Hoi - whose name sounds like Hanoi, curiously enough - wants to move cheeses from one stool to another.

The recursion insight at the heart of this problem, for three stools at least, was that all movements of multiple cheeses could be simplified into moving n - 1 cheeses first, then the bottom cheese, then n - 1 cheeses again. The four stools variation in our assignment is however a slightly more daunting challenge; the availability of an additional stool may mean that the optimal move sequence cannot always be broken down into the same steps, or even the same number of steps. I suspect that the crux of the issue is finding the optimal split of cheeses to move, possibly using recursion.