Unit 2 Overview

Unit 2 begins with hash functions, which are useful for mapping large data sets. We will continue with a broad introduction to object-oriented programming languages (Python is an example), covering objects, classes, subclasses, abstract data types, exceptions, and inheritance. Other algorithmic concepts covered are "Big O notation," divide and conquer, merge sort, orders of growth, and amortized analysis.

The next several lectures introduce effective problem-solving methods which rely on probability, statistical thinking, and simulations to solve both random and non-random problems. A background in probability is not assumed, and we will briefly cover basic concepts such as probability distributions, standard deviation, coefficient of variation, confidence intervals, linear regression, standard error, and plotting techniques. This will include an introduction to curve fitting, and we introduce the Python libraries numpy and pylab to add tools to create simulations, graphs, and predictive models.

We will spend some time on random walks and Monte Carlo simulations, a very powerful class of algorithms which invoke random sampling to model and compute mathematical or physical systems. The Monty Hall problem is used as an example of how to use simulations, and the knapsack problem introduces our discussion of optimization. Finally, we will begin looking at supervised and unsupervised machine learning, and then turn to data clustering.

At the end of Unit 2 there will be an exam covering all material (lectures, recitations, and problem sets) from the beginning of the course through More Optimization and Clustering.