library(ggplot2)
library(tidyr)
# plot log(n) vs n vs nlog(n) bs 2^n vs n^2
<- seq(1, 10, 0.1)
n
<- data.frame(
df log_n = log(n),
n = n,
n_times_log_n = n * log(n),
n_pow_2 = n^2,
two_pow_n = 2^n
)
<- df %>% gather(fun, time, -n)
df
ggplot(df, aes(x = n, y = time, color = fun)) +
geom_line() +
theme_minimal() +
theme(legend.position = "top")
Lecture 25 - Nov 2, 2023
Summary
In this lecture, we start discussing how to evaluate the performance of algorithms based on time complexity.
Last lecture
Backtracking.
Today
Complexity analysis - time.
Algorithm
The set of steps to solve a problem. For example, find a number in a list.
- Linear search: Worst case scenario, the number we search for will be the last number. We will need to make \(n\) comparisons.
- Binary search: We assume the array is ordered. We start looking in the middle. If the number we are searching for is smaller, look left. Otherwise, look right. With every comparison, you throw 1/2 of the remaining array. Worst case is when you break the array several times until the remaining element is just one. The number of comparisons is given by \(\frac{n}{2^x} = 1 \implies n = 2^x \implies x = \log_2(n)\).
The time it takes to run a program, depends on your computer power, nature of data, compiler, programming language, and input size.
However, the choice of your algorithm matters! If you get a faster computer for a slow algorithm, as input size grows, computation will take more time than if you have used a quicker algorithm.
We are interested in predicting performance as input size grows.
We will use complexity analysis (time) to evaluate how fast an algorithm with respect to input size.
\(T(n)\): run time estimate as a function of input size \(n\) for one operation. \(T(n) = 1\).
What is a better \(T(n)\)? Is it \(3n + 2\) or \(2n + 2\)?
- We do not care about constants or coefficients. A faster computer can make up this difference.
- What truly makes a difference is the highest order term, as it dominates when \(n\) is large.
- We use Big-O notation to represent \(T(n)\).
- \(T(n) = O(g(n))\) if \(T(n)\)’s highest order term is \(g(n)\) - disregarding coefficients and constant.
- \(O(g(n))\) is the upper bound on \(T(n)\).
Which term is more dominant?
\[ \log(n) < n < n \log(n) < n^2 < n^3 < 2^n \]