Advanced Algorithm (Fall 2021 NJU)

Course Homepage: Advanced Algorithms

Min-Cut and Max-Cut

Karger's Min-Cut Algorithm

pick an edge uniformly at random and contract it

Theorem:

\(\Pr[\textrm{a min-cut is returned}]\geq\frac{2}{n(n-1)}\)

Proof:

a min-cut preserves when not contracted
\(C\) is a min-cut \(\Rightarrow|C|\leq\frac{2|E|}{|V|}\)

Corollary:

The number of distinct min-cuts \(\leq n(n-1)/2\)

Faster Min-Cut

contract to \(t=n/\sqrt{2}+1\) vertices and branch

runs in \(O(n^2\log n)\) time and succeeds with probability \(\Omega(\frac{1}{\log n})\)

Greedy Cut

a vertex \(v\) joins one of \(S,T\) to maximize current \(E(S,T)\)

1/2-approximation

Random Cut

\(\mathbf{E}[SOL]=|E|/2\)

de-randomization by conditional expectation

mutual independence, k-wise independence

de-randomization by pairwise independence

Fingerprinting

Checking Matrix Multiplication

Freivald’s Algorithm:

pick a uniform random \(r\in\{0,1\}^n\) check whether \(ABr=Cr\)

if \(AB\neq C\) then \(\Pr[ABr=Cr]\leq \frac{1}{2}\)

Polynomial Identity Testing

Input: a polynomial \(f\in\mathbb{F}[x]\) of degree \(d\).

Output: \(f\equiv 0\) ?

Fundamental Theorem of Algebra

Any non-zero \(d\)-degree polynomial \(f\in\mathbb{F}[x]\) has at most \(d\) roots

Randomized algorithm (fingerprinting):

pick a uniform random \(r\in S\) check if \(f(r)=0\)

if \(f\not\equiv 0\) then \(\Pr[f(r)=0]\leq\frac{d}{|S|}\)

Input: a polynomial \(f\in\mathbb{F}[x_1,...,x_n]\) of degree \(d\).

Output: \(f\equiv 0\) ?

Randomized algorithm (fingerprinting):

pick \(r_1,...,r_n\in S\) uniformly and independently at random, check if \(f(r_1,...,r_n)=0\)

Schwartz-Zippel Theorem

\(f\not\equiv 0\Longrightarrow\Pr[f(r_1,...,r_n)=0]\leq\frac{d}{|S|}\)

Checking Distinctness

Input: two multisets \(A={a_1,...,a_n}\) and \(B={b_1,...,b_n}\) where elements in \([n]\)

Output: \(A=B\) ?

\(f_A(x)=\prod_{i=1}^{n}(x-a_i)\)

\(\textrm{FING}(A)=f_A(r), r\in\mathbb{Z}_p\)

if \(A\neq B\) :

\(f_A\equiv f_B\) on \(\mathbb{Z}_p\)
\(f_A\not\equiv f_B\) on \(\mathbb{Z}_p\) but \(f_A(r)=f_B(r)\)

Hashing and Sketching

Count Distinct Elements

Input: a sequence \(x_1,x_2,...,x_s\in U=[N]\)

Output: an estimation of \(n=|\{x_1,x_2,...,x_s\}|\)

\((\epsilon,\delta)\)-estimator: a random variable \(\hat{Z}\) s.t. \[ \Pr[(1-\epsilon)n\leq\hat{Z}\leq(1+\epsilon)n]\geq1-\delta \] Markov's Inequality

For nonnegative random variable \(X\), for any \(t>0\), \[ \Pr[X\geq t]\leq\frac{\mathbb{E}[X]}{t} \] Chebyshev's Inequality

For random variable \(X\), for any \(t>0\), \[ \Pr[|X-\mathbb{E}[X]|\geq t]\leq\frac{\mathbf{Var}[X]}{t^2} \] Min Sketch:

for each \(1\leq j\leq k\), let \(Y_j=\min_{1\leq i\leq s}h_j(x_i)\)
return \(\hat{Z}=1/\bar{Y}-1\) where \(\bar{Y}=\frac{1}{k}\sum_{j=1}^{k}Y_j\)

\[ \begin{align*} &\mathbb{E}[\bar{Y}]=\frac{1}{n+1},\mathbf{Var}[\bar{Y}]\leq\frac{1}{k(n+1)^2}\\ &\Pr\big[\hat{Z}<(1-\epsilon)n\textrm{ or }\hat{Z}>(1+\epsilon)n\big] \leq\Pr\left[|\bar{Y}-\mathbb{E}[\bar{Y}]|>\frac{\epsilon/2}{n+1}\right] \leq\frac{4}{k\epsilon^2} \leq\delta \end{align*} \]

Space cost: \(k=\lceil4/(\epsilon^2\delta)\rceil\) real numbers in \([0,1]\)

Storing \(k\) idealized hash functions

Universal Hash Family:

A family \(\mathcal{H}\) of hash functions in \(U\rightarrow[m]\) is \(k\)-universal if for any distinct \(x_1,...,x_k\in U\), \[ \Pr_{h\in\mathcal{H}}[h(x_1)=\dots=h(x_k)]\leq\frac{1}{m^{k-1}} \] Moreover, \(\mathcal{H}\) is strongly \(k\)-universal (\(k\)-wise independent) if for any distinct \(x_1,...,x_k\in U\) and any \(y_1,...,y_k\in[m]\), \[ \Pr_{h\in\mathcal{H}}\left[\bigwedge_{i=1}^kh(x_i)=y_i\right]=\frac{1}{m^k} \] Linear Congruential Hashing

Represent \(U\subseteq\mathbb{Z}_p\) for sufficiently large prime \(p\)
\(h_{a,b}(x)=((ax+b)\mod p)\mod m\)
\(\mathcal{H}=\{h_{a,b}|a\in\mathbb{Z}_p\backslash\{0\},b\in\mathbb{Z}_p\}\)

Theorem:

The linear congruential family \(\mathcal{H}\) is 2-wise independent
to achieve \(k\)-wise independent: degree-\(k\) polynomial in finite field with random coefficients

Flajolet-Martin Algorithm

BJKST Algorithm

Frequency Estimation

Bloom Filters

Data: a set \(S\) of \(n\) items \(x_1,x_2,...,x_n\in U=[N]\)

Query: an item \(x\in U\)

Determine whether \(x\in S\)

uniform & independent hash function \(h_1,...,h_k:U\rightarrow[m]\)

Data Structure:

bit vector \(v\in\{0,1\}^m\), initially all 0's
for each \(x_i\): set \(v[h_j(x_i)]=1\) for all \(1\leq j\leq k\)
Query x: "yes" iff \(v[h_j(x)]=1\) for all \(1\leq j\leq k\)

choose \(k=c\ln 2\) and \(m=cn\)

space cost: \(m=cn\) bits

time cost: \(k=c\ln 2\)

false positive \(\leq(1-(1-1/m)^{kn})^k\approx(1-e^{-kn/m})^k=2^{-c\ln 2}\leq(0.6185)^c\)

Count-Min Sketch

Data: a sequence \(x_1,x_2,...,x_n\in U=[N]\)

Query: an item \(x\in U\)

Estimate the frequency \(f_x=|\{i:x_i=x\}|\) of \(x\)

\(\hat{f}_x\) : estimation of \(f_x\) within additive error

\(\Pr\big[|\hat{f}_x-f_x|\geq\epsilon n\big]\leq\delta\)

\(k\) independent 2-universal hash functions \(h_1,...,h_k:[N]\rightarrow [m]\)

\(\mathrm{CMS}[k][m]\), initialized to all 0's
Upon each \(x_i\): \(\mathrm{CMS}[j][h_j(x_i)]++\) for all \(1\leq j\leq k\)
Query \(x\): return \(\hat{f}_x=\min_{1\leq j\leq k}\mathrm{CMS}[j][h_j(x)]\)

\(\mathbb{E}\big[\mathrm{CMS}[j][h_j(x)]\big]=f_x+\sum_{y}f_y\Pr[h_j(y)=h_j(x)]\leq f_x+\frac{n}{m}\)

\(\Pr\big[|\hat{f}_x-f_x|\geq\epsilon n\big]=\Pr\left[\forall1\leq j\leq k:\mathrm{CMS}[j][h_j(x)\right]-f_x\geq\epsilon n]\leq \left(\frac{1}{\epsilon m}\right)^k\)

choose \(m=\lceil e/\epsilon\rceil\) and \(k=\lceil\ln(1/\delta)\rceil\)

space cost: \(O(\epsilon^{-1}\log (1/\delta)\log n)\) bits

time cost: \(O(\log(1/\delta))\)

\(O(\log(1/\delta)\log N)\) bits for hash functions

Balls into Bins

1-1	birthday paradox
on-to	coupon collector
pre-image size	occupancy problem

Birthday Paradox

\(n\) balls are thrown into \(m\) bins

event \(\mathcal{E}\): each bin receives \(\leq 1\) balls

\(\Pr[\mathcal{E}]=\prod_{i=0}^{n-1}(1-\frac{i}{m})\approx e^{-n^2/2m}\)

\(n=\sqrt{2m\ln(1/p)}\Longrightarrow\Pr[\mathcal{E}]=(1\pm o(1))p\)

In a class of \(n>57\) students, with \(>99\%\) probability, there are two students with the same birthday.

Perfect Hashing

Data: a set \(S\) of \(n\) items \(x_1,x_2,...,x_n\in U=[N]\)

Query: an item \(x\in U\)

Determine whether \(x\in S\)

Birthday Paradox by 2-universal hashing:

\(\mathbb{E}[\mathrm{\#\ of\ collisions}]=\sum_{i<j}\Pr[X_i=X_j]\leq\binom{n}{2}\frac{1}{m}\)

\(n\leq\sqrt{2m\epsilon}\Longrightarrow\Pr[\neg\mathcal{E}]=\Pr[Y\geq 1]\leq\mathbb{E}[Y]\leq\epsilon\)

FKS Perfect Hashing

Query(x):

retrieve primary hash \(h\)
go to bucket \(i=h(x)\)
retrieve secondary hash \(h_i\)
check whether \(T_i[h_i(x)]=x\)

Size of the \(i\)-th bin: \(|B_i|\)

\(\mathrm{\#\ of\ collision}=\frac{1}{2}\sum|B_i|(|B_i|-1)\Longrightarrow\mathbb{E}\bigg[\sum|B_i|^2\bigg]=\frac{n(n-1)}{m}+n\)

\(\exists h\) from a 2-universal family s.t. the total space cost is \(O(n)\)

\(O(n\log N)\) space, \(O(1)\) time

Coupon Collector

\(X\): number of balls thrown to make all the \(n\) bins nonempty

\(\mathbb{E}[X]=\sum_{i=1}^{n}\frac{n}{n-i+1}=n\sum_{i=1}^{n}\frac{1}{i}=nH(n)\)

expected \(n\ln n+O(n)\) balls

Theorem:

For \(c>0\), \(\Pr[X\geq n\ln n+cn]\leq e^{-c}\)

Proof:

For one bin, it misses all balls with probability:

\(\left(1-\frac{1}{n}\right)^{n\ln n+cn}<e^{-(\ln n+c)}<\frac{1}{ne^c}\)

Then it can be proved by union bound.

Occupancy Problem

\(m\) balls are thrown into \(n\) bins

\(X_i:\) number of balls in the \(i\)-th bin

\(\forall i:\mathbb{E}[X_i]=\frac{m}{n}\)

Theorem:

With high probability, the maximum load is

\(O\left(\frac{\log n}{\log \log n}\right)\) when \(m=n\)
\(O(\frac{m}{n})\) when \(m\geq n\ln n\)

Proof:

\(\Pr[X_i\geq L]\leq \binom{m}{L}\frac{1}{n^L}\leq\frac{m^L}{L!n^L}\leq\left(\frac{em}{n}\right)^L\frac{1}{L^L}\)

when \(m=n\), this probability \(\leq\left(\frac{e}{L}\right)^L\leq\frac{1}{n^2}\) for \(L=\frac{3\ln n}{\ln\ln n}\)

when \(m\geq n\ln n\), this probability \(\leq e^{-L}\leq\frac{1}{n^2}\) for \(L=\frac{e^2m}{n}\geq e^2\ln n\)

Theorem:

With high probability, the maximum load is

\(\Omega\left(\frac{\log n}{\log \log n}\right)\) when \(m=n\)

Proof:

Exactly (?), \(X_1,...,X_n\sim\mathrm{Bin}(m,1/n)\) subject to \(\sum_{i=1}^{n}X_i=m\)

Heuristically, treat \(X_1,...,X_n\) as i.i.d. \(Y_1,...,Y_n\sim\mathrm{Pois}(m/n)\)

Theorem: \(\forall m_1,...,m_n\in\mathbb{N}\) s.t. \(\sum_{i=1}^{n}m_i=m\)

\[ \Pr\left[\bigwedge_{i=1}^nX_i=m_i\right]=\Pr\left[\bigwedge_{i=1}^nY_i=m_i\bigg|\sum_{i=1}^nY_i=m\right] \]

Theorem (Poisson Approximation): \(\forall\) nonnegative function \(f\)

\[ \mathbb{E}[f(X_1,...,X_n)]\leq e\sqrt{m}\cdot\mathbb{E}[f(Y_1,...,Y_n)] \]

Power of Two Choices

When \(n\) balls are thrown into \(n\) bins using the two-choice paradigm, the maximum load is \(\ln\ln n/\ln 2+\Theta(1)\) w.h.p.

Cuckoo Hashing

Concentration of Measure

Chernoff-Hoeffding Bounds

Chernoff Bound

For independent (or negatively associated) \(X_1,...,X_n\in\{0,1\}\) with \[ X=\sum_{i=1}^{n}X_i\mathrm{\ and\ }\mu=\mathbb{E}[X] \] For any \(\delta>0\), \[ \Pr[X\geq(1+\delta)\mu]\leq\left(\frac{e^\delta}{(1+\delta)^{(1+\delta)}}\right)^\mu\leq\exp\left(-\frac{\mu\delta^2}{3}\right) \] For any \(0<\delta<1\), \[ \Pr[X\leq(1-\delta)\mu]\leq\left(\frac{e^{-\delta}}{(1-\delta)^{(1-\delta)}}\right)^\mu\leq\exp\left(-\frac{\mu\delta^2}{2}\right) \] For \(t\geq 2e\mu\): \[ \Pr[X\geq t]\leq 2^{-t} \] For any \(t>0\): \[ \begin{align*} \Pr[X\geq\mathbb{E}[X]+t]\leq\exp\left(-\frac{2t^2}{n}\right)\\ \Pr[X\leq\mathbb{E}[X]-t]\leq\exp\left(-\frac{2t^2}{n}\right) \end{align*} \] A party of \(O(\sqrt{n\log n})\) can manipulate a vote w.h.p.

Hoeffding Bound

For \(X=\sum_{i=1}^nX_i\), where \(X_i\in[a_i,b_i],1\leq i\leq n\), are independent (or negatively associated),

for any \(t>0\): \[ \begin{align*} \Pr[X\geq\mathbb{E}[X]+t]\leq\exp\left(-\frac{2t^2}{\sum_{i=1}^{n}(b_i-a_i)^2}\right)\\ \Pr[X\leq\mathbb{E}[X]-t]\leq\exp\left(-\frac{2t^2}{\sum_{i=1}^{n}(b_i-a_i)^2}\right) \end{align*} \] Sub-Gaussian Random Variables

A centered (\(\mathbb{E}[Y]=0\)) random variable \(Y\) is said to be sub-Gaussian with variance factor \(\nu\) (denoted \(Y\in\mathcal{G}(\nu)\)) if \[ \mathbb{E}[e^{\lambda Y}]\leq\exp\left(\frac{\lambda^2\nu}{2}\right) \] Chernoff-Hoeffding Bound:

For \(Y=\sum_{i=1}^nY_i\), where \(Y_i\in\mathcal{G}(\nu_i)\), \(1\leq i\leq n\), are independent (or negatively associated) and centered (i.e. \(\mathbb{E}[Y_i]=0\)),

for any \(t>0\): \[ \begin{align*} \Pr[Y\geq t]\leq\exp\left(-\frac{t^2}{2\sum_{i=1}^{n}\nu_i}\right)\\ \Pr[Y\leq-t]\leq\exp\left(-\frac{t^2}{2\sum_{i=1}^{n}\nu_i}\right) \end{align*} \] Poisson Tails

For \(Y\sim\mathrm{Pois}(\mu)\), \[ \begin{align*} k>\mu\Longrightarrow\Pr[X>k]<e^{-\mu}\left(\frac{e\mu}{k}\right)^k\\ k<\mu\Longrightarrow\Pr[X<k]<e^{-\mu}\left(\frac{e\mu}{k}\right)^k \end{align*} \]

The Method of Bounded Differences

McDiarmid’s Inequality:

For independent \(X_1,X_2,...,X_n\), if \(n\)-variate function \(f\) satisfies the Lipschitz condition: for every \(1\leq i\leq n\), \[ |f(x_1,...,x_n)-f(x_1,...,x_{i-1},y_i,x_{i+1},...,x_n)|\leq c_i \] for any possible \(x_1,...,x_n\) and \(y_i\),

then for any \(t>0\): \[ \Pr\bigg[|f(X_1,...,X_n)-\mathbb{E}[f(X_1,...,X_n)|\geq t\bigg]\leq 2\exp\left(-\frac{t^2}{2\sum_{i=1}^nc_i^2}\right) \]

Chernoff: sum of Boolean variables, 1-Lipschitz
Hoeffding: sum of \([a_i,b_i]\)-bounded variables, \((b_i-a_i)\)-Lipschitz
Balls into Bins:
- \(Y:\) number of empty bins
- \(Y\) is 1-Lipschitz with respect to \(m\) variables, where \(m\) is the number of balls
Pattern Matching
- \(Y:\) number of substrings of \(X\in\Sigma^n\) matching the pattern \(\pi\in\Sigma^k\)
- \(Y\) is \(k\)-Lipschitz with respect to \(n\) variables \(X_1,...,X_n\)
Sprinkling Points on Hypercube
- \(Y:\) shortest Hamming distance from \(X\in\{0,1\}^n\) to \(S\subseteq\{0,1\}^n\)
- \(Y\) is 1-Lipschitz with respect to \(n\) variables \(X_1,...,X_n\)

Martingale

A sequence of random variables \(Y_0,Y_1,...\) is a martingale with respect to \(X_0,X_1,...\) if for all \(t\geq 0\),

\(Y_t\) is a function of \(X_0,...,X_t\)
\(\mathbb{E}[Y_{t+1}|X_0,X_1,...,X_t]=Y_t\)

A fair gambling game:

\(X_i:\) outcome of the \(i\)-th betting
\(Y_i:\) capital after the \(i\)-th betting

Azuma's Inequality:

For martingale \(Y_0,Y_1,...\) (with respect to \(X_0,X_1,...\)) satisfying: \[ \forall i\geq 0,\quad |Y_i-Y_{i-1}|\leq c_i \] for any \(n\geq 1\) and \(t>0\) : \[ \Pr\bigg[\big|Y_n-Y_0\big|\geq t\bigg]\leq 2\exp\left(-\frac{t^2}{2\sum_{i=1}^nc_i^2}\right) \] Your capital does not change too fast if:

the game is fair (martingale)
payoff for each gambling is bounded

Doob Martingale

A Doob sequence \(Y_0,Y_1,...,Y_n\) of an \(n\)-variate function \(f\) with respect to a random vector \((X_1,...,X_n)\) is: \[ \forall 0\leq i\leq n,\quad Y_i=\mathbb{E}\big[f(X_1,...,X_n)|X_1,...,X_i\big] \] Theorem: The Doob sequence \(Y_0,Y_1,...,Y_n\) is a martingale with respect to \(X_1,...,X_n\).

Then McDiarmid’s Inequality can be proved using Azuma's Inequality of a Doob Martingale with respect to independent random variables.

Dimension Reduction

Johnson-Lindenstrauss Theorem

Theorem:

\(\forall 0<\epsilon<1\), for any set \(S\) of \(n\) points from \(\mathbb{R}^d\), there is a \(\phi:\mathbb{R}^d\rightarrow\mathbb{R}^k\) with \(k=O(\epsilon^{-2}\log n)\), such that \(\forall x,y\in S:\) \[ (1-\epsilon)||x-y||_2^2\leq||\phi(x)-\phi(y)||_2^2\leq(1+\epsilon)||x-y||_2^2 \] Actually, \(\phi(x)=Ax\)

The probabilistic method: for random \(A\in\mathbb{R}^{k\times d}\) \[ \Pr\big[\forall x,y\in S:(1-\epsilon)||x-y||_2^2\leq||\phi(x)-\phi(y)||_2^2\leq(1+\epsilon)||x-y||_2^2\big]=1-O\bigg(\frac{1}{n}\bigg) \] Efficient construction of random \(A\in\mathbb{R}^{k\times d}\):

projection onto uniform random \(k\)-dimensional subspace
independent Gaussian entries
i.i.d. -1/+1 entries

JLT (i.i.d. Gaussian entries)

entries of \(A\) are chosen i.i.d. from \(\mathcal{N}(0,1/k)\)

JLT (uniform k-dim subspace)

the \(k\) rows \(A_1,...,A_k\) of \(A\in\mathbb{R}^{k\times d}\) are orthogonal unit vectors \(\in\mathbb{R}^d\) chosen uniformly at random and let \(y_i=\sqrt{\frac{d}{k}}Ax_i\)

(?)

Nearest Neighbor Search

Data: \(n\) points \(y_1,y_2,...,y_n\in X\)

Query: a point \(x\in X\)

Find the data-point \(y_i\) that is closest to \(x\).

\(c\)-ANN (Approximate Nearest Neighbor):

Find a \(y_i\) such that \(\mathrm{dist}(x,y_i)\leq c\cdot\min_{1\leq j\leq n}\mathrm{dist}(x,y_j)\)

\((c,r)\)-ANN (Approximate Near Neighbor):

return a \(y_i\) that \(\mathrm{dist}(x,y_i)\leq c\cdot r\) if \(\exists y_j\) s.t. \(\mathrm{dist}(x,y_j)\leq r\)

"no" if \(\forall y_i, \mathrm{dist}(x,y_i)>c\cdot r\)

arbitrary if otherwise

if \((\sqrt{c},r)\)-ANN can be solved for any \(r\), then \(c\)-ANN can be solved by using it \(\log_\sqrt{c}(D_\max/D_\min)\) times

\((c,r)\)-ANN in Hamming space (by Dimension Reduction)

sample \(k\times d\) Boolean matrix \(A\) with i.i.d. entries \(\in\mathrm{Bernoulli}(p)\)
let \(z_i=Ay_i\in\{0,1\}^k\) on finite field \(\mathrm{GF}(2)\)
store all \(s\)-balls \(B_s(u)=\{y_i\mid\mathrm{dist}(u,z_i)\leq s\}\) for all \(u\in\{0,1\}^k\)
To answer query \(x\in\{0,1\}^d:\)
- retrieve \(B_s(Ax)\)
- if \(B_s(Ax)=\emptyset\) return "no"
- else return any \(y_i\in B_s(Ax)\)

space: \(O(n2^k)\) query time: \(O(kd)\)

for suitable \(k=O(\log n)\), \(p\) and \(s\): \[ \begin{align*} \forall x,y\in\{0,1\}^d:\quad \mathrm{dist}(x,y)\leq r\quad\quad\ \Rightarrow\Pr[\mathrm{dist}(Ax,Ay)>s]<1/n^2\\ \quad\quad\quad\quad\quad\quad\quad\quad\ \mathrm{dist}(x,y)> c\cdot r\quad\Rightarrow\Pr[\mathrm{dist}(Ax,Ay)\leq s]<1/n^2 \end{align*} \]

Locality Sensitive Hashing

A random \(h:X\rightarrow U\) is an \((r,cr,p,q)\)-LSH if \(\forall x,y\in X:\) \[ \begin{align*} \mathrm{dist}(x,y)\leq r\Longrightarrow\Pr[h(x)=h(y)]\geq p\\ \mathrm{dist}(x,y)>c\cdot r\Longrightarrow\Pr[h(x)=h(y)]\leq q \end{align*} \]

an \((r,cr,p,q)\)-LSH \(h:X\rightarrow U\) \(\Longrightarrow\) an \((r,cr,p^k,q^k)\)-LSH \(h:X\rightarrow U^k\)

\((c,r)\)-ANN by LSH

suppose we have \((r,cr,p^*,1/n)\)-LSH \(g:X\rightarrow U\)

Hash functions:

i.i.d. instances \(g_1,\ldots,g_l\) of \(g\), where \(l=1/p^*\)

Data structure:

\(l\) sorted tables
in table-\(j\): sort \(y_1,y_1,\ldots,y_n\) according to \(g_j(y_i)\)

Upon query \(x\in X\):

find \(10l\) such \(y_i\) that \(\exists j,g_j(x)=g_j(y_i)\) by binary search
if encounter a \(y_i\) that \(\mathrm{dist}(x,y_i)\leq c\cdot r\) return this \(y_i\)
else return "no"

space: \(O(n/p^*)\) time: \(O((\log n)/p^*)\) one-sided error <0.5

suppose we have \((r,cr,p,q)\)-LSH \(h:X\rightarrow U\),

then we have \((r,cr,n^{-\rho},1/n)\)-LSH \(g:X\rightarrow U^k\),

where \(\rho=\frac{\log p}{\log q}\)

then we can solve \((c,r)\)-ANN with

space: \(O(n^{1+\rho})\) time: \(O(n^\rho\cdot\log n)\) one-sided error <0.5

\((c,r)\)-ANN in Hamming space (by LSH)

\(\forall x\in\{0,1\}^d:h(x)=x_i\) for uniform random \(i\in[d]\)

\(h:\{0,1\}^d\rightarrow \{0,1\}\) is an \((r,cr,1-r/d,1-cr/d)\)-LSH

\(\rho=\frac{\log(1-r/d)}{\log(1-cr/d)}\leq\frac{1}{c}\)

space: \(O(n^{1+1/c})\) time: \(O(n^{1/c}\cdot\log n)\) one-sided error <0.5

Greedy and Local Search

Max-Cut

Greedy Algorithm for Max-Cut

Local Search Algorithm for Max-Cut

initially, \((S,T)\) is an arbitrary cut
if \(\exists v\) switching side increases cut, then switch side

in a local optima (fixed point): \[ \begin{align*} \forall &v\in S:|E(v,S)|\leq|E(v,T)|\Longrightarrow2|E(S,S)|\leq|E(S,T)|\\ \forall &v\in T:|E(v,T)|\leq|E(v,S)|\Longrightarrow2|E(T,T)|\leq|E(S,T)|\\ &\Longrightarrow |E(S,S)|+|E(T,T)|\leq|E(S,T)|\\ &\Longrightarrow |E(S,T)|\geq\frac{1}{2}|E|\geq\frac{1}{2}OPT \end{align*} \]

Scheduling

Instance: \(n\) jobs \(j=1,\ldots,n\) with processing times \(p_j\in\mathbb{R}^+\)

Solution: An assignment of \(n\) jobs to \(m\) identical machines that minimizes the makespan \(C_\max\)

Graham's List Algorithm

assign job to the current least heavily loaded machine

\(OPT\geq\max_{1\leq j\leq n} p_j\)

\(OPT\geq\frac{1}{m}\sum_{j=1}^n p_j\)

\(C_{i^*}-p_l\leq \frac{1}{m}\sum_{j\neq l}p_j\)

\(p_l\leq\max_{1\leq j\leq n}p_j\)

\(C_\max=C_{i^*}\leq(1-\frac{1}{m})p_l+\frac{1}{m}\sum_{1\leq j\leq n}p_j\leq(2-\frac{1}{m})OPT\)

tight

Longest Processing Time (LPT)

\(p_1\geq p_2\geq\dots\geq p_n\)

assign job to the current least heavily loaded machine

WLOG: \(C_{i^*}>\max_{1\leq j\leq n}p_j\Longrightarrow p_l\leq p_{m+1}\)

\(OPT\geq p_m+p_{m+1}\geq 2p_{m+1}\)

\(p_l\leq\frac{1}{2}OPT\)

\(C_\max=C_{i^*}\leq p_l+\frac{1}{m}\sum_{1\leq j\leq n}p_j\leq \frac{3}{2}OPT\)

can be improved to \(4/3\)-approx. with a more careful analysis

has a PTAS

PTAS (Polynomial-Time Approximation Scheme):

\(\forall\epsilon>0\), a \((1+\epsilon)\)-approx. solution can be returned in time \(f(\epsilon)\cdot\mathrm{poly(n)}\)

competitive ratio of an online algorithm

Set Cover

Instance: A sequence of subsets \(S_1,\dots,S_m\subseteq U\).

Find the smallest \(C\subseteq\{1,\dots,m\}\) s.t. \(\bigcup_{i\in C}S_i=U\).

Hitting Set is exactly Set Cover

Greedy Cover:

initially \(C=\emptyset\)

while \(U\neq\emptyset\) do:

add \(i\) with largest \(|S_i\cap U|\) to \(C\)

\(U=U\backslash S_i\)

Rounding Dynamic Programs

(to be done)