# Hyperplane Separation Theorem

I’m reading Real Analysis books again as a part of my studies. I used to visit Kim C. Border site from time to time to read his excellent materials, and now I read that he passed away. I never audited one of his courses nor studied at Caltech, but we exchanged several emails from 2012 to 2019, mostly about Linear Algebra, and lately also about Arrow’s Impossibility Theorem and social debates in a moment when Chile entered a political crisis that we still haven’t solved.

The proof of this theorem, heavily inspired from his style, is a way to tribute him as a very positive influence during my economics studies. This theorem, interesting by itself, provides a powerful result used in the different versions and proofs of the finite dimensional supporting hyperplane theorem, the Farkas’ lemma, and the Karush-Kuhn-Tucker theorem. These result are also the basis for the support vector machines algorithm, duality in linear optimization, and many portfolio-related results in finance.

**Theorem** (Hyperplane Separation Theorem). Let \(\renewcommand{\vec}[1]{\boldsymbol{#1}} \newcommand{\R}{\mathbb{R}} \newcommand{\N}{\mathbb{N}} \newcommand{\adh}[1]{\text{adh}({#1})} \newcommand{\inte}[1]{\text{int}({#1})} A\) and \(B\) two convex, disjoint, non-empty subsets of \(\R^n\). If \(A\) is closed and \(B\) is compact, exists \(\vec{p}\in \R^n\setminus \{\vec{0}\}\) such that \(\vec{p}\cdot \vec{a} < \vec{p}\cdot \vec{b} \, \, \forall (\vec{a},\vec{b})\in A\times B\).

A consequence of this result, is that any closed and convex set \(C\) can be separated for any vector \(\vec{v} \notin C\).

*Proof*. Shall be made under a “divide and conquer” approach.

If \(A\) is closed, define the function \[\begin{align*} f: B &\to \R \cr \vec{b} &\mapsto \min_{\vec{a}\in A} \|\vec{b}-\vec{a}\|. \end{align*}\] This function returns the distance from \(\vec{b}\in B\) to \(\vec{a} \in A\) and is a continuous function.

If \(B\) is compact, \(\vec{b}_0 \in B\) exists and is such that \(f(\vec{b}_0)\leq f(\vec{b}) \:\forall \vec{b}\in B\) because of the Weierstrass extreme value theorem.

Let \(\vec{y}\in A\) such that \(f(\vec{b}_0)=\|\vec{b}_0-\vec{y}\|\). The vector \[\vec{p}=\frac{\vec{b}_0-\vec{y}}{\|\vec{b}_0-\vec{y}\|}\] is well defined and is such that \(\|\vec{p}\|=1\).

From \(\vec{p}\cdot \vec{p} > 0\), it follows that \[\vec{p}\cdot \frac{\vec{b}_0-\vec{y}}{\|\vec{b}_0-\vec{y}\|} > 0,\] from which we conclude that \[\begin{gather}\label{separacion-1} \vec{p}\cdot \vec{y} < \vec{p}\cdot \vec{b}_0 \tag{*}. \end{gather}\]

From \(\eqref{separacion-1}\) we need to prove that \(\vec{p}\cdot \vec{b}_0 \leq \vec{p}\cdot \vec{b}\) and \(\vec{p}\cdot \vec{a} < \vec{p}\cdot \vec{y}\).

Fix \(\vec{a}\in A\) and define the function \[\begin{align*} g: [0,1] &\to \R \cr \lambda &\mapsto \|\vec{b}_0+\vec{y} - \lambda(\vec{a}-\vec{y})\|^2. \end{align*}\] Provided that \(A\) is convex, \(g\) reaches a minimum at \(\lambda=0\), then \[\begin{align*} g(\lambda) &= (\langle \vec{b}_0-\vec{y} - \lambda(\vec{a}-\vec{y}) , \vec{b}_0-\vec{y} - \lambda(\vec{a}-\vec{y}) \rangle)^2 \cr &= (\langle \vec{b}_0-\vec{y} , \vec{b}_0-\vec{y} \rangle - 2\lambda \langle \vec{b}_0-\vec{y} , \vec{a}-\vec{y}\rangle + \lambda^2 \langle \vec{a}-\vec{y} , \vec{a}-\vec{y} \rangle)^2 \cr \frac{\partial g}{\partial \lambda} &= -2\langle \vec{b}_0-\vec{y} , \vec{a}-\vec{y}\rangle + 2\lambda \langle \vec{a}-\vec{y} , \vec{a}-\vec{y} \rangle. \end{align*}\] Hence, \[\left.\frac{\partial g}{\partial \lambda}\right|_{\lambda=0} = (\vec{b}_0-\vec{y})\cdot (\vec{y}-\vec{a}) \geq 0.\] Dividing by \(\|\vec{b}_0-\vec{y}\|\) we conclude that \[\begin{gather}\label{separacion-2} \vec{p}\cdot \vec{a} \leq \vec{p}\cdot \vec{y}. \tag{**} \end{gather}\]

Fix \(\vec{b}\in B\) and define the function \[\begin{align*} h: & [0,1] \to \R \cr \lambda &\mapsto \|\vec{b}_0-\vec{y} - \lambda(\vec{b}-\vec{y})\|^2. \end{align*}\] By the same argument that leads to \(\eqref{separacion-2}\), we conclude that \[\begin{gather}\label{separacion-3} \vec{p}\cdot \vec{b}_0 \leq \vec{p}\cdot \vec{b}. \tag{***} \end{gather}\]

Finally, \(\eqref{separacion-1}\), \(\eqref{separacion-2}\) and \(\eqref{separacion-3}\) lead to conclude that \(\vec{p}\cdot \vec{a} < \vec{p}\cdot \vec{b}\).

**Corollary**. Let \(A\) and \(B\) be two convex, disjoint and non-empty sets. Then \(C=A\setminus B\) is convex and non-empty such that \(\vec{0}\notin C\). There are two cases for this result:

- \(\vec{0} \in \adh{C}\), in which case exists \(\vec{p}\in \R^n\setminus \{\vec{0}\}\) such that \(\vec{p}\cdot \vec{a} \leq \vec{p}\cdot \vec{b} \:\forall (\vec{a},\vec{b})\in A\times B\).
- \(\vec{0} \notin \adh{C}\), in which case exists \(\vec{p}\in \R^n\setminus \{\vec{0}\}\) such that \(\vec{p}\cdot \vec{a} < \vec{p}\cdot \vec{b} \:\forall (\vec{a},\vec{b})\in A\times B\).

*Proof*. Again, dividing by parts.

\(\vec{0} \in \adh{C}\): Applying the hyperplane separation theorem exists \(\vec{0}\notin \inte{C}\). Therefore, exists a sequence \(\{\vec{x}_n\}_{n\in \N}\) such that \(\{\vec{x}_n\}\in \adh{C}\) and \(\{\vec{x}_n\}\to \vec{0}\) as \(n\to \infty\). Then for every \(n\in \N\) exists a vector \(\vec{p}_n\) such that \(\|\vec{p}_n\|=1\) and \(\vec{p}_n\cdot \vec{a} < \vec{p}_n\cdot \vec{b} \:\forall (\vec{a},\vec{b})\in A\times B\). Provided that \(\{\vec{p}_n\}_{n\in \N}\) is defined in a compact set, exists at least one convergent subsequence, this means that exists \(\vec{p}\in \R^n\setminus \{\vec{0}\}\) such that \(\|\vec{p}\|=1\) and \(\vec{p}\cdot \vec{a} \leq \vec{p}\cdot \vec{b} \:\forall (\vec{a},\vec{b})\in A\times B\).

\(\vec{0} \notin \adh{C}\): Applying the hyperplane separation theorem, we conclude that exists \(\vec{p}\in \R^n\setminus \{\vec{0}\}\) such that \(\vec{p}\cdot \vec{c} < 0 \:\forall \vec{c}\in \adh{C}\), and in particular if we define \(\vec{c}=\vec{a}-\vec{b}\) with \(\vec{a}\in A\) and \(\vec{b}\in B\) we obtain \(\vec{p}\cdot \vec{a} < \vec{p}\cdot \vec{b} \:\forall (\vec{a},\vec{b})\in A\times B\).

*Note*. The latter proof assumes that the interior of the adherence of a set is contained within the set. The property is intuitively clear, but only applies for convex sets. As a counter example: Let \(C=(0,1)\), then \(\adh{C}=[0,1]\) and \(\inte{\adh{C}}=(0,1)\). But if \(C=(0,\frac{1}{2})\cup (\frac{1}{2},1)\) then \(\adh{C}=[0,1]\) and \(\inte{\adh{C}} = (0,1)\), therefore \(\inte{\adh{C}} \nsubseteq C\).