A simple approach to proving the triangle inequality for a given *norm-like* function using convexity.

We walkthrough a cool and possibly less known result connecting convexity and
the triangle inequalities for norms. Using this result, typical proofs of the
triangle inequality for a proposed norm function are significantly simplified.
This exposition is based on (Chapter 3, Robinson 2020)
^{1}.

Normed linear spaces are a natural setting for much applied mathematics and statistics. These are vector spaces, \(V\), endowed with a norm function, \(\lVert \cdot \rVert_{V}\). Intuitively, norms give us a â€śyardstickâ€ť to measure the â€ślengthsâ€ť of individual vectors in the given vector space space. A standard definition of a norm is as follows:

**Definition 1 (Norms in vector spaces) **For a given vector space \(V\), a norm \(\lVert \cdot \rVert_{V}: V \to \mathbb{R}\),
is a function satisfying the following three properties.

**Positive definiteness:**For all \(\mathbf{x} \in V\), if \(\lVert \mathbf{x} \rVert = 0\) then \(\mathbf{x} = \mathbf{0}_{V}\).**Absolute homogeneity:**\(\lVert \lambda \mathbf{x} \rVert = | \lambda | \lVert \mathbf{x} \rVert\), for all \(\mathbf{x} \in V, \lambda \in \mathbb{R}\).**Triangle inequality:**\(\lVert \mathbf{x} + \mathbf{y} \rVert \leq \lVert \mathbf{x} \rVert + \lVert \mathbf{y} \rVert\), for all \(\mathbf{x}, \mathbf{y} \in V\).

*Remark* (Derived properties from Definition 1). We note that a norm, per Definition 1, in fact, **implies** the
following properties:

In Definition 1, we can always replace

**positive definiteness**with the stronger claim, namely that \[\begin{equation} \text{For all $\mathbf{x} \in V$, if $\lVert \mathbf{x} \rVert = 0 \iff \mathbf{x} = \mathbf{0}_{V}$.} \end{equation}\] In short, we want to show that the reverse implication to**positive definiteness**always holds, i.e., \(\mathbf{x} = \mathbf{0}_{V} \implies \lVert \mathbf{x} \rVert = 0\). To prove this observe that using**absolute homogeneity**in Definition 1, we have: \[\begin{equation} \lVert \mathbf{x} \rVert = \lVert \mathbf{0}_{V} \rVert = \lVert 0 (\mathbf{0}_{V}) \rVert = |0| \lVert \mathbf{0}_{V} \rVert = 0 \end{equation}\] As required.We also have that \(\lVert \mathbf{x} \rVert \geq 0\), for all \(\mathbf{x} \in V\). To see this, observe that for all \(\mathbf{x} \in V\) \[\begin{equation} \begin{split} 0 & = \lVert \mathbf{0}_{V} \rVert \quad\text{(by previous part of this remark)}\\ & = \lVert \mathbf{x} + (-\mathbf{x}) \rVert \\ &\leq \lVert \mathbf{x} \rVert + \lVert -\mathbf{x} \rVert \quad\text{(by the triangle inequality)} \\ &= \lVert \mathbf{x} \rVert + \lvert -1 \rvert \lVert \mathbf{x} \rVert \quad\text{(by absolute homogeneity)} \\ &= 2 \lVert \mathbf{x} \rVert \\ \implies \lVert \mathbf{x} \rVert & \geq 0 \end{split} \end{equation}\] In effect this means the co-domain can

*always*be changed from \(\lVert \cdot \rVert_{V}: V \to \mathbb{R}\) to \(\lVert \cdot \rVert_{V}: V \to \mathbb{R}_{\geq 0}\).Since these can always be derived directly from Definition 1, as shown, we can keep Definition 1 in its minimal form as noted here.

These ideas work for

*seminorms*as well, see here for more details.

**Theorem 1 (Characterization of norm triangle inequality) **Let \(N: V \to \mathbb{R_{\geq 0}}\), be a function satisfying the following two
properties^{2}.

**Positive definiteness:**For all \(\mathbf{x} \in V\), if \(N(\mathbf{x}) = 0\) then \(\mathbf{x} = \mathbf{0}_{V}\).**Absolute homogeneity:**\(N(\lambda \mathbf{x}) = | \lambda | N(\mathbf{x})\), for all \(\mathbf{x} \in V, \lambda \in \mathbb{R}\).

We then have that: \[\begin{equation} N(\mathbf{x} + \mathbf{y}) \leq N(\mathbf{x}) + N(\mathbf{y}) \text{, for each } \mathbf{x}, \mathbf{y} \in V \iff \mathbb{B} := \{\mathbf{z} \in V \:|\: N(\mathbf{z}) \leq 1 \} \text{ is convex} \tag{1} \end{equation}\]

In simple terms, the importance of Theorem 1 (as captured by Equation (1)) can be summarized as follows:

Let \(N : V \to [0, \infty)\) be a function satisfying positive definiteness and absolute homogeneity. Then \(N\) satisfies the triangle inequality if and only if the unit ball induced by \(N\), i.e., \(\mathbb{B} := \{\mathbf{z} \in V \:|\: N(\mathbf{z}) \leq 1 \}\), is a convex set.

*Remark*. In Theorem 1, we note the following:

- The function \(N : V \to \mathbb{R}_{\geq 0}\), is a
*norm-like*function, and only becomes a valid norm per Definition 1we establish the triangle inequality, i.e., \(N(\mathbf{x} + \mathbf{y}) \leq N(\mathbf{x}) + N(\mathbf{y})\).*once* - To prove the triangle inequality for \(N : V \to \mathbb{R}_{\geq 0}\), the necessary condition of Theorem 1 to establish is: \[\begin{equation} \mathbb{B} := \{\mathbf{z} \in V \:|\: N(\mathbf{z}) \leq 1 \} \text{ is convex} \tag{2} \end{equation}\] which will imply the triangle inequality for \(N\) - huzzah!
- The nice thing is, proving the convexity of \(\mathbb{B}\) can be
*much easier*to show than trying to prove the triangle inequality property of \(N\) directly, as we will soon see. **Subtle point:**note that here we had to*assume*that the co-domain of \(N\) is non-negative (not \(\mathbb{R}\)), i.e., \(N : V \to \mathbb{R}_{\geq 0}\). This is because in a typical norm, which satisfies the triangle inequality, is always shown to be non-negative (see remark below Definition 1 for more details). Here we*impose*non-negativity of \(N\) as an additional constraint to*establish*the triangle inequality property for \(N\). This is not an issue, since one would always first check the non-negativity of a candidate*norm-like*function \(N\).

Before getting into the details of the proof, letâ€™s just see what Theorem 1 can do! Weâ€™ll consider two related applications taken from (Lemma 3.6, Example 3.13 Robinson 2020), respectively.

**Example 1 (Minkowski inequality in finite dimensions) **Let us consider \((\mathbb{F}^{n}, \mathbb{F})\), where \(\mathbb{F} = \mathbb{R} \text{ or } \mathbb{C}\).
We then define the *norm-like* function \(N_{\ell^{p}}: \mathbb{F}^{n} \to \mathbb{R}_{\geq 0}\):
\[\begin{equation}
N_{\ell^{p}}(\mathbf{x})
:=\left(\sum_{j=1}^{n}\left|x_{j}\right|^{p}\right)^{1 / p}
, \quad 1 \leq p<\infty
\tag{3}
\end{equation}\]

One can show that \(N_{\ell^{p}}\) satisfies positive definiteness and absolute homogeneity. To show that \(N_{\ell^{p}}\) is a norm function we need to prove the triangle inequality. We will use Theorem 1. Let us define \(\mathbb{B} := \{\mathbf{z} \in \mathbb{F}^{n} \:|\: N_{\ell^{p}}(\mathbf{z}) \leq 1 \} = \{\mathbf{z} \in \mathbb{F}^{n} \:|\: N_{\ell^{p}}^{p}(\mathbf{z}) \leq 1 \}\). We now need to show that \(\mathbb{B}\) is convex. We will need to use the fact that for each \(t \in \mathbb{R}, t \mapsto |t|^{p}\) is convex. Let \(\mathbf{x}, \mathbf{y} \in \mathbb{B}\), we then have that for \(\lambda \in [0, 1]\):

\[\begin{equation} \begin{split} N_{\ell^{p}}^{p}(\lambda \mathbf{x} + (1 - \lambda) \mathbf{y}) & = \sum_{j=1}^{n}|\lambda| x_{j}|+(1-\lambda)| y_{j}||^{p} \quad\text{(by definition)} \\ & \leq \sum_{j=1}^{n} \lambda\left|x_{j}\right|^{p}+(1-\lambda)\left|y_{j}\right|^{p} \quad\text{(since $t \mapsto |t|^{p}$ is convex for each $t \in \mathbb{R}$)} \\ & = \lambda \sum_{j=1}^{n} \left|x_{j}\right|^{p} + (1 - \lambda) \sum_{j=1}^{n} \left|y_{j}\right|^{p} \\ & \leq 1 \quad\text{(since $\mathbf{x}, \mathbf{y} \in \mathbb{B}$.)} \end{split} \end{equation}\]

It follows that \(N_{\ell^{p}}(\lambda \mathbf{x} + (1 - \lambda) \mathbf{y}) \leq 1\), and so \(\lambda \mathbf{x} + (1 - \lambda) \mathbf{y} \in \mathbb{B}\), as required \(\blacksquare\).

In fact, since it \(N_{\ell^{p}}\) satisfies the three conditions for a norm per Definition 1 we can now denote it using the conventional \(\ell_{p}\)-norm form, i.e., \(\| \mathbf{x} \|_{\ell^{p}} := N_{\ell^{p}}(\mathbf{x})\)

We can also similarly prove the triangle inequality norms involving integrals efficiently. This is seen in the next example.

**Example 2 (Minkowski inequality in integral form) **Let us consider \((C([0, 1]), \mathbb{R})\). We then define the *norm-like*
function \(N_{L^{p}}: C([0, 1]) \to \mathbb{R}_{\geq 0}\):
\[\begin{equation}
N_{L^{p}}(\mathbf{x})
:=\left(\int_{0}^{1}\left|f(x)\right|^{p}\right)^{1 / p}
, \quad 1 \leq p<\infty
\tag{4}
\end{equation}\]

Let us define \(\mathbb{B} := \{h \in C([0, 1]) \:|\: N_{L^{p}}(h) \leq 1 \} = \{h \in C([0, 1]) \:|\: N_{L^{p}}^{p}(h) \leq 1 \}\). We now need to show that \(\mathbb{B}\) is convex. Let \(f, g \in \mathbb{B}\), we then have that for \(\lambda \in [0, 1]\):

\[\begin{equation} \begin{split} N_{L^{p}}^{p}(\lambda f + (1 - \lambda) g) & = \int_{0}^{1}|\lambda f(x) + (1-\lambda) g(x)|^{p} dx \quad\text{(by definition)} \\ & \leq \int_{0}^{1}\lambda |f(x)|^{p} + (1-\lambda) |g(x)|^{p} dx \quad\text{(since $t \mapsto |t|^{p}$ is convex for each $t \in \mathbb{R}$)} \\ & = \lambda \int_{0}^{1} |f(x)|^{p} dx + (1-\lambda) \int_{0}^{1} |g(x)|^{p} dx \\ & \leq 1 \quad\text{(since $f, g \in \mathbb{B}$.)} \end{split} \end{equation}\]

It follows that \(N_{L^{p}}(\lambda f + (1 - \lambda) g) \leq 1\), and so \(\lambda f + (1 - \lambda) g \in \mathbb{B}\), as required \(\blacksquare\).

Again, we can now denote \(N_{L^{p}}\) using the conventional \(L_{p}\)-norm form, i.e., \(\| f \|_{L^{p}} := N_{L^{p}}(f)\).

We just saw that applying Theorem 1 enabled us to write
**very short proofs** of
**Minkowskiâ€™s inequality**
in \(\mathbb{F}^{n}\) and \(C([0, 1])\).

To appreciate this approach, note that proving Minkowskiâ€™s inequality typically requires
one to first prove
**Youngâ€™s inequality**
and then
**HĂ¶lderâ€™s inequality**.
Moreover these need to be done separately in \(\mathbb{F}^{n}\) and
\(C([0, 1])\). Using Theorem 1 allowed us to achieve
both of these goals using near identical style of proofs đźŽ‰!

Assuming \(N: V \to \mathbb{R_{\geq 0}}\) satisfies the two properties in Theorem 1, we need to prove both implications in Equation (1).

Assume that \(N(\mathbf{x} + \mathbf{y}) \leq N(\mathbf{x}) + N(\mathbf{y})\), for each \(\mathbf{x}, \mathbf{y} \in V\). Let \(\lambda \in [0, 1]\) be arbitrary. We need to show that this implies for each \(\mathbf{x}, \mathbf{y} \in \mathbb{B}\) that the expression \(\lambda \mathbf{x} + (1 - \lambda) \mathbf{y} \in \mathbb{B}\) holds. This implies the convexity of \(\mathbb{B}\).

*Proof*. (\(\implies\)) We proceed directly.

Assume that \(N(\mathbf{x} + \mathbf{y}) \leq N(\mathbf{x}) + N(\mathbf{y})\), for each \(\mathbf{x}, \mathbf{y} \in V\). Let \(\lambda \in [0, 1]\) be arbitrary. We need to show that this implies for each \(\mathbf{x}, \mathbf{y} \in \mathbb{B}\) that the expression \(\lambda \mathbf{x} + (1 - \lambda) \mathbf{y} \in \mathbb{B}\) holds. This implies the convexity of \(\mathbb{B}\).

We observe that for \(\lambda \in \{0, 1\}\) our required expression is equal to either \(\mathbf{x}\) or \(\mathbf{y}\) which are both in \(\mathbb{B}\), by assumption. Now fix \(\lambda \in (0, 1)\). We then note:

\[\begin{equation} \begin{split} N(\lambda \mathbf{x} + (1 - \lambda) \mathbf{y}) & \leq N(\lambda \mathbf{x}) + N((1 - \lambda) \mathbf{y}) \quad\text{($N$ satisfies triangle inequality)} \\ & = \lvert \lambda \rvert N(\mathbf{x}) + \lvert 1 - \lambda \rvert N(\mathbf{y}) \quad\text{(by absolute homogeneity of $N$)} \\ & = \lambda N(\mathbf{x}) + 1 - \lambda N(\mathbf{y}) \quad\text{(since $\lambda > 0$)} \\ & \leq (\lambda) (1) + (1 - \lambda) (1) \quad\text{(since $N(\mathbf{z}) \leq 1$, for $\mathbf{z} \in \mathbb{B}$)} \\ & = 1 \\ \implies \lambda \mathbf{x} + (1 - \lambda) \mathbf{y} & \in \mathbb{B} \end{split} \end{equation}\]

Which implies the convexity of \(\mathbb{B}\), as required. \(\blacksquare\)

Assume \(\mathbb{B}\) is a convex set. We need to show that this implies that \(N(\mathbf{x} + \mathbf{y}) \leq N(\mathbf{x}) + N(\mathbf{y})\), for each \(\mathbf{x}, \mathbf{y} \in V\).

*Proof*. (\(\impliedby\)) We proceed directly.

Assume \(\mathbb{B}\) is a convex set. We need to show that this implies that \(N(\mathbf{x} + \mathbf{y}) \leq N(\mathbf{x}) + N(\mathbf{y})\), for each \(\mathbf{x}, \mathbf{y} \in V\).

Let \(\mathbf{x}, \mathbf{y} \in V\). We will consider four cases.

**Case 1:** Let \(\mathbf{x} = \mathbf{y} = \mathbf{0}_{V}\).
Then \(N(\mathbf{x}) = N(\mathbf{y}) = N(\mathbf{0}_{V}) = N(0 \mathbf{0}_{V}) = |0| N(\mathbf{0}_{V})= 0\), by absolute homogeneity of \(N\). Indeed we then
have that \(N(\mathbf{x} + \mathbf{y}) = N(\mathbf{0}_{V}) = 0 = N(\mathbf{x}) + N(\mathbf{y})\),
as required.

**Case 2:** Let \(\mathbf{x} = \mathbf{0}_{V}, \mathbf{y} \in V \setminus \{\mathbf{0}_{V}\}\).
Then \(N(\mathbf{y}) = 0\), and it follows that
\(N(\mathbf{x} + \mathbf{y}) = N(\mathbf{x} + \mathbf{0}_{V}) = N(\mathbf{x}) = N(\mathbf{x}) + 0 = N(\mathbf{x}) + N(\mathbf{y})\), as required.

**Case 3:** Let \(\mathbf{x} \in V \setminus \{\mathbf{0}_{V}\}, \mathbf{y} = \mathbf{0}_{V}\).
Same as **Case 2**, with the roles of \(\mathbf{x}, \mathbf{y}\) reversed.

**Case 4:** Let \(\mathbf{x}, \mathbf{y} \in V \setminus \{\mathbf{0}_{V}\}\). It
then follows that \(N(\mathbf{x}), N(\mathbf{y}) > 0\), since
\(N(\mathbf{z}) \geq 0\), for each \(\mathbf{z} \in V\), and
\(N(\mathbf{z}) = 0 \iff \mathbf{z} = \mathbf{0}_{V}\). Moreover we then have that
\(\lvert N(\mathbf{x}) \rvert = N(\mathbf{x}) > 0\) and
\(\lvert N(\mathbf{y}) \rvert = N(\mathbf{y}) > 0\).
So we can safely divide by these quantities. Let us then define
\(\tilde{\mathbf{x}} := \frac{\mathbf{x}}{N(\mathbf{x})}, \tilde{\mathbf{y}} := \frac{\mathbf{y}}{N(\mathbf{y})}\).
We then have by absolute homogeneity of \(N\) that, \(N(\tilde{\mathbf{x}}) := N\left(\frac{\mathbf{x}}{N(\mathbf{x})}\right) = \left \lvert \frac{1}{N(\mathbf{x})} \right \rvert N(\mathbf{x}) = 1 \implies \tilde{\mathbf{x}} \in \mathbb{B}\). Similarly \(\tilde{\mathbf{y}} \in \mathbb{B}\).
Let us denote \(\lambda := \frac{N(\mathbf{x})}{N(\mathbf{x}) + N(\mathbf{y})} \in (0, 1)\),
and \(\mathbf{z} := \frac{\mathbf{x} + \mathbf{y}}{N(\mathbf{x}) + N(\mathbf{y})}\).
We then have:
\[\begin{equation}
\begin{split}
\mathbf{z}
& := \frac{\mathbf{x} + \mathbf{y}}{N(\mathbf{x}) + N(\mathbf{y})} \\
& = \left(\frac{N(\mathbf{x})}{N(\mathbf{x}) + N(\mathbf{y})}\right)
\left(\frac{\mathbf{x}}{N(\mathbf{x})}\right) +
\left(\frac{N(\mathbf{y})}{N(\mathbf{x}) + N(\mathbf{y})}\right)
\left(\frac{\mathbf{y}}{N(\mathbf{y})}\right) \\
& = \left(\frac{N(\mathbf{x})}{N(\mathbf{x}) + N(\mathbf{y})}\right)
\tilde{\mathbf{x}} +
\left(\frac{N(\mathbf{y})}{N(\mathbf{x}) + N(\mathbf{y})}\right)
\tilde{\mathbf{y}} \\
& = \lambda \tilde{\mathbf{x}} +
(1 - \lambda) \tilde{\mathbf{y}} \\
& \in \mathbb{B}
\end{split}
\end{equation}\]

By the assumed convexity of \(\mathbb{B}\). We then have that \(\mathbf{z} := \frac{\mathbf{x} + \mathbf{y}}{N(\mathbf{x}) + N(\mathbf{y})} \in \mathbb{B} \implies N(\mathbf{z}) \leq 1\). We then observe: \[\begin{equation} \begin{split} N(\mathbf{z}) & \leq 1 \quad\text{(since $\mathbf{z} \in \mathbb{B}$.)} \\ \iff N\left( \frac{\mathbf{x} + \mathbf{y}}{N(\mathbf{x}) + N(\mathbf{y})}\right) & \leq 1 \quad\text{(by definition of $\mathbf{z}$.)} \\ \iff \left\lvert \frac{1}{N(\mathbf{x}) + N(\mathbf{y})}\right\rvert N(\mathbf{x} + \mathbf{y}) & \leq 1 \quad\text{(absolute homogeneity of $N$.)} \\ \iff \frac{1}{N(\mathbf{x}) + N(\mathbf{y})} N(\mathbf{x} + \mathbf{y}) & \leq 1 \quad\text{(since $N(\mathbf{x}), N(\mathbf{y}) > 0$.)} \\ \iff N(\mathbf{x} + \mathbf{y}) & \leq N(\mathbf{x}) + N(\mathbf{y}) \end{split} \end{equation}\]

As required. \(\blacksquare\)

In this article we learned the following about Theorem 1:

- It gives an alternative way to
**characterize**the triangle inequality for*norm-like*functions. - Using this characterization
**we can prove the triangle inequality**for such norm-like functions using the convexity of the unit ball induced by such functions. - This is usually easier since we have lots of
**tools from convex analysis**to help us prove the convexity of \(\mathbb{B}\). - We saw this in action since Theorem 1 enabled us to write
**very short proofs**of**Minkowskiâ€™s inequality**in \(\mathbb{F}^{n}\) and \(C([0, 1])\).

In summary, if you have a norm-like function for which you are trying to establish the triangle inequality, try out Theorem 1 đź’Ż!

We thank Prof.Â James Robinson for providing several technical clarifications on Theorem 1.

We thank Mikhail Popov for
providing this helpful guide
in setting up Context Cards in `Rmd`

files. These Context Cards enable the hover over preview for Wikipedia articles.
We thank
Jewel Johnson for providing
this helpful guide to enable fixed TOC for this article.
We thank Dr.Â Joel Nitta for providing
these instructions
to enable us to switch to the giscus comments system.
Much of these distill site improvements were brought to our attention due to the
excellent distillery site
run by Prof.Â John Paul Helveston.

Robinson, James C. 2020. *An Introduction to Functional Analysis*. Cambridge University Press.

**Note:**The presentation in this post isverbose. The goal is to give lots of intuition of the key result and its usefulness, and ensure that the proofs are rigorous. It is written with an empathetic mindset to newcomers, and to myself for future reference.â†©ď¸Ž*intentionally*We refer to such an \(N: V \to \mathbb{R_{\geq 0}}\) satisfying these properties, as a

*norm-like*function.â†©ď¸Ž

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Text and figures are licensed under Creative Commons Attribution CC BY-ND 4.0. Source code is available at https://github.com/shamindras/ss_personal_distill_blog, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

For attribution, please cite this work as

Shrotriya (2022, Feb. 12). Shamindra Shrotriya: Characterizing norm triangle inequalites via convexity. Retrieved from https://www.shamindras.com/posts/2021-12-31-shrotriya2021normtriconvexity/

BibTeX citation

@misc{shrotriya2022normconvexity, author = {Shrotriya, Shamindra}, title = {Shamindra Shrotriya: Characterizing norm triangle inequalites via convexity}, url = {https://www.shamindras.com/posts/2021-12-31-shrotriya2021normtriconvexity/}, year = {2022} }