<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="../feed.xsl" type="text/xsl"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">

<channel>
<title>Susam's Mathematics Pages</title>
<link>https://susam.net/tag/mathematics.html</link>
<atom:link rel="self" type="application/rss+xml" href="https://susam.net/tag/mathematics.xml"/>
<description>Feed for Susam's Mathematics Pages</description>

<item>
<title>Mar '26 Notes</title>
<link>https://susam.net/26c.html</link>
<guid isPermaLink="false">mtsnt</guid>
<pubDate>Mon, 30 Mar 2026 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<p>
  This is my third set of <a href="tag/notes.html">monthly notes</a>
  for this year.  In these notes, I capture various interesting facts
  and ideas I have stumbled upon during the month.  Like in the last
  two months, I have been learning and exploring algebraic graph
  theory.  The two main books I have been reading are <em>Algebraic
  Graph Theory</em> by Godsil and Royle and <em>Algebraic Graph
  Theory</em>, 2nd ed. by Norman Biggs.  Much of what appears here
  comes from my study of these books as well as my own explorations
  and attempts to distill the ideas.  This post is quite heavy on
  mathematics but there are some non-mathematical, computing-related
  notes towards the end.
</p>
<p>
  The level of exposition is quite uneven throughout these notes.
  After all, they aren't meant to be a polished exposition but rather
  notes I take for myself.  In some places I build concepts from first
  principles, while in others I gloss over details and focus only on
  the main results.
</p>
<p>
  Sometime during the second half of the month, I also
  developed an open-source tool called
  <a href="https://codeberg.org/susam/wander">Wander Console</a> on a
  whim.  It lets anyone with a website host a decentralised web
  console that recommends interesting websites from the 'small web' of
  independent, personal websites.  Check my console
  here: <a href="wander/">wander/</a>.
</p>
<p>
  Although the initial version was ready after just about 1.5 hours of
  development during a break I was taking from studying algebraic
  graph theory, the
  subsequent <a href="https://news.ycombinator.com/item?id=47422759">warm
  reception on Hacker News</a> and a
  <a href="https://codeberg.org/susam/wander/issues/1">growing
  community</a> around it, along with the resulting feature requests
  and bug fixes, ended up taking more time than I had anticipated, at
  the expense of my algebraic graph theory studies.  With a full-time
  job, it becomes difficult to find time for both open source
  development and mathematical studies.  But eventually, I managed to
  return to my studies while making Wander Console improvements only
  occasionally during breaks from my studies.
</p>
<h2 id="contents">Contents<a href="#contents"></a></h2>
<ol>
  <li><a href="#group-theory">Group Theory</a>
    <ol type="a">
      <li><a href="#permutation">Permutation</a></li>
      <li><a href="#group-homomorphism">Group Homomorphism</a></li>
      <li><a href="#group-homomorphism-preserves-identities">Group Homomorphism Preserves Identity</a></li>
      <li><a href="#group-homomorphism-preserves-inverses">Group Homomorphism Preserves Inverses</a></li>
      <li><a href="#image-of-a-group-homomorphism">Image of a Group Homomorphism</a></li>
      <li><a href="#group-monomorphism">Group Monomorphism</a>
        <ol type="i">
          <li><a href="#standard-proof">Standard Proof</a></li>
          <li><a href="#alternate-arrangement">Alternate Proof</a></li>
        </ol>
      </li>
      <li><a href="#permutation-representation">Permutation Representation</a></li>
      <li><a href="#group-action">Group Action</a>
        <ol type="i">
          <li><a href="#why-right-action">Why Right Action?</a></li>
          <li><a href="#group-action-example-1">Example 1</a></li>
          <li><a href="#group-action-example-2">Example 2</a></li>
        </ol>
      </li>
      <li><a href="#group-actions-induce-permutations">Group Actions Induce Permutations</a></li>
      <li><a href="#group-actions-determine-permutation-representations">Group Actions Determine Permutation Representations</a></li>
      <li><a href="#permutation-representations-determine-group-actions">Permutation Representations Determine Group Actions</a></li>
      <li><a href="#bijection-between-group-actions-and-permutation-representations">Bijection Between Group Actions and Permutation Representations</a></li>
      <li><a href="#orbits">Orbits</a></li>
      <li><a href="#stabilisers">Stabilisers</a></li>
      <li><a href="#orbit-stabiliser-theorem">Orbit-Stabiliser Theorem</a></li>
      <li><a href="#faithful-actions">Faithful Actions</a></li>
      <li><a href="#semiregular-actions">Semiregular Actions</a></li>
      <li><a href="#transitive-actions">Transitive Actions</a></li>
      <li><a href="#conjugacy">Conjugacy</a>
        <ol type="i">
          <li><a href="#conjugation-as-group-action">Conjugation as Group Action</a></li>
          <li><a href="#right-conjugation-vs-left-conjugation">Right Conjugation vs Left Conjugation</a></li>
        </ol>
      </li>
      <li><a href="#conjugate-groups">Conjugate Subgroups</a></li>
      <li><a href="#conjugacy-of-stabilisers">Conjugacy of Stabilisers</a></li>
    </ol>
  </li>
  <li><a href="#algeraic-graph-theory">Algebraic Graph Theory</a>
    <ol type="a">
      <li><a href="#stabiliser-index">Stabiliser Index</a></li>
      <li><a href="#strongly-connected-directed-graph">Strongly Connected Directed Graph</a></li>
      <li><a href="#shunting">Shunting</a></li>
      <li><a href="#automorphisms-preserve-successor-relation">Automorphisms Preserve Successor Relation</a></li>
      <li><a href="#test-of-s-arc-transitivity">Test of \( s \)-arc Transitivity</a></li>
      <li><a href="#moore-graphs">Moore Graphs</a></li>
      <li><a href="#generalised-polygons">Generalised Polygons</a></li>
    </ol>
  </li>
  <li><a href="#computing">Computing</a>
    <ol type="a">
      <li><a href="#select-between-lines-inclusive">Select Between Lines, Inclusive</a></li>
      <li><a href="#select-between-lines-exclusive">Select Between Lines, Exclusive</a></li>
      <li><a href="#signing-and-verification-with-ssh-key">Signing and Verification with SSH Key</a></li>
      <li><a href="#block-ip-address-with-nftables">Block IP Address with nftables</a></li>
      <li><a href="#debian-logrotate-setup">Debian Logrotate Setup</a></li>
    </ol>
  </li>
</ol>
<h2 id="group-theory">Group Theory<a href="#group-theory"></a></h2>
<h3 id="permutation">Permutation<a href="#permutation"></a></h3>
<p>
  A <em>permutation</em> of a set \( X \) is a bijection \( X \to X
 .  \)
</p>
<p>
  For example, take \( X = \{ 1, 2, 3, 4, 5, 6 \} \) and define the
  map

  \[
    \pi : X \to X; \; x \mapsto 1 + ((x + 1) \bmod 6).
  \]

  This maps

  \begin{align*}
    1 &amp;\mapsto 3, \\
    2 &amp;\mapsto 4, \\
    3 &amp;\mapsto 5, \\
    4 &amp;\mapsto 6, \\
    5 &amp;\mapsto 1, \\
    6 &amp;\mapsto 2.
  \end{align*}
</p>
<p>
  We can describe permutations more succinctly using cycle notation.
  The cycle notation of a permutation \( \pi \) consists of one or
  more sequences written next to each other such that the sequences
  are pairwise disjoint and \( \alpha \) maps each element in a
  sequence to the next element on its right.  If the sequence is
  finite, then \( \alpha \) maps the final element back to the first
  one.  Any element that does not appear in any sequence is mapped to
  itself.  For example the cycle notation for the above permutation is
  \( (1 3 5)(2 4 6).  \)
</p>
<h3 id="group-homomorphism">Group Homomorphism<a href="#group-homomorphism"></a></h3>
<p>
  A map \( \phi : G \to H \) from a group \( (G, \ast) \) to a group
  \( (H, \cdot) \) is a <em>group homomorphism</em> if, for all \( x, y
  \in G, \)

  \[
    \phi(x \ast y) = \phi(x) \cdot \phi(y).
  \]

  We say that a group homomorphism is a map between groups that
  <em>preserves</em> the group operation.  In other words, a group
  homomorphism <em>sends</em> products in \( G \) to products in \( H
 .  \)  For example, consider the groups \( (\mathbb{Z}, +) \) and \(
  (\mathbb{Z}_3, +).  \)  Then the map

  \[
    \phi : \mathbb{Z} \to \mathbb{Z}_3; \; n \mapsto n \bmod 3
  \]

  is a group homomorphism because

  \[
    \phi(x + y)
    = (x + y) \bmod 3
    = (x \bmod 3) + (y \bmod 3)
    = \phi(x) + \phi(y)
  \]

   for all \( x, y \in \mathbb{Z}.  \)  As another example, consider
   the groups \( (\mathbb{R}_{\gt 0}, \times) \) and \( (\mathbb{R},
   +).  \)  Then the map

  \[
    \log : \mathbb{R}_{\gt 0} \to \mathbb{R}
  \]

  is a group homomorphism because

  \[
    \log(m \times n) = \log m + \log n.
  \]

  Note that a group homomorphism preserves the identity element.  For
  example, \( 1 \) is the identity element of \( (\mathbb{R}_{\gt 0},
  \times) \) and \( 0 \) is the identity element of \( (\mathbb{R}, +)
  \) and indeed \( \log 1 = 0.  \)  Also, a group homomorphism
  preserves inverses.  Indeed \( \log m^{-1} = -\log m \) for all \( m
  \in \mathbb{R}_{\gt 0}.  \)  These observations are proved in the
  next two sections.
</p>
<h3 id="group-homomorphism-preserves-identities">Group Homomorphism Preserves Identity<a href="#group-homomorphism-preserves-identities"></a></h3>
<p>
  Let \( \phi : G \to H \) be a group homomorphism from \( (G, \ast)
  \) to \( (H, \cdot).  \)  Let \( e_1 \) be the identity in \( G \)
  and let \( e_2 \) be the identity in \( H.  \)  Then \( \phi(e_1) =
  e_2.  \)


  The proof is straightforward.  Note first that

  \[
    \phi(e_1) \cdot \phi(e_1)
    = \phi(e_1 \ast e_1)
    = \phi(e_1)
  \]

  Multiplying both sides on the right by \( \phi(e_1)^{-1}, \) we get

  \[
    (\phi(e_1) \cdot \phi(e_1)) \cdot \phi(e_1)^{-1}
    = \phi(e_1) \cdot \phi(e_1)^{-1}.
  \]

  Using the associative and inverse properties of groups, we can
  simplify both sides to get

  \[
    \phi(e_1) = e_2.
  \]
</p>
<h3 id="group-homomorphism-preserves-inverses">Group Homomorphism Preserves Inverses<a href="#group-homomorphism-preserves-inverses"></a></h3>
<p>
  Let \( \phi : G \to H \) be a group homomorphism from \( (G, \ast)
  \) to \( (H, \cdot).  \)  Let \( e_1 \) be the identity in \( G \)
  and let \( e_2 \) be the identity in \( H.  \)  Then for all \( x \in
  G, \) \(\phi(x^{-1}) = (\phi(x))^{-1}.  \)

  The proof of this is straightforward too.  Note that

  \[
    \phi(x) \cdot \phi(x^{-1})
    = \phi(x \ast x^{-1})
    = \phi(e_1)
    = e_2.
  \]

  Thus \( \phi(x^{-1}) \) is an inverse of \( \phi(x), \) so

  \[
    \phi(x^{-1}) = (\phi(x))^{-1}.
  \]

  The image of the inverse of an element is the inverse of the image
  of that element.
</p>
<h3 id="image-of-a-group-homomorphism">Image of a Group Homomorphism<a href="#image-of-a-group-homomorphism"></a></h3>
<p>
  Let \( \phi : G \to H \) be a group homomorphism.  Then the image of
  the \( \phi, \) denoted

  \[
    \phi(G) = \{ \phi(x) : x \in G \}
  \]

  is a subgroup of \( H.  \)  We will prove this now.
</p>
<p>
  Let \( a, b \in \phi(G).  \)  Then \( a = \phi(x) \) and \( b =
  \phi(y) \) for some \( x, y \in G.  \)  Now \( ab = \phi(x)\phi(y) =
  \phi(xy) \in \phi(G).  \)  Therefore \( \phi(G) \) satisfies the
  closure property.
</p>
<p>
  Let \( e_1 \) and \( e_2 \) be the identities in \( G \) and \( H \)
  respectively.  Since a group homomorphism preserves the identity, \(
  \phi(e_1) = e_2.  \)  Hence the identity of \( H \) lies in \(
  \phi(G).  \)
</p>
<p>
  Finally, let \( a \in \phi(G).  \)  Then \( a = \phi(x) \) for some
  \( x \in G.  \)  Then \( a^{-1} = \phi(x)^{-1} = \phi(x^{-1}) \in
  \phi(G).  \)  Therefore \( \phi(G) \) satisfies the inverse property
  as well.  Therefore \( \phi(G) \) is a subgroup of \( H.  \)
</p>
<h3 id="group-monomorphism">Group Monomorphism<a href="#group-monomorphism"></a></h3>
<p>
  A map \( \phi : G \to H \) from a group \( (G, \ast) \) to a group
  \( (H, \cdot) \) is a <em>group monomorphism</em> if \( \phi \) is a
  homomorphism and is injective.  In other words, a homomorphism \(
  \phi \) is called a monomorphism if, for all \( x, y \in G, \)

  \[
  \phi(x) = \phi(y) \implies x = y.
  \]

  Let \( e_1 \) be the identity element of \( G \) and let \( e_2 \)
  be the identity element of \( H.  \)  A useful result in group theory
  states that a homomorphism \( \phi : G \to H \) is a monomorphism if
  and only if its kernel is trivial, i.e.

  \[
    \ker(\phi) = \{ x \in G : \phi(x) = e_2 \} = \{ e_1 \}.
  \]

  Let us prove this now.
</p>
<h4 id="standard-proof">Standard Proof<a href="#standard-proof"></a></h4>
<p>
  Suppose \( \phi : G \to H \) is a monomorphism.  Since a
  homomorphism preserves the identity element, we have \( \phi(e_1) =
  e_2.  \)  Therefore

  \[
    e_1 \in \ker(\phi).
  \]

  Let \( x \in \ker(\phi).  \)  Then \( \phi(x) = e_2 = \phi(e_1).  \)
  Since \( \phi \) is injective, \( x = e_1.  \)  Therefore

  \[
    \ker(\phi) = \{ e_1 \}.
  \]

  Conversely, suppose \( \ker(\phi) = \{ e_1 \}.  \)  Let \( x, y \in G
  \) such that \( \phi(x) = \phi(y).  \)  Then

  \[
    \phi(x \ast y^{-1})
    = \phi(x) \cdot \phi(y^{-1})
    = \phi(x) \cdot (\phi(y))^{-1}
    = \phi(y) \cdot (\phi(y))^{-1}
    = e_2.
  \]

  Hence

  \[
    x \ast y^{-1} \in \ker(\phi) = \{ e_1 \},
  \]

  so

  \[
    x \ast y^{-1} = e_1.
  \]

  Multiplying both sides on the right by \( y, \) we obtain

  \[
    x = y.
  \]

  This completes the proof.
</p>
<h4 id="alternate-arrangement">Alternate Proof<a href="#alternate-arrangement"></a></h4>
<p>
  Here I briefly discuss an alternate way to think about the above
  proof.  The above proof is how most texts usually present these
  arguments.  In particular, the proof of injectivity typically
  proceeds by showing that equal images imply equal preimages.  It's a
  standard proof technique.  When I think about these proofs, however,
  the contrapositive argument feels more intuitive to me.  I prefer to
  think about how unequal preimages must have unequal images.
  Mathematically, there is no difference at all but the contrapositive
  argument has always felt the most natural to me.  Let me briefly
  describe how this proof runs in my mind when I think about it more
  intuitively.
</p>
<p>
  Suppose \( \phi \) is a monomophorism.  Since a homomorphism
  preserves the identity element, clearly \( \phi(e_1) = e_2.  \)
  Since \( \phi \) is injective, it cannot map two distinct elements
  of \( G \) to \( e_2.  \)  Thus \( e_1 \) is the only element of \( G
  \) that \( \phi \) maps to \( e_2 \) which means \( \ker(\phi) = \{
  e_1 \}.  \)
</p>
<p>
  To prove the converse, suppose \( \ker(\phi) = \{ e_1 \}.  \)
  Consider distinct elements \( x, y \in G.  \)  Since \( x \ne y, \)
  we have \( x \ast y^{-1} \ne e_1.  \)  Therefore \( x \ast y^{-1}
  \notin \ker(\phi).  \)  Thus \( \phi(x \ast y^{-1}) \ne e_2.  \)
  Since \( \phi \) is a homomorphism,

  \[
    \phi(x \ast y^{-1})
    = \phi(x) \cdot \phi(y^{-1})
    = \phi(x) \cdot \phi(y)^{-1}.
  \]

  Therefore \( \phi(x) \cdot \phi(y)^{-1} \ne e_2 \) which implies

  \[
    \phi(x) \ne \phi(y).
  \]

  This proves that \( \ker(\phi) = \{ e_1 \} \) implies that \( \phi
  \) is injective and thus a monomorphism.
</p>
<h3 id="permutation-representation">Permutation Representation<a href="#permutation-representation"></a></h3>
<p>
  Let \( G \) be a group and \( X \) a set.  Then a homomorphism

  \[
    \phi : G \to \operatorname{Sym}(X)
  \]

  is called a <em>permutation representation</em> of \( G \) on \( X
 .  \)  The homomorphism \( \phi \) maps each \( g \in G \) to a
  permutation of \( X.  \)  We say that each \( g \in G \)
  <em>induces</em> a permutation of \( X.  \)
</p>
<p>
  For example, let \( G = (\mathbb{Z}_3, +) \) and \( X = \{ 0, 1, 2,
  3, 4, 5 \}.  \)  Define the map \( \phi : G \to \operatorname{Sym}(X)
  \) by

  \begin{align*}
    \phi(0) &amp;= (), \\
    \phi(1) &amp;= (024)(135), \\
    \phi(2) &amp;= (042)(153).
  \end{align*}

  It is easy to verify that this is a homomorphism.  Here is one way
  to verify it:

  \begin{align*}
    \phi(0)\phi(1) &amp;= ()(024)(135) = (024)(135) = \phi(0 + 1), \\
    \phi(0)\phi(2) &amp;= ()(042)(153) = (042)(153) = \phi(0 + 2), \\
    &amp;\;\,\vdots \\
    \phi(2)\phi(1) &amp;= (042)(153)(024)(135) = () = \phi(0) = \phi(2 + 1), \\
    \phi(2)\phi(2) &amp;= (042)(153)(042)(153) = (024)(135) = \phi(1) = \phi(2 + 2).
  \end{align*}

  We will meet this homomorphism again in the form of group action \(
  \alpha \) in the next section.
</p>
<h3 id="group-action">Group Action<a href="#group-action"></a></h3>
<p>
  Let \( G \) be a group with identity element \( e.  \)  Let \( X \)
  be a set.  A right action of \( G \) on \( X \) is a map

  \[
    \alpha : X \times G \to X
  \]

  such that

  \begin{align*}
    \alpha(x, e)            &amp;= x, \\
    \alpha(\alpha(x, g), h) &amp;= \alpha(x, gh)
  \end{align*}

  for all \( x \in X \) and all \( g, h \in G.  \)  The two conditions
  above are called the identity and compatibility properties of the
  group action respectively.  Note that in a right action, the product
  \( gh \) is applied left to right: \( g \) acts first and then \( h
  \) acts.  If we denote \( \alpha(x, g) \) as \( x^g, \) then the
  notation for the two conditions can be simplified to \( x^e = x \)
  and \( (x^g)^h = x^{gh} \) for all \( g, h \in G.  \)
</p>
<h4 id="why-right-action">Why Right Action?<a href="#why-right-action"></a></h4>
<p>
  We discuss right group actions here instead of left group actions
  because we want to use the notation \( \alpha(x, g) = x^g, \) which
  is quite convenient while studying permutations and graph
  automorphisms.  It is perfectly possible to use left group actions
  to study permutations as well.  However, we lose the benefit of the
  convenient \( x^g \) notation.  In a left group action, the
  compatibility property is \( \alpha(g, \alpha(h, x)) = \alpha(gh, x)
 , \) so if we were to use the notation \( \alpha(g, x) = x^g, \) the
  compatibility property would look like \( (x^h)^g = x^{gh}.  \)  This
  reverses the order of exponents which can be confusing.  Right group
  actions avoid this notational inconvenience.
</p>
<h4 id="group-action-example-1">Example 1<a href="#group-action-example-1"></a></h4>
<p>
  Let \( G = \mathbb{Z}_3 \) be the group under addition modulo \( 3
 .  \)  Let \( X = \{ 0, 1, 2, 3, 4, 5 \}.  \)  Define an action \(
  \alpha \) of \( G \) on \( X \) by

  \[
    \alpha(x, g) = x^g = (x + 2g) \bmod 6.
  \]

  Each \( g \in G \) acts as a permutation of \( X.  \)  For example,
  the element \( 0 \in \mathbb{Z}_3 \) acts as the identity
  permutation.  The element \( 1 \in \mathbb{Z}_3 \) acts as the
  permutation \( (0 2 4)(1 3 5).  \)  The element \( 2 \in \mathbb{Z}_3
  \) acts as the permutation \( (0 4 2)(1 5 3).  \)  The following
  table shows how each \( g \in G \) permutes \( X.  \)

  \[
    \begin{array}{c|ccc}
      x_{\downarrow} \backslash g_{\rightarrow} &amp; 0 &amp; 1 &amp; 2 \\
      \hline
      0 &amp; 0 &amp; 2 &amp; 4 \\
      1 &amp; 1 &amp; 3 &amp; 5 \\
      2 &amp; 2 &amp; 4 &amp; 0 \\
      3 &amp; 3 &amp; 5 &amp; 1 \\
      4 &amp; 4 &amp; 0 &amp; 2 \\
      5 &amp; 5 &amp; 1 &amp; 3 \\
    \end{array}
  \]

  From the table we see that each \( g \in G \) permutes the elements
  of \( \{ 0, 2, 4 \} \) among themselves.  Similarly, the elements of
  \( \{ 1, 3, 5 \} \) are permuted among themselves.  These sets \(
  \{0, 2, 4 \} \) and \( \{ 1, 3, 5 \} \) are called the
  <em>orbits</em> of the action.  The concept of orbits is formally
  introduced in its <a href="#orbits">own section further below</a>.
</p>
<h4 id="group-action-example-2">Example 2<a href="#group-action-example-2"></a></h4>
<p>
  Now let \( G = \mathbb{Z}_6 \) be the group under addition modulo \(
  6.  \)  Let \( X = \{ 0, 1, \dots, 8 \}.  \)  Define an action \(
  \beta \) of \( G \) on \( X \) by

  \[
    \beta(x, g) = x^g = (x + 3g) \bmod 9.
  \]

  Now the table for the action looks like this:

  \[
    \begin{array}{c|cccccc}
      x_{\downarrow} \backslash g_{\rightarrow} &amp; 0 &amp; 1 &amp; 2 &amp; 3 &amp; 4 &amp; 5 \\
      \hline
      0 &amp; 0 &amp; 3 &amp; 6 &amp; 0 &amp; 3 &amp; 6 \\
      1 &amp; 1 &amp; 4 &amp; 7 &amp; 1 &amp; 4 &amp; 7 \\
      2 &amp; 2 &amp; 5 &amp; 8 &amp; 2 &amp; 5 &amp; 8 \\
      3 &amp; 3 &amp; 6 &amp; 0 &amp; 3 &amp; 6 &amp; 0 \\
      4 &amp; 4 &amp; 7 &amp; 1 &amp; 4 &amp; 7 &amp; 1 \\
      5 &amp; 5 &amp; 8 &amp; 2 &amp; 5 &amp; 8 &amp; 2 \\
      6 &amp; 6 &amp; 0 &amp; 3 &amp; 6 &amp; 0 &amp; 3 \\
      7 &amp; 7 &amp; 1 &amp; 4 &amp; 7 &amp; 1 &amp; 4 \\
      8 &amp; 8 &amp; 2 &amp; 5 &amp; 8 &amp; 2 &amp; 5
    \end{array}
  \]

  This action splits \( X \) into three orbits \( \{ 0, 3, 6 \}, \) \(
  \{ 1, 4, 7 \} \) and \( \{ 2, 5, 8 \}.  \)
</p>
<h3 id="group-actions-induce-permutations">Group Actions Induce Permutations<a href="#group-actions-induce-permutations"></a></h3>
<p>
  Earlier, we saw an example of a group action and observed that each
  element of the group acts as a permutation.  That was not merely a
  coincidence.  It is indeed a general property of group actions.
  Whenever a group \( G \) acts on a set \( X, \) each element \( g
  \in G \) determines a bijection \( X \to X.  \)  In other words,
  every element of \( G \) acts as a permutation of \( X.  \)  Let us
  see why this must be the case.
</p>
<p>
  Consider the group action \( \alpha : X \times G \to X.  \)  Fix \( g
  \in G \) and let \( x \) vary over \( X \) to obtain the map

  \[
    \alpha_g : X \to X; \; x \mapsto \alpha(x, g).
  \]

  We show that \( \alpha_g \) is a bijection.  First we prove
  injectivity.  Let \( e \) be the identity element of \( G.  \)
  Let \( x, y \in X.  \)  Then

  \begin{align*}
    \alpha_g(x) = \alpha_g(y)
    &amp; \implies \alpha(x, g) = \alpha(y, g) \\
    &amp; \implies \alpha(\alpha(x, g), g^{-1}) = \alpha(\alpha(y, g), g^{-1}) \\
    &amp; \implies \alpha(x, gg^{-1}) = \alpha(y, gg^{-1}) \\
    &amp; \implies \alpha(x, e) = \alpha(y, e) \\
    &amp; \implies x = y.
  \end{align*}

  The \( x^g \) notation allows us to write the above proof more
  conveniently as follows:

  \begin{align*}
    \alpha_g(x) = \alpha_g(y)
    &amp; \implies \alpha(x, g) = \alpha(y, g) \\
    &amp; \implies (x^g)^{g^{-1}} = (y^g)^{g^{-1}} \\
    &amp; \implies x^{g g^{-1}} = y^{g g^{-1}} \\
    &amp; \implies x^e = y^e \\
    &amp; \implies x = y.
  \end{align*}

  This completes the proof of injectivity.  Now we prove surjectivity.
  Let \( y \in X.  \)  Take \( x = \alpha(y, g^{-1}).  \)  Then

  \[
    \alpha_g(x)
    = \alpha(x, g)
    = \alpha(\alpha(y, g^{-1}), g)
    = \alpha(y, g^{-1} g)
    = \alpha(y, e)
    = y.
  \]

  Again, if we write \( x = y^{g^{-1}}, \) the above step can be
  written more succinctly as

  \[
    \alpha_g(x) = x^g = (y^{g^{-1}})^g = y^{(g^{-1} g)} = y^e = y.
  \]

  Thus every element \( y \in X \) has a preimage in \( X \) under \(
  \alpha_g.  \)  Hence \( \alpha_g \) is surjective.  Since we have
  already shown that \( \alpha_g \) is injective, we now conclude that
  \( \alpha_g \) is bijective.  Therefore \( \alpha_g \) is a
  permutation of \( X.  \)  Stated symbolically,

  \[
    \alpha_g \in \operatorname{Sym}(X).
  \]

  Note that

  \[
    \alpha_g(x) = \alpha(x, g) = x^g.
  \]

  Thus both \( \alpha_g(x) \) and \( x^g \) serve as convenient
  shorthands for \( \alpha(x, g).  \)
</p>
<h3 id="group-actions-determine-permutation-representations">Group Actions Determine Permutation Representations<a href="#group-actions-determine-permutation-representations"></a></h3>
<p>
  We have seen that each group element \( g \in G \) induces (acts as)
  a permutation of \( X.  \)  Precisely speaking, each \( g \in G \)
  determines a permutation \( \alpha_g \) of \( X.  \)  Now define a
  map

  \[
    \phi: G \to \operatorname{Sym}(X); \; g \mapsto \alpha_g.
  \]

  We now show that this map is a homomorphism.  This means that we
  want to show that \( \phi(gh) = \phi(g) \phi(h).  \)  Since \(
  \phi(g), \phi(h) \in \operatorname{Sym}(X), \) the right-hand side
  is a product of permutations of \( X.  \)  We first define the
  product of two permutations \( \pi, \rho : X \to X \) by

  \[
    \pi \rho : X \to X; \; x \mapsto \rho(\pi(x)).
  \]

  In other words, \( \pi \rho = \rho \circ \pi.  \)  Now

  \begin{align*}
    \phi(gh)(x)
    &amp; = \alpha_{gh}(x) \\
    &amp; = \alpha(x, gh) \\
    &amp; = \alpha(\alpha(x, g), h) \\
    &amp; = \alpha_h(\alpha_g(x)) \\
    &amp; = (\alpha_h \circ \alpha_g)(x) \\
    &amp; = (\alpha_g \alpha_h)(x) \\
    &amp; = (\phi(g) \phi(h))(x).
  \end{align*}

  Since the above equality holds for all \( x \in X, \) we conclude
  that

  \[
    \phi(gh) = \phi(g) \phi(h).
  \]

  Hence \( \phi \) is a group homomorphism from \( G \) to \(
  \operatorname{Sym}(X).  \)  Therefore \( \phi \) is a permutation
  representation of \( G \) on \( X.  \)  It maps each group element \(
  g \in G \) to a permutation \( \alpha_g \in \operatorname{Sym}(X).  \)
</p>
<p>
  Note the multiple levels of abstraction here.  The group action \(
  \alpha : X \times G \to X \) determines a permutation representation
  \( \phi : G \to \operatorname{Sym}(X).  \)  Each element \( g \in G
  \) together with the group action \( \alpha \) determines a
  permutation \( \alpha_g : X \to X.  \)
</p>
<p>
  Also note that \( \phi(g)(x) = \alpha_g(x) = \alpha(g, x) = x^g.  \)
  In fact, \( \phi(g) = \alpha_g.  \)
</p>
<h3 id="permutation-representations-determine-group-actions">Permutation Representations Determine Group Actions<a href="#permutation-representations-determine-group-actions"></a></h3>
<p>
  Consider a permutation representation \( \phi : G \to
  \operatorname{Sym}(X).  \)  Define a map

  \[
    \alpha : X \times G \to X; \; (x, g) \mapsto \phi(g)(x).
  \]

  First we verify the identity property of group actions.  Since \(
  \phi \) is a homomorphism, it preserves the identity element.
  Therefore \( \phi(e) \) is the identity permutation.  Hence

  \[
    \alpha(x, e) = \phi(e)(x) = x
  \]

  Now we verify the compatibility property of the action.  For all \(
  g, h \in G \) and \( x \in X, \) we have

  \begin{align*}
    \alpha(\alpha(x, g), h)
    &amp; = \alpha(\phi(g)(x), h) \\
    &amp; = \phi(h)(\phi(g)(x)) \\
    &amp; = (\phi(h) \circ \phi(g))(x) \\
    &amp; = (\phi(g)\phi(h))(x) \\
    &amp; = \phi(gh)(x) \\
    &amp; = \alpha(x, gh).
  \end{align*}

  This completes the proof of the fact that every permutation
  representation determines a group action.
</p>
<h3 id="bijection-between-group-actions-and-permutation-representations">Bijection Between Group Actions and Permutation Representations<a href="#bijection-between-group-actions-and-permutation-representations"></a></h3>
<p>
  There is a bijection between the group actions \( \alpha : X \times
  G \to X \) and permutation representations \( \phi : G \to
  \operatorname{Sym}(X).  \)  We now show that these two constructions
  are inverses of each other.
</p>
<p>
  Given a right action \( \alpha : X \times G \to X, \) define

  \[
    \phi_{\alpha} : G \to \operatorname{Sym}(X)
    \quad \text{by} \quad
    \phi_{\alpha}(g)(x) = \alpha(x, g).
  \]

  Given a permutation representation \( \phi : G \to
  \operatorname{Sym}(X), \) define

  \[
    \alpha_{\phi} : X \times G \to X
    \quad \text{by} \quad
    \alpha_{\phi}(x, g) = \phi(g)(x).
  \]

  We now show that these two constructions undo each other.  Take an
  arbitrary group action \( \alpha : X \times G \to X \) and construct
  the corresponding permutation representation \( \phi_{\alpha}.  \)
  Then take this permutation representation and construct the group
  action \( \alpha_{\phi_{\alpha}}.  \)  But

  \[
    \alpha_{\phi_{\alpha}}(x, g)
    = \phi_{\alpha}(g)(x)
    = \alpha(x, g).
  \]

  Therefore \( \alpha_{\phi_{\alpha}} = \alpha.  \)  Similarly,
  starting with the permutation representation \( \phi, \) we get

  \[
    \phi_{\alpha_{\phi}}(g)(x)
    = \alpha_{\phi}(x, g)
    = \phi(g)(x).
  \]

  Therefore \( \phi_{\alpha_{\phi}} = \phi.  \)  Hence there is a
  bijection between group actions \( \alpha : X \times G \to X \) and
  permutation representations: \( \phi : G \to \operatorname{Sym}(X)
 .  \)  In fact, a group action and the corresponding permutation
  representation contain the same information, namely how the elements
  \( g \in G \) acts as permutations of \( X.  \)  For this reason,
  many advanced texts do not make any distinction between the group
  action and its permutation representation.  They often use them
  interchangeably even though technically they have different domains.
</p>
<h3 id="orbits">Orbits<a href="#orbits"></a></h3>
<p>
  Let \( G \) act on a set \( X.  \)  For an element \( x \in X, \) the
  <em>orbit</em> of \( x \) under the action of \( G \) is the set of
  all elements of \( X \) that can be reached from \( x \) by the
  action of elements of \( G.  \)  Symbolically, the orbit of \( x \)
  is the set

  \[
    x^G = \{ x^g : g \in G \}.
  \]

  In other words, the orbit of \( x \) contains every element of \( X
  \) that \( x \) can be moved to by the group action.  If \( y \in
  x^G, \) then there exists some \( g \in G \) such that \( y = x^g
 .  \)
</p>
<p>
  The orbits of a group action partition the set \( X.  \)  That is,
  every element of \( X \) lies in exactly one orbit and two orbits
  are either identical or disjoint.  Thus the group action decomposes
  the set \( X \) into disjoint subsets (the orbits), each consisting
  of elements that can be transformed into one another by the action
  of \( G.  \)
</p>
<h3 id="stabilisers">Stabilisers<a href="#stabilisers"></a></h3>
<p>
  Let \( G \) be a group acting on a set \( X.  \)  For an element
  \( x \in X, \) the <em>stabiliser</em> of \( x \) is the set

  \[
    G_x = \{ g \in G : x^g = x \}.
  \]

  The stabiliser \( G_x \) consists of all elements of \( G \) that
  fix the element \( x.  \)  The stabiliser \( G_x \) is a subgroup of
  \( G.  \)  Indeed, the identity element \( e \in G \) satisfies \(
  x^e = x, \) so \( e \in G_x.  \)  If \( g, h \in G_x, \) then \(
  x^{gh} = (x^g)^h = x.  \)  If \( g \in G_x, \) then \( x^{g^{-1}} =
  (x^g)^{g^{-1}} = x.  \)
</p>
<p>
  Intuitively, the stabiliser measures how much symmetry of the group
  action leaves the element \( x \) unchanged.  The larger the
  stabiliser, the more elements of \( G \) fix \( x.  \)
</p>
<h3 id="orbit-stabiliser-theorem">Orbit-Stabiliser Theorem<a href="#orbit-stabiliser-theorem"></a></h3>
<p>
  Let \( G \) be a group acting on a set \( X.  \)  The
  orbit-stabiliser theorem states that for any \( x \in X, \)

  \[
    \lvert G_x \rvert \cdot \lvert x^G \rvert = \lvert G \rvert.
  \]

  Stated differently, the index of the stabiliser \( G_x \) in the
  group \( G \) is given by

  \[
    [ G : G_x ]
    = \lvert G_x \backslash G \rvert
    = \lvert G \rvert / \lvert G_x \rvert
    = \lvert x^G \rvert.
  \]

  There is a bijection between the right cosets of \( G_x \) and the
  elements of \( x^G.  \)  Demonstrating this bijection proves the
  above equation.  We will work with right cosets of \( G_x.  \)
  Define

  \[
    \phi : G_x \backslash G \to x^G; \; G_x g \mapsto x^g.
  \]

  We want to show that \( \phi \) is a bijection.  But first we need
  to show that \( \phi \) is well defined.  A coset \( G_x g \in G_x
  \backslash G \) can also be written as

  \[
    G_x g = G_x h
  \]

  for some \( h \in G.  \)  If \( x^g \ne x^h, \) then \( \phi \) would
  not be well defined, since \( \phi \) must assign each coset in \(
  G_x \backslash G \) to exactly one element in the orbit \( x^G \) in
  order to be a function.  This can be shown using the following
  equivalences:

  \begin{align*}
    G_x g = G_x h
    &amp; \iff hg^{-1} \in G_x \\
    &amp; \iff x = x^{h g^{-1}} \\
    &amp; \iff x^g = x^h \\
  \end{align*}

  This proves two things at once.  The fact that

  \[
    G_x g = G_x h \implies x^g = x^h
  \]

  proves that when the same coset is written using two different
  representatives, the image does not change.  Therefore \( \phi \) is
  well defined.  Further

  \[
    x^g = x^h \implies G_x g = G_x h
  \]

  proves that \( \phi \) is injective.  To show that \( \phi \) is
  surjective, let \( y \in x^G.  \)  Then \( y = x^g \) for some \( g
  \in G.  \)  Since \( \phi(G_x g) = x^g, \) we get

  \[
    \phi(G_x g) = y.
  \]

  Thus every element of \( x^G \) is the image of some right coset \(
  G_x g \) under \( \phi.  \)  This completes the proof of a bijection
  between the right cosets of \( G_x \) and the elements of \( x^G.  \)
  Therefore \( \lvert G_x \backslash G \rvert = \lvert x^G \rvert \)
  and hence \( \lvert G \rvert / \lvert G_x \rvert = \lvert x^G \rvert
 , \) which establishes the orbit-stabiliser theorem.
</p>
<h3 id="faithful-actions">Faithful Actions<a href="#faithful-actions"></a></h3>
<p>
  Let \( G \) act on a set \( X.  \)  The action is called
  <em>faithful</em> if distinct elements of \( G \) induce distinct
  permutations of \( X.  \)  In other words, the only element of \( G
  \) that acts as the identity permutation of \( X \) is the identity
  element \( e \in G.  \)  Symbolically, the action is faithful if

  \[
    g \ne e \implies \exists x \in X, \; x^g \ne x.
  \]

  Equivalently,

  \[
    \forall x \in X, \; x^g = x \implies g = e.
  \]

  The action is faithful if the only element of \( G \) that fixes
  every element of \( X \) is the identity, i.e.

  \[
    \bigcap_{x \in X} G_x = \{ e \}.
  \]

  Recall that every group action determines a permutation
  representation \( \phi : G \to \operatorname{Sym}(X).  \)  From this
  point of view, the action is faithful precisely when the permutation
  representation is faithful, that is, when the homomorphism \( \phi
  \) is injective (or equivalently when \( \ker(\phi) = \{ e \} \)).
  In other words, the action is faithful if and only if the associated
  homomorphism \( \phi \) is a monomorphism.
</p>
<h3 id="semiregular-actions">Semiregular Actions<a href="#semiregular-actions"></a></h3>
<p>
  A group action of \( G \) on \( X \) is called <em>semiregular</em>
  if no non-identity element of \( G \) fixes any element of \( X.  \)
  In other words, whenever \( g \ne e, \) the permutation of \( X \)
  induced by \( g \) moves every element of \( X.  \)  Symbolically,

  \[
    g \ne e \implies \forall x \in X, \; x^g \ne x.
  \]

  Equivalently,

  \[
    \exists x \in X, \; x^g = x \implies g = e.
  \]

  The action is semiregular if

  \[
    \forall x \in X, \; G_x = \{ e \}.
  \]

  This is a stronger property than faithfulness.  Faithfulness only
  guarantees that when \( g \ne e, \) the element \( g \) moves at
  least one element of \( X.  \)  But semiregularity guarantees that
  when \( g \ne e, \) the element \( g \) moves every element of \( X
 .  \)  Therefore every semiregular action is faithful, but not every
  faithful action is semiregular.
</p>
<h3 id="transitive-actions">Transitive Actions<a href="#transitive-actions"></a></h3>
<p>
  Let \( G \) act on a set \( X.  \)  The action is called
  <em>transitive</em> if there is only one orbit.  In other words, the
  action is transitive if every element of \( X \) can be reached from
  any other element by the action of some element of \( G.  \)
  Symbolically, the action is transitive if

  \[
    \forall x, y \in X \; \exists g \in G, \; x^g = y.
  \]

  Equivalently, the action is transitive if

  \[
    x^G = X
  \]

  for some (and hence every) \( x \in X.  \)
</p>
<h3 id="conjugacy">Conjugacy<a href="#conjugacy"></a></h3>
<p>
  Let \( G \) be a group.  Let \( x, g \in G.  \)  The element

  \[
    g^{-1} x g
  \]

  is called a <em>conjugate</em> of \( x \) by \( g.  \)  Any element
  \( y \in G \) that can be written as \( g^{-1} x g \) for some \( g
  \in G \) is said to be a conjugate of \( x.  \)  The conjugacy class
  of \( x \) in \( G \) is the set

  \[
    x^G = \{ g^{-1} x g : g \in G \}.
  \]

  In other words, the conjugacy class of \( x \) is the set of all
  elements of \( G \) that are conjugate to \( x.  \)  At first,
  reusing the orbit notation \( x^G \) for the conjugacy class may
  seem like an abuse of notation.  However, we will see in the next
  section that the conjugacy class is precisely the orbit of \( x \)
  under the action of \( G \) on itself by conjugation.  Thus \( x^G
  \) is in fact a natural and accurate notation for the conjugacy
  class.
</p>
<h4 id="conjugation-as-group-action">Conjugation as Group Action<a href="#conjugation-as-group-action"></a></h4>
<p>
  Conjugation can be seen as an action of a group on itself.  Define
  the map

  \[
    \alpha : G \times G \to G; \; (x, g) \mapsto g^{-1} x g.
  \]

  Note that

  \[
    \alpha(x, e) = e^{-1} x e = x
  \]

  and

  \[
    \alpha(\alpha(x, g), h)
    = h^{-1} (g^{-1} x g) h
    = (gh)^{-1} x (gh)
    = \alpha(x, gh).
  \]

  Therefore \( \alpha \) satisfies the two defining properties of a
  right group action.  The conjugacy class \( x^G \) is precisely the
  orbit of \( x \) under the conjugation action.  Therefore the orbits
  of the conjugation action of \( G \) on itself are the conjugacy
  classes of \( G.  \)
</p>
<h4 id="right-conjugation-vs-left-conjugation">Right Conjugation vs Left Conjugation<a href="#right-conjugation-vs-left-conjugation"></a></h4>
<p>
  We observed above that the conjugation action is a right action of a
  group on itself.  Let \( x, g \in G \) and let

  \[
    y = g^{-1} x g.
  \]

  Now let \( h = g^{-1}.  \)  Then we can write the above equation as

  \[
    y = h x h^{-1}.
  \]

  According to the previous section, \( y \) is the conjugate of \( x
  \) by \( g.  \)  However, many texts call \( y \) the conjugate of \(
  x \) by \( h.  \)  Both are valid perspectives.  In both
  perspectives, \( x \) and \( y \) are conjugates of each other.
  Precisely,
</p>
<ul>
  <li>
    In the first perspective, we have \( y = g^{-1} x g \) and we say
    that \( y \) is a conjugate of \( x \) by \( g.  \)  A corollary is
    that \( x \) is a conjugate of \( y \) by \( g^{-1}.  \)
  </li>
  <li>
    In the second perspective, we have \( y = h x h^{-1} \) and we say
    that \( y \) is a conjugate of \( x \) by \( h.  \)  A corollary is
    that \( x \) is a conjugate of \( y \) by \( h^{-1}.  \)
  </li>
</ul>
<p>
  Although in both perspectives, \( x \) and \( y \) are conjugates of
  each other, the group element by which one is conjugated to the
  other is different.  This leads to different group actions as well.
</p>
<p>
  When we say that \( y = g^{-1} x g \) is a conjugate of \( x \) by
  \( g, \) the group action

  \[
    \alpha : G \times G \to G; \; (x, g) \mapsto g^{-1} x g.
  \]

  is a right group action as demonstrated in the previous section.
  But when we say that \( y = h x h^{-1} \) is a conjugate of \( x \)
  by \( h, \) the conjugation action is no longer a right group action
  because the compatibility property is violated:

  \[
    \alpha(\alpha(x, g), h)
    = h ( g x g^{-1} ) h^{-1}
    = (hg) x (hg)^{-1}
    = \alpha(x, hg).
  \]

  We get \( \alpha(x, hg) \) instead of the required \( \alpha(x, gh)
 .  \)  So with the second perspective, the group action is no longer a
  right action.  Instead it is a left action since

  \[
    \alpha(g, \alpha(h, x))
    = g (h x h^{-1}) g^{-1}
    = (gh) x (gh)^{-1}
    = \alpha(gh, x).
  \]

  In this post we will work only with the first perspective because we
  will use right actions throughout.
</p>
<h3 id="conjugate-groups">Conjugate Subgroups<a href="#conjugate-groups"></a></h3>
<p>
  Let \( G \) be a group.  Let \( H \le G.  \)  Define

  \[
    g^{-1} H g = \{ g^{-1} h g : h \in H \}.
  \]

  We say that \( g^{-1} H g \) is a conjugate of \( H \) by \( g.  \)
</p>
<h3 id="conjugacy-of-stabilisers">Conjugacy of Stabilisers<a href="#conjugacy-of-stabilisers"></a></h3>
<p>
  Let \( G \) be a group acting on a set \( X.  \)  Let \( x \in X \)
  and \( g \in G.  \)  Then

  \[
    g^{-1} G_x g = G_{x^g}.
  \]

  That is, \( G_{x^g} \) is a conjugate of \( G_x \) by \( g.  \)  This
  result can be summarised as follows: stabilisers of elements in the
  same orbit are conjugate.  Or more explicitly: the stabiliser of \(
  x^g \) is a conjugate of the stabiliser of \( x \) by \( g.  \)  The
  proof is straightforward.  Let \( h \in G.  \)  Then

  \begin{align*}
    h \in g^{-1} G_x g
    &amp; \iff g^{-1} (g h g^{-1}) g \in g^{-1} G_x g \\
    &amp; \iff g h g^{-1} \in G_x \\
    &amp; \iff x^{g h g^{-1}} = x \\
    &amp; \iff (x^g)^h = x^g \\
    &amp; \iff h \in G_{x^g}.
  \end{align*}

  Therefore \( g^{-1} G_x g = G_{x^g}.  \)
</p>
<h2 id="algeraic-graph-theory">Algebraic Graph Theory<a href="#algeraic-graph-theory"></a></h2>
<h3 id="stabiliser-index">Stabiliser Index<a href="#stabiliser-index"></a></h3>
<p>
  In a vertex-transitive graph \( \Gamma, \) for any \( x \in
  V(\Gamma) \) and all \( y \in V(\Gamma), \) there exists \( g \in G
  \) such that \( x^g = y.  \)  Therefore \( x^G = V(\Gamma).  \)  Thus
  by the <a href="#orbit-stabiliser-theorem">orbit-stabiliser
  theorem</a>,

  \[
    [ G : G_x ]
    = \lvert G_x \backslash G \rvert
    = \lvert x^G \rvert
    = \lvert V(\Gamma) \rvert.
  \]
</p>
<h3 id="strongly-connected-directed-graph">Strongly Connected Directed Graph<a href="#strongly-connected-directed-graph"></a></h3>
<p>
  A <em>path</em> in a directed graph \( \Gamma \) is a sequence of
  vertices \( v_0, \dots, v_r \) of distinct vertices such that \(
  (v_{i - 1}, v_i) \) is an arc of \( \Gamma \) for \( i = 1, \dots, r
 .  \)
</p>
<p>
  A directed graph is <em>strongly connected</em> if for every ordered
  pair of vertices \( (u, v) \) there is a path from \( u \) to \( v
 .  \)
</p>
<h3 id="shunting">Shunting<a href="#shunting"></a></h3>
<p>
  Let \( \alpha = ( \alpha_0, \dots, \alpha_s ) \) and \( \beta = (
  \beta_0, \dots, \beta_s ) \) be two \( s \)-arcs in a graph \(
  \Gamma.  \)  We say that \( \beta \) is a successor of \( \alpha \)
  if \( \beta_i = \alpha_{i + 1} \) for \( 0 \le i \le s - 1.  \)  We
  also say that \( \alpha \) can be <em>shunted</em> onto \( \beta.  \)
</p>
<p>
  In section 4.2 of Godsil and Royle, there is a rather technical
  setup which first defines \( X^{(s)} \) as the directed graph with
  the \( s \)-arcs of a graph \( X \) as its vertices such that \(
  (\alpha, \beta) \) is an arc of \( X^{(s)} \) if and only if \(
  \alpha \) can be shunted onto \( \beta \) in \( X.  \)  Then it goes
  on to show that if \( X \) is a connected graph with a minimum
  degree two and \( X \) is not a cycle, then \( X^{(s)} \) is
  strongly connected for all \( s \ge 0.  \)
</p>
<p>
  That is a very technical way of saying that in a connected graph \(
  X \) that is not a cycle and has a minimum degree two, any \( s
  \)-arc \( \alpha \) can be sent to any \( s \)-arc \( \beta \) by
  repeated shunting.  The proof is quite technical too and pretty
  long, so I'll omit it here.
</p>
<h3 id="automorphisms-preserve-successor-relation">Automorphisms Preserve Successor Relation<a href="#automorphisms-preserve-successor-relation"></a></h3>
<p>
  We will obtain a nifty result here that will prove to be very useful
  in the next section.  Let \( S(\gamma) \) denote the set of all
  successors of the \( s \)-arc \( \gamma \) of a graph.  Let \( g \)
  be an automorphism of the graph.  Then

  \[
    \delta \in S(\gamma) \iff \delta^g \in S(\gamma^g).
  \]

  This follows directly from the fact that automorphisms preserve
  adjacency, so they must preserve successor relation as well.  A
  corollary of this is that for an automorphism \( h, \) we have

  \[
    \delta^{h^{-1}} \in S(\gamma) \iff \delta \in S(\gamma^h).
  \]

  This is the form that will be useful soon.
</p>
<h3 id="test-of-s-arc-transitivity">Test of \( s \)-arc Transitivity<a href="#test-of-s-arc-transitivity"></a></h3>
<p>
  The results in the previous two sections lead to a remarkably simple
  proof of the fact that the Petersen graph is \( 3 \)-arc transitive.
  Let us see how.
</p>
<p>
  Let \( P \) be the Petersen graph whose vertices are the \( 2
  \)-subsets of \( \{ 1, 2, 3, 4, 5 \} \) with adjacency given by
  disjointness of the \( 2 \)-subsets.  Then \( \operatorname{Aut}(P)
  \cong S_5 \) since any permutation of \( \{ 1, 2, 3, 4, 5 \} \)
  induces a permutation of the vertices that preserves disjointness
  and hence adjacency.  We will use the shorthand \( ab \) to
  represent each vertex \( \{ a, b \} \) of \( P.  \)  Consider the \(
  3 \)-arc

  \[
    \alpha = (12, 34, 15, 23).
  \]

  It has exactly two successors, namely

  \[
    \beta_1 = (34, 15, 23, 14), \quad \beta_2 = (34, 15, 23, 45).
  \]

  Let \( g_1 = (13)(245) \) and \( g_2 = (13524).  \)  Then

  \begin{align*}
    \alpha^{g_1}
    &amp; = (12, 34, 15, 23)^{(13)(245)} = (34, 15, 23, 14) = \beta_1, \\
    \alpha^{g_2}
    &amp; = (12, 34, 15, 23)^{(13524)} = (34, 51, 23, 45) = \beta_2.
  \end{align*}

  Let \( H = \langle g_1, g_2 \rangle \le \operatorname{Aut}(P).  \)
  Consider an \( s \)-arc \( \alpha^h \) for some \( h \in H.  \)  Let
  \( \delta \in S(\alpha^h).  \)  Then by the result in the previous
  section, we get

  \[
    \delta^{h^{-1}} \in S(\alpha)
    = \{ \beta_1, \beta_2 \}
    = \{ \alpha^{g_1}, \alpha^{g_2} \}.
  \]

  Therefore

  \[
    \delta \in \{ \alpha^{g_1 h}, \alpha^{g_2 h} \}.
  \]

  Thus

  \[
    \delta \in \alpha^{H}.
  \]

  We started with an \( s \)-arc \( \alpha^h \in \alpha^H \) and
  showed that its successors \( \delta \) also lie in \( \alpha^H.  \)
  Thus the orbit \( \alpha^H \) is closed under taking successors.
</p>
<p>
  Now by the <a href="#shunting">shunting result</a> discussed
  previously, \( \alpha \) can be sent to any \( 3 \)-arc of \( P \)
  by repeated shunting.  Therefore all \( 3 \)-arcs of \( P \) belong
  to \( \alpha^H.  \)  Therefore the automorphisms in \( H \) can send
  any \( 3 \)-arc of \( P \) to any other thus making \( P \) \( 3
  \)-arc transitive.
</p>
<h3 id="moore-graphs">Moore Graphs<a href="#moore-graphs"></a></h3>
<p>
  Graphs with diameter \( d \) and girth \( 2d + 1 \) are known as
  Moore graphs.
</p>
<p>
  There are an infinite number of Moore graphs with diameter \( 1 \)
  since the complete graphs \( K_n, \) where \( n \ge 3, \) have
  diameter \( 1 \) and girth \( 3.  \)
</p>
<p>
  There are three known Moore graphs of diameter \( 2.  \)  They are \(
  C_5, \) \( J(5, 2, 0) \) also known as the Petersen graph and the
  Hoffman-Singleton graph.  They are respectively \( 2 \)-regular, \(
  3 \)-regular and \( 7 \)-regular.  There is a famous result that
  proves that a Moore graph must be \( 2 \)-regular, \( 3 \)-regular,
  \( 7 \)-regular or \( 57 \)-regular.  It is unknown currently
  whether a \( 57 \)-regular Moore graph of diameter \( 2 \) exists.
</p>
<p>
  There are infinitely many Moore graphs of diameter \( d \ge 3 \)
  because the odd cycles \( C_{2d + 1} \) are \( 2 \)-regular graphs
  with diameter \( d \) and girth \( 2d + 1 \) for all \( d \ge 1.  \)
  However, there are no \( k \)-regular Moore graphs for diameter \( d
  \ge 3 \) when \( k \ge 3.  \)
</p>
<h3 id="generalised-polygons">Generalised Polygons<a href="#generalised-polygons"></a></h3>
<p>
  Bipartite graphs with diameter \( d \) and girth \( 2d \) are known
  as generalised polygons.  This is easy to understand.  If we take a
  classical \( d \)-gon and create the incidence graph of its vertices
  and edges, then the incidence graph is the cycle \( C_{2d} \) which
  has diameter \( d \) and girth \( 2d.  \)
</p>
<p>
  The converse is not always true.  For example,
  the <a href="https://en.wikipedia.org/wiki/Heawood_graph">Heawood
  graph</a> which has diameter \( d = 3 \) and girth \( 2d = 6.  \)  It
  is the incidence graph of Fano plane, which is a projective plane
  rather than a classical \( d \)-gon.
</p>
<p>
  Although a generalised polygon is not always the incidence graph of
  a classical polygon, the idea behind the definition comes from a
  simple observation.  If we take a classical \( d \)-gon and form the
  incidence graph of its vertices and edges, we obtain the cycle \(
  C_{2d}.  \)  This graph is bipartite and has diameter \( d \) and
  girth \( 2d.  \)  The definition of a generalised polygon abstracts
  these properties.  Any bipartite graph with diameter \( d \) and
  girth \( 2d \) is called a generalised polygon, even when it is not
  the incidence graph of a classical \( d \)-gon.  In this way the
  definition allows much richer graphs than simple cycles.
</p>
<h2 id="computing">Computing<a href="#computing"></a></h2>
<h3 id="select-between-lines-inclusive">Select Between Lines, Inclusive<a href="#select-between-lines-inclusive"></a></h3>
<p>
  Select text between two lines, including both lines:
</p>
<pre><code>sed '/pattern1/,/pattern2/!d'</code></pre>
<pre><code>sed -n '/pattern1/,/pattern2/p'</code></pre>
<p>
  Here are some examples:
</p>
<pre><samp>$ <kbd>printf 'A\nB\nC\nD\nE\nF\nG\nH\n' | sed '/C/,/F/!d'</kbd>
C
D
E
F
$ <kbd>printf 'A\nB\nC\nD\nE\nF\nG\nH\n' | sed -n '/C/,/F/p'</kbd>
C
D
E
F</samp></pre>
<h3 id="select-between-lines-exclusive">Select Between Lines, Exclusive<a href="#select-between-lines-exclusive"></a></h3>
<p>
  Select text between two lines, excluding both lines:
</p>
<pre><code>sed '/pattern1/,/pattern2/!d; //d'</code></pre>
<p>
  Here is an example usage:
</p>
<pre><samp>$ <kbd>printf 'A\nB\nC\nD\nE\nF\nG\nH\n' | sed '/C/,/F/!d; //d'</kbd>
D
E</samp></pre>
<p>
  The negated command <code>!d</code> deletes everything not matched
  by the 2-address range <code>/C/,/F/</code>, i.e. it deletes
  everything before the line matching <code>/C/</code> as well as
  everything after the line matching <code>/F/</code>.  So we are left
  with only the lines from <code>C</code> to <code>F</code>,
  inclusive.  Finally, <code>//</code> (the empty regular expression)
  reuses the most recently used regular expression.  So
  when <code>/C/,/F/</code> matches <code>C</code>, the
  command <code>//d</code> also matches <code>C</code> and deletes it.
  Similarly, <code>F</code> is deleted too.  That's how we are left
  with the lines between <code>C</code> and <code>F</code>, exclusive.
</p>
<p>
  Here are some excerpts from
  <a href="https://pubs.opengroup.org/onlinepubs/9799919799/utilities/sed.html">POSIX.1-2024</a>
  that help understand the <code>!d</code> and <code>//d</code>
  commands better:
</p>
<blockquote>
  A function can be preceded by a <code>'!'</code> character, in which
  case the function shall be applied if the addresses do not select
  the pattern space.  Zero or more &lt;blank&gt; characters shall be
  accepted before the <code>'!'</code> character.  It is unspecified
  whether &lt;blank&gt; characters can follow the <code>'!'</code>
  character, and conforming applications shall not follow
  the <code>'!'</code> character with &lt;blank&gt; characters.
</blockquote>
<blockquote>
  If an RE is empty (that is, no pattern is specified) <em>sed</em>
  shall behave as if the last RE used in the last command applied
  (either as an address or as part of a substitute command) was
  specified.
</blockquote>
<h3 id="signing-and-verification-with-ssh-key">Signing and Verification with SSH Key<a href="#signing-and-verification-with-ssh-key"></a></h3>
<p>
  Here are some minimal commands to demonstrate how we can sign some
  text using SSH key and then later verify it.
</p>
<pre><code>ssh-keygen -t ed25519 -f key
echo hello &gt; hello.txt
ssh-keygen -Y sign -f key.pub -n file hello.txt
echo "jdoe $(cat key.pub)" &gt; allowed.txt
ssh-keygen -Y verify -f allowed.txt -I jdoe -n file -s hello.txt.sig &lt; hello.txt</code></pre>
<p>
  Here are some examples that demonstrate what the outputs and
  signature file look like:
</p>
<pre><samp>$ <kbd>ssh-keygen -Y sign -f key.pub -n file hello.txt</kbd>
Signing file hello.txt
Write signature to hello.txt.sig</samp></pre>
<pre><samp>$ <kbd>cat hello.txt.sig</kbd>
-----BEGIN SSH SIGNATURE-----
U1NIU0lHAAAAAQAAADMAAAALc3NoLWVkMjU1MTkAAAAgAwP6RnmFVrZO0m/nRIHyvr2S19
itsKegj9p/BZKqP1sAAAAEZmlsZQAAAAAAAAAGc2hhNTEyAAAAUwAAAAtzc2gtZWQyNTUx
OQAAAEB8ylqjCLgInF8DvROnLSm1UUWd0VuLPesI+1NhMrV9BjH5lf0w20kHunJW3qRIjw
Jfs9+q/e47KdlR8wBQaHYD
-----END SSH SIGNATURE-----</samp></pre>
<pre><samp>$ <kbd>ssh-keygen -Y verify -f allowed.txt -I jdoe -n file -s hello.txt.sig &lt; hello.txt</kbd>
Good "file" signature for jdoe with ED25519 key SHA256:9ZJuUJNMy1UXo3AlQy8L7baD3LOfEbgQ30ELIt+8wWc</samp></pre>
<h3 id="block-ip-address-with-nftables">Block IP Address with nftables<a href="#block-ip-address-with-nftables"></a></h3>
<p>
  Here is a sequence of commands to create an nftables rule from
  scratch to block an IP address:
</p>
<pre><samp>$ <kbd>sudo nft list ruleset</kbd>
$ <kbd>sudo nft add table inet filter</kbd>
$ <kbd>sudo nft list ruleset</kbd>
table inet filter {
}
$ <kbd>sudo nft add chain inet filter input { type filter hook input priority 0 \; }</kbd>
$ <kbd>sudo nft list ruleset</kbd>
table inet filter {
        chain input {
                type filter hook input priority filter; policy accept;
        }
}
$ <kbd>sudo nft add rule inet filter input ip saddr 172.236.0.216 drop</kbd>
$ <kbd>sudo nft list ruleset</kbd>
table inet filter {
        chain input {
                type filter hook input priority filter; policy accept;
                ip saddr 172.236.0.216 drop
        }
}</samp></pre>
<p>
  Here is how to undo the above setup step by step:
</p>
<pre><samp>$ <kbd>sudo nft -a list ruleset</kbd>
table inet filter { # handle 1
        chain input { # handle 1
                type filter hook input priority filter; policy accept;
                ip saddr 172.236.0.216 drop # handle 2
        }
}
$ <kbd>sudo nft delete rule inet filter input handle 2</kbd>
$ <kbd>sudo nft list ruleset</kbd>
table inet filter {
        chain input {
                type filter hook input priority filter; policy accept;
        }
}
$ <kbd>sudo nft delete chain inet filter input</kbd>
$ <kbd>sudo nft list ruleset</kbd>
table inet filter {
}
$ <kbd>sudo nft delete table inet filter</kbd>
$ <kbd>sudo nft list ruleset</kbd>
$</samp></pre>
<p>
  Finally, the following command deletes all rules, chains and tables.
  It wipes the entire ruleset, so use it with care.
</p>
<pre><samp>$ <kbd>sudo nft flush ruleset</kbd>
$ <kbd>sudo nft list ruleset</kbd>
$</samp></pre>
<p>
  All outputs above were obtained using nftables v1.1.3 on Debian 13.2
  (Trixie).
</p>
<h3 id="debian-logrotate-setup">Debian Logrotate Setup<a href="#debian-logrotate-setup"></a></h3>
<p>
  Observed on Debian 11.5 (Bullseye) that <code>logrotate</code> is
  set up on it via <code>systemd</code>.  Here are some outputs that
  show what the setup is like:
</p>
<pre><samp>$ <kbd>sudo systemctl status logrotate.service</kbd>
● logrotate.service - Rotate log files
     Loaded: loaded (/lib/systemd/system/logrotate.service; static)
     Active: inactive (dead) since Mon 2026-03-30 00:00:17 UTC; 19h ago
TriggeredBy: <span class="c2">●</span> logrotate.timer
       Docs: man:logrotate(8)
             man:logrotate.conf(5)
    Process: 2148235 ExecStart=/usr/sbin/logrotate /etc/logrotate.conf (code=exited, status=0/SUCCESS)
   Main PID: 2148235 (code=exited, status=0/SUCCESS)
        CPU: 574ms

Mar 30 00:00:16 spweb systemd[1]: Starting Rotate log files...
Mar 30 00:00:17 spweb systemd[1]: logrotate.service: Succeeded.
Mar 30 00:00:17 spweb systemd[1]: Finished Rotate log files.
$ <kbd>sudo systemctl status logrotate.timer</kbd>
● logrotate.timer - Daily rotation of log files
     Loaded: loaded (/lib/systemd/system/logrotate.timer; enabled; vendor preset: enabled)
     Active: active (waiting) since Mon 2026-01-19 19:19:34 UTC; 2 months 9 days ago
    Trigger: Tue 2026-03-31 00:00:00 UTC; 4h 7min left
   Triggers: <span class="c2">●</span> logrotate.service
       Docs: man:logrotate(8)
             man:logrotate.conf(5)

Warning: journal has been rotated since unit was started, output may be incomplete.
$ <kbd>sudo systemctl list-timers logrotate</kbd>
NEXT                        LEFT         LAST                        PASSED  UNIT            ACTIVATES
Tue 2026-03-31 00:00:00 UTC 4h 7min left Mon 2026-03-30 00:00:16 UTC 19h ago logrotate.timer logrotate.service

1 timers listed.
Pass --all to see loaded but inactive timers, too.
$ <kbd>head /lib/systemd/system/logrotate.service</kbd>
[Unit]
Description=Rotate log files
Documentation=man:logrotate(8) man:logrotate.conf(5)
RequiresMountsFor=/var/log
ConditionACPower=true

[Service]
Type=oneshot
ExecStart=/usr/sbin/logrotate /etc/logrotate.conf

$ <kbd>cat /lib/systemd/system/logrotate.timer</kbd>
[Unit]
Description=Daily rotation of log files
Documentation=man:logrotate(8) man:logrotate.conf(5)

[Timer]
OnCalendar=daily
AccuracySec=1h
Persistent=true

[Install]
WantedBy=timers.target
$ <kbd>grep -vE '^#|^$' /etc/logrotate.conf</kbd>
weekly
rotate 4
create
include /etc/logrotate.d
$ <kbd>ls -l /etc/logrotate.d/</kbd>
total 40
-rw-r--r-- 1 root root 120 Aug 21  2022 alternatives
-rw-r--r-- 1 root root 173 Jun 10  2021 apt
-rw-r--r-- 1 root root 130 Oct 14  2019 btmp
-rw-r--r-- 1 root root  82 May 26  2018 certbot
-rw-r--r-- 1 root root 112 Aug 21  2022 dpkg
-rw-r--r-- 1 root root 128 May  4  2021 exim4-base
-rw-r--r-- 1 root root 108 May  4  2021 exim4-paniclog
-rw-r--r-- 1 root root 329 May 29  2021 nginx
-rw-r--r-- 1 root root 374 May 20  2022 rsyslog
lrwxrwxrwx 1 root root  28 Mar 17 01:52 <span class="c3">susam</span> -&gt; /opt/susam.net/etc/logrotate
-rw-r--r-- 1 root root 145 Oct 14  2019 wtmp</samp></pre>
<p>
  To force log rotation right now, execute:
</p>
<pre><code>sudo systemctl start logrotate.service</code></pre>
<!-- ### -->
<p>
  <a href="https://susam.net/26c.html">Read on website</a> |
  <a href="https://susam.net/tag/notes.html">#notes</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a> |
  <a href="https://susam.net/tag/linux.html">#linux</a> |
  <a href="https://susam.net/tag/technology.html">#technology</a>
</p>
]]>
</description>
</item>
<item>
<title>Feb '26 Notes</title>
<link>https://susam.net/26b.html</link>
<guid isPermaLink="false">ntfts</guid>
<pubDate>Fri, 27 Feb 2026 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<p>
  Since last month, I have been collecting brief notes on ideas and
  references that caught my attention during each month but did not
  make it into full articles.  Some of these fragments may eventually
  grow into standalone posts, though most will probably remain as they
  are.  At the very least, this approach allows me to keep a record of
  them.
</p>
<p>
  Most of <a href="26a.html">last month's notes</a> grew out of my
  reading of <em>Algebraic Graph Theory</em> by Godsil and Royle.  I
  am still exploring and learning this subject.  This month, however,
  I dove into another book with the same title but this book is
  written by Norman Biggs.  As a result, many of the notes that follow
  are drawn from Biggs's treatment of the topic.
</p>
<p>
  Since I already had a good understanding of the subject from the
  earlier book, I decided to skip the first fourteen chapters of the
  new book.  I began with Chapter 15, which discusses automorphisms of
  graphs and then moved on to the following chapters on graph
  symmetries.  My main reason for picking up Biggs's book was to
  understand Tutte's well known result that any \( s \)-arc-transitive
  finite cubic graph must satisfy \( s \le 5.  \)  While I did not
  reach that chapter this month, I made substantial progress with the
  book.  I hope to work through the proof of Tutte's theorem next
  month.
</p>
<h2 id="contents">Contents<a href="#contents"></a></h2>
<ol>
  <li><a href="#degree-of-vertices-in-an-orbit">Degree of Vertices in an Orbit</a></li>
  <li><a href="#regular-non-vertex-transitive-graphs">Regular Non-Vertex-Transitive Graphs</a></li>
  <li><a href="#vertex-transitive-but-not-edge-transitive">Vertex-Transitive But Not Edge-Transitive</a></li>
  <li><a href="#edge-transitive-but-not-vertex-transitive">Edge-Transitivex But Not Vertex-Transitive</a></li>
  <li><a href="#bipartiteness-as-a-necessary-condition">Bipartiteness as a Necessary Condition</a></li>
  <li><a href="#graph-with-an-automorphism-group">Graph with an Automorphism Group</a></li>
  <li><a href="#permutation-groups-need-not-be-automorphism-groups">Permutation Groups Need Not Be Automorphism Groups</a></li>
  <li><a href="#symmetric-graphs">Symmetric Graphs</a></li>
</ol>
<h2 id="degree-of-vertices-in-an-orbit">Degree of Vertices in an Orbit<a href="#degree-of-vertices-in-an-orbit"></a></h2>
<p>
  If two vertices of a graph belong to the same orbit, then they have
  the same degree.  In other words, for a graph \( X, \) if \( x, y
  \in V(X) \) and there is an automorphism \( \alpha \) such that \(
  \alpha(x) = y, \) then \( \deg(x) = \deg(y).  \)
</p>
<p>
  The proof is quite straightforward.  Let

  \begin{align*}
    N(x) &amp;= \{ v_1, \dots, v_r \}, \\
    N(y) &amp;= \{ w_1, \dots, w_s \}
  \end{align*}

  represent the neighbours of \( x \) and \( y \) respectively.
  Therefore we have

  \[
    x \sim v_1, \; \dots, \; x \sim v_r.
  \]

  Since an automorphism preserves adjacency, we get

  \[
    \alpha(x) \sim \alpha(v_1), \; \dots, \;
    \alpha(x) \sim \alpha(v_r).
  \]

  Substituting \( \alpha(x) = y, \) we get

  \[
    y \sim \alpha(v_1), \; \dots, \; y \sim \alpha(v_r).
  \]

  Thus

  \[
    \alpha(N(x))
    = \{ \alpha(v_1), \; \dots, \; \alpha(v_r) \}
    \subseteq N(y).
  \]

  A similar argument works in reverse as well.  By the definition of
  automorphism, if \( \alpha \) is an automorphism, so is \(
  \alpha^{-1}.  \)  From the definition of \( N(y) \) above, we have

  \[
    y \sim w_1, \; \dots, \; y \sim w_s.
  \]

  Therefore

  \[
    \alpha^{-1}(y) \sim \alpha^{-1}(w_1), \; \dots, \;
    \alpha^{-1}(y) \sim \alpha^{-1}(w_s).
  \]

  This is equivalent to

  \[
    x \sim \alpha^{-1}(w_1), \; \dots, \; x \sim \alpha^{-1}(w_s).
  \]

  Thus

  \[
    \alpha^{-1}(N(y))
    = \{ \alpha^{-1}(w_1), \; \dots, \; \alpha^{-1}(w_s) \}
    \subseteq N(x)
  \]

  This can be rewritten as

  \[
    \{ \alpha^{-1}(w_1), \; \dots, \; \alpha^{-1}(w_s) \}
    \subseteq \{ v_1, \dots, v_r \}.
  \]

  Therefore

  \[
    N(y)
    = \{ w_1, \dots, w_s \}
    \subseteq \{ \alpha(v_1), \dots, \alpha(v_r) \}
    = \alpha(N(x)).
  \]

  We have shown that \( \alpha(N(x)) \subseteq N(y) \) and \( N(y)
  \subseteq \alpha(N(x)).  \)  Thus

  \[
    \alpha(N(x)) = N(y).
  \]

  Thus

  \[
    \lvert N(y) \rvert = \lvert \alpha(N(x)) \rvert = r.
  \]

  Therefore both \( x \) and \( y \) have \( r \) neighbours each.
  Hence \( \deg(x) = \deg(y).  \)
</p>
<h2 id="regular-non-vertex-transitive-graphs">Regular Non-Vertex-Transitive Graphs<a href="#regular-non-vertex-transitive-graphs"></a></h2>
<p>
  The <a href="https://en.wikipedia.org/wiki/Frucht_graph">Frucht graph</a>
  and the
  <a href="https://en.wikipedia.org/wiki/Folkman_graph">Folkman
  graph</a> are examples of graphs that are \( k \)-regular but not
  vertex-transitive.  In fact, the Folkman graph is a semi-symmetric
  graph, i.e. it is regular and edge-transitive but not
  vertex-transitive.
</p>
<h2 id="vertex-transitive-but-not-edge-transitive">Vertex-Transitive But Not Edge-Transitive<a href="#vertex-transitive-but-not-edge-transitive"></a></h2>
<p>
  The circular ladder graph \( CL_3, \) i.e. the triangular prism
  graph, is vertex-transitive but not edge-transitive.
</p>
<p>
  Every vertex has the same local structure.  Every vertex has degree
  \( 3 \) and it lies on exactly one of the two triangles and it has
  exactly one 'vertical' edge connecting it to the corresponding edge
  on the other triangle.  Any vertex can be sent to any other by an
  automorphism.
</p>
<p>
  Since triangle edges are in a triangle and vertical edges are in no
  triangle, no automorphism can send a triangle edge to a vertical
  edge or vice versa.  Therefore the graph is not edge-transitive.
</p>
<h2 id="edge-transitive-but-not-vertex-transitive">Edge-Transitivex But Not Vertex-Transitive<a href="#edge-transitive-but-not-vertex-transitive"></a></h2>
<p>
  The complete bipartite graphs \( K_{m,n} \) with \( m \ne n \) are
  edge-transitive but not vertex-transitive.
</p>
<p>
  Every edge connects one vertex from the \( m \)-part to one vertex
  from the \( n \)-part.  Any permutation of vertices inside the \( m
  \)-part preserves adjacency.  Similarly, any permutation of vertices
  inside the \( n \)-part preserves adjacency.
</p>
<p>
  Take two arbitrary edges

  \[
    uv, \; u'v' \in E(K_{m,n})
  \]

  where \( u, u' \) are vertices that lie in the \( m \)-part and \(
  v, v' \) are vertices that lie in the \( n \)-part.  Permute
  vertices within the \( m \)-part to send \( u \) to \( u'.  \)
  Similarly, permute vertices within the \( n \)-part to send \( v \)
  to \( v'.  \)  This gives an automorphism that sends the edge \( uv
  \) to \( u'v'.  \)  In this manner we can find an automorphism that
  sends any edge to any other.  Therefore, \( K_{m,n} \) is
  edge-transitive.
</p>
<p>
  However, \( K_{m,n} \) is not vertex-transitive since no
  automorphism can send a vertex in the \( m \)-part to a vertex in
  the \( n \)-part since the vertices in the \( m \)-part have degree
  \( n \) and the vertices in the \( n \)-part have degree \( m.  \)
</p>
<h2 id="bipartiteness-as-a-necessary-condition">Bipartiteness as a Necessary Condition<a href="#bipartiteness-as-a-necessary-condition"></a></h2>
<p>
  If a connected graph is edge-transitive but not vertex-transitive,
  then it must be bipartite.
</p>
<h2 id="graph-with-an-automorphism-group">Graph with an Automorphism Group<a href="#graph-with-an-automorphism-group"></a></h2>
<p>
  In 1938, Frucht proved that for every finite abstract group \( G, \)
  there exists a graph whose automorphism group is isomorphic to \( G
 .  \)
</p>
<p>
  Remarkably, this result remains valid even when we restrict our
  attention to cubic graphs.  That is, for every finite abstract group
  \( G, \) there exists a cubic graph whose automorphism group is
  isomorphic to \( G.  \)  Moreover, the result has been extended to
  graphs satisfying various additional graph-theoretical properties,
  such as \( k \)-connectivity, \( k \)-regularity and prescribed
  chromatic number.
</p>
<h2 id="permutation-groups-need-not-be-automorphism-groups">Permutation Groups Need Not Be Automorphism Groups<a href="#permutation-groups-need-not-be-automorphism-groups"></a></h2>
<p>
  Consider the following specialised version of the problem discussed
  in the previous section: Given a permutation group on a set \( X, \)
  must there exist a graph with vertex set \( X \) whose automorphism
  group is precisely that permutation group?
</p>
<p>
  The answer is no.  Consider the cyclic group \( C_3 \) acting on \(
  X = \{ a, b, c \}.  \)  There is no graph \( \Gamma \) with \(
  V(\Gamma) = X \) and \( \operatorname{Aut}(\Gamma) \cong C_3.  \)  If
  we take \( \Gamma = K_3, \) then \( C_3 \subset S_3 =
  \operatorname{Aut}(K_3) \) but \( C_3 \ne \operatorname{Aut}(K_3)
 .  \)
</p>
<h2 id="symmetric-graphs">Symmetric Graphs<a href="#symmetric-graphs"></a></h2>
<p>
  It is interesting that while we study graph symmetry through
  concepts such as graph automorphisms, vertex-transitivity,
  edge-transitivity, etc. the name <em>symmetric graph</em> is
  reserved for graphs that are \( 1 \)-arc-transitive.  A
  vertex-transitive graph or an edge-transitive graph need not be
  \(1\)-arc-transitive and therefore need not be symmetric.
</p>
<p>
  However, every \( s \)-arc-transitive graph is \(1 \)-arc-transitive
  for \( s \ge 1.  \)  Consequently, every \( s \)-arc-transitive graph
  is symmetric.  Moreover, every distance-transitive graph is also \(
  1 \)-arc-transitive and hence symmetric.
</p>
<p>
  Formally, we say that a graph \( \Gamma \) is \( 1 \)-arc-transitive
  (or equivalently, symmetric) if for all \( 1 \)-arcs \( uv \) and \(
  u'v' \) of \( \Gamma, \) there is an automorphism \( \alpha \in
  \operatorname{Aut}(\Gamma) \) such that \( \alpha(uv) = u'v'.  \)
</p>
<p>
  Stated in more basic terms, we can say that \( \Gamma \) is
  symmetric if for all \( u, v, u', v' \in V(\Gamma) \) satisfying \(
  u \sim v \) and \( u' \sim v', \) there exists \( \alpha \in
  \operatorname{Aut}(\Gamma) \) such that \( \alpha(u) = u' \) and \(
  \alpha(v) = v'.  \)
</p>
<p>
  Switching gears now, we say that \( \Gamma \) is distance-transitive
  if for all \( u, v, u', v' \in V(\Gamma) \) satisfying \( d(u, v) =
  d(u', v'), \) there exists \( \alpha \in \operatorname{Aut}(\Gamma)
  \) such that \( \alpha(u) = u' \) and \( \alpha(v) = v'.  \)  Since
  all \( 1 \)-arcs \( uv \) and \( u'v' \) satisfy \( d(u, v) = d(u',
  v') = 1, \) distance-transitivity implies that there is an
  automorphism that sends \( uv \) to \( u'v'.  \)  Therefore a
  distance-transitive graph is also \( 1 \)-arc-transitive.
</p>
<p>
  To summarise, a graph must possess a certain degree of symmetry in
  order to be called symmetric.  It turns out that merely having a
  non-trivial automorphism group is not sufficient.  Even being
  vertex-transitive or edge-transitive is not enough for a graph to be
  called symmetric.  The graph needs to be at least \( 1
  \)-arc-transitive to be called symmetric.
</p>
<p>
  Another interesting aspect of this terminology is that the property
  of being asymmetric is not the exact opposite of being symmetric.
  For example, a vertex-transitive graph need not be symmetric.
  However, that does not make it asymmetric.  A graph is called
  asymmetric if it has no non-trivial automorphisms, i.e. its
  automorphism group contains only the identity permutation.  Thus, if
  a graph has at least two vertices and is vertex-transitive, it must
  admit a non-trivial automorphism that maps one vertex to another.
  So while such a vertex-transitive may not be symmetric, it isn't
  asymmetric either.
</p>
<!-- ### -->
<p>
  <a href="https://susam.net/26b.html">Read on website</a> |
  <a href="https://susam.net/tag/notes.html">#notes</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a>
</p>
]]>
</description>
</item>
<item>
<title>Jan '26 Notes</title>
<link>https://susam.net/26a.html</link>
<guid isPermaLink="false">ntjts</guid>
<pubDate>Thu, 29 Jan 2026 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<p>
  In these monthly notes, I jot down ideas and references I
  encountered during the month that I did not have time to expand into
  their own posts.  A few of these may later develop into independent
  posts but most of them will likely not.  In any case, this format
  ensures that I record them here.  I spent a significant part of this
  month studying the book <em>Algebraic Graph Theory</em> by Godsil
  and Royle, so many of the notes here are about it.  There are a few
  non-mathematical, technical notes towards the end.
</p>
<h2 id="contents">Contents<a href="#contents"></a></h2>
<ol>
  <li><a href="#cayley-graphs">Cayley Graphs</a></li>
  <li><a href="#vertex-transitive-graphs">Vertex-Transitive Graphs</a></li>
  <li><a href="#arc-transitive-graphs">Arc-Transitive Graphs</a></li>
  <li><a href="#bipartite-graphs-and-cycle-parity">Bipartite Graphs and Cycle Parity</a></li>
  <li><a href="#tutte-theorem">Tutte's Theorem</a></li>
  <li><a href="#tutte-8-cage">Tutte's 8-Cage</a></li>
  <li><a href="#lcg">Linear Congruential Generator</a></li>
  <li><a href="#cat-n">Numbering Lines</a></li>
</ol>
<h2 id="cayley-graphs">Cayley Graphs<a href="#cayley-graphs"></a></h2>
<p>
  Let \( G \) be a group and let \( C \subseteq G \) such that \( C \)
  is closed under taking inverses and does not contain the identity,
  i.e.

  \[
    \forall x \in C, \; x^{-1} \in C, \qquad e \notin C.
  \]

  Then the Cayley graph \( X(G, C) \) is the graph with the vertex set
  \( V(X(G, C)) \) and edge set \( E(X(G, C)) \) defined by

  \begin{align*}
    V(X(G, C)) &amp;= G, \\
    E(X(G, C)) &amp;= \{ gh : hg^{-1} \in C \}.
  \end{align*}

  The set \( C \) is known as the connection set.
</p>
<h2 id="vertex-transitive-graphs">Vertex-Transitive Graphs<a href="#vertex-transitive-graphs"></a></h2>
<p>
  A graph \( X \) is <em>vertex-transitive</em> if its automorphism
  group acts transitively on its set of vertices \( V(X).  \)
  Intuitively, this means that no vertex has a special role.  We can
  'move' the graph around so that any chosen vertex becomes any other
  vertex.  In other words, all vertices are indistinguishable.  The
  graph looks the same from each vertex.
</p>
<p>
  The \( k \)-cube \( Q_k \) is vertex-transitive.  So are the Cayley
  graphs \( X(G, C).  \)  However the path graph \( P_3 \) is not
  vertex-transitive since no automorphism can send the middle vertex
  of valency \( 2 \) to an end vertex of valency \( 1.  \)
</p>
<h2 id="arc-transitive-graphs">Arc-Transitive Graphs<a href="#arc-transitive-graphs"></a></h2>
<p>
  The cube \( Q_3 \) is \( 2 \)-arc-transitive but not \( 3
  \)-arc-transitive.  In \( Q_3, \) a \( 3 \)-arc belonging to a \( 4
  \)-cycle cannot be sent to a \( 3 \)-arc that does not belong to a
  \( 4 \)-cycle.  This is easy to explain.  The end vertices of a \( 3
  \)-arc belonging to a \( 4 \)-cycle are adjacent but the end
  vertices of a \( 3 \)-arc not belonging to a \( 4 \)-cycle are not
  adjacent.  Therefore, no automorphism can map the end vertices of
  the first \( 3 \)-arc to those of the second \( 3 \)-arc.
</p>
<p>
  For intuition, imagine that a traveller stands on a vertex and
  chooses an edge to move along.  They do this \( s \) times thereby
  walking along an arc of length \( s, \) also known as an \( s
  \)-arc.  By the definition of \( s \)-arcs, the traveller is not
  allowed to backtrack from one vertex to the previous one
  immediately.  In an \( s \)-arc-transitive graph, these arcs look
  the same no matter which vertex they start from or which edges they
  choose.  In the cube, this is indeed true for \( s = 2.  \)  All arcs
  of length \( 2 \) are indistinguishable.  No matter which arc of
  length \( 2 \) the traveller has walked along, the graph would look
  the same from their perspective at each vertex along the arc.
  However, this no longer holds good for arcs of length \( 3 \) since
  there are two distinct kinds of arcs of length \( 3.  \)  The first
  kind ends at a distance of \( 1 \) from the starting vertex of the
  arc (when the arc belongs to a \( 4 \)-cycle).  The second kind ends
  at a distance \( 3 \) from the starting vertex of the arc (when the
  arc does not belong to a \( 4 \)-cycle).  Therefore the cube is not
  \( 3 \)-arc-transitive.
</p>
<h2 id="bipartite-graphs-and-cycle-parity">Bipartite Graphs and Cycle Parity<a href="#bipartite-graphs-and-cycle-parity"></a></h2>
<p>
  A graph is bipartite if and only if it contains no cycles of odd
  length.  Equivalently, every cycle in a bipartite graph has even
  length.  Conversely, if every cycle in a graph has even length, then
  the graph is bipartite.
</p>
<h2 id="tutte-theorem">Tutte's Theorem<a href="#tutte-theorem"></a></h2>
<p>
  For any \( s \)-arc-transitive cubic graph, \( s \le 5.  \)  This was
  demonstrated by W. T. Tutte in 1947.  A proof can be found in
  Chapter 18 of <em>Algebraic Graph Theory</em> by Norman Biggs.
</p>
<p>
  In 1973, Richward Weiss established a more general theorem that
  proves that for any \( s \)-arc-transitive graph, \( s \le 7.  \)
  The bound is weaker but it applies to all graphs rather than only to
  cubic ones.
</p>
<h2 id="tutte-8-cage">Tutte's 8-Cage<a href="#tutte-8-cage"></a></h2>
<p>
  The book <em>Algebraic Graph Theory</em> by Godsil and Royle offers
  the following two descriptions of Tutte's 8-cage on 30 vertices:
</p>
<blockquote>
  Take the cube and an additional vertex \( \infty.  \)  In each set of
  four parallel edges, join the midpoint of each pair of opposite
  edges by an edge, then join the midpoint of the two new edges by an
  edge, and finally join the midpoint of this edge to \( \infty.  \)
</blockquote>
<blockquote>
  Construct a bipartite graph \( T \) with the fifteen edges as one
  colour class and the fifteen \( 1 \)-factors as the other, where
  each edge is adjacent to the three \( 1 \)-factors that contain it.
</blockquote>
<p>
  It can be shown that both descriptions construct a cubic bipartite
  graph on \( 30 \) vertices of girth \( 8.  \)  It can be further
  shown that there is a unique cubic bipartite graph on \( 30 \)
  vertices with girth \( 8.  \)  As a result both descriptions above
  construct the same graph.
</p>
<h2 id="lcg">Linear Congruential Generator<a href="#lcg"></a></h2>
<p>
  Here is a simple linear congruential generator (LCG) implementation
  in JavaScript:
</p>
<pre><code>function srand (seed) {
  let x = seed
  return function () {
    x = (1664525 * x + 1013904223) % 4294967296
    return x
  }
}</code></pre>
<p>
  Here is an example usage:
</p>
<pre><samp>&gt; <kbd>const rand = srand(0)</kbd>
undefined
&gt; <kbd>rand()</kbd>
1013904223
&gt; <kbd>rand()</kbd>
1196435762
&gt; <kbd>rand()</kbd>
3519870697</samp></pre>
<h2 id="cat-n">Numbering Lines<a href="#cat-n"></a></h2>
<p>
  Both BSD and GNU <code>cat</code> can number output lines with
  the <code>-n</code> option.  For example:
</p>
<pre><samp>$ <kbd>printf 'foo\nbar\nbaz\n' | cat -n</kbd>
     1  foo
     2  bar
     3  baz</samp></pre>
<p>
  However I have always used <code>nl</code> for this.  For example:
</p>
<pre><samp>$ <kbd>printf 'foo\nbar\nbaz\n' | nl</kbd>
     1  foo
     2  bar
     3  baz</samp></pre>
<p>
  While <code>nl</code> is
  <a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/nl.html">specified
  in POSIX</a>, the <code>cat -n</code> option
  <a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/cat.html">is
  not</a>.
</p>
<!-- ### -->
<p>
  <a href="https://susam.net/26a.html">Read on website</a> |
  <a href="https://susam.net/tag/notes.html">#notes</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a> |
  <a href="https://susam.net/tag/programming.html">#programming</a> |
  <a href="https://susam.net/tag/javascript.html">#javascript</a> |
  <a href="https://susam.net/tag/shell.html">#shell</a>
</p>
]]>
</description>
</item>
<item>
<title>A4 Paper Stories</title>
<link>https://susam.net/a4-paper-stories.html</link>
<guid isPermaLink="false">a4pps</guid>
<pubDate>Tue, 06 Jan 2026 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<p>
  I sometimes resort to a rather common measuring technique that is
  neither fast, nor accurate, nor recommended by any standards body
  and yet it hasn't failed me whenever I have had to use it.  I will
  describe it here, though calling it a technique might be overselling
  it.  Please do not use it for installing kitchen cabinets or
  anything that will stare back at you every day for the next ten
  years.  It involves one tool: a sheet of A4 paper.
</p>
<p>
  Like most sensible people with a reasonable sense of priorities, I
  do not carry a ruler with me wherever I go.  Nevertheless, I often
  find myself needing to measure something at short notice, usually in
  situations where a certain amount of inaccuracy is entirely
  forgivable.  When I cannot easily fetch a ruler, I end up doing what
  many people do and reach for the next best thing, which for me is a
  sheet of A4 paper, available in abundant supply where I live.
</p>
<p>
  From photocopying night-sky charts to serving as a scratch pad for
  working through mathematical proofs, A4 paper has been a trusted
  companion since my childhood days.  I use it often.  If I am
  carrying a bag, there is almost always some A4 paper inside: perhaps
  a printed research paper or a mathematical problem I have worked on
  recently and need to chew on a bit more during my next train ride.
</p>
<h2 id="dimensions">Dimensions<a href="#dimensions"></a></h2>
<p>
  The dimensions of A4 paper are the solution to a simple, elegant
  problem.  Imagine designing a sheet of paper such that, when you cut
  it in half parallel to its shorter side, both halves have exactly
  the same aspect ratio as the original.  In other words, if the
  shorter side has length \( x \) and the longer side has length \( y
 , \) then

  \[
    \frac{y}{x} = \frac{x}{y / 2}
  \]

  which gives us

  \[
    \frac{y}{x} = \sqrt{2}.
  \]

  Test it out.  Suppose we have \( y/x = \sqrt{2}.  \)  We cut the
  paper in half parallel to the shorter side to get two halves, each
  with shorter side \( x' = y / 2 = x \sqrt{2} / 2 = x / \sqrt{2} \)
  and longer side \( y' = x.  \)  Then indeed

  \[
    \frac{y'}{x'}
    = \frac{x}{x / \sqrt{2}}
    = \sqrt{2}.
  \]

  In fact, we can keep cutting the halves like this and we'll keep
  getting even smaller sheets with the aspect ratio \( \sqrt{2} \)
  intact.  To summarise, when a sheet of paper has the aspect ratio \(
  \sqrt{2}, \) bisecting it parallel to the shorter side leaves us
  with two halves that preserve the aspect ratio.  A4 paper has this
  property.
</p>
<p>
  But what are the exact dimensions of A4 and why is it called A4?
  What does 4 mean here?  Like most good answers, this one too begins
  by considering the numbers \( 0 \) and \( 1.  \)  Let me elaborate.
</p>
<p>
  Let us say we want to make a sheet of paper that is \( 1 \,
  \mathrm{m}^2 \) in area and has the aspect-ratio-preserving property
  that we just discussed.  What should its dimensions be?  We want

  \[
    xy = 1 \, \mathrm{m}^2
  \]

  subject to the condition

  \[
    \frac{y}{x} = \sqrt{2}.
  \]

  Solving these two equations gives us

  \[
    x^2 = \frac{1}{\sqrt{2}} \, \mathrm{m}^2
  \]

  from which we obtain

  \[
    x = \frac{1}{\sqrt[4]{2}} \, \mathrm{m}, \quad
    y = \sqrt[4]{2} \, \mathrm{m}.
  \]

  Up to three decimal places, this amounts to

  \[
    x = 0.841 \, \mathrm{m}, \quad
    y = 1.189 \, \mathrm{m}.
  \]

  These are the dimensions of A0 paper.  They are precisely the
  dimensions specified by the ISO standard for it.  It is quite large
  to scribble mathematical solutions on, unless your goal is to make a
  spectacle of yourself and cause your friends and family to reassess
  your sanity.  So we need something smaller that allows us to work in
  peace, without inviting commentary or concerns from passersby.  We
  take the A0 paper of size

  \[
    84.1 \, \mathrm{cm} \times 118.9 \, \mathrm{cm}
  \]

  and bisect it to get A1 paper of size

  \[
    59.4 \, \mathrm{cm} \times 84.1 \, \mathrm{cm}.
  \]

  Then we bisect it again to get A2 paper with dimensions

  \[
    42.0 \, \mathrm{cm} \times 59.4 \, \mathrm{cm}.
  \]

  And once again to get A3 paper with dimensions

  \[
    29.7 \, \mathrm{cm} \times 42.0 \, \mathrm{cm}.
  \]

  And then once again to get A4 paper with dimensions

  \[
    21.0 \, \mathrm{cm} \times 29.7 \, \mathrm{cm}.
  \]

  There we have it.  The dimensions of A4 paper.  These numbers are
  etched in my memory like the multiplication table of \( 1.  \)  We
  can keep going further to get A5, A6, etc.  We could, in theory, go
  all the way up to A\( \infty.  \)  Hold on, I think I hear someone
  heckle.  What's that?  Oh, we can't go all the way to A\( \infty?  \)
  Something about atoms, was it?  Hmm.  Security!  Where's security?
  Ah yes, thank you, sir.  Please show this gentleman out, would you?
</p>
<p>
  Sorry for the interruption, ladies and gentlemen.  Phew!  That
  fellow!  Atoms?  Honestly.  We, the mathematically inclined, are not
  particularly concerned with such trivial limitations.  We drink our
  tea from doughnuts.  We are not going to let the size of atoms
  dictate matters, now are we?
</p>
<p>
  So I was saying that we can bisect our paper like this and go all
  the way to A\( \infty.  \)  That reminds me.  Last night I was at a
  bar in Hoxton and I saw an infinite number of mathematicians walk
  in.  The first one asked, "Sorry to bother you, but would it be
  possible to have a sheet of A0 paper?  I just need something to
  scribble a few equations on."  The second one asked, "If you happen
  to have one spare, could I please have an A1 sheet?"  The third one
  said, "An A2 would be perfectly fine for me, thank you."  Before the
  fourth one could ask, the bartender disappeared into the back for a
  moment and emerged with two sheets of A0 paper and said, "Right.
  That should do it.  Do know your limits and split these between
  yourselves."
</p>
<p>
  In general, a sheet of A\( n \) paper has the dimensions

  \[
    2^{-(2n + 1)/4} \, \mathrm{m} \times
    2^{-(2n - 1)/4} \, \mathrm{m}.
  \]

  If we plug in \( n = 4, \) we indeed get the dimensions of A4 paper:

  \[
    0.210 \, \mathrm{m} \times 0.297 \, \mathrm{m}.
  \]
</p>
<h2 id="measuring-stuff">Measuring Stuff<a href="#measuring-stuff"></a></h2>
<p>
  Let us now return to the business of measuring things.  As I
  mentioned earlier, the dimensions of A4 are lodged firmly into my
  memory.  Getting hold of a sheet of A4 paper is rarely a challenge
  where I live.  I have accumulated a number of A4 paper stories over
  the years.  Let me share a recent one.  I was hanging out with a few
  folks of the nerd variety one afternoon when the conversation
  drifted, as it sometimes does, to a nearby computer monitor that
  happened to be turned off.  At some point, someone confidently
  declared that the screen in front of us was 27 inches.  That sounded
  plausible but we wanted to confirm it.  So I reached for my trusted
  measuring instrument: an A4 sheet of paper.  What followed was
  neither fast, nor especially precise, but it was more than adequate
  for settling the matter at hand.
</p>
<p>
  I lined up the longer edge of the A4 sheet with the width of the
  monitor.  One length.  Then I repositioned it and measured a second
  length.  The screen was still sticking out slightly at the end.  By
  eye, drawing on an entirely unjustified confidence built from years
  of measuring things that never needed measuring, I estimated the
  remaining bit at about \( 1 \, \mathrm{cm}.  \)  That gives us a
  width of

  \[
    29.7 \, \mathrm{cm} +
    29.7 \, \mathrm{cm} +
     1.0 \, \mathrm{cm}
    =
    60.4 \, \mathrm{cm}.
  \]

  Let us round that down to \( 60 \, \mathrm{cm}.  \)  For the height,
  I switched to the shorter edge.  One full \( 21 \, \mathrm{cm} \)
  fit easily.  For the remainder, I folded the paper parallel to the
  shorter side, producing an A5-sized rectangle with dimensions \(
  14.8 \, \mathrm{cm} \times 21.0 \, \mathrm{cm}.  \)  Using the \(
  14.8 \, \mathrm{cm} \) edge, I discovered that it overshot the top
  of the screen slightly.  Again, by eye, I estimated the excess at
  around \( 2 \, \mathrm{cm}.  \)  That gives us

  \[
    21.0 \, \mathrm{cm} +
    14.8 \, \mathrm{cm}
    -2.0 \, \mathrm{cm}
    =
    33.8 \, \mathrm{cm}.
  \]

  Let us round this up to \( 34 \, \mathrm{cm}.  \)  The ratio \( 60 /
  34 \approx 1.76 \) is quite close to \( 16/9, \) a popular aspect
  ratio of modern displays.  At this point the measurements were
  looking good.  So far, the paper had not embarrassed itself.
  Invoking the wisdom of the Pythagoreans, we can now estimate the
  diagonal as

  \[
    \sqrt{(60 \, \mathrm{cm})^2 + (34 \, \mathrm{cm})^2}
    \approx 68.9 \,\mathrm{cm}.
  \]

  Finally, there is the small matter of units.  One inch is \( 2.54 \,
  \mathrm{cm}, \) another figure that has embedded itself in my head.
  Dividing \( 68.9 \) by \( 2.54 \) gives us roughly \( 27.2 \,
  \mathrm{in}.  \)  So yes.  It was indeed a \( 27 \)-inch display.  My
  elaborate exercise in showing off my A4 paper skills was now
  complete.  Nobody said anything.  A few people looked away in
  silence.  I assumed they were reflecting.  I am sure they were
  impressed deep down.  Or perhaps... no, no.  They were definitely
  impressed.  I am sure.
</p>
<p>
  Hold on.  I think I hear another heckle.  What is that?  There are
  mobile phone apps that can measure things now?  Really?  Right.
  Security.  Where's security?
</p>
<!-- ### -->
<p>
  <a href="https://susam.net/a4-paper-stories.html">Read on website</a> |
  <a href="https://susam.net/tag/absurd.html">#absurd</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a>
</p>
]]>
</description>
</item>
<item>
<title>Triangle-Free Cayley Graph</title>
<link>https://susam.net/triangle-free-cayley-graph.html</link>
<guid isPermaLink="false">cgwnt</guid>
<pubDate>Wed, 03 Dec 2025 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<p>
  In this note I elaborate the proof of a claim regarding Cayley
  graphs of symmetric groups with transpositions as generators that I
  found in the book <em>Algebraic Graph Theory</em> by Chris Godsil
  and Gordon Royle.  This claim appears as commentary in Section 3.10
  about <em>Transpositions</em>.  Here I present it in the form of a
  theorem along with a complete proof.
</p>
<p>
  <strong>Theorem.</strong>
  <em>
    If \( \mathcal{T} \) is a set of transpositions, then the Cayley
    graph \( X(\operatorname{Sym}(n), \mathcal{T}) \) has no triangles.
  </em>
</p>
<p>
  <em>Proof.</em>  Suppose the vertices \( a, b, c \in
  \operatorname{Sym}(n) \) form a triangle in the Cayley graph \(
  X(\operatorname{Sym}(n), \mathcal{T}).  \)  Since multiplication by
  \( a^{-1} \) is an automorphism of the Cayley graph (by the proof of
  Theorem 3.1.2 that comes earlier), the vertices \( e, ba^{-1},
  ca^{-1} \) form a triangle too.  Let us label them as \( e, b', c'
  \) respectively.
</p>
<p>
  Now by the definition of a Cayley graph, for any two vertices \( a,
  b \in \operatorname{Sym}(n), \) we have

  \begin{align*}
    a \sim b
    &amp; \iff ba^{-1} \in \mathcal{T} \\
    &amp; \iff ba^{-1} = g \\
    &amp; \iff b = ga
  \end{align*}

  for some \( g \in \mathcal{T}.  \)  Therefore

  \begin{align*}
    e \sim b'  &amp; \iff b' = ge = g, \\
    e \sim c'  &amp; \iff c' = he = h, \\
    b' \sim c' &amp; \iff c' = lb'
  \end{align*}

  for some \( g, h, l \in \mathcal{T}.  \)  Note that the last equality gives

  \[
    l =c'b'^{-1} = hg^{-1} = hg \in \mathcal{T}
  \]

  Therefore \( g, h, hg \in \mathcal{T}.  \)  However, this is
  impossible since the product of two transpositions is \( e, \) a \(
  3 \)-cycle or a product of two disjoint transpositions.  For
  example, \( (12)(12) = e, \) \( (12)(13) = (123) \) and \( (12)(24)
  = (12)(24).  \)  Therefore \( hg \) cannot be a transposition,
  i.e. \( hg \notin \mathcal{T}.  \)  This is a contradiction.
  Therefore the vertices \( a, b, c \) cannot form a triangle.  We
  conclude that the Cayley graph \( X(\operatorname{Sym}(n),
  \mathcal{T}) \) has no triangles.
</p>
<!-- ### -->
<p>
  <a href="https://susam.net/triangle-free-cayley-graph.html">Read on website</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a>
</p>
]]>
</description>
</item>
<item>
<title>Fizz Buzz with Cosines</title>
<link>https://susam.net/fizz-buzz-with-cosines.html</link>
<guid isPermaLink="false">fzbzz</guid>
<pubDate>Thu, 20 Nov 2025 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<p>
  Fizz Buzz is a counting game that has become oddly popular in the
  world of computer programming as a simple test of basic programming
  skills.  The rules of the game are straightforward.  Players say the
  numbers aloud in order beginning with one.  Whenever a number is
  divisible by 3, they say 'Fizz' instead.  If it is divisible by 5,
  they say 'Buzz'.  If it is divisible by both 3 and 5, the player
  says both 'Fizz' and 'Buzz'.  Here is a typical Python program that
  prints this sequence:
</p>
<pre><code>for n in range(1, 101):
    if n % 15 == 0:
        print('FizzBuzz')
    elif n % 3 == 0:
        print('Fizz')
    elif n % 5 == 0:
        print('Buzz')
    else:
        print(n)</code></pre>
<p>
  Here is the output:
  <a href="files/blog/fizz-buzz.txt">fizz-buzz.txt</a>.  Can we make
  the program more complicated?  The words 'Fizz', 'Buzz' and
  'FizzBuzz' repeat in a periodic manner throughout the sequence.
  What else is periodic?  Trigonometric functions!  Perhaps we can use
  trigonometric functions to encode all four rules of the sequence in
  a single closed-form expression.  That is what we are going to
  explore in this article, for fun and no profit.
</p>
<p>
  By the end, we will obtain a discrete Fourier series that can take
  any integer \( n \) and select the corresponding text to be printed.
  In fact, we will derive it using two different methods.  First, we
  will follow a long-winded but hopefully enjoyable approach that
  relies on a basic understanding of complex exponentiation, geometric
  series and trigonometric functions.  Then, we will obtain the same
  result through a direct application of the discrete Fourier
  transform.
</p>
<h2 id="contents">Contents<a href="#contents"></a></h2>
<ul>
  <li><a href="#definitions">Definitions</a>
    <ul>
      <li><a href="#symbol-functions">Symbol Functions</a></li>
      <li><a href="#index-function">Index Function</a></li>
      <li><a href="#fizz-buzz-sequence">Fizz Buzz Sequence</a></li>
    </ul>
  </li>
  <li><a href="#from-indicator-functions-to-cosines">From Indicator Functions to Cosines</a>
    <ul>
      <li><a href="#indicator-functions">Indicator Functions</a></li>
      <li><a href="#complex-exponentials">Complex Exponentials</a></li>
      <li><a href="#cosines">Cosines</a></li>
    </ul>
  </li>
  <li><a href="#dft">Discrete Fourier Transform</a>
    <ul>
      <li><a href="#one-period-of-fizz-buzz">One Period of Fizz Buzz</a></li>
      <li><a href="#fourier-coefficients">Fourier Coefficients</a></li>
      <li><a href="#inverse-transform">Inverse Transform</a></li>
    </ul>
  </li>
  <li><a href="#conclusion">Conclusion</a></li>
</ul>
<h2 id="definitions">Definitions<a href="#definitions"></a></h2>
<p>
  Before going any further, we establish a precise mathematical
  definition for the Fizz Buzz sequence.  We begin by introducing a
  few functions that will help us define the Fizz Buzz sequence later.
</p>
<h3 id="symbol-functions">Symbol Functions<a href="#symbol-functions"></a></h3>
<p>
  We define a set of four functions \( \{ s_0, s_1, s_2, s_3 \} \) for
  integers \( n \) by:

  \begin{align*}
    s_0(n) &amp;= n, \\
    s_1(n) &amp;= \mathtt{Fizz}, \\
    s_2(n) &amp;= \mathtt{Buzz}, \\
    s_3(n) &amp;= \mathtt{FizzBuzz}.
  \end{align*}

  We call these the symbol functions because they produce every term
  that appears in the Fizz Buzz sequence.  The symbol function \( s_0
  \) returns \( n \) itself.  The functions \( s_1, \) \( s_2 \) and
  \( s_3 \) are constant functions that always return the literal
  words \( \mathtt{Fizz}, \) \( \mathtt{Buzz} \) and \(
  \mathtt{FizzBuzz} \) respectively, no matter what the value of \( n
  \) is.
</p>
<h3 id="index-function">Index Function<a href="#index-function"></a></h3>
<p>
  We define a function \( f(n) \) for integer \( n \) by

  \[
    f(n) = \begin{cases}
      1 &amp; \text{if } 3 \mid n \text{ and } 5 \nmid n, \\
      2 &amp; \text{if } 3 \nmid n \text{ and } 5 \mid n, \\
      3 &amp; \text{if } 3 \mid n \text{ and } 5 \mid n, \\
      0 &amp; \text{otherwise}.
    \end{cases}
  \]

  The notation \( m \mid n \) means that the integer \( m \) divides
  the integer \( n, \) i.e. \( n \) is a multiple of \( m.  \)
  Equivalently, there exists an integer \( c \) such that \( n = cm
 .  \)  Similarly, \( m \nmid n \) means that \( m \) does not divide
  \( n, \) i.e. \( n \) is not a multiple of \( m.  \)
</p>
<p>
  This function covers all four conditions involved in choosing the \(
  n \)th item of the Fizz Buzz sequence.  As we will soon see, this
  function tells us which of the four symbol functions produces the \(
  n \)th item of the Fizz Buzz sequence.  For this reason, we call \(
  f(n) \) the index function.
</p>
<h3 id="fizz-buzz-sequence">Fizz Buzz Sequence<a href="#fizz-buzz-sequence"></a></h3>
<p>
  We now define the Fizz Buzz sequence as the sequence

  \[
    (s_{f(n)}(n))_{n = 1}^{\infty}
  \]

  We can expand the first few terms of the sequence explicitly as
  follows:

  \begin{align*}
    (s_{f(n)}(n))_{n = 1}^{\infty}
    &amp;= (s_{f(1)}(1), \; s_{f(2)}(2), \; s_{f(3)}(3), \; s_{f(4)}(4), \;
            s_{f(5)}(5), \; s_{f(6)}(6), \; s_{f(7)}(7), \; \dots) \\
    &amp;= (s_0(1), \; s_0(2), \; s_1(3), \; s_0(4),
            s_2(5), \; s_1(6), \; s_0(7), \; \dots) \\
    &amp;= (1, \; 2, \; \mathtt{Fizz}, \; 4, \;
            \mathtt{Buzz}, \; \mathtt{Fizz}, \; 7, \; \dots).
  \end{align*}

  Note how the function \( f(n) \) produces an index \( i \) which we
  then use to select the symbol function \( s_i(n) \) to produce the
  \( n \)th term of the sequence.  This is precisely why we decided to
  call \( f(n) \) the index function while defining it in the previous
  section.
</p>
<h2 id="from-indicator-functions-to-cosines">From Indicator Functions to Cosines<a href="#from-indicator-functions-to-cosines"></a></h2>
<p>
  Here we discuss the first method of deriving our closed form
  expression, starting with indicator functions and rewriting them
  using complex exponentials and cosines.
</p>
<h3 id="indicator-functions">Indicator Functions<a href="#indicator-functions"></a></h3>
<p>
  Here is the index function \( f(n) \) from the previous section with
  its cases and conditions rearranged to make it easier to spot
  interesting patterns:

  \[
    f(n) = \begin{cases}
      0 &amp; \text{if } 5 \nmid n \text{ and } 3 \nmid n, \\
      1 &amp; \text{if } 5 \nmid n \text{ and } 3 \mid n, \\
      2 &amp; \text{if } 5 \mid n \text{ and } 3 \nmid n, \\
      3 &amp; \text{if } 5 \mid n \text{ and } 3 \mid n.
    \end{cases}
  \]

  This function helps us select another function \( s_{f(n)}(n) \)
  which in turn determines the \( n \)th term of the Fizz Buzz
  sequence.  Our goal now is to replace this piecewise formula with a
  single closed-form expression.  To do so, we first define indicator
  functions \( I_m(n) \) as follows:

  \[
    I_m(n) = \begin{cases}
      1 &amp; \text{if } m \mid n, \\
      0 &amp; \text{if } m \nmid n.
    \end{cases}
  \]

  The formula for \( f(n) \) can now be written as:

  \[
    f(n) = \begin{cases}
      0 &amp; \text{if } I_5(n) = 0 \text{ and } I_3(n) = 0, \\
      1 &amp; \text{if } I_5(n) = 0 \text{ and } I_3(n) = 1, \\
      2 &amp; \text{if } I_5(n) = 1 \text{ and } I_3(n) = 0, \\
      3 &amp; \text{if } I_5(n) = 1 \text{ and } I_3(n) = 1.
    \end{cases}
  \]

  Do you see a pattern?  Here is the same function written as a table:
</p>
<table class="grid center textcenter">
  <tr>
    <th>\( I_5(n) \)</th>
    <th>\( I_3(n) \)</th>
    <th>\( f(n) \)</th>
  </tr>
  <tr>
    <td>\( 0 \)</td>
    <td>\( 0 \)</td>
    <td>\( 0 \)</td>
  </tr>
  <tr>
    <td>\( 0 \)</td>
    <td>\( 1 \)</td>
    <td>\( 1 \)</td>
  </tr>
  <tr>
    <td>\( 1 \)</td>
    <td>\( 0 \)</td>
    <td>\( 2 \)</td>
  </tr>
  <tr>
    <td>\( 1 \)</td>
    <td>\( 1 \)</td>
    <td>\( 3 \)</td>
  </tr>
</table>
<p>
  Do you see it now?  If we treat the values in the first two columns
  as binary digits and the values in the third column as decimal
  numbers, then in each row the first two columns give the binary
  representation of the number in the third column.  For example, \(
  3_{10} = 11_2 \) and indeed in the last row of the table, we see the
  bits \( 1 \) and \( 1 \) in the first two columns and the number \(
  3 \) in the last column.  In other words, writing the binary digits
  \( I_5(n) \) and \( I_3(n) \) side by side gives us the binary
  representation of \( f(n).  \)  Therefore

  \[
    f(n) = 2 \, I_5(n) + I_3(n).
  \]

  We can now write a small program to demonstrate this formula:
</p>
<pre><code>for n in range(1, 101):
    s = [n, 'Fizz', 'Buzz', 'FizzBuzz']
    i = (n % 3 == 0) + 2 * (n % 5 == 0)
    print(s[i])</code></pre>
<p>
  We can make it even shorter at the cost of some clarity:
</p>
<pre><code>for n in range(1, 101):
    print([n, 'Fizz', 'Buzz', 'FizzBuzz'][(n % 3 == 0) + 2 * (n % 5 == 0)])</code></pre>
<p>
  What we have obtained so far is pretty good.  While there is no
  universal definition of a closed-form expression, I think most
  people would agree that the indicator functions as defined above are
  simple enough to be permitted in a closed-form expression.
</p>
<h3 id="complex-exponentials">Complex Exponentials<a href="#complex-exponentials"></a></h3>
<p>
  In the previous section, we obtained the formula

  \[
    f(n) = I_3(n) + 2 \, I_5(n)
  \]

  which we then used as an index to look up the text to be printed.
  We also argued that this is a pretty good closed-form expression
  already.
</p>
<p>
  However, in the interest of making things more complicated, we must
  ask ourselves: What if we are not allowed to use the indicator
  functions?  What if we must adhere to the commonly accepted meaning
  of a closed-form expression which allows only finite combinations of
  basic operations such as addition, subtraction, multiplication,
  division, integer exponents and roots with integer index as well as
  functions such as exponentials, logarithms and trigonometric
  functions.  It turns out that the above formula can be rewritten
  using only addition, multiplication, division and the cosine
  function.  Let us begin the translation.  Consider the sum

  \[
    S_m(n) = \sum_{k = 0}^{m - 1} e^{i 2 \pi k n / m},
  \]

  where \( i \) is the imaginary unit and \( n \) and \( m \) are
  integers.  This is a geometric series in the complex plane with
  ratio \( r = e^{i 2 \pi n / m}.  \)  If \( n \) is a multiple of \( m
 , \) then \( n = cm \) for some integer \( c \) and we get

  \[
    r
    = e^{i 2 \pi n / m}
    = e^{i 2 \pi c}
    = 1.
  \]

  Therefore, when \( n \) is a multiple of \( m, \) we get

  \[
    S_m(n)
    = \sum_{k = 0}^{m - 1} e^{i 2 \pi k n / m}
    = \sum_{k = 0}^{m - 1} 1^k
    = m.
  \]

  If \( n \) is not a multiple of \( m, \) then \( r \ne 1 \) and the
  geometric series becomes

  \[
    S_m(n)
    = \frac{r^m - 1}{r - 1}
    = \frac{e^{i 2 \pi n} - 1}{e^{i 2 \pi n / m} - 1}
    = 0.
  \]

  Therefore,

  \[
    S_m(n) = \begin{cases}
      m &amp; \text{if } m \mid n, \\
      0 &amp; \text{if } m \nmid n.
    \end{cases}
  \]

  Dividing both sides by \( m, \) we get

  \[
    \frac{S_m(n)}{m} = \begin{cases}
      1 &amp; \text{if } m \mid n, \\
      0 &amp; \text{if } m \nmid n.
    \end{cases}
  \]

  But the right-hand side is \( I_m(n).  \)  Therefore

  \[
    I_m(n)
    = \frac{S_m(n)}{m}
    = \frac{1}{m} \sum_{k = 0}^{m - 1} e^{i 2 \pi k n / m}.
  \]
</p>
<h3 id="cosines">Cosines<a href="#cosines"></a></h3>
<p>
  We begin with Euler's formula

  \[
    e^{i x} = \cos x + i \sin x
  \]

  where \( x \) is a real number.  From this formula, we get

  \[
    e^{i x} + e^{-i x} = 2 \cos x.
  \]

  Therefore

  \begin{align*}
    I_3(n)
    &amp;= \frac{1}{3} \sum_{k = 0}^2 e^{i 2 \pi k n / 3} \\
    &amp;= \frac{1}{3} \left( 1 + e^{i 2 \pi n / 3} +
                                  e^{i 4 \pi n / 3} \right) \\
    &amp;= \frac{1}{3} \left( 1 + e^{i 2 \pi n / 3} +
                                  e^{-i 2 \pi n / 3} \right) \\
    &amp;= \frac{1}{3} + \frac{2}{3} \cos \left( \frac{2 \pi n}{3} \right).
  \end{align*}

  The third equality above follows from the fact that \( e^{i 4 \pi n
  / 3} = e^{i 6 \pi n / 3} e^{-i 2 \pi n / 3} = e^{i 2 \pi n} e^{-i 2
  \pi n/3} = e^{-i 2 \pi n / 3} \) when \( n \) is an integer.
</p>
<p>
  The function above is defined for integer values of \( n \) but we
  can extend its formula to real \( x \) and plot it to observe its
  shape between integers.  As expected, the function takes the value
  \( 1 \) whenever \( x \) is an integer multiple of \( 3 \) and \( 0
  \) whenever \( x \) is an integer not divisible by \( 3.  \)
</p>
<figure class="soft">
  <img src="files/blog/fizz-buzz-i3.png" alt="Graph">
  <figcaption>
    Graph of \( \frac{1}{3} + \frac{2}{3} \cos \left( \frac{2 \pi x}{3} \right) \)
  </figcaption>
</figure>
<p>
  Similarly,

  \begin{align*}
    I_5(n)
    &amp;= \frac{1}{5} \sum_{k = 0}^4 e^{i 2 \pi k n / 5} \\
    &amp;= \frac{1}{5} \left( 1 + e^{i 2 \pi n / 5}
                                + e^{i 4 \pi n / 5}
                                + e^{i 6 \pi n / 5}
                                + e^{i 8 \pi n / 5} \right) \\
    &amp;= \frac{1}{5} \left( 1 + e^{i 2 \pi n / 5}
                                + e^{i 4 \pi n / 5}
                                + e^{-i 4 \pi n / 5}
                                + e^{-i 2 \pi n / 5} \right) \\
    &amp;= \frac{1}{5} + \frac{2}{5} \cos \left( \frac{2 \pi n}{5} \right)
                       + \frac{2}{5} \cos \left( \frac{4 \pi n}{5} \right).
  \end{align*}

  Extending this expression to real values of \( x \) allows us to
  plot its shape as well.  Once again, the function takes the value \(
  1 \) at integer multiples of \( 5 \) and \( 0 \) at integers not
  divisible by \( 5.  \)
</p>
<figure class="soft">
  <img src="files/blog/fizz-buzz-i5.png" alt="Graph">
  <figcaption>
    Graph of \(
      \frac{1}{5}
      + \frac{2}{5} \cos \left( \frac{2 \pi x}{5} \right)
      + \frac{2}{5} \cos \left( \frac{4 \pi x}{5} \right)
    \)
  </figcaption>
</figure>
<p>
  Recall that we expressed \( f(n) \) as

  \[
    f(n) = I_3(n) + 2 \, I_5(n).
  \]

  Substituting these trigonometric expressions yields

  \[
    f(n)
    = \frac{1}{3}
      + \frac{2}{3} \cos \left( \frac{2 \pi n}{3} \right)
      + 2 \cdot \left(
        \frac{1}{5}
        + \frac{2}{5} \cos \left( \frac{2 \pi n}{5} \right)
        + \frac{2}{5} \cos \left( \frac{4 \pi n}{5} \right)
      \right).
  \]

  A straightforward simplification gives

  \[
    f(n)
    = \frac{11}{15}
      + \frac{2}{3} \cos \left( \frac{2 \pi n}{3} \right)
      + \frac{4}{5} \cos \left( \frac{2 \pi n}{5} \right)
      + \frac{4}{5} \cos \left( \frac{4 \pi n}{5} \right).
  \]

  We can extend this expression to real \( x \) and plot it as well.
  The resulting curve takes the values \( 0, 1, 2 \) and \( 3 \) at
  integer points, as desired.
</p>
<figure class="soft">
  <img src="files/blog/fizz-buzz-f.png" alt="Graph">
  <figcaption>
    Graph of \(
      \frac{11}{15} +
      \frac{2}{3} \cos \left( \frac{2 \pi x}{3} \right) +
      \frac{4}{5} \cos \left( \frac{2 \pi x}{5} \right) +
      \frac{4}{5} \cos \left( \frac{4 \pi x}{5} \right)
    \)
  </figcaption>
</figure>
<p>
  Now we can write our Python program as follows:
</p>
<pre><code>from math import cos, pi
for n in range(1, 101):
    s = [n, 'Fizz', 'Buzz', 'FizzBuzz']
    i = round(11 / 15 + (2 / 3) * cos(2 * pi * n / 3)
                      + (4 / 5) * cos(2 * pi * n / 5)
                      + (4 / 5) * cos(4 * pi * n / 5))
    print(s[i])</code></pre>
<h2 id="dft">Discrete Fourier Transform<a href="#dft"></a></h2>
<p>
  The keen-eyed might notice that the expression we obtained for \(
  f(n) \) is a discrete Fourier series.  This is not surprising, since
  the output of a Fizz Buzz program depends only on \( n \bmod 15.  \)
  Any function on a finite cyclic group can be written exactly as a
  finite Fourier expansion.  In this section, we obtain \( f(n) \)
  using the discrete Fourier transform.  It is worth mentioning that
  the calculations presented here are quite tedious to do by hand.
  Nevertheless, this section offers a glimpse of how such calculations
  are performed.  By the end, we will arrive at exactly the same \(
  f(n) \) as before.  There is nothing new to discover here.  We
  simply obtain the same result by a more direct but more laborious
  method.  If this doesn't sound interesting, you may safely skip the
  subsections that follow.
</p>
<h3 id="one-period-of-fizz-buzz">One Period of Fizz Buzz<a href="#one-period-of-fizz-buzz"></a></h3>
<div style="display: none">\( \gdef\arraystretch{1.2} \)</div>
<p>
  We know that \( f(n) \) is a periodic function with period \( 15.  \)
  To apply the discrete Fourier transform, we look at one complete
  period of the function using the values \( n = 0, 1, \dots, 14.  \)
  Over this period, we have:

  \begin{array}{c|ccccccccccccccc}
      n &amp;  0 &amp;  1 &amp;  2 &amp;  3 &amp;  4
        &amp;  5 &amp;  6 &amp;  7 &amp;  8 &amp;  9
        &amp; 10 &amp; 11 &amp; 12 &amp; 13 &amp; 14 \\
    \hline
    f(n) &amp; 3 &amp;  0 &amp;  0 &amp;  1 &amp;  0
         &amp; 2 &amp;  1 &amp;  0 &amp;  0 &amp;  1
         &amp; 2 &amp;  0 &amp;  1 &amp;  0 &amp;  0
  \end{array}

  The discrete Fourier series of \( f(n) \) is

  \[
    f(n) = \sum_{k = 0}^{14} c_k \, e^{i 2 \pi k n / 15}
  \]

  where the Fourier coefficients \( c_k \) are given by the discrete
  Fourier transform

  \[
    c_k = \frac{1}{15} \sum_{n = 0}^{14} f(n) e^{-i 2 \pi k n / 15}.
  \]

  for \( k = 0, 1, \dots, 14.  \)  The formula for \( c_k \) is called
  the discrete Fourier transform (DFT).  The formula for \( f(n) \) is
  called the inverse discrete Fourier transform (IDFT).
</p>
<h3 id="fourier-coefficients">Fourier Coefficients<a href="#fourier-coefficients"></a></h3>
<p>
  Let \( \omega = e^{-i 2 \pi / 15}.  \)  Then using the values of \(
  f(n) \) from the table above, the DFT becomes:

  \[
    c_k = \frac{3 + \omega^{3k} + 2 \omega^{5k} + \omega^{6k}
                  + \omega^{9k} + 2 \omega^{10k} + \omega^{12k}}{15}.
  \]

  Substituting \( k = 0, 1, 2, \dots, 14 \) into the above equation
  gives us the following Fourier coefficients:

  \begin{align*}
    c_{0}  &amp;= \frac{11}{15}, \\
    c_{3}  &amp;= c_{6} = c_{9} = c_{12} = \frac{2}{5}, \\
    c_{5}  &amp;= c_{10} = \frac{1}{3}, \\
    c_{1}  &amp;= c_{2} = c_{4} = c_{7} = c_{8} = c_{11} = c_{13} = c_{14} = 0.
  \end{align*}

  Calculating these Fourier coefficients by hand can be rather
  tedious.  In practice they are almost always calculated using
  numerical software, computer algebra systems or even simple code
  such as the example here:
  <a href="code/fizz-buzz-fourier/fizz-buzz-fourier.py">fizz-buzz-fourier.py</a>.
</p>
<h3 id="inverse-transform">Inverse Transform<a href="#inverse-transform"></a></h3>
<p>
  Once the coefficients are known, we can substitute them into the
  inverse transform introduced earlier to obtain

  \begin{align*}
    f(n)
    &amp;= \sum_{k = 0}^{14} c_k \, e^{i 2 \pi k n / 15} \\[1.5em]
    &amp;= \frac{11}{15}
           + \frac{2}{5} \left(
             e^{i 2 \pi \cdot 3n / 15}
             + e^{i 2 \pi \cdot 6n / 15}
             + e^{i 2 \pi \cdot 9n / 15}
             + e^{i 2 \pi \cdot 12n / 15}
           \right) \\
           &amp; \phantom{=\frac{11}{15}}
           + \frac{1}{3} \left(
             e^{i 2 \pi \cdot 5n / 15}
             + e^{i 2 \pi \cdot 10n / 15}
           \right) \\[1em]
    &amp;= \frac{11}{15}
           + \frac{2}{5} \left(
             e^{i 2 \pi \cdot 3n / 15}
             + e^{i 2 \pi \cdot 6n / 15}
             + e^{-i 2 \pi \cdot 6n / 15}
             + e^{-i 2 \pi \cdot 3n / 15}
           \right) \\
           &amp; \phantom{=\frac{11}{15}}
           + \frac{1}{3} \left(
             e^{i 2 \pi \cdot 5n / 15}
             + e^{-i 2 \pi \cdot 5n / 15}
           \right) \\[1em]
    &amp;= \frac{11}{15}
       + \frac{2}{5} \left(
         2 \cos \left( \frac{2 \pi n}{5} \right)
         + 2 \cos \left( \frac{4 \pi n}{5} \right)
       \right) \\
       &amp; \phantom{=\frac{11}{15}}
       + \frac{1}{3} \left(
         2 \cos \left( \frac{2 \pi n}{3} \right)
       \right) \\[1em]
    &amp;= \frac{11}{15} +
       \frac{4}{5} \cos \left( \frac{2 \pi n}{5} \right) +
       \frac{4}{5} \cos \left( \frac{4 \pi n}{5} \right) +
       \frac{2}{3} \cos \left( \frac{2 \pi n}{3} \right).
  \end{align*}

  This is exactly the same expression for \( f(n) \) we obtained in
  the previous section.  We see that the Fizz Buzz index function \(
  f(n) \) can be expressed precisely using the machinery of Fourier
  analysis.
</p>
<h2 id="conclusion">Conclusion<a href="#conclusion"></a></h2>
<p>
  To summarise, we have defined the Fizz Buzz sequence as

  \[
    (s_{f(n)}(n))_{n = 1}^{\infty}
  \]

  where

  \[
    f(n)
    = \frac{11}{15} +
      \frac{2}{3} \cos \left( \frac{2 \pi n}{3} \right) +
      \frac{4}{5} \cos \left( \frac{2 \pi n}{5} \right) +
      \frac{4}{5} \cos \left( \frac{4 \pi n}{5} \right).
  \]

  and \( s_0(n) = n, \) \( s_1(n) = \mathtt{Fizz}, \) \( s_2(n) =
  \mathtt{Buzz} \) and \( s_3(n) = \mathtt{FizzBuzz}.  \)  A Python
  program to print the Fizz Buzz sequence based on this definition was
  presented earlier.  That program can be written more succinctly as
  follows:
</p>
<pre><code>from math import cos, pi
for n in range(1, 101):
    print([n, 'Fizz', 'Buzz', 'FizzBuzz'][round(11 / 15 + (2 / 3) * cos(2 * pi * n / 3) + (4 / 5) * (cos(2 * pi * n / 5) + cos(4 * pi * n / 5)))])</code></pre>
<p>
  We can also wrap this up nicely in a shell one-liner, in case you
  want to share it with your friends and family and surprise them:
</p>
<pre><code>python3 -c 'from math import cos, pi; [print([n, "Fizz", "Buzz", "FizzBuzz"][round(11/15 + (2/3) * cos(2*pi*n/3) + (4/5) * (cos(2*pi*n/5) + cos(4*pi*n/5)))]) for n in range(1, 101)]'</code></pre>
<p>
  We have taken a simple counting game and turned it into a
  trigonometric construction consisting of a discrete Fourier series
  with three cosine terms and four coefficients.  None of this makes
  Fizz Buzz any easier.  Quite the contrary.  But it does show that
  every \( \mathtt{Fizz} \) and \( \mathtt{Buzz} \) now owes its
  existence to a particular set of Fourier coefficients.  We began
  with the modest goal of making this simple problem more complicated.
  I think it is safe to say that we did not fall short.
</p>
<!-- ### -->
<p>
  <a href="https://susam.net/fizz-buzz-with-cosines.html">Read on website</a> |
  <a href="https://susam.net/tag/absurd.html">#absurd</a> |
  <a href="https://susam.net/tag/python.html">#python</a> |
  <a href="https://susam.net/tag/programming.html">#programming</a> |
  <a href="https://susam.net/tag/technology.html">#technology</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a> |
  <a href="https://susam.net/tag/puzzle.html">#puzzle</a>
</p>
]]>
</description>
</item>
<item>
<title>My Lobsters Interview</title>
<link>https://susam.net/my-lobsters-interview.html</link>
<guid isPermaLink="false">lbstr</guid>
<pubDate>Fri, 12 Sep 2025 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<p>
  I recently had an engaging conversation with Alex
  (<a href="https://lobste.rs/~veqq">@veqq</a>) from the
  <a href="https://lobste.rs/">Lobsters</a> community about computing,
  mathematics and a range of related topics.  Our conversation was
  later published on the community website as
  <a href="https://lobste.rs/s/kltoas">Lobsters Interview with
  Susam</a>.
</p>
<p>
  I should mention the sections presented in that post are not in the
  same order in which we originally discussed them.  The sections were
  edited and rearranged by Alex to improve the flow and avoid
  repetition of similar topics too close to each other.
</p>
<p>
  This page preserves a copy of our discussion as edited by Alex, so I
  can keep an archived version on my website.  In my copy, I have
  added a table of contents to make it easier to navigate to specific
  sections.  The interview itself follows the table of contents.  I
  hope you enjoy reading it.
</p>
<h2 id="contents">Contents<a href="#contents"></a></h2>
<ol>
  <li><a href="#lisp-and-other-things">Lisp and Other Things</a></li>
  <li><a href="#lisp-emacs-and-mathematics">Lisp, Emacs and Mathematics</a></li>
  <li><a href="#interests-and-exploration">Interests and Exploration</a></li>
  <li><a href="#computing-for-fun">Computing for Fun</a></li>
  <li><a href="#computing-activities">Computing Activities</a></li>
  <li><a href="#programming-vs-domains">Programming vs Domains</a></li>
  <li><a href="#old-functionality-and-new-problems">Old Functionality and New Problems</a></li>
  <li><a href="#designing-for-composability">Designing for Composability</a></li>
  <li><a href="#small-vs-large-functions">Small vs Large Functions</a></li>
  <li><a href="#domains-and-projects">Domains and Projects</a></li>
  <li><a href="#double-spacing-and-touch-typing">Double Spacing and Touch Typing</a></li>
  <li><a href="#approach-to-learning">Approach to Learning</a></li>
  <li><a href="#managing-time-and-distractions">Managing Time and Distractions</a></li>
  <li><a href="#blogging">Blogging</a></li>
  <li><a href="#forums">Forums</a></li>
  <li><a href="#mathb-moderation-problems">MathB Moderation Problems</a></li>
  <li><a href="#favourite-mathematics-textbooks">Favourite Mathematics Textbooks</a></li>
  <li><a href="#mathematics-and-computing">Mathematics and Computing</a></li>
</ol>
<h2 id="conversation">Our Conversation<a href="#conversation"></a></h2>
<!-- Lisp and other things -->
<p class="question" id="lisp-and-other-things">
  Hi <a href="https://lobste.rs/~susam">@susam</a>, I primarily know
  you as a Lisper, what other things do you use?
</p>
<p>
  Yes, I use Lisp extensively for my personal projects and much of
  what I do in my leisure is built on it.  I ran
  a <a href="https://github.com/susam/mathb">mathematics pastebin</a>
  for close to thirteen years.  It was quite popular on some IRC
  channels.  The pastebin was written in Common Lisp.
  My <a href="https://susam.net/">personal website</a> and blog are
  generated using a tiny static site generator written in Common Lisp.
  Over the years I have built several other personal tools in it as
  well.
</p>
<p>
  I am an active Emacs Lisp programmer too.  Many of my software tools
  are in fact Emacs Lisp functions that I invoke with convenient key
  sequences.  They help me automate repetitive tasks as well as
  improve my text editing and task management experience.
</p>
<p>
  I use plenty of other tools as well.  In my early adulthood, I spent
  many years working with C, C++, Java and PHP.  My
  <a href="https://issues.apache.org/jira/browse/NUTCH-559">first
  substantial open source contribution</a> was to the Apache Nutch
  project which was in Java and one of my early original open source
  projects was <a href="https://github.com/susam/uncap">Uncap</a>, a C
  program to remap keys on Windows.
</p>
<p>
  These days I use a lot of Python, along with some Go and Rust, but
  Lisp remains important to my personal work.  I also enjoy writing
  small standalone tools directly in HTML and JavaScript, often with
  all the code in a single file in a readable, unminified form.
</p>
<!-- Lisp, Emacs and mathematics -->
<p class="question" id="lisp-emacs-and-mathematics">
  How did you first discover computing, then end up with Lisp, Emacs
  and mathematics?
</p>
<p>
  I got introduced to computers through the Logo programming language
  as a kid.  Using simple arithmetic, geometry, logic and code to
  manipulate a two-dimensional world had a lasting effect on me.
</p>
<p>
  I still vividly remember how I ended up with Lisp.  It was at an
  airport during a long layover in 2007.  I wanted to use the time to
  learn something, so I booted my laptop
  running <a href="https://www.debian.org/">Debian</a> GNU/Linux 4.0
  (Etch) and then started
  <a href="https://www.gnu.org/software/clisp/">GNU CLISP</a> 2.41.
  In those days, Wi-Fi in airports was uncommon.  Smartphones and
  mobile data were also uncommon.  So it was fortunate that I had
  CLISP already installed on my system and my laptop was ready for
  learning Common Lisp.  I had it installed because I had wanted to
  learn Common Lisp for some time.  I was especially attracted by its
  simplicity, by the fact that the entire language can be built up
  from a very small set of special forms.  I
  use <a href="https://www.sbcl.org/">SBCL</a> these days, by the way.
</p>
<p>
  I discovered Emacs through Common Lisp.  Several sources recommended
  using the <a href="https://slime.common-lisp.dev/">Superior Lisp
  Interaction Mode for Emacs (SLIME)</a> for Common Lisp programming,
  so that's where I began.  For many years I continued to use Vim as
  my primary editor, while relying on Emacs and SLIME for Lisp
  development.  Over time, as I learnt more about Emacs itself, I grew
  fond of Emacs Lisp and eventually made Emacs my primary editor and
  computing environment.
</p>
<p>
  I have loved mathematics since my childhood days.  What has always
  fascinated me is how we can prove deep and complex facts using first
  principles and clear logical steps.  That feeling of certainty and
  rigour is unlike anything else.
</p>
<p>
  Over the years, my love for the subject has been rekindled many
  times.  As a specific example, let me share how I got into number
  theory.  One day I decided to learn the RSA cryptosystem.  As I was
  working through the
  <a href="https://people.csail.mit.edu/rivest/Rsapaper.pdf">RSA
  paper</a>, I stumbled upon the Euler totient function
  \( \varphi(n) \) which gives the number of positive integers not
  exceeding n that are relatively prime to n.  The paper first states
  that

  \[
    \varphi(p) = p - 1
  \]

  for prime numbers \( p.  \)  That was obvious since \( p \) has no
  factors other than \( 1 \) and itself, so every integer from \( 1 \)
  up to \( p - 1 \) must be relatively prime to it.  But then it
  presents

  \[
    \varphi(pq) = \varphi(p) \cdot \varphi(q) = (p - 1)(q - 1)
  \]

  for primes \( p \) and \( q.  \)  That was not immediately obvious to
  me back then.  After a few minutes of thinking, I managed to prove
  it from scratch.  By the inclusion-exclusion principle, we count how
  many integers from \( 1 \) up to \( pq \) are not divisible by
  \(p \) or \( q.  \)  There are \( pq \) integers in total.  Among
  them, there are \( q \) integers divisible by \( p \) and \( p \)
  integers divisible by \( q.  \)  So we need to subtract \( p + q \)
  from \(pq.  \)  But since one integer (\( pq \) itself) is counted in
  both groups, we add \( 1 \) back.  Therefore

  \[
    \varphi(pq) = pq - (p + q) + 1 = (p - 1)(q - 1).
  \]

  Next I could also obtain the general formula for \( \varphi(n) \)
  for an arbitrary positive integer \( n \) using the same idea.
  There are several other proofs too, but that is how I derived the
  general formula for \( \varphi(n) \) when I first encountered it.
  And just like that, I had begun to learn number theory!
</p>
<!-- Computing for fun -->
<p class="question" id="computing-for-fun">
  You've said you prefer computing for fun.  What is fun to you?  Do
  you have an idea of what makes something fun or not?
</p>
<p>
  For me, fun in computing began when I first learnt IBM/LCSI PC Logo
  when I was nine years old.  I had very limited access to computers
  back then, perhaps only about two hours per <em>month</em> in the
  computer laboratory at my primary school.  Most of my Logo
  programming happened with pen and paper at home.  I would 'test' my
  programs by tracing the results on graph paper.  Eventually I would
  get about thirty minutes of actual computer time in the lab to run
  them for real.
</p>
<p>
  So back then, most of my computing happened without an actual
  computer.  But even with that limited access to computers, a whole
  new world opened up for me: one that showed me the joy of computing
  and more importantly, the joy of sharing my little programs with my
  friends and teachers.  One particular Logo program I still remember
  very well drew a house with animated dashed lines, where the dashes
  moved around the outline of the house.  Everyone around me loved it,
  copied it and tweaked it to change the colours, alter the details
  and add their own little touches.
</p>
<p>
  For me, fun in computing comes from such exploration and sharing.  I
  enjoy asking 'what happens if' and then seeing where it leads me.
  My Emacs package
  <a href="https://elpa.nongnu.org/nongnu/devil.html">devil-mode</a>
  comes from such exploration.  It came from asking, 'What happens if
  we avoid using the <kbd>ctrl</kbd> and <kbd>meta</kbd> modifier keys
  and use <kbd>,</kbd> (the comma key) or another suitable key as a
  leader key instead?  And can we still have a non-modal editing
  experience?'
</p>
<p>
  Sometimes computing for fun may mean crafting a minimal esoteric
  drawing language, making a small game or building a tool that solves
  an interesting problem elegantly.  It is a bonus if the exploration
  results in something working well enough that I can share with
  others on the World Wide Web and others find it fun too.
</p>
<!-- Pursuits -->
<p class="question" id="interests-and-exploration">
  How do you choose what to investigate?  Which most interest you,
  with what commonalities?
</p>
<p>
  For me, it has always been one exploration leading to another.
</p>
<p>
  For example, I originally built
  <a href="https://github.com/susam/mathb">MathB</a> for my friends
  and myself who were going through a phase in our lives when we used
  to challenge each other with mathematical puzzles.  This tool became
  a nice way to share solutions with each other.  Its use spread from
  my friends to their friends and colleagues, then to schools and
  universities and eventually to IRC channels.
</p>
<p>
  Similarly, I built <a href="https://github.com/susam/texme">TeXMe</a>
  when I was learning neural networks and taking a lot of notes on the
  subject.  I was not ready to share the notes online, but I did want
  to share them with my friends and colleagues who were also learning
  the same topic.  Normally I would write my notes in LaTeX, compile
  them to PDF and share the PDF, but in this case, I wondered, what if
  I took some of the code from MathB and created a tool that would let
  me write plain Markdown
  (<a href="https://github.github.com/gfm/">GFM</a>) + LaTeX
  (<a href="https://www.mathjax.org/">MathJax</a>) in
  a <code>.html</code> file and have the tool render the file as soon
  as it was opened in a web browser?  That resulted in TeXMe, which
  has surprisingly become one of my most popular projects, receiving
  <a href="files/blog/texme-may-2025.png">millions of hits</a> in some
  months according to the CDN statistics.
</p>
<p>
  Another example is <a href="https://susam.github.io/muboard/">Muboard</a>,
  which is a bit like an interactive mathematics chalkboard.  I built
  this when I was hosting an
  <a href="journey-to-prime-number-theorem.html">analytic number
  theory book club</a> and I needed a way to type LaTeX snippets live
  on screen and see them immediately rendered.  That made me wonder:
  what if I took TeXMe, made it interactive and gave it a chalkboard
  look-and-feel?  That led to Muboard.
</p>
<p>
  So we can see that sharing mathematical notes and snippets has been
  a recurring theme in several of my projects.  But that is only a
  small fraction of my interests.  I have a wide variety of interests
  in computing.  I also engage in random explorations, like writing
  IRC clients
  (<a href="https://github.com/susam/nimb">NIMB</a>,
  <a href="https://github.com/susam/tzero">Tzero</a>),
  ray tracing
  (<a href="https://github.com/susam/pov25">POV-Ray</a>,
  <a href="https://github.com/spxy/java-ray-tracing">Java ray tracer</a>),
  writing Emacs guides
  (<a href="https://github.com/susam/emacs4cl">Emacs4CL</a>,
  <a href="https://github.com/susam/emfy">Emfy</a>),
  developing small single-file HTML games
  (<a href="invaders.html">Andromeda Invaders</a>,
  <a href="myrgb.html">Guess My RGB</a>),
  purely recreational programming
  (<a href="fxyt.html">FXYT</a>,
  <a href="https://github.com/susam/may4">may4.fs</a>,
  <a href="self-printing-machine-code.html">self-printing machine code</a>,
  <a href="primegrid.html">prime number grid explorer</a>)
  and so on.  The list goes on.  When it comes to hobby computing, I
  don't think I can pick just one domain and say it interests me the
  most.  I have a lot of interests.
</p>
<!-- What is computing?  -->
<p class="question" id="computing-activities">
  What is computing, to you?
</p>
<p>
  Computing, to me, covers a wide range of activities: programming a
  computer, using a computer, understanding how it works, even
  building one.  For example, I once built a tiny 16-bit CPU along
  with a small main memory that could hold only eight 16-bit
  instructions, using VHDL and a Xilinx CPLD kit.  The design was
  based on the Mano CPU introduced in the book <em>Computer System
  Architecture</em> (3rd ed.) by M. Morris Mano.  It was incredibly
  fun to enter instructions into the main memory, one at a time, by
  pushing DIP switches up and down and then watch the CPU I had built
  execute an entire program.  For someone like me, who usually works
  with software at higher levels of abstraction, that was a thrilling
  experience!
</p>
<p>
  Beyond such experiments, computing also includes more practical and
  concrete activities, such as installing and using my favourite Linux
  distribution (Debian), writing software tools in languages like
  Common Lisp, Emacs Lisp, Python and the shell command language or
  customising my Emacs environment to automate repetitive tasks.
</p>
<p>
  To me, computing also includes the abstract stuff like spending time
  with abstract algebra and number theory and getting a deeper
  understanding of the results pertaining to groups, rings and fields,
  as well as numerous number-theoretic results.  Browsing the
  <a href="https://oeis.org/">On-Line Encyclopedia of Integer
  Sequences</a> (OEIS), writing small programs to explore interesting
  sequences or just thinking about them is computing too.  I think
  many of the interesting results in computer science have deep
  mathematical foundations.  I believe much of computer science is
  really discrete mathematics in action.
</p>
<p>
  And if we dive all the way down from the CPU to the level of
  transistors, we encounter continuous mathematics as well, with
  non-linear voltage-current relationships and analogue behaviour that
  make digital computing possible.  It is fascinating how, as a
  relatively new species on this planet, we have managed to take sand
  and find a way to use continuous voltages and currents in electronic
  circuits built with silicon and convert them into the discrete
  operations of digital logic.  We have machines that can simulate
  themselves!
</p>
<p>
  To me, all of this is fun.  To study and learn about these things,
  to think about them, to understand them better and to accomplish
  useful or amusing results with this knowledge is all part of the
  fun.
</p>
<!-- Programming vs domains -->
<p class="question" id="programming-vs-domains">
  How do you view programming vs. domains?
</p>
<p>
  I focus more on the domain than the tool.  Most of the time it is a
  problem that catches my attention and then I explore it to
  understand the domain and arrive at a solution.  The problem itself
  usually points me to one of the tools I already know.
</p>
<p>
  For example, if it is about working with text files, I might write
  an Emacs Lisp function.  If it involves checking large sets of
  numbers rapidly for patterns, I might choose C++ or Rust.  But if I
  want to share interactive visualisations of those patterns with
  others, I might rewrite the solution in HTML and JavaScript,
  possibly with the use of the Canvas API, so that I can share the
  work as a self-contained file that others can execute easily within
  their web browsers.  When I do that, I prefer to keep the HTML neat
  and readable, rather than bundled or minified, so that people who
  like to 'View Source' can copy, edit and customise the code
  themselves to immediately see their changes take effect.
</p>
<p>
  Let me share a specific example.  While working on a web-based game, I first
  used <code>CanvasRenderingContext2D</code>'s <code>fillText()</code>
  to display text on the game canvas.  However, dissatisfied with the
  text rendering quality, I began looking for IBM PC OEM fonts and
  similar retro fonts online.  After downloading a few font packs, I
  wrote a little Python script to convert them to bitmaps (arrays of
  integers) and then used the bitmaps to draw text on the canvas using
  JavaScript, one cell at a time, to get pixel-perfect results!  These
  tiny Python and JavaScript tools were good enough that I felt
  comfortable sharing them together as a tiny toolkit called
  <a href="https://susam.github.io/pcface/src/demo.html">PCFace</a>.
  This toolkit offers JavaScript bitmap arrays and tiny JavaScript
  rendering functions, so that someone else who wants to display text
  on their game canvas using PC fonts and nothing but plain HTML and
  JavaScript can do so without having to solve the problem from
  scratch!
</p>
<!-- Applicability of old functionality for new problems -->
<p class="question" id="old-functionality-and-new-problems">
  Has the rate of your making new Emacs functions has diminished over
  time (as if everything's covered) or do the widening domains lead to
  more?  I'm curious how applicable old functionality is for new
  problems and how that impacts the APIs!
</p>
<p>
  My rate of making new Emacs functions has definitely decreased.
  There are two reasons.  One is that over the years my computing
  environment has converged into a comfortable, stable setup I am very
  happy with.  The other is that at this stage of life I simply cannot
  afford the time to endlessly tinker with Emacs as I did in my
  younger days.
</p>
<p>
  More generally, when it comes to APIs, I find that well-designed
  functionality tends to remain useful even when new problems appear.
  In Emacs, for example, many of my older functions continue to serve
  me well because they were written in a composable way.  New problems
  can often be solved with small wrappers or combinations of existing
  functions.  I think APIs that consist of functions that are simple,
  orthogonal and flexible age well.  If each function in an API does
  one thing and does it well (the Unix philosophy), it will have
  long-lasting utility.
</p>
<p>
  Of course, new domains and problems do require new functions and
  extensions to an API, but I think it is very important to not give
  in to the temptation of enhancing the existing functions by making
  them more complicated with optional parameters, keyword arguments,
  nested branches and so on.  Personally, I have found that it is much
  better to implement new functions that are small, orthogonal and
  flexible, each doing one thing and doing it well.
</p>
<p class="question" id="designing-for-composability">
  What design methods or tips do you have, to increase composability?
</p>
<p>
  For me, good design starts with good vocabulary.  Clear vocabulary
  makes abstract notions concrete and gives collaborators a shared
  language to work with.  For example, while working on a network
  events database many years ago, we collected data minute by minute
  from network devices.  We decided to call each minute of data from a
  single device a 'nugget'.  So if we had 15 minutes of data from 10
  devices, that meant 150 nuggets.
</p>
<p>
  Why 'nugget'?  Because it was shorter and more convenient than
  repeatedly saying 'a minute of data from one device'.  Why not
  something less fancy like 'chunk'?  Because we reserved 'chunk' for
  subdivisions within a nugget.  Perhaps there were better choices,
  but 'nugget' was the term we settled on and it quickly became shared
  terminology between the collaborators.  Good terminology naturally
  carries over into code.  With this vocabulary in place, function
  names like <code>collect_nugget()</code>,
  <code>open_nugget()</code>, <code>parse_chunk()</code>,
  <code>index_chunk()</code>, <code>skip_chunk()</code>,
  etc. immediately become meaningful to everyone involved.
</p>
<p>
  Thinking about the vocabulary also ensures that we are thinking
  about the data, concepts and notions we are working with in a
  deliberate manner and that kind of thinking also helps when we
  design the architecture of software.
</p>
<p>
  Too often I see collaborators on software projects jump straight
  into writing functions that take some input and produce some desired
  effect, with variable names and function names decided on the fly.
  To me, this feels backwards.  I prefer the opposite approach.
  Define the terms first and let the code follow from them.
</p>
<p>
  I also prefer developing software in a layered manner, where complex
  functionality is built from simpler, well-named building blocks.  It
  is especially important to avoid <em>layer violations</em>, where
  one complex function invokes another complex function.  That creates
  tight coupling between two complex functions.  If one function
  changes in the future, we have to reason carefully about how it
  affects the other.  Since both are already complex, the cognitive
  burden is high.  A better approach, I think, is to identify the
  common functionality they share and factor that out into smaller,
  simpler functions.
</p>
<p>
  To summarise, I like to develop software with a clear vocabulary,
  consistent use of that vocabulary, a layered design where complex
  functions are built from simpler ones and by avoiding layer
  violations.  I am sure none of this is new to the Lobsters
  community.  Some of these ideas also occur
  in <a href="https://en.wikipedia.org/wiki/Domain-driven_design">domain-driven
  design</a> (DDD).  DDD defines the term <em>ubiquitous language</em>
  to mean, 'A language structured around the domain model and used by
  all team members within a bounded context to connect all the
  activities of the team with the software.'  If I could call this
  approach of software development something, I would simply call it
  'vocabulary-driven development' (VDD), though of course DDD is the
  more comprehensive concept.
</p>
<p>
  Like I said, none of this is likely new to the Lobsters community.
  In particular, I suspect Forth programmers would find it too
  obvious.  In Forth, it is very difficult to begin with a long,
  poorly thought-out monolithic word and then break it down into
  smaller ones later.  The stack effects quickly become too hard to
  track mentally with that approach.  The only viable way to develop
  software in Forth is to start with a small set of words that
  represent the important notions of the problem domain, test them
  immediately and then compose higher-level words from the lower-level
  ones.  Forth naturally encourages a layered style of development,
  where the programmer thinks carefully about the domain, invents
  vocabulary and expresses complex ideas in terms of simpler ones,
  almost in a mathematical fashion.  In my experience, this kind of
  deliberate design produces software that remains easy to understand
  and reason about even years after it was written.
</p>
<!-- Small vs large functions -->
<p class="question" id="small-vs-large-functions">
  Not enhancing existing functions but adding new small ones seems
  quite lovely, but how do you come back to such a codebase later with
  many tiny functions?  At points, I've advocated for very large
  functions, particularly traumatized by Java-esque 1000 functions in
  1000 files approaches.  When you had time, would you often
  rearchitecture the conceptual space of all of those functions?
</p>
<p>
  The famous quote from Alan J. Perlis comes to mind:
</p>
<blockquote>
  <p>
    It is better to have 100 functions operate on one data structure
    than 10 functions on 10 data structures.
  </p>
</blockquote>
<p>
  Personally, I enjoy working with a codebase that has thousands of
  functions, provided most of them are small, well-scoped and do one
  thing well.  That said, I am not dogmatically opposed to large
  functions.  It is always a matter of taste and judgement.  Sometimes
  one large, cohesive function is clearer than a pile of tiny ones.
</p>
<p>
  For example, when I worked on parser generators, I often found that
  lexers and finite state machines benefited from a single top-level
  function containing the full tokenisation logic or the full state
  transition logic in one place.  That function could call smaller
  helpers for specific tasks, but we still need the overall
  <code>switch</code>-<code>case</code> or
  <code>if</code>-<code>else</code> or <code>cond</code> ladder
  somewhere.  I think trying to split that ladder into smaller
  functions would only make the code harder to follow.
</p>
<p>
  So while I lean towards small, composable functions, the real goal
  is to strike a balance that keeps code maintainable in the long run.
  Each function should be as small as it can reasonably be and no
  smaller.
</p>
<!-- Domains -->
<p class="question" id="domains-and-projects">
  Like you, I program as a tool to explore domains.  Which do you know
  the most about?
</p>
<p>
  For me too, the appeal of computer programming lies especially in
  how it lets me explore different domains.  There are two kinds of
  domains in which I think I have gained good expertise.  The first
  comes from years of developing software for businesses, which has
  included solving problems such as network events parsing, indexing
  and querying, packet decoding, developing parser generators,
  database session management and TLS certificate lifecycle
  management.  The second comes from areas I pursue purely out of
  curiosity or for hobby computing.  This is the kind I am going to
  focus on in our conversation.
</p>
<p>
  Although computing and software are serious business today, for me,
  as for many others, computing is also a hobby.
</p>
<p>
  Personal hobby projects often lead me down various rabbit holes and
  I end up learning new domains along the way.  For example, although
  I am not a web developer, I learnt to build small, interactive
  single-page tools in plain HTML, CSS and JavaScript simply because I
  needed them for my hobby projects over and over again.  An early
  example is <a href="quickqwerty.html">QuickQWERTY</a>, which I built
  to teach myself and my friends touch-typing on QWERTY keyboards.
  Another example is <a href="cfrs.html">CFRS[]</a>, which I created
  because I wanted to make a total (non-Turing complete) drawing
  language that has turtle graphics like Logo but is absolutely
  minimal like P&prime;&prime;.
</p>
<!-- Double spacing -->
<p class="question" id="double-spacing-and-touch-typing">
  You use double spaces after periods which I'd only experienced from
  people who learned touch typing on typewriters, unexpected!
</p>
<p>
  Yes, I do separate sentences by double spaces.  It is interesting
  that you noticed this.
</p>
<p>
  I once briefly learnt touch typing on typewriters as a kid, but
  those lessons did not stick with me.  It was much later, when I used
  a Java applet-based touch typing tutor that I found online about two
  decades ago, that the lessons really stayed with me.  Surprisingly,
  that application taught me to type with a single space between
  sentences.  By the way, I disliked installing Java plugins into the
  web browser, so I wrote <a href="quickqwerty.html">QuickQWERTY</a>
  as a similar touch typing tutor in plain HTML and JavaScript for
  myself and my friends.
</p>
<p>
  I learnt to use double spaces between sentences first with Vim and
  then later again with Emacs.  For example, in Vim,
  the <code>joinspaces</code> option is on by default, so when we join
  sentences with the normal mode command <code>J</code> or format
  paragraphs with <code>gqap</code>, Vim inserts two spaces after full
  stops.  We need to disable that behaviour with <code>:set
  nojoinspaces</code> if we want single spacing.
</p>
<p>
  It is similar in Emacs.  In Emacs, the
  <code>delete-indentation</code> command (<code>M-^</code>) and
  the <code>fill-paragraph</code> command (<code>M-q</code>) both
  insert two spaces between sentences by default.  Single spacing can
  be enabled with <code>(setq sentence-end-double-space nil)</code>.
</p>
<p>
  Incidentally, I spend a good portion of the README for my Emacs
  quick-start DIY kit named
  <a href="https://github.com/susam/emfy">Emfy</a> discussing sentence
  spacing conventions under the section
  <a href="https://github.com/susam/emfy#single-space-for-sentence-spacing">Single
  Space for Sentence Spacing</a>.  There I explain how to configure
  Emacs to use single spaces, although I use double spaces myself.
  That's because many new Emacs users prefer single spacing.
</p>
<p>
  The defaults in Vim and Emacs made me adopt double spacing.  The
  double spacing convention is also widespread across open source
  software.  If we look at the Vim help pages, Emacs built-in
  documentation or the Unix and Linux man pages, double spacing is the
  norm.  Even inline comments in traditional open source projects
  often use it.  For example, see Vim's
  <a href="https://github.com/vim/vim/blob/v9.1.1752/runtime/doc/usr_01.txt">:h usr_01.txt</a>,
  Emacs's
  <a href="https://cgit.git.savannah.gnu.org/cgit/emacs.git/tree/doc/emacs/emacs.texi?h=emacs-30.2#n1556">(info "(emacs) Intro")</a>
  or the comments in the <a href="https://gcc.gnu.org/git/?p=gcc.git;f=gcc/cfg.cc;hb=releases/gcc-15.2.0">GCC source code</a>.
</p>
<!-- Learning -->
<p class="question" id="approach-to-learning">
  How do you approach learning a new domain?
</p>
<p>
  When I take on a new domain, there is of course a lot of reading
  involved from articles, books and documentation.  But as I read, I
  constantly try to test what I learn.  Whenever I see a claim, I ask
  myself, 'If this claim were wrong, how could I demonstrate it?'
  Then I design a little experiment, perhaps write a snippet of code
  or run a command or work through a concrete example, with the goal
  of checking the claim in practice.
</p>
<p>
  Now I am not genuinely hoping to prove a claim wrong.  It is just a
  way to engage with the material.  To illustrate, let me share an
  extremely simple and generic example without going into any
  particular domain.  Suppose I learn that Boolean operations in
  Python short-circuit.  I might write out several experimental
  snippets like the following:
</p>
<pre><code>def t(): print('t'); return True
def f(): print('f'); return False
f() or t() or f()
</code></pre>
<p>
  And then confirm that the results do indeed confirm short-circuit
  evaluation (<code>f</code> followed by <code>t</code> in this case).
</p>
<p>
  At this point, one could say, 'Well, you just confirmed what the
  documentation already told you.'  And that's true.  But for me, the
  value lies in trying to test it for myself.  Even if the claim
  holds, the act of checking forces me to see the idea in action.
  That not only reinforces the concept but also helps me build a much
  deeper intuition for it.
</p>
<p>
  Sometimes these experiments also expose gaps in my own
  understanding.  Suppose I didn't properly know what 'short-circuit'
  means.  Then the results might contradict my expectations.  That
  contradiction would push me to correct my misconception and that's
  where the real learning happens.
</p>
<p>
  Occasionally, this process even uncovers subtleties I didn't expect.
  For example, while learning socket programming, I discovered that a
  client can successfully receive data using <code>recv()</code> even
  after calling <code>shutdown()</code>, contrary to what I had first
  inferred from the specifications.  See my Stack Overflow post
  <a href="https://stackoverflow.com/q/39698037/303363">Why can recv()
  receive messages after the client has invoked shutdown()?</a> for
  more details if you are curious.
</p>
<p>
  Now this method cannot always be applied, especially if it is very
  expensive or unwieldy to do so.  For example, if I am learning
  something in the finance domain, it is not always possible to
  perform an actual transaction.  One can sometimes use simulation
  software, mock environments or sandbox systems to explore ideas
  safely.  Still, it is worth noting that this method has its
  limitations.
</p>
<p>
  In mathematics, though, I find this method highly effective.  When I
  study a new branch of mathematics, I try to come up with examples
  and counterexamples to test what I am learning.  Often, failing to
  find a counterexample helps me appreciate more deeply why a claim
  holds and why no counterexamples exist.
</p>
<!-- Distraction -->
<p class="question" id="managing-time-and-distractions">
  Do you have trouble not getting distracted with so much on your
  plate?  I'm curious how you balance the time commitments of
  everything!
</p>
<p>
  Indeed, it is very easy to get distracted.  One thing that has
  helped over the years is the increase in responsibilities in other
  areas of my life.  These days I also spend some of my free time
  studying mathematics textbooks.  With growing responsibilities and
  the time I devote to mathematics, I now get at most a few hours each
  week for hobby computing.  This automatically narrows down my
  options.  I can explore perhaps one or at most two ideas in a month
  and that constraint makes me very deliberate about choosing my
  pursuits.
</p>
<p>
  Many of the explorations do not evolve into something solid that I
  can share.  They remain as little experimental code snippets or
  notes archived in a private repository.  But once in a while, an
  exploration grows into something concrete and feels worth sharing on
  the Web.  That becomes a short-term hobby project.  I might work on
  it over a weekend if it is small or for a few weeks if it is more
  complex.  When that happens, the goal of sharing the project helps
  me focus.
</p>
<p>
  I try not to worry too much about making time.  After all, this is
  just a hobby.  Other areas of my life have higher priority.  I also
  want to devote a good portion of my free time to learning more
  mathematics, which is another hobby I am passionate about.  Whatever
  little spare time remains after attending to the higher-priority
  aspects of my life goes into my computing projects, usually a couple
  of hours a week, most of it on weekends.
</p>
<!-- Blogging -->
<p class="question" id="blogging">
  How does blogging mix in?  What's the development like of a single
  piece of curiosity through wrestling with the domain, learning and
  sharing it etc.?
</p>
<p>
  Maintaining my personal website is another aspect of computing that
  I find very enjoyable.  My website began as a loose collection of
  pages on a LAN site during my university days.  Since then I have
  been adding pages to it to write about various topics that I find
  interesting.  It acquired its blog shape and form much later when
  blogging became fashionable.
</p>
<p>
  I usually write a new blog post when I feel like there is some piece
  of knowledge or some exploration that I want to archive in a
  persistent format.  Now what the development of a post looks like
  depends very much on the post.  So let me share two opposite
  examples to describe what the development of a single piece looks
  like.
</p>
<p>
  One of my most frequently visited posts
  is <a href="lisp-in-vim.html">Lisp in Vim</a>.  It started when I
  was hosting a Common Lisp programming club for beginners.  Although
  I have always used Emacs and SLIME for Common Lisp programming
  myself, many in the club used Vim, so I decided to write a short
  guide on setting up something SLIME-like there.  As a former
  long-time Vim user myself, I wanted to make the Lisp journey easier
  for Vim users too.  I thought it would be a 30-minute exercise where
  I write up a README that explains how to install
  <a href="https://github.com/kovisoft/slimv">Slimv</a> and how to set
  it up in Vim.  But then I discovered a newer plugin called
  <a href="https://github.com/vlime/vlime">Vlime</a> that also offered
  SLIME-like features in Vim!  That detail sent me down a very deep
  rabbit hole.  Now I needed to know how the two packages were
  different, what their strengths and weaknesses were, how routine
  operations were performed in both and so on.  What was meant to be a
  short note turned into a nearly 10,000-word article.  As I was
  comparing the two SLIME-like packages for Vim, I also found a few
  bugs in Slimv and contributed fixes for them
  (<a href="https://github.com/kovisoft/slimv/pull/87">#87</a>,
  <a href="https://github.com/kovisoft/slimv/pull/88">#88</a>,
  <a href="https://github.com/kovisoft/slimv/pull/89">#89</a>,
  <a href="https://github.com/kovisoft/slimv/pull/90">#90</a>).
  Writing this blog post turned into a month-long project!
</p>
<p>
  At the opposite extreme is a post like
  <a href="elliptical-python-programming.html">Elliptical
  Python Programming</a>.  I stumbled upon Python's
  <a href="https://docs.python.org/3/library/constants.html#Ellipsis">Ellipsis</a>
  while reviewing someone's code.  It immediately caught my attention.
  I wondered if, combined with some standard obfuscation techniques,
  one could write arbitrary Python programs that looked almost like
  Morse code.  A few minutes of experimentation showed that a
  genuinely Morse code-like appearance was not possible, but something
  close could be achieved.  So I wrote what I hope is a humorous post
  demonstrating that arbitrary Python programs can be written using a
  very restricted set of symbols, one of which is the ellipsis.  It
  took me less than an hour to write this post.  The final result
  doesn't look quite like Morse code as I had imagined, but it is
  quite amusing nevertheless!
</p>
<!-- Forums -->
<p class="question" id="forums">
  What draws you to post and read online forums?  How do you balance
  or allot time for reading technical articles, blogs etc.?
</p>
<p>
  The exchange of ideas!  Just as I enjoy sharing my own
  computing-related thoughts, ideas and projects, I also find joy in
  reading what others have to share.
</p>
<p>
  Other areas of my life take precedence over hobby projects and hobby
  projects take precedence over technical forums.
</p>
<p>
  After I've given time to the higher-priority parts of my life and to
  my own technical explorations, I use whatever spare time remains to
  read articles, follow technical discussions and occasionally add
  comments.
</p>
<!-- MathB.in -->
<p class="question" id="mathb-moderation-problems">
  When you decided to stop with MathB due to moderation burdens, I
  offered to take over/help and you mentioned others had too.  Did
  anyone end up forking it, to your knowledge?
</p>
<p>
  I first thought of shutting down the
  <a href="https://github.com/susam/mathb">MathB</a>-based pastebin
  website in November 2019.  The website had been running for seven
  years at that time.  When I announced my thoughts to the IRC
  communities that would be affected, I received a lot of support and
  encouragement.  A few members even volunteered to help me out with
  moderation.  That support and encouragement kept me going for
  another six years.  However, the volunteers eventually became busy
  with their own lives and moved on.  After all, moderating user
  content for an open pastebin that anyone in the world can post to is
  a thankless and tiring activity.  So most of the moderation activity
  fell back on me.  Finally, in February 2025, I realised that I no
  longer want to spend time on this kind of work.
</p>
<p>
  I developed MathB with a lot of passion for myself and my friends.
  I had no idea at the time that this little project would keep a
  corner of my mind occupied even during weekends and holidays.  There
  was always a nagging worry.  What if someone posted content that
  triggered compliance concerns and my server was taken offline while
  I was away?  I no longer wanted that kind of burden in my life.  So
  I finally decided to shut it down.  I've written more about this
  in <a href="mathbin-is-shutting-down.html">MathB.in Is Shutting
  Down</a>.
</p>
<p>
  To my knowledge, no one has forked it, but others have developed
  alternatives.  Further, the
  <a href="https://wiki.archiveteam.org/">Archive Team</a> has
  <a href="https://web.archive.org/web/*/https://mathb.in/">archived</a>
  all posts from the now-defunct MathB-based website.  A member of the
  Archive Team reached out to me over IRC and we worked together for
  about a week to get everything successfully archived.
</p>
<!-- Textbooks -->
<p class="question" id="favourite-mathematics-textbooks">
  What're your favorite math textbooks?
</p>
<p>
  I have several favourite mathematics books, but let me share three I
  remember especially fondly.
</p>
<p>
  The first is <em>Advanced Engineering Mathematics</em> by Erwin
  Kreyszig.  I don't often see this book recommended online, but for
  me it played a major role in broadening my horizons.  I think I
  studied the 8th edition back in the early 2000s.  It is a hefty book
  with over a thousand pages and I remember reading it cover to cover,
  solving every exercise problem along the way.  It gave me a solid
  foundation in routine areas like differential equations, linear
  algebra, vector calculus and complex analysis.  It also introduced
  me to Fourier transforms and Laplace transforms, which I found
  fascinating.
</p>
<p>
  Of course, the Fourier transform has a wide range of applications in
  signal processing, communications, spectroscopy and more.  But I
  want to focus on the fun and playful part.  In the early 2000s, I
  was also learning to play the piano as a hobby.  I used to record my
  amateur music compositions with
  <a href="https://github.com/audacity/audacity">Audacity</a> by
  connecting my digital piano to my laptop with a line-in cable.  It
  was great fun to plot the spectrum of my music on Audacity, apply
  high-pass and low-pass filters and observe how the Fourier transform
  of the audio changed and then hear the effect on the music.  That
  kind of hands-on tinkering made Fourier analysis intuitive for me
  and I highly recommend it to anyone who enjoys both music and
  mathematics.
</p>
<p>
  The second book is <em>Introduction to Analytic Number Theory</em>
  by Tom M.  Apostol.  As a child I was intrigued by the prime number
  theorem but lacked the mathematical maturity to understand its
  proof.  Years later, as an adult, I finally taught myself the proof
  from Apostol's book.  It was a fantastic journey that began with
  simple concepts like the Möbius function and Dirichlet products and
  ended with quite clever contour integrals that proved the theorem.
  The complex analysis I had learnt from Kreyszig turned out to be
  crucial for understanding those integrals.  Along the way I gained a
  deeper understanding of the Riemann zeta function \( \zeta(s).  \)
  The book discusses zero-free regions where \( \zeta(s) \) does not
  vanish, which I found especially fascinating.  Results like \(
  \zeta(-1) = -1/12, \) which once seemed mysterious, became obvious
  after studying this book.
</p>
<p>
  The third is <em>Galois Theory</em> by Ian Stewart.  It introduced
  me to field extensions, field homomorphisms and solubility by
  radicals.  I had long known that not all quintic equations are
  soluble by radicals, but I didn't know why.  Stewart's book taught
  me exactly why.  In particular, it demonstrated that the polynomial
  \( t^5 - 6t + 3 \) over the field of rational numbers is not soluble
  by radicals.  This particular result, although fascinating, is just
  a small part of a much larger body of work, which is even more
  remarkable.  To arrive at this result, the book takes us through a
  wonderful journey that includes the theory of polynomial rings,
  algebraic and transcendental field extensions, impossibility proofs
  for ruler-and-compass constructions, the Galois correspondence and
  much more.
</p>
<p>
  One of the most rewarding aspects of reading books like these is how
  they open doors to new knowledge, including things I didn't even
  know that I didn't know.
</p>
<!-- Mathematics and computing -->
<p class="question" id="mathematics-and-computing">
  How does the newer math jell with or inform past or present
  computing, compared to much older stuff?
</p>
<p>
  I don't always think explicitly about how mathematics informs
  computing, past or present.  Often the textbooks I pick feel very
  challenging to me, so much so that all my energy goes into simply
  mastering the material.  It is arduous but enjoyable.  I do it
  purely for the fun of learning without worrying about applications.
</p>
<p>
  Of course, a good portion of pure mathematics probably has no
  real-world applications.  As G. H. Hardy famously wrote in <em>A
  Mathematician's Apology</em>:
</p>
<blockquote>
  <p>
    I have never done anything 'useful'.  No discovery of mine has
    made or is likely to make, directly or indirectly, for good or
    ill, the least difference to the amenity of the world.
  </p>
</blockquote>
<p>
  But there is no denying that some of it does find applications.
  Were Hardy alive today, he might be disappointed that number theory,
  his favourite field of 'useless' mathematics, is now a crucial part
  of modern cryptography.  Electronic commerce wouldn't likely exist
  without it.
</p>
<p>
  Similarly, it is amusing how something as abstract as abstract
  algebra finds very concrete applications in coding theory.  Concepts
  such as polynomial rings, finite fields and cosets of subspaces in
  vector spaces over finite fields play a crucial role in
  error-correcting codes, without which modern data transmission and
  storage would not be possible.
</p>
<p>
  On a more personal note, some simpler areas of mathematics have been
  directly useful in my own work.  While solving problems for
  businesses, information entropy, combinatorics and probability
  theory were crucial when I worked on gesture-based authentication
  about one and a half decades ago.
</p>
<p>
  Similarly, when I was developing Bloom filter-based indexing and
  querying for a network events database, again, probability theory
  was crucial in determining the parameters of the Bloom filters (such
  as the number of hash functions, bits per filter and elements per
  filter) to ensure that the false positive rate remained below a
  certain threshold.  Subsequent testing with randomly sampled network
  events confirmed that the observed false positive rate matched the
  theoretical estimate quite well.  It was very satisfying to see
  probability theory and the real world agreeing so closely.
</p>
<p>
  Beyond these specific examples, studying mathematics also influences
  the way I think about problems.  Embarking on journeys like analytic
  number theory or Galois theory is humbling.  There are times when I
  struggle to understand a small paragraph of the book and it takes me
  several hours (or even days) to work out the arguments in detail
  with pen and paper (lots of it) before I really grok them.  That
  experience of grappling with dense reasoning teaches humility and
  also makes me sceptical of complex, hand-wavy logic in day-to-day
  programming.
</p>
<p>
  Several times I have seen code that bundles too many decisions into
  one block of logic, where it is not obvious whether it would behave
  correctly in all circumstances.  Explanations may sometimes be
  offered about why it works for reasonable inputs, but the reasoning
  is often not watertight.  The experience of working through
  mathematical proofs, writing my own, making mistakes and then
  correcting them has taught me that if the reasoning for correctness
  is not clear and rigorous, something could be wrong.  In my
  experience, once such code sees real-world usage, a bug is nearly
  always found.
</p>
<p>
  That's why I usually insist either on simplifying the logic or on
  demonstrating correctness in a clear, rigorous way.  Sometimes this
  means doing a case-by-case analysis for different types of inputs or
  conditions and showing that the code behaves correctly in each case.
  There is also a bit of an art to reducing what seem like numerous or
  even infinitely many cases to a small, manageable set of cases by
  spotting structure, such as symmetries, invariants or natural
  partitions of the input space.  Alternatively, one can look for a
  simpler argument that covers all cases.  These are techniques we
  employ routinely in mathematics and I think that kind of thinking
  and reasoning is quite valuable in software development too.
</p>
<!-- ### -->
<p>
  <a href="https://susam.net/my-lobsters-interview.html">Read on website</a> |
  <a href="https://susam.net/tag/programming.html">#programming</a> |
  <a href="https://susam.net/tag/technology.html">#technology</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a>
</p>
]]>
</description>
</item>
<item>
<title>Prime Number Grid Explorer</title>
<link>https://susam.net/primegrid.html</link>
<guid isPermaLink="false">pghtm</guid>
<pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<p>
  A simple single-page HTML application to explore the distribution of
  prime numbers in a grid.  Choose a starting number along with the
  number of rows and columns and the page generates the corresponding
  grid.
</p>
<!-- ### -->
<p>
  <a href="https://susam.net/primegrid.html">Read on website</a> |
  <a href="https://susam.net/tag/web.html">#web</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a> |
  <a href="https://susam.net/tag/technology.html">#technology</a>
</p>
]]>
</description>
</item>
<item>
<title>Miller-Rabin Speed Test</title>
<link>https://susam.net/code/web/miller-rabin-speed-test.html</link>
<guid isPermaLink="false">mrpst</guid>
<pubDate>Sat, 16 Aug 2025 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<p>
  A demo page that implements the Miller-Rabin primality test to
  accurately detect primes for all numbers less than
  318665857834031151167461 and compare its speed against a simple
  division based primality test algorithm.
</p>
<!-- ### -->
<p>
  <a href="https://susam.net/code/web/miller-rabin-speed-test.html">Read on website</a> |
  <a href="https://susam.net/tag/web.html">#web</a> |
  <a href="https://susam.net/tag/programming.html">#programming</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a> |
  <a href="https://susam.net/tag/technology.html">#technology</a>
</p>
]]>
</description>
</item>
<item>
<title>Mutually Attacking Knights</title>
<link>https://susam.net/mutually-attacking-knights.html</link>
<guid isPermaLink="false">makcf</guid>
<pubDate>Mon, 11 Aug 2025 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<p>
  How many different ways can we place two identical knights on an \(
  n \times n \) chessboard so that they attack each other?  Can we
  find a closed-form expression that gives this number?  This is the
  problem we explore in this article.
</p>
<h2 id="contents">Contents<a href="#contents"></a></h2>
<ul>
  <li><a href="#introduction">Introduction</a></li>
  <li><a href="#counting-placements-as-the-board-grows">Counting Placements as the Board Grows</a>
    <ul>
      <li><a href="#type-a-squares">Type A Squares</a></li>
      <li><a href="#type-b-squares">Type B Squares</a></li>
      <li><a href="#type-c-squares">Type C Squares</a></li>
      <li><a href="#type-d-squares">Type D Squares</a></li>
      <li><a href="#closed-form-expression-1">Closed Form Expression</a></li>
    </ul>
  </li>
  <li><a href="#counting-placements-for-each-square">Counting Placements for Each Square</a>
    <ul>
      <li><a href="#attacking-degrees-of-squares">Attacking Degrees of Squares</a></li>
      <li><a href="#from-attacking-degrees-to-counting-placements">From Attacking Degrees to Counting Placements</a></li>
      <li><a href="#closed-form-expression-2">Closed Form Expression</a></li>
    </ul>
  </li>
  <li><a href="#counting-placements-from-minimal-attack-sections">Counting Placements From Minimal Attack Sections</a>
    <ul>
      <li><a href="#minimal-attack-sections">Minimal Attack Sections</a></li>
      <li><a href="#closed-form-expression-3">Closed Form Expression</a></li>
    </ul>
  </li>
  <li><a href="#reference">References</a></li>
</ul>
<h2 id="introduction">Introduction<a href="#introduction"></a></h2>
<p>
  A knight moves two squares in one direction, then one square
  perpendicular to it, forming an L-shaped path.  If a piece occupies
  the destination square, the knight captures it.  If two knights are
  placed such that each can capture the other in a single move, then
  we say the knights attack each other.  We want to determine the
  number of ways to place two identical knights on an \( n \times n \)
  chessboard so that they attack each other.
</p>
<figure>
  <table class="chess odd">
    <tr>
      <td class="black knight"></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td></td>
      <td class="black knight"></td>
      <td></td>
    </tr>
  </table>
  <figcaption>
    Two knights attacking each other
  </figcaption>
</figure>
<p>
  The above illustration shows just one of several ways two knights
  can attack each other on a \( 3 \times 3 \) board.  There are, in
  fact, a total of eight such placements, shown below.
</p>
<figure style="text-align: center">
  <!-- 1 -->
  <table class="chess odd inline">
    <tr>
      <td class="black knight"></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td></td>
      <td class="black knight"></td>
      <td></td>
    </tr>
  </table>
  <!-- 2 -->
  <table class="chess odd inline">
    <tr>
      <td></td>
      <td class="black knight"></td>
      <td></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td class="black knight"></td>
      <td></td>
      <td></td>
    </tr>
  </table>
  <!-- 3 -->
  <table class="chess odd inline">
    <tr>
      <td></td>
      <td class="black knight"></td>
      <td></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td class="black knight"></td>
    </tr>
  </table>
  <!-- 4 -->
  <table class="chess odd inline">
    <tr>
      <td></td>
      <td></td>
      <td class="black knight"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td></td>
      <td class="black knight"></td>
      <td></td>
    </tr>
  </table>
  <!-- 5 -->
  <table class="chess odd inline">
    <tr>
      <td class="black knight"></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td class="black knight"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
    </tr>
  </table>
  <!-- 6 -->
  <table class="chess odd inline">
    <tr>
      <td></td>
      <td></td>
      <td class="black knight"></td>
    </tr>
    <tr>
      <td class="black knight"></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
    </tr>
  </table>
  <!-- 7 -->
  <table class="chess odd inline">
    <tr>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td class="black knight"></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td class="black knight"></td>
    </tr>
  </table>
  <!-- 8 -->
  <table class="chess odd inline">
    <tr>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td class="black knight"></td>
    </tr>
    <tr>
      <td class="black knight"></td>
      <td></td>
      <td></td>
    </tr>
  </table>
  <figcaption>
    All \( 8 \) ways two identical knights can attack each other on a
    \( 3 \times 3 \) board.
  </figcaption>
</figure>
<p>
  Let \( f(n) \) denote the number of ways we can place two identical
  knights on an \( n \times n \) chessboard such that they attack each
  other, where \( n \ge 1.  \)
</p>
<p>
  A \( 1 \times 1 \) board has room for only one knight, so we define
  \( f(1) = 0.  \)  On a \( 2 \times 2 \) board, a knight cannot move
  two squares in any direction and therefore cannot attack.
  Therefore, \( f(2) = 0.  \)  To summarise,

  \[
    f(1) = f(2) = 0.
  \]

  From the illustration above, we see that \( f(3) = 8.  \)  We want to
  find a closed-form expression for \( f(n).  \)
</p>
<p>
  We will analyse this problem from various perspectives.  We begin
  with a couple of needlessly complicated approaches, followed by a
  simple and elegant solution.  While I personally enjoy these
  long-winded explorations, if you prefer a more direct solution,
  please skip ahead
  to <a href="#counting-placements-from-minimal-attack-sections">Counting
  Placements From Minimal Attack Sections</a>.
</p>
<p>
  Before we proceed, let us introduce the term <em>mutually attacking
  knight placement</em> to mean a placement of two knights on the
  chessboard such that they attack each other.  Unless stated
  otherwise, the two knights are identical.  This term will serve as a
  convenient shorthand for referring to such placements.
</p>
<h2 id="counting-placements-as-the-board-grows">Counting Placements as the Board Grows<a href="#counting-placements-as-the-board-grows"></a></h2>
<p>
  We now turn to the needlessly complicated solution promised in the
  previous section.  We analyse the <em>new</em> mutually attacking
  knight placements introduced when an existing board is enlarged by
  adding a row and a column.
</p>
<p>
  Let us define

  \[
    \Delta f(n) = f(n) - f(n - 1)
  \]

  for \( n \ge 2, \) so that \( \Delta f(n) \) denotes the new
  mutually attacking knight placements introduced when an \( (n - 1)
  \times (n - 1) \) board is expanded to size \( n \times n \) by
  adding one row and one column.
</p>
<p>
  For brevity, we will avoid restating the process of enlarging an \(
  (n - 1) \times (n - 1) \) board to an \( n \times n \) board by
  adding one row and one column whenever we refer to new placements.
  Instead, we use the term <em>new placements</em> on an
  \( n \times n \) board to refer to \( \Delta f(n).  \)  It is to be
  understood that these new placements are the mutually attacking
  knight placements introduced by enlarging the board from size \( (n
  - 1) \times (n - 1) \) to \( n \times n.  \)
</p>
<p>
  Without loss of generality, suppose the new row and column are added
  to the bottom and right respectively.  We already know that

  \begin{align*}
    \Delta f(2) &amp; = f(2) - f(1) = 0 - 0 = 0, \\
    \Delta f(3) &amp; = f(3) - f(2) = 8 - 0 = 8.  \\
  \end{align*}

  We will now find \( \Delta f(n) \) for \( n \ge 4.  \)  To do this,
  we first categorise the newly added squares due to board expansion,
  into four types, as illustrated below.
</p>
<figure>
  <table class="chess">
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="em">A</td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="em">B</td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="em">C</td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="em">C</td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="em">D</td>
    </tr>
    <tr>
      <td class="em">A</td>
      <td class="em">B</td>
      <td class="em">C</td>
      <td class="em">C</td>
      <td class="em">D</td>
      <td class="em">A</td>
    </tr>
  </table>
  <figcaption>
    New squares, labelled by type, as the board size increases from \(
    5 \times 5 \) to \( 6 \times 6 \)
  </figcaption>
</figure>
<p>
  Here is a brief description of each square type:
</p>
<ul>
  <li>
    Type A squares are the three new corner squares.
  </li>
  <li>
    Type B squares are the two new squares adjacent to type A squares
    at the top and left edges.
  </li>
  <li>
    Type C squares are the new squares that are <em>not</em> adjacent
    to any type A square.  If the new board has dimensions \( n \times
    n, \) where \( n \ge 4, \) then there are exactly \( 2n - 8 \)
    squares of type C.
  </li>
  <li>
    Type D squares are the two new squares adjacent to the
    bottom-right type A square.
  </li>
</ul>
<p>
  We now calculate how many new mutually attacking knight placements
  are introduced by these additional squares as the board expands.  We
  proceed with a case-by-case analysis for each square type.
</p>
<h3 id="type-a-squares">Type A Squares<a href="#type-a-squares"></a></h3>
<p>
  There are three squares of type A.  If we place one knight on a type
  A square, there are two positions for the second knight such that
  the two knights attack each other.
</p>
<figure>
  <table class="chess odd">
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="black knight em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td>&cross;</td>
      <td></td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td>&cross;</td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="em"></td>
    </tr>
    <tr>
      <td class="black knight em"></td>
      <td class="em"></td>
      <td class="em"></td>
      <td class="em"></td>
      <td class="black knight em"></td>
    </tr>
  </table>
  <figcaption>
    Knights on type A squares, with squares attacked by the top knight
    marked with crosses
  </figcaption>
</figure>
<p>
  Since there are three such squares, we get a total of \( 3 \times 2
  = 6 \) new mutually attacking knight placements.
</p>
<h3 id="type-b-squares">Type B Squares<a href="#type-b-squares"></a></h3>
<p>
  There are two squares of type B.  If we place one knight on a type B
  square, there are three positions for the second knight such that
  the two knights attack each other.
</p>
<figure>
  <table class="chess odd">
    <tr>
      <td></td>
      <td></td>
      <td>&cross;</td>
      <td></td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="black knight em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td>&cross;</td>
      <td></td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td>&cross;</td>
      <td class="em"></td>
    </tr>
    <tr>
      <td class="em"></td>
      <td class="black knight em"></td>
      <td class="em"></td>
      <td class="em"></td>
      <td class="em"></td>
    </tr>
  </table>
  <figcaption>
    Knights on type B squares, with squares attacked by the top knight
    marked with crosses
  </figcaption>
</figure>
<p>
  Since there are two such squares, we get a total of \( 2 \times 3 =
  6 \) new mutually attacking knight placements.
</p>
<h3 id="type-c-squares">Type C Squares<a href="#type-c-squares"></a></h3>
<p>
  The number of type C squares depends on the board size.  When we
  increase the size of a board from \( (n - 1) \times (n - 1) \) to
  \(n \times n, \) where \( n \ge 4, \) we add \( n^2 - (n - 1)^2 = 2n
  - 1 \) new squares.  Among these, \( 3 \) are of type A, \( 2 \) are
  of type B and \( 2 \) are of type D.  That gives us a total of \(
  7 \) squares of type A, B or D.  The remaining \( 2n - 1 - 7 = 2n -
  8 \) squares are therefore of type C.  Note that when the board size
  increases from \( 3 \times 3 \) to \( 4 \times 4, \) there are \( 2
  \times 4 - 8 = 0 \) squares of type C.
</p>
<figure>
  <table class="chess">
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td class="em">A</td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td class="em">B</td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td class="em">D</td>
    </tr>
    <tr>
      <td class="em">A</td>
      <td class="em">B</td>
      <td class="em">D</td>
      <td class="em">A</td>
    </tr>
  </table>
  <figcaption>
    A \( 4 \times 4 \) board has no type C squares.
  </figcaption>
</figure>
<p>
  However, for a board of size \( 5 \times 5 \) or greater, there is a
  positive number of type C squares since \( 2n - 8 \gt 0 \) if and
  only if \( n \gt 4.  \)
</p>
<figure>
  <table class="chess">
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="em">A</td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="em">B</td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="em">C</td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="em">D</td>
    </tr>
    <tr>
      <td class="em">A</td>
      <td class="em">B</td>
      <td class="em">C</td>
      <td class="em">D</td>
      <td class="em">A</td>
    </tr>
  </table>
  <figcaption>
    A \( 5 \times 5 \) board has one type C square.
  </figcaption>
</figure>
<p>
  If we place one knight on a type C square, there are four positions
  for the second knight such that the two knights attack each other.
</p>
<figure>
  <table class="chess odd">
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td>&cross;</td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td>&cross;</td>
      <td></td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="black knight em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td>&cross;</td>
      <td></td>
      <td class="em"></td>
    </tr>
    <tr>
      <td class="em"></td>
      <td class="em"></td>
      <td class="black knight em"></td>
      <td class="em">&cross;</td>
      <td class="em"></td>
    </tr>
  </table>
  <figcaption>
    Knights on type C squares, with squares attacked by the top knight
    marked with crosses
  </figcaption>
</figure>
<p>
  Since there are \( 2n - 8 \) such squares, we get a total of \( 4(2n
  - 8) = 8(n - 4) \) new mutually attacking knight placements.
</p>
<h3 id="type-d-squares">Type D Squares<a href="#type-d-squares"></a></h3>
<p>
  There are two squares of type D.  As with type B squares, placing
  one knight on a type D square yields three positions for the second
  knight such that the two knights attack each other.  This gives \( 2
  \times 3 = 6 \) <em>potentially</em> new placements.
</p>
<figure>
  <table class="chess odd">
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td>&cross;</td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td>&cross;</td>
      <td></td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="black knight em"></td>
    </tr>
    <tr>
      <td class="em"></td>
      <td class="em"></td>
      <td class="em">&cross;</td>
      <td class="black knight em"></td>
      <td class="em"></td>
    </tr>
  </table>
  <figcaption>
    Knights on type D squares, with squares attacked by the top knight
    marked with crosses
  </figcaption>
</figure>
<p>
  However, unlike type B squares, not all of these placements are
  <em>new</em>.  The two placements where one knight is on the right
  edge and the other on the bottom edge were already counted in a
  previous subsection.
</p>
<figure>
  <table class="chess inline">
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td class="black knight em"></td>
    </tr>
    <tr>
      <td class="em"></td>
      <td class="black knight em"></td>
      <td class="em"></td>
      <td class="em"></td>
    </tr>
  </table>
  <table class="chess inline">
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td class="black knight em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td class="em"></td>
    </tr>
    <tr>
      <td class="em"></td>
      <td class="em"></td>
      <td class="black knight em"></td>
      <td class="em"></td>
    </tr>
  </table>
  <figcaption>
    Placements already counted while analysing placements involving a
    knight on a type B square of the \( 4 \times 4 \) board
  </figcaption>
</figure>
<p>
  For example, when we increase the board size from \( 3 \times 3 \)
  to \( 4 \times 4, \) both the placements described in the previous
  paragraph appear while analysing the placements with a knight on a
  type B square.  More generally, for any board of size
  \( n \times n \) with \( n \ge 5, \) these placements occur while
  analysing the placements with a knight on a type C square.
  Therefore the total number of new mutually attacking knight
  placements is \( 2 \times 3 - 2 = 4.  \)
</p>
<figure>
  <table class="chess odd inline">
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="black knight em"></td>
    </tr>
    <tr>
      <td class="em"></td>
      <td class="em"></td>
      <td class="black knight em"></td>
      <td class="em"></td>
      <td class="em"></td>
    </tr>
  </table>
  <table class="chess odd inline">
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="black knight em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td class="em"></td>
      <td class="em"></td>
      <td class="em"></td>
      <td class="black knight em"></td>
      <td class="em"></td>
    </tr>
  </table>
  <figcaption>
    Placements already counted while analysing placements involving a
    knight on a type C square of an \( n \times n \) board, where \( n
    \ge 5 \)
  </figcaption>
</figure>
<p>
  Another way to describe this result is to observe that when one
  knight is placed on a type D square, only two positions for the
  second knight yield <em>new</em> mutually attacking knight
  placements.  Since there are two type \( D \) squares, we get a
  total of \( 2 \times 2 = 4 \) new mutually attacking knight
  placements.
</p>
<figure>
  <table class="chess odd">
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td>&cross;</td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td>&cross;</td>
      <td></td>
      <td class="em"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
      <td class="black knight em"></td>
    </tr>
    <tr>
      <td class="em"></td>
      <td class="em"></td>
      <td class="em"></td>
      <td class="black knight em"></td>
      <td class="em"></td>
    </tr>
  </table>
  <figcaption>
    Knights on type D squares, with squares attacked by the top knight
    that yield <em>new</em> mutually attacking knight placements
    marked with crosses
  </figcaption>
</figure>
<h3 id="closed-form-expression-1">Closed Form Expression<a href="#closed-form-expression-1"></a></h3>
<p>
  If we add the number of new mutually attacking knight placements
  found in each of the cases above, we get

  \[
    \Delta f(n) = 6 + 6 + 8(n - 4) + 4 = 8(n - 2)
  \]

  new mutually attacking knight placements as the board size increases
  from \( (n - 1) \times (n - 1) \) to \( n \times n, \) where \( n
  \ge 4.  \)  We already know that \( \Delta f(2) = 0 \) and \( \Delta
  f(3) = 8.  \)  Surprisingly, the above formula produces the correct
  values for those cases as well.  Therefore, we can generalise this
  result as

  \[
    \Delta f(n) = 8(n - 2)
  \]

  for all \( n \ge 2.  \)  We can now calculate \( f(n) \) for \( n \ge
  1 \) as follows:

  \begin{align*}
    f(n)
    &amp; = \sum_{k = 1}^n f(k) - \sum_{k = 1}^{n - 1} f(k) \\
    &amp; = \sum_{k = 1}^n f(k) - \sum_{k = 2}^n f(k - 1) \\
    &amp; = f(1) + \sum_{k = 2}^n (f(k) - f(k - 1)) \\
    &amp; = f(1) + \sum_{k = 2}^n \Delta f(k) \\
    &amp; = 0 + \sum_{k = 2}^n 8(k - 2) \\
    &amp; = 8 \sum_{k = 0}^{n - 2} k \\
    &amp; = \frac{8(n - 2)(n - 1)}{2} \\
    &amp; = 4(n - 1)(n - 2).
  \end{align*}

  To summarise, we now have a closed form expression for \( f(n).  \)
  For all \( n \ge 1, \) we have

  \[
    f(n) = 4(n - 1)(n - 2).
  \]
</p>
<h2 id="counting-placements-for-each-square">Counting Placements for Each Square<a href="#counting-placements-for-each-square"></a></h2>
<p>
  The previous section took a long-winded path to arrive at a closed
  form expression for \( f(n).  \)  In this section, we will reach the
  same result that is still a bit drawn out, but not quite as much as
  before.
</p>
<p>
  This time, instead of looking only at the new squares created when
  the board grows, we consider <em>every</em> square on the board.  To
  make the counting easier, we no longer treat the knights as
  identical.  We first work with two distinct knights, count the
  mutually attacking knight placements and then divide the total by \(
  2 \) to get the result for identical knights.
</p>
<h3 id="attacking-degrees-of-squares">Attacking Degrees of Squares<a href="#attacking-degrees-of-squares"></a></h3>
<p>
  Here, we introduce the term <em>attacking degree of a square</em> to
  mean the number of squares a knight can move to from that square in
  a single move.  In other words, the attacking degree of a square is
  the number of squares that would be attacked if a knight were placed
  on it.  For example, the corner squares have an attacking degree of
  \( 2.  \)
</p>
<figure>
  <table class="chess">
    <tr>
      <td class="black knight"></td>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td>&cross;</td>
      <td></td>
    </tr>
    <tr>
      <td></td>
      <td>&cross;</td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td></td>
      <td></td>
    </tr>
  </table>
  <figcaption>
    The attacking degree of a corner square is \( 2 \) since a knight
    can attack two squares from it
  </figcaption>
</figure>
<p>
  Let us now label each square with its attacking degree.  A \( 1
  \times 1 \) board has only one square of attacking degree \( 0 \)
  since a knight placed on it has nothing to attack.  Similarly, each
  square of a \( 2 \times 2 \) board has attacking degree \( 0 \) too.
</p>
<figure>
  <table class="chess odd inline">
    <tr>
      <td>0</td>
    </tr>
  </table>
  <table class="chess inline">
    <tr>
      <td>0</td>
      <td>0</td>
    </tr>
    <tr>
      <td>0</td>
      <td>0</td>
    </tr>
  </table>
  <figcaption>
    Attacking degrees of all squares are zero on \( 1 \times 1 \) and
    \( 2 \times 2 \) boards
  </figcaption>
</figure>
<p>
  On a \( 3 \times 3 \) board, all squares have attacking degree
  \( 2 \) except the centre square, whose attacking degree is \( 0.  \)
  In other words, placing a knight on any square other than the middle
  one gives exactly two possible positions for the other knight so
  that they attack each other.
</p>
<figure>
  <table class="chess odd">
    <tr>
      <td>2</td>
      <td>2</td>
      <td>2</td>
    </tr>
    <tr>
      <td>2</td>
      <td>0</td>
      <td>2</td>
    </tr>
    <tr>
      <td>2</td>
      <td>2</td>
      <td>2</td>
    </tr>
  </table>
  <figcaption>
    Attacking degrees of all squares on a \( 3 \times 3 \) board
  </figcaption>
</figure>
<p>
  With eight such squares, we get \( 8 \times 2 = 16 \) mutually
  attacking knight placements when the two knights are distinct.  If
  we divide this number by \( 2, \) we get \( 8 \) which is indeed the
  number of mutually attacking knight placements on a \( 3 \times 3 \)
  board when the two knights are identical.  This matches the earlier
  result \( f(3) = 8.  \)
</p>
<h3 id="from-attacking-degrees-to-counting-placements">From Attacking Degrees to Counting Placements<a href="#from-attacking-degrees-to-counting-placements"></a></h3>
<p>
  Let \( g(n) \) be the number of mutually attacking knight placements
  on an \( n \times n \) board when the knights are distinct.  Then \(
  g(n) \) is simply the sum of the attacking degrees of all squares on
  the board.  As before, let \( f(n) \) denote the number of mutually
  attacking knight placements on an \( n \times n \) board when the
  two knights are identical.  We will now show that
  \( f(n) = g(n)/2.  \)
</p>
<p>
  Label all squares of the \( n \times n \) board as \( S_1, S_2,
  \dots, S_{n^2} \) in any fixed order.  Label the two distinct
  knights as \( N_1 \) and \( N_2.  \)  We represent each mutually
  attacking knight placement as an ordered pair \( (S_i, S_j) \) if \(
  N_1 \) is on \( S_i \) and \( N_2 \) is on \( S_j, \) with the two
  knights attacking each other.  Here \( 1 \le i, j \le n^2 \) and \(
  i \ne j.  \)
</p>
<p>
  Let \( M \) be the set of all mutually attacking knight placements
  for distinct knights on an \( n \times n \) board.  Then

  \[
    g(n) = \lvert M \rvert.
  \]

  If \( (S_i, S_j) \) is a mutually attacking knight placement of the
  distinct knights \( N_1 \) and \( N_2 \) for some \( i \) and
  \( j \) with \( 1 \le i, j \le n^2 \) and \( i \ne j, \) then \(
  (S_j, S_i) \) is also a mutually attacking knight placement, since
  swapping the positions of the two mutually attacking knights still
  yields a valid mutually attacking placement.  Therefore

  \[
    (S_i, S_j) \in M \iff (S_j, S_i) \in M.
  \]

  Each ordered placement \( (S_i, S_j) \) in \( M \) is thus paired
  with the ordered placement \( (S_j, S_i).  \)  When the knights are
  identical, the two arrangements are indistinguishable and count as
  one placement.  Hence, the number of mutually attacking placements
  for identical knights is exactly half of the number for distinct
  knights, i.e.

  \[
    f(n) = \frac{g(n)}{2}.
  \]

  The next subsection focuses on calculating \( g(n), \) from which \(
  f(n) \) follows immediately by the above formula.
</p>
<h3 id="closed-form-expression-2">Closed Form Expression<a href="#closed-form-expression-2"></a></h3>
<p>
  As noted in the previous section, the number of mutually attacking
  knight placements for two distinct knights on an \( n \times n \)
  board is simply the sum of attacking degrees of all squares on the
  board.  If we label each square as discussed in the previous section
  and use the notation \( \deg(S_i) \) for the attacking degree of the
  square labelled \( S_i, \) where \( 1 \le i \le n^2, \) then

  \[
    g(n) = \sum_{i=1}^{n^2} \deg(S_i).
  \]

  Recall that the attacking degree of a square is the number of
  squares a knight could attack if it were placed there.  Earlier, we
  saw that on a \( 3 \times 3 \) board, all squares except the centre
  one have attacking degree \( 2, \) which gives \( g(3) = 8 \times 2
  = 16 \) and \( f(3) = g(3)/2 = 8.  \)  Let us now write down the
  attacking degrees of all squares on a \( 4 \times 4 \) board.
</p>
<figure>
  <table class="chess">
    <tr>
      <td>2</td>
      <td>3</td>
      <td>3</td>
      <td>2</td>
    </tr>
    <tr>
      <td>3</td>
      <td>4</td>
      <td>4</td>
      <td>3</td>
    </tr>
    <tr>
      <td>3</td>
      <td>4</td>
      <td>4</td>
      <td>3</td>
    </tr>
    <tr>
      <td>2</td>
      <td>3</td>
      <td>3</td>
      <td>2</td>
    </tr>
  </table>
  <figcaption>
    Attacking degrees of all squares on a \( 4 \times 4 \) board
  </figcaption>
</figure>
<p>
  From the above illustration we get

  \begin{align*}
    g(4) &amp; = 4 \times 2 + 8 \times 3 + 4 \times 4 = 48, \\
    f(4) &amp; = g(4)/2 = 24.
  \end{align*}

  A more general pattern emerges if we consider a larger board, such
  as a \( 6 \times 6 \) board.
</p>
<figure>
  <table class="chess">
    <tr>
      <td>2</td>
      <td>3</td>
      <td>4</td>
      <td>4</td>
      <td>3</td>
      <td>2</td>
    </tr>
    <tr>
      <td>3</td>
      <td>4</td>
      <td>6</td>
      <td>6</td>
      <td>4</td>
      <td>3</td>
    </tr>
    <tr>
      <td>4</td>
      <td>6</td>
      <td>8</td>
      <td>8</td>
      <td>6</td>
      <td>4</td>
    </tr>
    <tr>
      <td>4</td>
      <td>6</td>
      <td>8</td>
      <td>8</td>
      <td>6</td>
      <td>4</td>
    </tr>
    <tr>
      <td>3</td>
      <td>4</td>
      <td>6</td>
      <td>6</td>
      <td>4</td>
      <td>3</td>
    </tr>
    <tr>
      <td>2</td>
      <td>3</td>
      <td>4</td>
      <td>4</td>
      <td>3</td>
      <td>2</td>
    </tr>
  </table>
  <figcaption>
    Attacking degrees of all squares on a \( 6 \times 6 \) board
  </figcaption>
</figure>
<p>
  From this illustration, we get

  \begin{align*}
    g(6) &amp; = 4 \times 2 + 8 \times 3 + 12 \times 4 + 8 \times 6 + 4 \times 8 = 160.  \\
    f(6) &amp; = g(6)/2 = 80.
  \end{align*}

  Let us find a general formula now for \( n \ge 4.  \)  We introduce
  one more notation.  Let \( D_k(n) \) denote the sum of the attacking
  degrees of all squares of attacking degree \( k \) on an \( n \times
  n \) board, i.e.

  \[
    D_k(n) = \sum_{\mathclap{\deg(S_i) = k}} \deg(S_i).
  \]

  Since the only attacking degrees the squares can have are \( 2, 3,
  4, 6 \) and \( 8, \) the sum of the attacking degrees of all squares
  can be written as

  \[
    g(n) = D_2(n) + D_3(n) + D_4(n) + D_6(n) + D_8(n).
  \]

  There are exactly four squares of attacking degree \( 2.  \)  These
  are the corner ones.  Therefore,

  \[
    D_2(n) = 4 \times 2 = 8.
  \]

  The eight squares adjacent to the corner squares have attacking
  degree \( 3.  \)  Therefore,

  \[
    D_3(n) = 8 \times 3 = 24.
  \]

  Let us define an <em>inner corner square</em> as one that shares a
  corner with a corner square but not an edge with it.  There are four
  inner corner squares and each has attacking degree \( 4.  \)
  Further, each row and column on the outer edge contains \( n - 4 \)
  additional squares with attacking degree \( 4.  \)  Therefore,

  \[
    D_4(n) = (4 + 4(n - 4))(4) = 16(n - 3).
  \]

  Consider a row or column that contains two inner corner squares of
  attacking degree \( 4.  \)  All \( n - 4 \) squares between the inner
  corner squares have attacking degree \( 6.  \)  There are two such
  rows and two such columns.  Therefore,

  \[
    D_6(n) = 4(n - 4)(6) = 24(n - 4).
  \]

  We have counted the attacking degrees of all squares in the first
  two columns and rows as well as the last two columns and rows.  We
  are left with \( (n - 4)^2 \) squares in the middle and they all
  have attacking degree \( 8.  \)  Therefore,

  \[
    D_8(n) = 8(n - 4)^2.
  \]

  Therefore,

  \begin{align*}
    g(n)
    &amp; = D_2(n) + D_3(n) + D_4(n) + D_6(n) + D_8(n) \\
    &amp; = 8 + 24 + 16(n - 3) + 24(n - 4) + 8(n - 4)^2 \\
    &amp; = 8(n - 1)(n - 2).
  \end{align*}

  Even though we assumed \( n \ge 4 \) while obtaining the above
  formula, remarkably, it gives us the correct values for \( n = 1,
  2 \) and \( 3.  \)  The number of mutually attacking knight
  placements for distinct knights on an \( n \times n \) board is \(
  0 \) if \( n = 1 \) or \( 2.  \)  It is \( 16 \) if \( n = 3.  \)
  Indeed the above formula gives us

  \[
    g(1) = g(2) = 0, \quad g(3) = 16.
  \]

  Therefore, we can now generalise the above result as

  \[
    g(n) = 8(n - 1)(n - 2)
  \]

  for all \( n \ge 1.  \)  Therefore, for all \( n \ge 1, \)

  \[
    f(n) = \frac{g(n)}{2} = 4(n - 1)(n - 2).
  \]
</p>
<h2 id="counting-placements-from-minimal-attack-sections">Counting Placements From Minimal Attack Sections<a href="#counting-placements-from-minimal-attack-sections"></a></h2>
<p>
  Finally, in this section, we take a look at a simple and elegant
  solution that arrives at the closed-form solution in a more direct
  manner.  The analysis begins by looking at the smallest section of
  the board where two knights can attack each other.
</p>
<h3 id="minimal-attack-sections">Minimal Attack Sections<a href="#minimal-attack-sections"></a></h3>
<p>
  Consider a \( 2 \times 3 \) section of a board of size
  \( 3 \times 3 \) or larger.  Such a section has exactly two mutually
  attacking knight placements.
</p>
<figure>
  <table class="chess inline">
    <tr>
      <td class="black knight"></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
      <td class="black knight"></td>
    </tr>
  </table>
  <table class="chess inline">
    <tr>
      <td></td>
      <td></td>
      <td class="black knight"></td>
    </tr>
    <tr>
      <td class="black knight"></td>
      <td></td>
      <td></td>
    </tr>
  </table>
  <figcaption>
    Two mutually attacking knight placements on a \( 2 \times 3 \)
    section of a board
  </figcaption>
</figure>
<p>
  Similarly, a \( 3 \times 2 \) section of a board also has exactly
  two mutually attacking knight placements.
</p>
<figure>
  <table class="chess odd inline">
    <tr>
      <td class="black knight"></td>
      <td></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td></td>
      <td class="black knight"></td>
    </tr>
  </table>
  <table class="chess odd inline">
    <tr>
      <td></td>
      <td class="black knight"></td>
    </tr>
    <tr>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td class="black knight"></td>
      <td></td>
    </tr>
  </table>
  <figcaption>
    Two mutually attacking knight placements on a \( 3 \times 2 \)
    section of a board
  </figcaption>
</figure>
<p>
  We call these \( 2 \times 3 \) and \( 3 \times 2 \) sections
  the <em>minimal attack sections</em> of a board, since no smaller
  section can contain a mutually attacking knight placement.
</p>
<p>
  Two distinct \( 2 \times 3 \) sections can share at most a \( 1
  \times 3 \) section, which is smaller than a minimal attack section.
  Consequently, no mutually attacking knight placement can be common
  to two distinct \( 2 \times 3 \) sections of a board.
</p>
<p>
  Similarly, two distinct \( 3 \times 2 \) sections can share at most
  a \( 3 \times 1 \) section, again too small to contain a minimal
  attack section.  Therefore, they share no mutually attacking knight
  placement.
</p>
<p>
  A \( 2 \times 3 \) section and a \( 3 \times 2 \) section can share
  at most a \( 2 \times 2 \) section, which is still smaller than a
  minimal attack section, so they share no mutually attacking knight
  placement either.
</p>
<p>
  To summarise, any two minimal attack sections of the board yield
  distinct pairs of mutually attacking knight placements.  The total
  number of such placements is therefore exactly twice the number of
  minimal attack sections on the board.
</p>
<h3 id="closed-form-expression-3">Closed Form Expression<a href="#closed-form-expression-3"></a></h3>
<p>
  In an \( n \times n \) board where \( n \ge 3, \) the left edge of a
  \( 2 \times 3 \) section can be placed in any one of the first \( n
  - 2 \) columns of the board.  Similarly, the top edge of such a
  section can be placed in any one of the first \( n - 1 \) rows of
  the board.  Therefore, the total number of distinct \( 2 \times 3 \)
  sections on the board is \( (n - 2)(n - 1).  \)
</p>
<p>
  By similar reasoning, the number of distinct \( 3 \times 2 \)
  sections on an \( n \times n \) board, where \( n \ge 3, \) is also
  \( (n - 1)(n - 2).  \)
</p>
<p>
  Let \( h(n) \) be the total number of minimal attack sections we can
  find on an \( n \times n \) board where \( n \ge 1.  \)  From the
  discussion in the previous two paragraphs, we know that \( h(n) =
  2(n - 1)(n - 2) \) for \( n \ge 3.  \)  Further, this formula for \(
  h(n) \) works for \( n = 1 \) and \( n = 2 \) as well since \( h(1)
  = h(2) = 0 \) and indeed a \( 1 \times 1 \) board or a
  \( 2 \times 2 \) board is too small to contain any minimal attack
  sections.  Therefore, for all \( n \ge 1, \) we get

  \[
    h(n) = 2(n - 1)(n - 2).
  \]

  Since each minimal attack section yields two mutually attacking
  knight placements, the total number of mutually attacking knight
  placements on an \( n \times n \) board is

  \[
    f(n) = 2h(n) = 4(n - 1)(n - 2)
  \]

  for all \( n \ge 1.  \)
</p>
<h2 id="reference">References<a href="#reference"></a></h2>
<ul>
  <li>
    <a href="https://cses.fi/problemset/task/1072">Two Knights</a>
    from the CSES Problem Set
  </li>
  <li>
    <a href="https://mathworld.wolfram.com/KnightGraph.html">Knight Graph</a>
    by Eric W. Weisstein
  </li>
  <li>
    <a href="https://oeis.org/A033996">OEIS Entry A033996</a>
    by N. J. A. Sloane
  </li>
  <li>
    <a href="https://oeis.org/A172132">OEIS Entry A172132</a>
    by Vaclav Kotesovec
  </li>
</ul>
<!-- ### -->
<p>
  <a href="https://susam.net/mutually-attacking-knights.html">Read on website</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a> |
  <a href="https://susam.net/tag/puzzle.html">#puzzle</a>
</p>
]]>
</description>
</item>
<item>
<title>Zigzag Number Spiral</title>
<link>https://susam.net/zigzag-number-spiral.html</link>
<guid isPermaLink="false">znscf</guid>
<pubDate>Sun, 27 Jul 2025 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<div style="display: none">
  \[
    \gdef\lf{\hspace{-5mm}\leftarrow\hspace{-5mm}}
    \gdef\rt{\hspace{-5mm}\rightarrow\hspace{-5mm}}
    \gdef\up{\uparrow}
    \gdef\dn{\downarrow}
    \gdef\sp{}
    \gdef\cd{\cdots}
    \gdef\vd{\vdots}
    \gdef\dd{\ddots}
    \gdef\arraystretch{1.2}
    \gdef\hl{\small\blacktriangleright}
  \]
</div>
<p>
  Consider the following infinite grid of numbers, where the numbers
  are arranged in a spiral-like manner, but the spiral reverses
  direction each time it reaches the edge of the grid:

  \begin{array}{rcrcrcrcrl}
      1 &amp; \rt &amp;   2 &amp; \sp &amp;   9 &amp; \rt &amp;  10 &amp; \sp &amp;  25 &amp; \cd \\
    \sp &amp; \sp &amp; \dn &amp; \sp &amp; \up &amp; \sp &amp; \dn &amp; \sp &amp; \up &amp; \sp \\
      4 &amp; \lf &amp;   3 &amp; \sp &amp;   8 &amp; \sp &amp;  11 &amp; \sp &amp;  24 &amp; \cd \\
    \dn &amp; \sp &amp; \sp &amp; \sp &amp; \up &amp; \sp &amp; \dn &amp; \sp &amp; \up &amp; \sp \\
      5 &amp; \rt &amp;   6 &amp; \rt &amp;   7 &amp; \sp &amp;  12 &amp; \sp &amp;  23 &amp; \cd \\
    \sp &amp; \sp &amp; \sp &amp; \sp &amp; \sp &amp; \sp &amp; \dn &amp; \sp &amp; \up &amp; \sp \\
     16 &amp; \lf &amp;  15 &amp; \lf &amp;  14 &amp; \lf &amp;  13 &amp; \sp &amp;  22 &amp; \cd \\
    \dn &amp; \sp &amp; \sp &amp; \sp &amp; \sp &amp; \sp &amp; \sp &amp; \sp &amp; \up &amp; \sp \\
     17 &amp; \rt &amp;  18 &amp; \rt &amp;  19 &amp; \rt &amp;  20 &amp; \rt &amp;  21 &amp; \cd \\
    \vd &amp; \sp &amp; \vd &amp; \sp &amp; \vd &amp; \sp &amp; \vd &amp; \sp &amp; \vd &amp; \dd
  \end{array}

  Can we find a closed-form expression that tells us the number at the
  \( m \)th row and \( n \)th column?
</p>
<h2 id="contents">Contents<a href="#contents"></a></h2>
<ul>
  <li><a href="#introduction">Introduction</a></li>
  <li><a href="#patterns-on-the-edges">Patterns on the Edges</a>
    <ul>
      <li><a href="#computing-edge-numbers">Computing Edge Numbers</a></li>
      <li><a href="#computing-all-grid-numbers-1">Computing All Grid Numbers</a></li>
      <li><a href="#closed-form-expression-1">Closed Form Expression</a></li>
    </ul>
  </li>
  <li><a href="#patterns-on-the-diagonal">Patterns on the Diagonal</a>
    <ul>
      <li><a href="#computing-diagonal-numbers">Computing Diagonal Numbers</a></li>
      <li><a href="#computing-all-grid-numbers-2">Computing All Grid Numbers</a></li>
      <li><a href="#closed-form-expression-2">Closed Form Expression</a></li>
    </ul>
  </li>
  <li><a href="#references">References</a></li>
</ul>
<h2 id="introduction">Introduction<a href="#introduction"></a></h2>
<p>
  Before we explore this problem further, let us rewrite the zigzag
  number spiral grid in a cleaner form, omitting the arrows:

  \begin{array}{rrrrrl}
      1 &amp;   2 &amp;   9  &amp;  10 &amp;  25 &amp; \cd \\
      4 &amp;   3 &amp;   8  &amp;  11 &amp;  24 &amp; \cd \\
      5 &amp;   6 &amp;   7  &amp;  12 &amp;  23 &amp; \cd \\
     16 &amp;  15 &amp;  14  &amp;  13 &amp;  22 &amp; \cd \\
     17 &amp;  18 &amp;  19  &amp;  20 &amp;  21 &amp; \cd \\
    \vd &amp; \vd &amp; \vd  &amp; \vd &amp; \vd &amp; \dd
  \end{array}

  Let \( f(m, n) \) denote the number at the \( m \)th row and
  \( n \)th column.  For example, \( f(1, 1) = 1 \) and \( f(2, 5) =
  24.  \)  We want to find a closed-form expression for \( f(m, n).  \)
</p>
<p>
  Let us first clarify what we mean by a <em>closed-form
  expression</em>.  There is no universal definition of a closed-form
  expression, but the term typically refers to a mathematical
  expression involving variables and constants, built using a finite
  combination of basic operations: addition, subtraction,
  multiplication, division, integer exponents, roots with integer
  index and functions such as exponentials, logarithms and
  trigonometric functions.
</p>
<p>
  In this article, however, we need only addition, subtraction,
  division, squares and square roots.  This may be a bit of a spoiler,
  but I must mention that the \( \max \) function appears in the
  closed-form expressions we are about to see.  If you are concerned
  about whether functions like \( \max \) and \( \min \) are permitted
  in such expressions, note that

  \begin{align*}
    \max(m, n) &amp; = \frac{m + n + \sqrt{(m - n)^2}}{2}, \\
    \min(m, n) &amp; = \frac{m + n - \sqrt{(m - n)^2}}{2}.
  \end{align*}

  So \( \max \) and \( \min \) are simply shorthand for expressions
  involving addition, subtraction, division, squares and square roots.
  In the discussion that follows, we will use only the \( \max \)
  function.
</p>
<h2 id="patterns-on-the-edges">Patterns on the Edges<a href="#patterns-on-the-edges"></a></h2>
<p>
  Let us begin by analysing the edge numbers.  Number the rows as \(
  1, 2, 3 \dots \) and the columns likewise.  Observe where the spiral
  touches the left edge and changes direction.  This happens only on
  even-numbered rows.  Similarly, each time the spiral touches the top
  edge and changes direction, it does so on odd-numbered columns.  In
  the following subsections, we take a closer look at this behaviour
  of the spiral.
</p>
<p>
  I should mention that this section takes a rather long path to
  arrive at the closed-form solution.  Personally, I enjoy such long
  tours.  If you prefer a more direct approach, feel free to skip
  ahead to
  <a href="#patterns-on-the-diagonal">Patterns on the Diagonal</a> for
  a shorter discussion that reaches the same result.
</p>
<h3 id="computing-edge-numbers">Computing Edge Numbers<a href="#computing-edge-numbers"></a></h3>
<p>
  Each time the spiral reaches the left edge of the grid, it does so
  at some \( m \)th row where \( m \) is even.  The \( m \times m \)
  subgrid formed by the first \( m \) rows and the first \( m \)
  columns contains \( m^2 \) consecutive numbers.  Since the numbers
  strictly increase as the spiral grows, the largest of these
  \( m^2 \) numbers must appear at the position where the spiral
  touches the left edge.  This is illustrated in the figure below.
</p>
<figure>
  \begin{array}{rrrr:rl}
     1     &amp;   2 &amp;   9  &amp;  10 &amp;  25 &amp; \cd \\
     4     &amp;   3 &amp;   8  &amp;  11 &amp;  24 &amp; \cd \\
     5     &amp;   6 &amp;   7  &amp;  12 &amp;  23 &amp; \cd \\
    \hl 16 &amp;  15 &amp;  14  &amp;  13 &amp;  22 &amp; \cd \\
    \hdashline
    17     &amp;  18 &amp;  19  &amp;  20 &amp;  21 &amp; \cd \\
    \vd    &amp; \vd &amp; \vd  &amp; \vd &amp; \vd &amp; \dd
  \end{array}
  <figcaption>
    The spiral touches the left edge on the \( 4 \)th row where the
    number is \( 4^2 \)
  </figcaption>
</figure>
<p>
  Whenever the spiral touches the left edge at the \( m \)th row
  (where \( m \) is even), the number in the first column of that row
  is \( m^2.  \)  Hence, we conclude that \( f(m, 1) = m^2 \) when \( m
  \) is even.  Immediately after touching the left edge, the spiral
  turns downwards into the first column of the next row.  Thus, in the
  next row, i.e. in the \( (m + 1) \)th row, we have \( f(m + 1, 1) =
  m^2 + 1, \) where \( m + 1 \) is odd.  This can be restated as \(
  f(m, 1) = (m - 1)^2 + 1 \) when \( m \) is odd.  Since \( f(1, 1) =
  1, \) we can summarise the two formulas we have found here as:

  \[
    f(m, 1) =
      \begin{cases}
        m^2           &amp; \text{if } m \equiv 0 \pmod{2}, \\
        (m - 1)^2 + 1 &amp; \text{if } m \equiv 1 \pmod{2}.
      \end{cases}
  \]
</p>
<p>
  We can perform a similar analysis for the numbers at the top edge
  and note that whenever the spiral touches the top edge at the
  \( n \)th column (where \( n \) is odd), the number in the first row
  of that column is \( n^2.  \)  This is illustrated below.
</p>
<figure>
  \begin{array}{rrr:rrl}
     1 &amp;   2 &amp; \hl 9 &amp;  10 &amp;  25 &amp; \cd \\
     4 &amp;   3 &amp;     8 &amp;  11 &amp;  24 &amp; \cd \\
     5 &amp;   6 &amp;     7 &amp;  12 &amp;  23 &amp; \cd \\
    \hdashline
    16 &amp;  15 &amp;    14 &amp;  13 &amp;  22 &amp; \cd \\
    17 &amp;  18 &amp;    19 &amp;  20 &amp;  21 &amp; \cd \\
    \vd &amp; \vd &amp;  \vd &amp; \vd &amp; \vd &amp; \dd
  \end{array}
  <figcaption>
    The spiral touches the top edge on the \( 3 \)rd column where the
    number is \( 3^2 \)
  </figcaption>
</figure>
<p>
  Immediately after touching the top edge, the spiral turns right into
  the next column.  These observations give us the following formula
  for the numbers at the top edge:

  \[
    f(1, n) =
      \begin{cases}
        n^2           &amp; \text{if } n \equiv 1 \pmod{2}, \\
        (n - 1)^2 + 1 &amp; \text{if } n \equiv 0 \pmod{2}.
      \end{cases}
  \]

  Next we will find a formula for any arbitrary number anywhere in the
  grid.
</p>
<h3 id="computing-all-grid-numbers-1">Computing All Grid Numbers<a href="#computing-all-grid-numbers-1"></a></h3>
<p>
 Since the spiral touches the left edge on even-numbered rows, then
 turns downwards into the next (odd-numbered) row and then starts
 moving right until the diagonal (where it changes direction again),
 the following two rules hold:
</p>
<ul>
  <li>
    On every odd-numbered row, as we go from left to right, the
    numbers increase until we reach the diagonal.
  </li>
  <li>
    On every even-numbered row, as we go from left to right, the
    numbers decrease until we reach the diagonal.
  </li>
</ul>
<p>
  Note that all the numbers we considered in the above two points lie
  on or below the diagonal (or equivalently, on or to the left of the
  diagonal).  Therefore, on an odd-numbered row, we can find the
  numbers on or below the diagonal using the formula \( f(m, n) = f(m,
  1) + (n - 1), \) where \( m \) is odd.  Similarly, on even-numbered
  rows, we can find the numbers on or below the diagonal using the
  formula \( f(m, n) = f(m, 1) - (n - 1), \) where \( m \) is even.
</p>
<p>
  By a similar analysis, the following rules hold when we consider the
  numbers in a column:
</p>
<ul>
  <li>
    On every even-numbered column, as we go from top to bottom, the
    numbers increase until we reach the diagonal.
  </li>
  <li>
    On every odd-numbered column, as we go from top to bottom, the
    numbers decrease until we reach the diagonal.
  </li>
</ul>
<p>
  Now the numbers on or above the diagonal can be found using the
  formula \( f(m, n) = f(1, n) - (m - 1) \) when \( n \) is odd and \(
  f(m, n) = f(1, n) + (m - 1), \) when \( n \) is even.
</p>
<p>
  Can we determine from the values of \( m \) and \( n \) if the
  number \( f(m, n) \) is above the diagonal or below it?  Yes, if \(
  m \le n, \) then \( f(m, n) \) lies on or above the diagonal.
  However, if \( m \ge n, \) then \( f(m, n) \) lies on or below the
  diagonal.
</p>
<p>
  We now have everything we need to write a general formula for
  finding the numbers anywhere in the grid.  Using the four formulas
  and the two inequalities obtained in this section, we get

  \[
    f(m, n) =
      \begin{cases}
        f(1, n) + (m - 1)
        &amp; \text{if } m \le n \text{ and } n \equiv 0 \pmod{2}, \\
        f(1, n) - (m - 1)
        &amp; \text{if } m \le n \text{ and } n \equiv 1 \pmod{2}, \\
        f(m, 1) - (n - 1)
        &amp; \text{if } m \ge n \text{ and } m \equiv 0 \pmod{2}, \\
        f(m, 1) + (n - 1)
        &amp; \text{if } m \ge n \text{ and } m \equiv 1 \pmod{2}.  \\
      \end{cases}
  \]

  Using the equations for \( f(1, n) \) and \( f(m, 1) \) from the
  previous section, the above formulas can be rewritten as

  \[
    f(m, n) =
      \begin{cases}
        (n - 1)^2 + 1 + (m - 1)
        &amp; \text{if } m \le n \text{ and } n \equiv 0 \pmod{2}, \\
        n^2 - (m - 1)
        &amp; \text{if } m \le n \text{ and } n \equiv 1 \pmod{2}, \\
        m^2 - (n - 1)
        &amp; \text{if } m \ge n \text{ and } m \equiv 0 \pmod{2}, \\
        (m - 1)^2 + 1 + (n - 1)
        &amp; \text{if } m \ge n \text{ and } m \equiv 1 \pmod{2}.  \\
      \end{cases}
  \]

  Simplifying the expressions on the right-hand side, we get

  \[
    f(m, n) =
      \begin{cases}
        (n - 1)^2 + m
        &amp; \text{if } m \le n \text{ and } n \equiv 0 \pmod{2}, \\
        n^2 - m + 1
        &amp; \text{if } m \le n \text{ and } n \equiv 1 \pmod{2}, \\
        m^2 - n + 1
        &amp; \text{if } m \ge n \text{ and } m \equiv 0 \pmod{2}, \\
        (m - 1)^2 + n
        &amp; \text{if } m \ge n \text{ and } m \equiv 1 \pmod{2}.  \\
      \end{cases}
  \]

  This is pretty good.  We now have a piecewise formula that works for
  any position in the grid.  Let us now explore whether we can express
  it as a single closed-form expression.
</p>
<h3 id="closed-form-expression-1">Closed Form Expression<a href="#closed-form-expression-1"></a></h3>
<p>
  First, we will rewrite the piecewise formula from the previous
  section in the following form:

  \[
    f(m, n) =
      \begin{cases}
        (n^2 - n + 1) + (m - n)
        &amp; \text{if } m \le n \text{ and } n \equiv 0 \pmod{2}, \\
        (n^2 - n + 1) - (m - n)
        &amp; \text{if } m \le n \text{ and } n \equiv 1 \pmod{2}, \\
        (m^2 - m + 1) + (m - n)
        &amp; \text{if } m \ge n \text{ and } m \equiv 0 \pmod{2}, \\
        (m^2 - m + 1) - (m - n)
        &amp; \text{if } m \ge n \text{ and } m \equiv 1 \pmod{2}.  \\
      \end{cases}
  \]

  This is the same formula, rewritten to reveal common patterns
  between the four expressions on the right-hand side.  In each
  expression, one variable plays the dominant role, occurring several
  times, while the other appears only once.  For example, in the first
  two expressions, \( n \) plays the dominant role whereas \( m \)
  occurs only once.  If we look closely, we realise that it is the
  variable that is greater than or equal to the other that plays the
  dominant role.  Therefore the first and third expressions may be
  written as

  \[
    \left( (\max(m, n))^2 - \max(m, n) + 1 \right) + (m - n).
  \]

  Similarly, the second and fourth expressions may be written as

  \[
    \left( (\max(m, n))^2 - \max(m, n) + 1 \right) - (m - n).
  \]

  We have made some progress towards a closed-form expression.  We
  have collapsed the four expressions in the piecewise formula to just
  two.  The only difference between them lies in the sign of the
  second term: it is positive when the dominant variable is even and
  negative when it is odd.  This observation allows us to unify both
  cases into a single expression:

  \[
    f(m, n) = (\max(m, n))^2 - \max(m, n) + 1 + (-1)^{\max(m, n)} (m - n).
  \]

  Now we have a closed-form expression for \( f(m, n) \) that gives
  the number at any position in the grid.
</p>
<h2 id="patterns-on-the-diagonal">Patterns on the Diagonal<a href="#patterns-on-the-diagonal"></a></h2>
<p>
  As mentioned earlier, there is a shorter route to the same
  closed-form expression.  This alternative approach is based on
  analysing the numbers along the diagonal of the grid.  We still need
  to examine the edge numbers, but not all of them as we did in the
  previous section.  Some of the reasoning about edge values will be
  repeated here to ensure this section is self-contained.
</p>
<h3 id="computing-diagonal-numbers">Computing Diagonal Numbers<a href="#computing-diagonal-numbers"></a></h3>
<p>
  A number on the diagonal has the same row number and column number.
  In other words, a diagonal number has the value \( f(n, n) \) for
  some positive integer \( n.  \)  Consider the case when \( n \) is
  even.  In this case, the diagonal number is on a segment of the
  spiral that is moving to the left.  The \( n \times n \) subgrid
  formed by the first \( n \) rows and the first \( n \) columns
  contains exactly \( n^2 \) consecutive numbers.  Since the diagonal
  number is on the last row of this subgrid and the numbers in this
  row increase as we move from right to left, the largest number in
  the subgrid must be on the left edge of this row.  Therefore the
  number at the left edge is \( f(n, 1) = n^2, \) where \( n \) is
  even.  This is illustrated below.
</p>
<figure>
  \begin{array}{rrrr:rl}
     1     &amp;   2 &amp;   9  &amp;     10 &amp;  25 &amp; \cd \\
     4     &amp;   3 &amp;   8  &amp;     11 &amp;  24 &amp; \cd \\
     5     &amp;   6 &amp;   7  &amp;     12 &amp;  23 &amp; \cd \\
    \hl 16 &amp;  15 &amp;  14  &amp; \hl 13 &amp;  22 &amp; \cd \\
    \hdashline
    17     &amp;  18 &amp;  19  &amp;     20 &amp;  21 &amp; \cd \\
    \vd    &amp; \vd &amp; \vd  &amp;    \vd &amp; \vd &amp; \dd
  \end{array}
  <figcaption>
    The spiral touches the left edge on the \( 4 \)th row where the
    number is \( 4^2 \)
  </figcaption>
</figure>
<p>
  From the diagonal to the edge of the subgrid, there are \( n \)
  consecutive numbers.  In a sequence of \( n \) consecutive numbers,
  the difference between the maximum number and the minimum number is
  \( n - 1.  \)  Therefore, \( n^2 - f(n, n) = n - 1.  \)  This gives us

  \[
    f(n, n) = n^2 - n + 1 \quad \text{if } n \equiv 0 \pmod{2}.
  \]
</p>
<p>
  Now consider the case when \( n \) is odd.
</p>
<figure>
  \begin{array}{rrr:rrl}
     1 &amp;   2 &amp; \hl 9 &amp;  10 &amp;  25 &amp; \cd \\
     4 &amp;   3 &amp;     8 &amp;  11 &amp;  24 &amp; \cd \\
     5 &amp;   6 &amp; \hl 7 &amp;  12 &amp;  23 &amp; \cd \\
    \hdashline
    16 &amp;  15 &amp;    14 &amp;  13 &amp;  22 &amp; \cd \\
    17 &amp;  18 &amp;    19 &amp;  20 &amp;  21 &amp; \cd \\
    \vd &amp; \vd &amp;  \vd &amp; \vd &amp; \vd &amp; \dd
  \end{array}
  <figcaption>
    The spiral touches the top edge on the \( 3 \)rd column where the
    number is \( 3^2 \)
  </figcaption>
</figure>
<p>
  By a similar reasoning, for odd \( n, \) the \( n \)th column has
  numbers that increase as we move up from the diagonal number towards
  the top edge.  Therefore \( f(1, n) = n^2 \) and since \( n^2 - f(n,
  n) = n - 1, \) we again obtain

  \[
    f(n, n) = n^2 - n + 1 \quad \text{if } n \equiv 1 \pmod{2}.
  \]

  Since \( f(n, n) \) takes the same form for both odd and even
  \( n, \) we can write

  \[
    f(n, n) = n^2 - n + 1
  \]

  for all positive integers \( n.  \)
</p>
<h3 id="computing-all-grid-numbers-2">Computing All Grid Numbers<a href="#computing-all-grid-numbers-2"></a></h3>
<p>
  If \( m \le n, \) then the number \( f(m, n) \) lies on or above the
  diagonal number \( f(n, n).  \)  If \( n \) is even, then the numbers
  decrease as we go from the diagonal up to the top edge.  Therefore
  \( f(m, n) \le f(n, n) \) and \( f(m, n) = f(n, n) - (n - m).  \)  If
  \( n \) is odd, then the numbers increase as we go from the diagonal
  up to the top edge and therefore \( f(m, n) \ge f(n, n) \) and \(
  f(m, n) = f(n, n) + (n - m).  \)
</p>
<p>
  If \( m \ge n, \) then the number \( f(m, n) \) lies on or below the
  diagonal number \( f(m, m).  \)  By a similar analysis, we find that
  \( f(m, n) = f(m, m) + (m - n) \) if \( n \) is even and \( f(m, n)
  = f(m, m) - (m - n) \) if \( n \) is odd.  We summarise these
  results as follows:

  \[
    f(m, n) =
      \begin{cases}
        f(n, n) - (n - m)
        &amp; \text{if } m \le n \text{ and } n \equiv 0 \pmod{2}, \\
        f(n, n) + (n - m)
        &amp; \text{if } m \le n \text{ and } n \equiv 1 \pmod{2}, \\
        f(m, m) + (m - n)
        &amp; \text{if } m \ge n \text{ and } m \equiv 0 \pmod{2}, \\
        f(m, m) - (m - n)
        &amp; \text{if } m \ge n \text{ and } m \equiv 1 \pmod{2}.  \\
      \end{cases}
  \]

  Note that the above formula can be rewritten as

  \[
    f(m, n) =
      \begin{cases}
        f(n, n) + (m - n)
        &amp; \text{if } m \le n \text{ and } n \equiv 0 \pmod{2}, \\
        f(n, n) - (m - n)
        &amp; \text{if } m \le n \text{ and } n \equiv 1 \pmod{2}, \\
        f(m, m) + (m - n)
        &amp; \text{if } m \ge n \text{ and } m \equiv 0 \pmod{2}, \\
        f(m, m) - (m - n)
        &amp; \text{if } m \ge n \text{ and } m \equiv 1 \pmod{2}.  \\
      \end{cases}
  \]
</p>
<h3 id="closed-form-expression-2">Closed Form Expression<a href="#closed-form-expression-2"></a></h3>
<p>
  If we take a close look at the last formula in the previous section,
  we find that in each expression, one variable plays a dominant role,
  i.e. it occurs more frequently in the expression than the other.  In
  the first two expressions \( n \) plays the dominant role whereas in
  the last two expressions \( m \) plays the dominant role.  In fact,
  in each expression, the dominant variable is the one that is greater
  than or equal to the other.  With this in mind, we can rewrite the
  above formula as

  \[
    f(m, n) =
      \begin{cases}
        f(\max(m, n), \max(m, n)) + (m - n)
        &amp; \text{if } \max(m, n) \equiv 0 \pmod{2}, \\
        f(\max(m, n), \max(m, n)) - (m - n)
        &amp; \text{if } \max(m, n) \equiv 1 \pmod{2}.  \\
      \end{cases}
  \]

  The only difference between the expressions is the sign of the
  second term: it is positive when \( \max(m, n) \) is even and
  negative when \( \max(m, n) \) is odd.  As a result, we can rewrite
  the above formula as a single expression like this:

  \[
    f(m, n) = f(\max(m, n), \max(m, n)) + (-1)^{\max(m, n)} (m - n).
  \]

  Using the formula \( f(n, n) = n^2 - n + 1 \) from the previous
  section, we get

  \[
    f(m, n) = (\max(m, n))^2 - \max(m, n) + 1 + (-1)^{\max(m, n)} (m - n).
  \]

  We arrive again at the same closed-form expression, this time by
  focusing on the diagonal of the grid.
</p>
<h2 id="references">References<a href="#references"></a></h2>
<ul>
  <li>
    <a href="https://cses.fi/problemset/task/1071">Number Spiral</a>
    from the CSES Problem Set
  </li>
  <li>
    <a href="https://mathworld.wolfram.com/Closed-FormSolution.html">Closed-Form Solution</a>
    by Christopher Stover and Eric W. Weisstein
  </li>
  <li>
    <a href="https://mathworld.wolfram.com/PiecewiseFunction.html">Piecewise Function</a>
    by Eric W. Weisstein
  </li>
</ul>
<!-- ### -->
<p>
  <a href="https://susam.net/zigzag-number-spiral.html">Read on website</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a> |
  <a href="https://susam.net/tag/puzzle.html">#puzzle</a>
</p>
]]>
</description>
</item>
<item>
<title>Product of Additive Inverses</title>
<link>https://susam.net/product-of-additive-inverses.html</link>
<guid isPermaLink="false">rxpnz</guid>
<pubDate>Thu, 29 May 2025 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<p>
  A negative number multiplied by another negative number results in a
  positive number.  Most of us learnt this rule during our primary or
  secondary school years.  'Negative times negative equals positive'
  was a phrase drummed into us during mathematics lessons.  In this
  article, we will prove this rule, not just for numbers but for any
  algebraic structure that, in a general sense, behaves somewhat like
  numbers.
</p>
<h2 id="contents">Contents<a href="#contents"></a></h2>
<ul>
  <li><a href="#illustration">Illustration</a></li>
  <li><a href="#ring-axioms">Ring Axioms</a></li>
  <li><a href="#closure-properties">Closure Properties</a></li>
  <li><a href="#inverse-of-inverse">Inverse of Inverse</a></li>
  <li><a href="#multiplication-by-zero">Multiplication by Zero</a></li>
  <li><a href="#multiplication-by-additive-inverse">Multiplication by Additive Inverse</a></li>
  <li><a href="#product-of-additive-inverses">Product of Additive Inverses</a></li>
  <li><a href="#alternate-proof">Alternate Proof</a></li>
  <li><a href="#conclusion">Conclusion</a></li>
</ul>
<h2 id="illustration">Illustration<a href="#illustration"></a></h2>
<p>
  Let us begin with a quick illustration that shows why the product of
  two negative numbers must be positive for arithmetic to make sense.
  Consider

  \[
    7 \times 8 = 56.
  \]

  The above equation can also be written as

  \[
    (10 - 3) \times (10 - 2) = 56.
  \]

  Using the distributive property of multiplication over subtraction,
  we get

  \[
    (10 - 3) \times 10 + (10 - 3) \times (-2) = 56.
  \]

  Using the distributive property again, we have

  \[
    10 \times 10 + (-3) \times 10 + 10 \times (-2) + (-3) \times (-2) = 56.
  \]

  Now, we will take it for granted that a positive times a negative is
  negative.  We will prove all of this rigorously later, but for now,
  we are just working through an illustration, so we will accept that
  rule and see where it leads.  The equation becomes:

  \[
    100 + (-30) + (-20) + (-3) \times (-2) = 56.
  \]

  Adding the first three terms gives

  \[
    50 + (-3) \times (-2) = 56.
  \]

  Subtracting \( 50 \) from both sides, we get

  \[
    (-3) \times (-2) = 6.
  \]

  What we have seen here is that if we accept \( 7 \times 8 = 56 \)
  and that positive times negative gives a negative result, then we
  must also accept that \( (-3) \times (-2) = 6.  \)
</p>
<h2 id="ring-axioms">Ring Axioms<a href="#ring-axioms"></a></h2>
<p>
  From this section onwards, we take a rigorous approach.  We want to
  show that the rule 'negative times negative equals positive' holds,
  in a general sense, for any set of elements that share certain
  properties with numbers.  As it turns out, these elements do not
  need to possess all the properties of complex numbers, real numbers
  or even rational numbers.  In fact, if they satisfy a small and
  specific set of properties held by the integers, then the rule still
  holds.  These properties are known as the <em>ring axioms</em>.
</p>
<p>
  A ring is an algebraic structure consisting of a set \( R \) with
  two binary operations \( + \) and \( \cdot, \) called addition and
  multiplication respectively, satisfying the following axioms:
</p>
<ol>
  <li>
    <p>
      <strong>Associativity of addition:</strong> For all \( a, b, c
      \in R, \) we have \( a + (b + c) = (a + b) + c.  \)
    </p>
  </li>
  <li>
    <p>
      <strong>Commutativity of addition:</strong> For all \( a, b \in
      R, \) we have \( a + b = b + a.  \)
    </p>
  </li>
  <li>
    <p>
      <strong>Additive identity:</strong> There exists an element \( 0
      \in R \) such that for all \( a \in R, \) we have \( a + 0 = a =
      0 + a.  \)
    </p>
  </li>
  <li>
    <p>
      <strong>Additive inverse:</strong> For each \( a \in R, \) there
      exists an element \( -a \in R \) such that \( a + (-a) = 0 =
      (-a) + a.  \)
    </p>
  </li>
  <li>
    <p>
      <strong>Associativity of multiplication:</strong> For all \( a,
      b, c \in R, \) we have \( a \cdot (b \cdot c) = (a \cdot b)
      \cdot c.  \)
    </p>
  </li>
  <li>
    <p>
      <strong>Left distributivity of multiplication over
      addition:</strong> For all \( a, b, c \in R, \) we have \( a
      \cdot (b + c) = (a \cdot b) + (a \cdot c).  \)
    </p>
  </li>
  <li>
    <p>
      <strong>Right distributivity of multiplication over
      addition:</strong> For all \( a, b, c \in R, \) we have \( (b +
      c) \cdot a = (b \cdot a) + (c \cdot a).  \)
    </p>
  </li>
</ol>
<p>
  Note that we do not assume that the ring contains multiplicative
  identity, nor do we assume that multiplication is commutative.  Many
  familiar types of numbers form rings.  For example, the set of
  integers forms a ring with the usual addition and multiplication
  operations.  The sets of rational numbers, real numbers and complex
  numbers satisfy the ring axioms too.
</p>
<p>
  Rings need not consist of numbers; they may contain elements
  of <em>any</em> type.  As long as a set of elements, together with
  suitable addition and multiplication operations, satisfies the seven
  axioms above, it forms a ring.  For example, the set of all
  polynomials in the indeterminate \( t \) with coefficients in some
  ring \( R \) forms a ring under the usual addition and
  multiplication of polynomials.  Such a ring is called
  a <em>polynomial ring</em> and it is denoted \( R[t].  \)
</p>
<h2 id="closure-properties">Closure Properties<a href="#closure-properties"></a></h2>
<p>
  Some texts include the following additional axioms for the closure
  properties of a ring:
</p>
<ol>
  <li>
    <p>
      <strong>Closure under addition:</strong> For all
      \( a, b \in R, \) we have \( a + b \in R.  \)
    </p>
  </li>
  <li>
    <p>
      <strong>Closure under multiplication:</strong> For all \( a, b
      \in R, \) we have \( a \cdot b \in R.  \)
    </p>
  </li>
</ol>
<p>
  However, stating these axioms explicitly is usually considered
  redundant because a binary operation is closed by definition.  A
  binary operation \( \circ \) on a set \( M \) is defined to be a
  function

  \[
    \circ : M \times M \to M; \quad (a, b) \mapsto a \circ b.
  \]

  This definition automatically implies the closure property, since
  the domain and codomain are the same.  The addition and
  multiplication operations on a ring \( R \) may be defined as

  \begin{align*}
        + &amp;: R \times R \to R; \quad (a, b) \mapsto a + b, \\
    \cdot &amp;: R \times R \to R; \quad (a, b) \mapsto a \cdot b.
  \end{align*}

  These definitions imply that a ring is closed under addition and
  multiplication.  In practice, while deciding if some set \( R \)
  forms a ring, we should always verify that the addition and
  multiplication operations indeed have \( R \) as the codomain to
  confirm that the closure property holds.
</p>
<h2 id="inverse-of-inverse">Inverse of Inverse<a href="#inverse-of-inverse"></a></h2>
<p id="theorem-1">
  <strong>Theorem 1.</strong>
  <em>
    Let \( R \) be a ring with \( + \) and \( \cdot \) operations.
    Then for all \( a \in R, \) we have

    \[
      -(-a) = a.
    \]
  </em>
</p>
<p>
  <em>Proof.</em> This result follows directly from the additive
  inverse axiom.  First, observe that

  \[
    a + (-a) = 0.
  \]

  Therefore \( a \) is an additive inverse of \( -a, \) i.e.

  \[
    -(-a) = a.
  \]

  This completes the proof.
</p>
<p>
  Notice that this proof does not involve the multiplication operation
  of a ring at all.  In fact, it holds true in a more general
  algebraic structure known as a <em>group</em>, which requires only a
  binary operation with associativity, an identity element and
  inverses.  A ring, under addition, is also a group.  Since the proof
  relies solely on these additive group properties, this theorem holds
  for all groups.  However, for brevity and to avoid introducing group
  axioms separately, I have stated and proved this theorem in the
  context of rings.
</p>
<p>
  It is also worth noting that the additive identity is unique in a
  ring (as well as in any group), but since this fact is not needed
  for later results, its proof has been omitted.  Even if,
  hypothetically, there were two distinct additive identities, \( 0 \)
  and \( 0', \) in a ring (there are not, of course), the arguments
  below would still hold if we simply focus on \( 0.  \)
</p>
<h2 id="multiplication-by-zero">Multiplication by Zero<a href="#multiplication-by-zero"></a></h2>
<p id="theorem-2">
  <strong>Theorem 2.</strong>
  <em>
    Let \( R \) be a ring with \( + \) and \( \cdot \) operations.  Then
    for all \( a \in R, \) we have

    \[
      a \cdot 0 = 0 \cdot a = 0.
    \]
  </em>
</p>
<p>
  <em>Proof.</em> Using the additive identity axiom, we get

  \[
    0 + 0 = 0.
  \]

  Multiplying both sides on the left by \( a, \) we get

  \[
    a \cdot (0 + 0) = a \cdot 0.
  \]

  Using the left distributivity axiom, we get

  \[
    a \cdot 0 + a \cdot 0 = a \cdot 0.
  \]

  Let \( b = a \cdot 0.  \)  Then

  \[
    b + b = b.
  \]

  Since a ring is closed under multiplication, \( b \in R.  \)  By the
  additive inverse axiom, there exists \( -b \in R \) such that \( b +
  (-b) = 0.  \)  Adding \( -b \) to both sides of the above equation,
  we get

  \[
    (b + b) + (-b) = b + (-b).
  \]

  By associativity of addition in a ring, we get

  \[
    b + (b + (-b)) = b + (-b).
  \]

  Since \( b + (-b) = 0, \) the above equation becomes

  \[
    b + 0 = 0.
  \]

  By the additive identity axiom, we get

  \[
    b = 0.
  \]

  Since \( b = a \cdot 0, \) the above equation may be written as

  \[
    a \cdot 0 = 0.
  \]

  A similar argument shows that

  \[
    0 \cdot a = 0.
  \]

  This completes the proof.
</p>
<h2 id="multiplication-by-additive-inverse">Multiplication by Additive Inverse<a href="#multiplication-by-additive-inverse"></a></h2>
<p id="theorem-3">
  <strong>Theorem 3.</strong>
  <em>
    Let \( R \) be a ring with \( + \) and \( \cdot \) operations.
    Then for all \( a, b \in R, \) we have

    \begin{align*}
      a \cdot (-b) &amp;= -(a \cdot b), \\
      (-a) \cdot b &amp;= -(a \cdot b).
    \end{align*}
    </em>
</p>
<p>
  <em>Proof.</em> Using the left distributivity and additive inverse
  properties of a ring along with <a href="#theorem-2">Theorem 2</a>,
  we get

  \[
    a \cdot b + a \cdot (-b)
    = a \cdot (b + (-b))
    = a \cdot 0
    = 0.
  \]

  Therefore \( a \cdot (-b) \) is an additive inverse of \( a \cdot b
 , \) i.e.

  \[
    -(a \cdot b) = a \cdot (-b).
  \]

  Similarly

  \[
    a \cdot b + (-a) \cdot b
    = (a + (-a)) \cdot b
    = 0 \cdot b
    = 0
  \]

  and thus

  \[
    -(a \cdot b) = (-a) \cdot b.
  \]

  This completes the proof.
</p>
<h2 id="product-of-additive-inverses">Product of Additive Inverses<a href="#product-of-additive-inverses"></a></h2>
<p>
  <strong>Theorem 4.</strong>
  <em>
    Let \( R \) be a ring with \( + \) and \( \cdot \) operations.  Then
    for all \( a, b \in R, \) we have

    \[
      (-a) \cdot (-b) = a \cdot b.
    \]
  </em>
</p>
<p>
  <em>Proof.</em>
  From <a href="#theorem-3">Theorem 3</a>, we know that

  \[
    a \cdot (-b) = -(a \cdot b).
  \]

  Substituting \( a \) with \( -a, \) we get

  \[
    (-a) \cdot (-b) = -((-a) \cdot b).
  \]

  Again by <a href="#theorem-3">Theorem 3</a>, we have \( (-a) \cdot b
  = -(a \cdot b).  \)  Substituting this in the above equation, we
  obtain

  \[
    (-a) \cdot (-b) = -(-(a \cdot b)).
  \]

  Now using <a href="#theorem-1">Theorem 1</a>, the right-hand side
  becomes \( a \cdot b, \) so we get

  \[
    (-a) \cdot (-b) = a \cdot b.
  \]

  This completes the proof.
</p>
<h2 id="alternate-proof">Alternate Proof<a href="#alternate-proof"></a></h2>
<p>
  The above sequence of theorems is not the only way to arrive
  at <a href="#theorem-4">Theorem 4</a>.  There are other ways to
  reach this result as well.  Let us briefly discuss another such
  proof.  From <a href="#theorem-3">Theorem 3</a> we know that \( a
  \cdot b \) is the additive inverse of \( a \cdot (-b).  \)  In a very
  similar way, we can show that \( (-a) \cdot (-b) \) is also the
  additive inverse of \( a \cdot (-b).  \)  The proof goes as follows:

  \[
    (-a) \cdot (-b) + a \cdot (-b)
    = (-a + a) \cdot (-b)
    = 0 \cdot (-b)
    = 0.
  \]

  Note that we used <a href="#theorem-2">Theorem 2</a> again for the
  last equality.  So now we know that both \( (-a) \cdot (-b) \) and
  \( a \cdot b \) are additive inverses of \( a \cdot (-b).  \)  Does
  this mean that \( (-a) \cdot (-b) = a \cdot b?  \)  Yes, since the
  additive inverse of an element is unique in a ring.  Let us prove
  this now.

  Let \( b \) and \( c \) be additive inverses of \( a.  \)  Then \( a
  + b = b + a = 0 \) and \( a + c = c + a = 0.  \)  Using this, we get

  \[
    b = b + 0 = b + (a + c) = (b + a) + c = 0 + c = c.
  \]

  Since \( (-a) \cdot (-b) \) and \( a \cdot b \) are additive
  inverses of the same element \(a \cdot (-b) \) and since the
  additive inverse of an element is unique in a ring, it follows that

  \[
    (-a) \cdot (-b) = a \cdot b.
  \]

  Note that we do not need <a href="#theorem-1">Theorem 1</a> in this
  alternate proof but we introduced a new theorem about the uniqueness
  of the additive inverse to complete this proof.
</p>
<h2 id="conclusion">Conclusion<a href="#conclusion"></a></h2>
<p>
  Theorems 1 to 4 establish certain algebraic properties that hold in
  any ring.  Although these results were proven abstractly for rings,
  they reflect properties we are already familiar with from our
  experience with numbers.  For example, in the ring of integers, we
  observe \( -(-2) = 2 \) which is a specific case
  of <a href="#theorem-1">Theorem 1</a>.
</p>
<p>
  Similarly, <a href="#theorem-2">Theorem 2</a> confirms the
  well-known fact that multiplying any integer by \( 0 \) yields
  \( 0.  \)  For example, \( 2 \cdot 0 = 0.  \)
</p>
<p>
  Then <a href="#theorem-3">Theorem 3</a> implies the rule that
  multiplying a positive number by a negative number yields a negative
  result.  For example, \( 2 \cdot (-3) = -(2 \cdot 3) = -6.  \)
</p>
<p>
  Finally, <a href="#theorem-4">Theorem 4</a> implies that the product
  of two negative numbers is positive.  For example, \( (-2) \cdot
  (-3) = 2 \cdot 3 = 6.  \)
</p>
<p>
  These familiar results are not limited to the ring of integers.  The
  results hold in any ring, including polynomial rings, rings of
  integers modulo a fixed positive integer and many other algebraic
  systems.  These results demonstrate how the ring axioms formalise
  familiar arithmetic rules within a more general algebraic framework.
</p>
<!-- ### -->
<p>
  <a href="https://susam.net/product-of-additive-inverses.html">Read on website</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a>
</p>
]]>
</description>
</item>
<item>
<title>Two Ideals of Fields</title>
<link>https://susam.net/two-ideals-of-fields.html</link>
<guid isPermaLink="false">xsuzd</guid>
<pubDate>Tue, 27 May 2025 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<p>
  A field has exactly two ideals: the zero ideal, which contains only
  the additive identity and the whole field itself.  These are known
  as trivial ideals.  Further if a commutative ring, with distinct
  additive and multiplicative identities, has no ideals other than the
  trivial ones, then it must be a field.  These two facts are elegant
  in their symmetry and simplicity.  In this article, we will explore
  why these facts are true.  Familiarity with algebraic structures
  such as groups, rings and fields is assumed.
</p>
<h2 id="contents">Contents<a href="#contents"></a></h2>
<ul>
  <li><a href="#definition-of-ideals">Definition of Ideals</a></li>
  <li><a href="#examples-of-ideals">Examples of Ideals</a></li>
  <li><a href="#known-results">Known Results</a></li>
  <li><a href="#ideals-of-fields">Ideals of Fields</a></li>
  <li><a href="#rings-with-trivial-ideals">Rings With Trivial Ideals</a></li>
  <li><a href="#conclusion">Conclusion</a></li>
</ul>
<h2 id="definition-of-ideals">Definition of Ideals<a href="#definition-of-ideals"></a></h2>
<p>
  A left ideal of a ring \( R \) is a subset \( I \subseteq R \) such
  that \( I \) is an additive subgroup of \( R \) and for all \( a \in
  I \) and \( r \in R, \) we have \( r \cdot a \in I.  \)  We say that
  a left ideal absorbs multiplication from the left by any ring
  element; or equivalently, that it is closed under left
  multiplication by any ring element.
</p>
<p>
  Similarly, a right ideal of a ring \( R \) is a subset \( I
  \subseteq R \) such that \( I \) is an additive subgroup of \( R \)
  and for all \( a \in I \) and \( r \in R, \) we have \( a \cdot r
  \in I.  \)  We say that a right ideal absorbs multiplication from the
  right by any ring element; or equivalently, that it is closed under
  right multiplication by any ring element.
</p>
<p>
  In a commutative ring \( R, \) every left ideal is also a right
  ideal and vice versa.  This is because for all \( a \in I \) and \(
  r \in R, \) we have \( r \cdot a = a \cdot r.  \)  Therefore, when
  working with commutative rings, we do not need to distinguish
  between left and right ideals and we simply refer to them as ideals.
  In this case, the ideal is said to absorb multiplication by any ring
  element; or equivalently, it is said to be closed under
  multiplication by any ring element.
</p>
<h2 id="examples-of-ideals">Examples of Ideals<a href="#examples-of-ideals"></a></h2>
<p>
  Consider the set of even integers

  \[
    \langle 2 \rangle = \{ 2n : n \in \mathbb{Z} \}.
  \]

  This is an ideal of \( \mathbb{Z}.  \)  Indeed, if we multiply any
  even integer by any integer, the result is an even integer.  In
  other words, the set of even integers absorbs multiplication by any
  integer.  Equivalently, the set of even integers is closed under
  multiplication by any integer.
</p>
<p>
  Let us see another example.  Consider the ring of polynomials in the
  indeterminate \( t \) with integer coefficients, denoted \(
  \mathbb{Z}[t].  \)  The set

  \[
    \langle 2, t \rangle = \{ 2f + tg : f, g \in \mathbb{Z}[t] \}
  \]

  is an ideal of \( \mathbb{Z}[t].  \)  Every element of this ideal is
  a linear combination of \( 2 \) and \( t \) with polynomial
  coefficients.  If we take any element \( 2f + tg \in \langle 2, t
  \rangle \) where \( f, g \in \mathbb{Z}[t] \) and multiply it by any
  polynomial \( h \in \mathbb{Z}[t], \) we obtain \( 2fh + tgh, \)
  which is again an element of \( \langle 2, t \rangle.  \)  Hence \(
  \langle 2, t \rangle \) absorbs multiplication by any element of \(
  \mathbb{Z}[t], \) i.e. it is closed under multiplication by elements
  of \( \mathbb{Z}[t].  \)
</p>
<h2 id="known-results">Known Results<a href="#known-results"></a></h2>
<p>
  For the sake of brevity, we assume the following standard results.
</p>
<p id="zero-multiplication">
  <strong>Proposition 1.</strong>
  <em>
    Let \( R \) be a ring.  Then, for all \( a \in R, \) we have

    \[
      a \cdot 0 = 0 \cdot a = 0.
    \]
  </em>
</p>
<p id="principal-ideal">
  <strong>Proposition 2.</strong>
  <em>
    Let \( R \) be a ring and let \( a \in R.  \)  Then

    \begin{align*}
      I_L &amp;= \{ r \cdot a : r \in R \}, \\
      I_R &amp;= \{ a \cdot r : r \in R \}
    \end{align*}

    are respectively a left ideal and a right ideal of \( R.  \)  If \(
    R \) is commutative, then \( I_L = I_R \) and we write

    \[
      \langle a \rangle = \{ a \cdot r : r \in R \}
    \]

    and say that \( \langle a \rangle \) is an ideal of \( R \)
    generated by \( a.  \)
  </em>
</p>
<h2 id="ideals-of-fields">Ideals of Fields<a href="#ideals-of-fields"></a></h2>
<p>
  In this section, we show that a field \( K \) has only two ideals:
  \( \{ 0 \} \) and \( K \) itself.
</p>
<p>
  Clearly \( \{ 0 \} \) is an ideal of \( K, \) as it satisfies the
  definition of an ideal.  It is the trivial additive subgroup of \(
  K \) and by <a href="#zero-multiplication">Proposition 1</a>, for
  all \( r \in K, \) we have \( r \cdot 0 = 0 \in \{ 0 \}.  \)
</p>
<p>
  Now \( K \) is also an ideal of itself.  Since \( K \) is an additive
  group by the definition of a field, it is an additive subgroup of
  itself.  Moreover, as a field, \( K \) is closed under
  multiplication, so for all \( a, r \in K \) we have \( a \cdot r \in
  K.  \)
</p>
<p>
  We will now show that \( \{ 0 \} \) and \( K \) are
  the <em>only</em> ideals of \( K.  \)  Let \( I \) be an ideal of \(
  K.  \)  There are two cases to consider: \( I = \{ 0 \} \) and \( I
  \ne \{ 0 \}.  \)  Suppose \( I \ne \{ 0 \}.  \)  Then there exists a
  non-zero element \( b \in I.  \)  Since \( b \ne 0 \) and \( K \) is
  a field, \( b \) has a multiplicative inverse \( b^{-1} \in K.  \)
  Since \( b \in I, \) \( b^{-1} \in K \) and \( I \) is closed under
  multiplication by any element of \( K, \) we have

  \[
    1 = b \cdot b^{-1} \in I.
  \]

  Now, let \( c \in K.  \)  Since \( 1 \in I, \) \( c \in K \) and \(
  I \) is an ideal of \( K, \) we get

  \[
    c = 1 \cdot c \in I.
  \]

  Thus \( K \subseteq I \) and since \( I \subseteq K \) by
  definition, we conclude \( I = K.  \)  Therefore the only ideals of
  \( K \) are \( \{ 0 \} \) and \( K \) itself.
</p>
<h2 id="rings-with-trivial-ideals">Rings With Trivial Ideals<a href="#rings-with-trivial-ideals"></a></h2>
<p>
  We now show that if \( R \) is a commutative ring with \( 1 \ne 0 \)
  and the only ideals of \( R \) are \( \{ 0 \} \) and \( R \) itself,
  then \( R \) must be a field.  To do this, we first show that every
  non-zero element of \( R \) has a multiplicative inverse in \( R.  \)
  Let \( a \in R \) with \( a \ne 0.  \)  We now show that there exists
  a multiplicative inverse \( a^{-1} \in R.  \)
  By <a href="#principal-ideal">Proposition 2</a>, the set

  \[
    \langle a \rangle = \{ a \cdot r : r \in R \}.
  \]

  is an ideal of \( R.  \)  Since \( a = a \cdot 1 \in \langle a
  \rangle, \) we have \( \langle a \rangle \ne \{ 0 \}.  \)  By
  assumption, the only ideals of \( R \) are \( \{ 0 \} \) and
  \( R, \) so it must be that \( \langle a \rangle = R.  \)  Therefore
  \( 1 \in \langle a \rangle \) and

  \[
    1 = a \cdot s
  \]

  for some \( s \in R.  \)  Thus \( a \) has a multiplicative inverse
  \( s \in R \) and this holds for every non-zero \( a \in R.  \)
</p>
<p>
  The remaining properties of fields, namely, associativity and
  commutativity of addition and multiplication, the existence of
  distinct additive and multiplicative identities, the existence of
  additive inverses and the distributivity of multiplication over
  addition, are inherited from the ring \( R.  \)  Therefore \( R \) is
  a field.
</p>
<h2 id="conclusion">Conclusion<a href="#conclusion"></a></h2>
<p>
  To summarise, any commutative ring with distinct additive and
  multiplicative identities that has only trivial ideals is a field
  and every field has only trivial ideals.
</p>
<p>
  Note that every field is also a commutative ring with distinct
  additive and multiplicative identities.  Therefore, we can say that
  every field is a commutative ring with distinct additive and
  multiplicative identities and only trivial ideals and vice versa.
  It is neat how the two facts align so nicely.
</p>
<!-- ### -->
<p>
  <a href="https://susam.net/two-ideals-of-fields.html">Read on website</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a>
</p>
]]>
</description>
</item>
<item>
<title>From Finite Integral Domains to Finite Fields</title>
<link>https://susam.net/from-finite-integral-domains-to-finite-fields.html</link>
<guid isPermaLink="false">ojxkk</guid>
<pubDate>Sun, 25 May 2025 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<p>
  In this article, we explore a few well-known results from abstract
  algebra pertaining to fields and integral domains.  We ask ourselves
  whether every field is an integral domain and whether every integral
  domain is a field.  We begin with the definition of an integral
  domain, discuss a few established results and then proceed to answer
  these questions.  Familiarity with algebraic structures such as
  rings and fields is assumed.
</p>
<h2 id="contents">Contents<a href="#contents"></a></h2>
<ul>
  <li><a href="#definition-of-integral-domain">Definition of Integral Domain</a></li>
  <li><a href="#examples-of-integral-domain">Examples of Integral Domains</a></li>
  <li><a href="#known-results">Known Results</a></li>
  <li><a href="#on-distinct-identities">On Distinct Identities</a></li>
  <li><a href="#every-field-is-an-integral-domain">Every Field Is an Integral Domain</a></li>
  <li><a href="#infinite-integral-domains">Infinite Integral Domains</a></li>
  <li><a href="#every-finite-integral-domain-is-a-field">Every Finite Integral Domain Is a Field</a>
    <ul>
      <li><a href="#alternate-proof">Alternate Proof</a></li>
    </ul>
  </li>
  <li><a href="#conclusion">Conclusion</a></li>
</ul>
<h2 id="definition-of-integral-domain">Definition of Integral Domain<a href="#definition-of-integral-domain"></a></h2>
<p>
  An <em>integral domain</em> is a commutative ring, with distinct
  additive and multiplicative identities, in which the product of any
  two non-zero elements is also non-zero.
</p>
<p>
  Equivalently, an integral domain is a commutative ring, with
  distinct additive and multiplicative identities, such that if the
  product of two elements is zero, then one of the elements must be
  zero.
</p>
<p>
  Using standard notation, we can write that a commutative ring
  \( R \) is an integral domain if \( 0 \ne 1 \) and for
  \( a, b \in R, \)

  \[
    a \ne 0 \text{ and } b \ne 0 \implies a \cdot b \ne 0
  \]

  or equivalently,

  \[
    a \cdot b = 0 \implies a = 0 \text{ or } b = 0.
  \]

  There are many other alternative ways to define an integral domain
  which are all equivalent.  In a ring \( R, \) a <em>zero
  divisor</em> is a non-zero element \( a \in R \) such that there
  exists a non-zero element \( b \in R \) with \( ab = 0.  \)  With
  this definition of a zero divisor, we can define an integral domain
  to be a unital commutative ring, with \( 0 \ne 1, \) that has no
  zero divisors.
</p>
<h2 id="examples-of-integral-domain">Examples of Integral Domains<a href="#examples-of-integral-domain"></a></h2>
<p>
  The ring of integers \( \mathbb{Z} \) is an integral domain since
  the product of two non-zero integers is non-zero.  The field of
  rational numbers \( \mathbb{Q} \) is also an integral domain.  The
  ring of polynomials in the indeterminate \( t \) with coefficients
  in an integral domain \( R, \) denoted \( R[t], \) is an integral
  domain as well.
</p>
<p>
  The ring of integers modulo 5, denoted \( \mathbb{Z}_5 \) is an
  integral domain.  However, the ring of integers modulo 6, denoted \(
  \mathbb{Z}_6, \) is not an integral domain since \( 2 \cdot 3 = 0 \)
  in \( \mathbb{Z}_6.  \)  In other words, \( \mathbb{Z}_6 \) has zero
  divisors, namely \( 2 \) and \( 3, \) so it is not an integral
  domain.  In fact, the ring of integers modulo \( n, \) denoted \(
  \mathbb{Z}_n \) is an integral domain if and only if \( n \) is
  prime.
</p>
<h2 id="known-results">Known Results<a href="#known-results"></a></h2>
<p>
  For the sake of brevity, we assume the following known results.
</p>
<p id="zero-multiplication">
  <strong>Proposition 1.</strong>
  <em>
    Let \( R \) be a ring.  Then, for all \( a \in R, \) we have

    \[
      a \cdot 0 = 0 \cdot a = 0.
    \]
  </em>
</p>
<p id="cancellation-property">
  <strong>Proposition 2.</strong>
  <em>
    Let \( D \) be an integral domain.  Then, for all \( a, b, c \in D \) such
    that \( a \ne 0, \) we have

    \[
      a \cdot b = a \cdot c \implies b = c.
    \]
  </em>
</p>
<p>
  The second result is also known as the <em>cancellation property of
  integral domains</em>.
</p>
<h2 id="on-distinct-identities">On Distinct Identities<a href="#on-distinct-identities"></a></h2>
<p>
  We have been mentioning the distinctiveness of the additive and
  multiplicative identities as a property of an integral domain.  Some
  texts express this more concisely by saying that an integral domain
  is a <em>non-zero</em> unital commutative ring without zero
  divisors, i.e. the zero ring \( \{ 0 \} \) is excluded from the
  definition.
</p>
<p>
  This follows directly from
  <a href="#zero-multiplication">Proposition 1</a>.  If
  \( 0 = 1 \in R, \) then for all \( r \in R, \) we get

  \[
    r = r \cdot 1 = r \cdot 0 = 0
  \]

  which means that every element of \( R \) is zero, i.e. \( R = \{ 0
  \}.  \)  To summarise

  \[
    0 = 1 \in R \implies R = \{ 0 \}
  \]

  or equivalently, for a ring \( R \) with unity,

  \[
    R \ne \{ 0 \} \implies 0 \ne 1.
  \]

  Further, if \( 0 \) and \( 1 \) are two distinct elements of
  \( R, \) then \( R \) has at least two elements, so for a ring
  \( R \) with unity,

  \[
    0 \ne 1 \implies R \ne \{ 0 \}
  \]

  Therefore, a ring with unity has distinct additive and
  multiplicative identities if and only if it is a non-zero ring.  This
  is why an integral domain can also be defined as a non-zero unital
  commutative ring without zero divisors.
</p>
<h2 id="every-field-is-an-integral-domain">Every Field Is an Integral Domain<a href="#every-field-is-an-integral-domain"></a></h2>
<p>
  We now show that every field is indeed an integral domain.  Let
  \(F \) be a field and let \( a, b \in F \) such that \( ab = 0.  \)
  There are two cases to consider: \( a = 0 \) and \( a \ne 0.  \)  If
  \( a = 0, \) we are done.
</p>
<p>
  Now suppose \( a \ne 0.  \)  Then by the properties of fields, there exists a
  multiplicative inverse \( a^{-1} \in F \) such that \( a \cdot
  a^{-1} = 1.  \)  Then using the properties of fields, we get

  \[
    b = b \cdot 1 = b \cdot (a \cdot a^{-1}) = (a \cdot b) \cdot
    a^{-1} = 0 \cdot a^{-1} = 0.
  \]

  The last equality follows from
  <a href="#zero-multiplication">Proposition 1</a>.  We have shown
  that if \( ab = 0, \) then either \( a = 0 \) or \( b = 0.  \)
  Therefore, if both \( a \ne 0 \) and \( b \ne 0, \) then it must be
  that \( ab \ne 0.  \)  Therefore \( F \) is an integral domain.
</p>
<h2 id="infinite-integral-domains">Infinite Integral Domains<a href="#infinite-integral-domains"></a></h2>
<p>
  Now we arrive at the next natural question.  Is every integral
  domain a field?
</p>
<p>
  The ring of integers \( \mathbb{Z} \) is an integral domain but it
  is not a field since \( 2 \in \mathbb{Z}, \) but \( 2^{-1} \notin
  \mathbb{Z}.  \)  Therefore, \( \mathbb{Z} \) is an example of an
  infinite integral domain that is not a field.
</p>
<p>
  Next we ask ourselves: Is every infinite integral domain not a
  field?  Not quite!  Some infinite integral domains are, in fact,
  fields.  This follows directly from the result in the previous
  section.  Every field is an integral domain and there are plenty of
  infinite fields, so they must all be integral domains too.  Consider
  the field of rational numbers \( \mathbb{Q} \) or the field of
  complex numbers \( \mathbb{C}.  \)  Since these are fields, they are
  also integral domains.  So, clearly, there are infinite integral
  domains that are also fields.
</p>
<h2 id="every-finite-integral-domain-is-a-field">Every Finite Integral Domain Is a Field<a href="#every-finite-integral-domain-is-a-field"></a></h2>
<p>
  We will now turn our attention to finite integral domains.  Is every
  finite integral domain a field?  Yes!  This can be shown as follows.
</p>
<p>
  Let \( D \) be a finite integral domain.  Let \( a \in D \) with \(
  a \ne 0.  \)  Consider the set

  \[
    A = \{ a, a^2, a^3, \dots \}.
  \]

  Since a ring is closed under multiplication, every element of
  \( A \) belongs to \( D, \) so \( A \subseteq D.  \)  Since \( D \)
  is finite, \( A \) is finite too.  Therefore, by the pigeonhole
  principle, there exist integers \( m \gt n \ge 0 \) such that

  \[
    a^m = a^n
  \]

  This equation can be rewritten as

  \[
    a \cdot a^{m - n - 1} \cdot a^n = 1 \cdot a^n.
  \]

  Since \( a \) is a non-zero element of an integral domain, it
  follows that \( a^n \ne 0.  \)  Therefore we can use
  <a href="#cancellation-property">Proposition 2</a> (the cancellation
  property of integral domains) to get

  \[
    a \cdot a^{m - n - 1} = 1.
  \]

  Since a ring is closed under multiplication and since \( m - n - 1
  \ge 0, \) it follows that \( a^{m - n - 1} \in D.  \)  Thus every
  non-zero element \( a \in D \) has a multiplicative inverse in
  \( D.  \)  This establishes the multiplicative inverse property of a
  field.
</p>
<p>
  Since an integral domain has distinct additive and multiplicative
  identities, it satisfies two additional field properties: the
  existence of an additive identity and a distinct multiplicative
  identity.
</p>
<p>
  Finally, the remaining field properties are inherited from the ring
  structure, i.e. associativity and commutativity of addition and
  multiplication, the existence of additive inverses and the
  distributivity of multiplication over addition all hold in \( D, \)
  since they hold in any ring.  Thus, \( D \) satisfies all the field
  properties.  Therefore \( D \) is a field.
</p>
<h3 id="alternate-proof">Alternate Proof<a href="#alternate-proof"></a></h3>
<p>
  The proof in the previous section presents what I initially came up
  with while working through these concepts and proving these results
  for myself.  However, I later found that there is another proof that
  is quite popular in the literature.  This alternate proof differs in
  one key aspect: it does not invoke the cancellation property of
  integral domains stated in
  <a href="#cancellation-property">Proposition 2</a>.  Let us examine
  this alternate proof.
</p>
<p>
  As before, we consider the set \( A = \{ a, a^2, a^3, \dots \}
  \subseteq D, \) where \( a \in D \) and \( a \ne 0 \) and we obtain
  the equation

  \[
    a^m = a^n
  \]

  for some integers \( m \gt n \ge 0.  \)  As before, we use the fact
  that \( a \) is an element of an integral domain to conclude that \(
  a^n \ne 0.  \)  Now adding the additive inverse of \( a^n \) to both
  sides we get

  \[
    a^m - a^n = 0.
  \]

  Using the distributivity property of rings, we get

  \[
    a^n (a^{m - n} - 1) = 0
  \]

  Since a ring is closed under addition and multiplication, both \(
  a^n \) and \( a^{m - n} - 1 \) belong to \( D.  \)  As \( D \) is an
  integral domain and \( a^n \ne 0, \) we conclude that \( a^{m - n} -
  1 = 0.  \)  Therefore \( a^{m - n} = 1.  \)  Since \( m - n \ge 1, \)
  we can write:

  \[
    a \cdot a^{m - n - 1} = 1.
  \]

  Therefore every non-zero element \( a \in D \) has a multiplicative
  inverse in \( D.  \)  The remaining properties of a field are
  established in the same manner as in the previous section.  Hence,
  if \( D \) is a finite integral domain, then it is also a field.
</p>
<h2 id="conclusion">Conclusion<a href="#conclusion"></a></h2>
<p>
  We now summarise all the results here before concluding the article:
</p>
<ul>
  <li>
    Every field is an integral domain.
  </li>
  <li>
    Every <em>finite</em> integral domain is a field.
  </li>
  <li>
    Some infinite integral domains are not fields.  A convenient
    example is the set of integers \( \mathbb{Z}.  \)
  </li>
  <li>
    Some infinite integral domains are fields.  Every infinite field,
    such as \( \mathbb{Q}, \) \( \mathbb{R} \) or \( \mathbb{C} \) is
    an example.
  </li>
</ul>
<p>
  It is worth reiterating here that the fourth result in the summary
  above follows from the fact that every field is an integral domain.
  These results reveal how structure and size interact in algebraic
  systems.  It is interesting how simply being finite guarantees that
  an integral domain is a field.
</p>
<!-- ### -->
<p>
  <a href="https://susam.net/from-finite-integral-domains-to-finite-fields.html">Read on website</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a>
</p>
]]>
</description>
</item>
<item>
<title>Lemma for FTGT</title>
<link>https://susam.net/lemma-for-ftgt.html</link>
<guid isPermaLink="false">udjib</guid>
<pubDate>Sun, 09 Mar 2025 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<h2 id="introduction">Introduction<a href="#introduction"></a></h2>
<p>
  This post illustrates a key lemma that is used in proving
  the <em>fundamental theorem of Galois theory</em> (FTGT).  Note that
  FTGT is not covered in this post.  The focus of this post is on
  understanding and proving this lemma only.  Here is the lemma from
  the book <em>Galois Theory</em>, 5th ed. by Stewart (2023):
</p>
<div class="highlight">
  <p>
    <strong>Lemma 12.1.</strong>
    <em>
      Suppose that \( L/K \) is a field extension, \( M \) is an
      intermediate field, and \( \tau \) is a \( K \)-automorphism of \(
      L.  \)  Then \( \tau M^* \tau^{-1} = \tau(M)^{*}.  \)
    </em>
  </p>
</div>
<p>
  The notation \( M^* \) denotes the group of all
  \( M \)-automorphisms of \( L \) with composition as the group
  operation.  Note that Stewart writes \( \tau(M)^{*} = \tau M^*
  \tau^{-1} \) while stating the lemma but I have reversed the LHS and
  RHS to maintain consistency with the equations that appear in the
  discussion below.
</p>
<p>
  To build intuition for this lemma, I'll first present an
  illustration, followed by a proof.  The discussion below assumes
  familiarity with field extensions and field automorphisms, as
  several notations and results from these areas will be used
  implicitly without detailed justification.  This post is meant to
  serve as a set of notes on the lemma, not a comprehensive tutorial.
</p>
<h2 id="contents">Contents<a href="#contents"></a></h2>
<ul>
  <li><a href="#introduction">Introduction</a></li>
  <li><a href="#illustration">Illustration</a>
    <ul>
      <li><a href="#concrete-example">Concrete Example</a></li>
      <li><a href="#lhs-subset-of-rhs">LHS &sube; RHS</a></li>
      <li><a href="#lhs-superset-of-rhs">LHS &supe; RHS</a></li>
      <li><a href="#lhs-equals-rhs">LHS = RHS</a></li>
    </ul>
  </li>
  <li><a href="#proof">Proof</a></li>
</ul>
<h2 id="illustration">Illustration<a href="#illustration"></a></h2>
<h3 id="concrete-example">Concrete Example<a href="#concrete-example"></a></h3>
<p>
  Let \( L = \mathbb{Q}(\sqrt{2}, \sqrt{3}), \) \( K = \mathbb{Q} \)
  and \( M = \mathbb{Q}(\sqrt{2}).  \)  Note that

  \begin{align*}
    L &amp;= \{ a + b \sqrt{2} + c \sqrt{3} + d \sqrt{6} : a, b, c, d \in \mathbb{Q} \}, \\
    M &amp;= \{ k + l \sqrt{2} : k, l \in \mathbb{Q} \}.
  \end{align*}

  Now the group of \( K \)-automorphisms of \( L \) is

  \[
    K^* = \{\phi_1, \phi_2, \phi_3, \phi_4 \}
  \]

  where each \( \phi_i \) is given by

  \begin{align*}
    \phi_1 &amp;:
    a + b \sqrt{2} + c \sqrt{3} + d \sqrt{6} \mapsto
    a + b \sqrt{2} + c \sqrt{3} + d \sqrt{6}, \\

    \phi_2 &amp;:
    a + b \sqrt{2} + c \sqrt{3} + d \sqrt{6} \mapsto
    a - b \sqrt{2} + c \sqrt{3} - d \sqrt{6}, \\

    \phi_3 &amp;:
    a + b \sqrt{2} + c \sqrt{3} + d \sqrt{6} \mapsto
    a + b \sqrt{2} - c \sqrt{3} - d \sqrt{6}, \\

    \phi_4 &amp;:
    a + b \sqrt{2} + c \sqrt{3} + d \sqrt{6} \mapsto
    a - b \sqrt{2} - c \sqrt{3} + d \sqrt{6}.
  \end{align*}

  Then \( M^* = \{ \phi_1, \phi_3 \}.  \)  Let \( \tau = \phi_2.  \)
  Then

  \begin{align*}
  \tau(M)
    &amp;= \{ \tau(x) : x \in \mathbb{M} \} \\
    &amp;= \{ \tau(k + l \sqrt{2}) : k, l \in \mathbb{Q} \} \\
    &amp;= \{ k - l \sqrt{2} : k, l \in \mathbb{Q} \}.
  \end{align*}

  Note that in this case we ended up with \( \tau(M) = M \) but we
  will be careful not to utilise this fact.  We will ensure that the
  steps below work without assuming \( \tau(M) = M.  \)  Next we find

  \begin{equation}
    \tau(M)^* = \{ \phi_1, \phi_3 \}.
    \label{eq-tau-m-ast}
  \end{equation}

  Now

  \begin{align*}
    \tau M^* \tau^{-1}
    &amp;= \{ \tau \gamma \tau^{-1} : \gamma \in {M^*} \} \\
    &amp;= \{ \tau \phi_1 \tau^{-1}, \tau \phi_3 \tau^{-1} \}.
  \end{align*}

  Let us now find out how each element of \( \tau M^* \tau^{-1} \)
  transforms the elements of \( L.  \)  For all \( a + b \sqrt{2} + c
  \sqrt{3} + d \sqrt{6} \in L, \) we get

  \begin{align*}
    (\tau \phi_1 \tau^{-1})(a + b \sqrt{2} + c \sqrt{3} + d \sqrt{6})
    &amp;= (\tau \phi_1)(a - b\sqrt{2} + c\sqrt{3} - d\sqrt{6}) \\
    &amp;= \tau (a - b\sqrt{2} + c\sqrt{3} - d\sqrt{6}) \\
    &amp;= a + b\sqrt{2} + c\sqrt{3} + d\sqrt{6}).
  \end{align*}

  Therefore

  \[
    \tau \phi_1 \tau^{-1} = \phi_1.
  \]

  Similarly,

  \begin{align*}
    (\tau \phi_3 \tau^{-1})(a + b \sqrt{2} + c \sqrt{3} + d \sqrt{6})
    &amp;= (\tau \phi_3)(a - b\sqrt{2} + c\sqrt{3} - d\sqrt{6}) \\
    &amp;= \tau (a - b\sqrt{2} - c\sqrt{3} + d\sqrt{6}) \\
    &amp;= a + b\sqrt{2} - c\sqrt{3} - d\sqrt{6}.
  \end{align*}

  Therefore

  \[
    \tau \phi_3 \tau^{-1} = \phi_3.
  \]

  We have shown that

  \begin{equation}
    \tau M^* \tau^{-1} = \{ \phi_1, \phi_3 \}.
    \label{eq-tau-coset}
  \end{equation}

  From \( \eqnref{eq-tau-m-ast}{1} \) and \( \eqnref{eq-tau-coset}{2} \)
  we see that

  \[
     \tau M^* \tau^{-1} = \tau(M)^*.
  \]

  Since we are working with a concrete example of \( \tau \) here, we
  know exactly how it behaves, so we succeeded in demonstrating the
  above equality.  However, in a general proof, \( \tau \) is going to
  be an arbitrary \( K \)-automorphism of \( L, \) so we cannot know
  exactly how it behaves and as a result, we cannot obtain the above
  equation directly.  Therefore, in a general proof, we we will first
  show that \( \tau M^* \tau^{-1} \subseteq \tau(M)^* \) and then we
  will show that \( \tau M^* \tau^{-1} \supseteq \tau(M)^* \) in order
  to prove the above equation.
</p>
<h3 id="lhs-subset-of-rhs">LHS &sube; RHS<a href="#lhs-subset-of-rhs"></a></h3>
<p>
  Once again, let us see how each element of \( \tau M^* \tau^{-1} \)
  transforms the elements of \( \tau(M).  \)  Note that this time we
  are not going to examine how they transform arbitrary elements of \(
  L.  \)  We are only going to see how they transform the elements of
  \( \tau(M).  \)  For all \( k - l \sqrt{2} \in \tau(M), \) we get

  \begin{align*}
    (\tau \phi_1 \tau^{-1})(k - l \sqrt{2})
    &amp;= (\tau \phi_1)(k + l \sqrt{2}) \\
    &amp;= \tau(k + l \sqrt{2}) \\
    &amp;= k - l \sqrt{2}.
  \end{align*}

  Similarly, for all \( k - l \sqrt{2} \in \tau(M), \) we get

  \begin{align*}
    (\tau \phi_3 \tau^{-1})(k - l \sqrt{2})
    &amp;= (\tau \phi_3)(k + l \sqrt{2}) \\
    &amp;= \tau(k + l \sqrt{2}) \\
    &amp;= k - l \sqrt{2}.
  \end{align*}

  Note above that both \( \phi_1 \) and \( \phi_3 \) fix \( k + l
  \sqrt{2} \in M \) because \( \phi_1, \phi_2 \in M^*, \) the set of
  \( M \)-automorphisms of \( L.  \)  This detail will be used in the
  general proof.
</p>
<p>
  Since both \( \tau \phi_1 \tau^{-1} \) and
  \( \tau \phi_3 \tau^{-1} \) fix the elements of \( \tau(M), \) they
  are both \( \tau(M) \)-automorphisms of \( L.  \)  Therefore \( \tau
  M^* \tau^{-1} \subseteq \tau(M)^{*}.  \)
</p>
<h3 id="lhs-superset-of-rhs">LHS &supe; RHS<a href="#lhs-superset-of-rhs"></a></h3>
<p>
  Consider the set \( \tau^{-1} \tau(M)^* \tau \) and examine how its
  elements transform the elements of \( M.  \)  For all \( k + l
  \sqrt{2} \in M, \) we get

  \begin{align*}
    (\tau^{-1} \phi_1 \tau)(k + l \sqrt{2})
    &amp;= (\tau^{-1} \phi_1)(k - \sqrt{2}) \\
    &amp;= \tau^{-1}(k - \sqrt{2}) \\
    &amp;= k + l \sqrt{2}.
  \end{align*}

  Similarly, for all \( k + l \sqrt{2} \in M, \) we get

  \begin{align*}
    (\tau^{-1} \phi_3 \tau)(k + l \sqrt{2})
    &amp;= (\tau^{-1} \phi_3)(k - \sqrt{2}) \\
    &amp;= \tau^{-1}(k - \sqrt{2}) \\
    &amp;= k + l \sqrt{2}.
  \end{align*}

  Here both \( \phi_1 \) and \( \phi_3 \) fix \( k - l \sqrt{2} \in
  \tau(M) \) because \( \phi_1, \phi_2 \in \tau(M)^*, \) the set of \(
  \tau(M) \)-automorphisms of \( L.  \)
</p>
<p>
  Since both \( \tau^{-1} \phi_1 \tau \) and
  \( \tau^{-1} \phi_3 \tau \) fix the elements of \( M, \) they are
  both \( M \)-automorphisms of \( L.  \)  Therefore \( \tau^{-1}
  \tau(M)^* \tau \subseteq M^* \) which implies \( \tau M^* \tau^{-1}
  \supseteq \tau(M)^*.  \)
</p>
<h3 id="lhs-equals-rhs">LHS = RHS<a href="#lhs-equals-rhs"></a></h3>
<p>
  The previous two sections complete the illustration of the lemma
  with the chosen example.  We have shown that \( \tau M^* \tau^{-1}
  \subseteq \tau(M)^{*} \) and \( \tau M^* \tau^{-1} \supseteq
  \tau(M)^*.  \)  Therefore \( \tau M^* \tau^{-1} = \tau(M)^*.  \)
</p>
<h2 id="proof">Proof<a href="#proof"></a></h2>
<p>
  The ideas presented in the previous sections will now be extended to
  formulate a general proof.  For clarity, the lemma is stated once
  again below before proceeding with the proof.
</p>
<p>
  <strong>Lemma 12.1.</strong>
  <em>
    Suppose that \( L/K \) is a field extension, \( M \) is an
    intermediate field, and \( \tau \) is a \( K \)-automorphism of \(
    L.  \)  Then \( \tau M^* \tau^{-1} = \tau(M)^{*}.  \)
  </em>
</p>
<p>
  <em>Proof.</em>

  For all \( \gamma \in M^*, \) \( x' \in \tau(M), \) we use the
  notation \( x = \tau^{-1}(x') \in M \) and get

  \[
    (\tau \gamma \tau^{-1})(x') = (\tau \gamma)(x) = \tau(x) = x'.
  \]

  In the second equality above, we have used the fact that \( \gamma
  \in M^* \) which implies that \( \gamma \) is an \( M \)-automorphism
  of \( L \) which allows us to conclude that \( \gamma(x) = x \) for
  \( x \in M.  \)  Since every \( \tau \gamma \tau^{-1} \in \tau M^*
  \tau^{-1} \) fixes all elements \( x' \in \tau(M), \) each \( \tau
  \gamma \tau^{-1} \) must be a \( \tau(M) \)-automorphism of \( L.  \)
  Thus \( \tau M^* \tau^{-1} \subseteq \tau(M)^*.  \)
</p>
<p>
  Similarly, for all \( \gamma' \in \tau(M)^*, \) \( x \in M, \) we
  use the notation \( x' = \tau(x) \in \tau(M) \) and get

  \[
  (\tau^{-1} \gamma' \tau)(x) = (\tau^{-1} \gamma')(x') = \tau^{-1}(x') = x.
  \]

  In the second equality above, we have used the fact that \( \gamma'
  \in \tau(M)^* \) which implies that \( \gamma' \) is an
  \( \tau(M) \)-automorphism of \( L \) which allows us to conclude
  that \( \gamma'(x') = x' \) for \( x' \in \tau(M).  \)  Since every
  \( \tau^{-1} \gamma' \tau \in \tau^{-1} \tau(M)^* \tau \) fixes all
  elements \( x \in M, \) each \( \tau^{-1} \gamma' \tau \) must be an
  \( M \)-automorphism of \( L.  \)  Thus \( \tau^{-1} \tau(M)^* \tau
  \subseteq M^*.  \)  This implies \( \tau M^* \tau^{-1} \supseteq
  \tau(M)^*.  \)
</p>
<p>
  We have shown that \( \tau M^* \tau^{-1} \subseteq \tau(M)^* \) and
  \( \tau M^* \tau^{-1} \supseteq \tau(M)^*.  \)  Therefore \( \tau M^*
  \tau^{-1} = \tau(M)^*.  \)
</p>
<!-- ### -->
<p>
  <a href="https://susam.net/lemma-for-ftgt.html">Read on website</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a>
</p>
]]>
</description>
</item>
<item>
<title>Function</title>
<link>https://susam.net/function.html</link>
<guid isPermaLink="false">talpc</guid>
<pubDate>Sun, 20 Oct 2024 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<p>
  In mathematics, a function \( f \) from a set \( X \) to a set \(
  Y \) is a relation that associates each element of \( X \) with
  exactly one element of \( Y.  \)  This page describes the commonly
  used notation, terminology and concepts pertaining to functions.
</p>
<h2 id="contents">Contents<a href="#contents"></a></h2>
<ul>
  <li><a href="#definition">Definition</a></li>
  <li><a href="#notation">Notation</a></li>
  <li><a href="#domain-codomain-and-image">Domain, Codomain and Image</a></li>
  <li><a href="#injection-surjection-and-bijection">Injection, Surjection and Bijection</a></li>
</ul>
<h2 id="definition">Definition<a href="#definition"></a></h2>
<p>
  A function \( f \) from a set \( X \) to a set \( Y \) is a binary
  relation \( R \) that satisfies the following conditions:
</p>
<ul>
  <li>
    \( R \subseteq \{ (x, y) \mid x \in X, y \in Y \} = X \times Y.  \)
  </li>
  <li>
    For every \( x \in X, \) there exists \( y \in Y \) such that \(
    (x, y) \in R.  \)
  </li>
  <li>
    If \( (x, y) \in R \) and \( (x, z) \in R, \) then \( y = z.  \)
  </li>
</ul>
<p>
  The set \( X \) is called the domain of \( f \) and the set \( Y \)
  is called the codomain of \( f.  \)  The relation \( R \) is also
  known as the graph of \( f.  \)
</p>
<h2 id="notation">Notation<a href="#notation"></a></h2>
<p>
  Let \( f \) be a function from a set \( X \) to a set \( Y.  \)  Then
  the name \( f \) represents the function and the notation \( f(x) \)
  represents the application of the function to the argument \( x, \)
  i.e. \( f(x) \) represents the value of \( f \) for the element \( x
  \in X.  \)  In other words, for all \( x \in X, \) we have \( (x,
  f(x)) \in R.  \)
</p>
<p>
  A function \( f \) with domain \( X \) and codomain \( Y \) is also
  written as \( f : X \to Y.  \)  The function \( f \) may also be
  written as \( x \mapsto f(x).  \)  This notation specifies a function
  that maps \( x \) to \( f(x).  \)
</p>
<p>
  Formally, \( f(x) \) denotes the application of the function \( f \)
  to the argument \( x.  \)  However, in practice, it is common to use
  the expression \( f(x) \) to refer to both the function itself and
  its output for a given \( x, \) which is a slight deviation from
  strict notation.  Similarly, the function \( x \mapsto g(x) \) is
  often written as \( f(x) = g(x).  \)  For example, the function \( x
  \mapsto x^2 - 1 \) may also be written as \( f(x) = x^2 - 1.  \)
</p>
<p>
  Consider a function \( f \) that returns the square of a real
  number.  The following are common notations used to define this
  function, roughly ordered from the most formal form to the least
  formal one:
</p>
<ul>
  <li>
    \( f : \mathbb{R} \to \mathbb{R} ; \; x \mapsto x^2, \)
  </li>
  <li>
    \( f : \mathbb{R} \to \mathbb{R} : x \mapsto x^2, \)
  </li>
  <li>
    \( f : x \mapsto x^2, \)
  </li>
  <li>
    \( f: \mathbb{R} \to \mathbb{R} \) where \( f(x) = x^2, \)
  </li>
  <li>
    \( f(x) = x^2.  \)
  </li>
</ul>
<h2 id="domain-codomain-and-image">Domain, Codomain and Image<a href="#domain-codomain-and-image"></a></h2>
<p>
  The <em>domain</em> of a function is the set of all values for which
  the function is defined.
</p>
<p>
  A <em>codomain</em> of a function \( f \) is a set within which the
  values \( f(x) \) for all \( x \in X \) must lie, where \( X \) is
  the domain of \( f.  \)
</p>
<p>
  The <em>image</em> of a function \( f \) is the set \( \{ f(x) \mid
  x \in X \} \) where \( X \) is the domain of \( f.  \)
</p>
<p>
  The term <em>range</em> is often used as a synonym of image.
  However the use of this term is inconsistent across literature.
  Some old books use the term range to mean codomain while other books
  use this term to mean the image.  Therefore it is best to use the
  term image because it is free from such ambiguity.
</p>
<h2 id="injection-surjection-and-bijection">Injection, Surjection and Bijection<a href="#injection-surjection-and-bijection"></a></h2>
<p>
  A function \( f : X \to Y \) is <em>injective</em> if \( \forall a,
  b \in X, a \neq b \implies f(a) \neq f(b).  \)  A function is
  injective, if each element of the codomain is mapped to by <em>at
  most</em> one element of the domain.  An injective function is also
  known as a <em>one-to-one</em> function or an injection.
</p>
<p>
  A function \( f : X \to Y \) is <em>surjective</em> if \( \forall y
  \in Y, \exists x \in X \) such that \( y = f(x).  \)  A function is
  surjective, if each element of the codomain is mapped to by <em>at
  least</em> one element of the domain.  A surjective function is also
  known as an <em>onto</em> function or a surjection.
</p>
<p>
  A function \( f : X \to Y \) is <em>bijective</em> if \( \forall y
  \in Y, \) there exists exactly one \( x \in X, \) such that \( y =
  f(x).  \)  A function is bijective, if each element of the codomain
  is mapped to by <em>exactly</em> one element of the domain.  A
  bijective function is also known as a <em>one-to-one
  correspondence</em> or bijection.  A bijection is both injective and
  surjective.  In other words, a bijection is both <em>one-to-one and
  onto</em>.
</p>
<p>
  The function \( f : \mathbb{R} \to \mathbb{R}; \; x \mapsto e^x \)
  is injective but not surjective.  It is injective because distinct
  values of \( x \) produce distinct values of \( e^x.  \)  However, it
  is not surjective as no value in the domain maps to negative numbers
  in the codomain, leaving some elements in the codomain unmapped.
</p>
<p>
  The function \( f : \mathbb{R} \to \mathbb{R}; \; x \mapsto x^3 - x \)
  is surjective but not injective.  It is surjective because every
  value in the codomain is mapped to by at least one value in the
  domain.  However, it is not injective, as distinct values in the
  domain can map to the same value in the codomain.  For example, \(
  f(-1) = f(0) = f(1) = 0.  \)
</p>
<p>
  The function \( f : \mathbb{R} \to \mathbb{R}; \; x \mapsto x + 1 \)
  is bijective.  It is both injective and surjective.  This function
  is invertible with the inverse given by the function \( f^{-1} :
  \mathbb{R} \to \mathbb{R}; \; x \mapsto x - 1.  \)
</p>
<p>
  The function \( f : \mathbb{R} \to \mathbb{R}; \; x \mapsto x^2 \)
  is neither injective nor surjective.  First, the function is not
  injective because distinct values in the domain can map to the same
  value in the codomain.  For example, \( f(-2) = f(2) = 4.  \)
  Additionally, the function is not surjective because no value in the
  domain maps to the negative numbers in the codomain.
</p>
<!-- ### -->
<p>
  <a href="https://susam.net/function.html">Read on website</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a> |
  <a href="https://susam.net/tag/definition.html">#definition</a>
</p>
]]>
</description>
</item>
<item>
<title>Perron's Paradox</title>
<link>https://susam.net/perrons-paradox.html</link>
<guid isPermaLink="false">skrsn</guid>
<pubDate>Wed, 10 Apr 2024 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<p>
  Oskar Perron, a German mathematician, introduced Perron's paradox to
  illustrate the danger of assuming the existence of a solution to an
  optimisation problem.  The paradox works like this:
</p>
<div class="highlight">
  Let \( n \) be the largest positive integer.  Then either \( n =
  1 \) or \( n \gt 1.  \)  If \( n \gt 1, \) then \( n^2 \gt n, \)
  contradicting the definition of \( n.  \)  Hence \( n = 1.  \)
</div>
<p>
  We get this absurd result because of the incorrect assumption that
  there exists an integer that is the largest of all the integers.
</p>
<!-- ### -->
<p>
  <a href="https://susam.net/perrons-paradox.html">Read on website</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a>
</p>
]]>
</description>
</item>
<item>
<title>Logarithm Notation</title>
<link>https://susam.net/logarithm-notation.html</link>
<guid isPermaLink="false">ipgnh</guid>
<pubDate>Fri, 05 Apr 2024 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<p>
  We know that the natural logarithm of a number \( x, \) i.e. the
  logarithm of \( x \) to the base \( e, \) is sometimes denoted as \(
  \ln x.  \)  It has other notations too.  For example, many
  mathematics textbooks just use the notation \( \log x \) after
  establishing once that this notation denotes the natural logarithm.
  The most descriptive notation is perhaps \( \log_e x \) but this is
  most definitely an overkill.  I have never seen any serious textbook
  use this notation.
</p>
<p>
  Let us focus on \( \ln x \) again.  Is it not peculiar?  What does
  \( \ln \) stand for really?  Logarithm natural?  Sounds very unnatural.
</p>
<p>
  Well, as a kid I learnt that \( \ln \) here stands for the Latin
  phrase "logarithmus naturalis".  It is only recently that I bothered
  to verify if this expansion of \( \ln x \) that I learnt as a kid is
  really true.  The most credible discussion of this that I could find
  online is this thread on Mathematics Stack Exchange:
  <a href="https://math.stackexchange.com/q/1694">math.stackexchange.com/q/1694</a>.
  The answer by Dan Velleman points us to page 277 of an 1875
  book <em>Lehrbuch der Mathematik</em> by Anton Steinhauser.  Quoting
  the relevant portion from the page:
</p>
<blockquote>
  Man pflegt nun, um Verwechslungen dieser beiden Systeme vorzubeugen,
  mit log.nat. a (gesprochen: logarithmus naturalis a) oder ln. a,
  oder am einfachsten mit la den natürlichen, mit log.brigg. a
  (gesprochen: Logarithmus briggus a) oder log.a, oder am einfachsten
  mit lg. a den gemeinen Logarithmus (von a) zu bezeichnen.
</blockquote>
<p>
  Translated to English, it says:
</p>
<blockquote>
  One is accustomed now, in order to prevent confusion between these
  two systems, to use log.nat. a (pronounced: logarithmus naturalis a)
  or ln. a, or most simply la for the natural, and log.brigg. a
  (pronounced: logarithmus briggus a) or log. a, or most simply lg. a
  to denote the common logarithm (of a).
</blockquote>
<p>
  So it does look like what I learnt as a kid is correct and the
  earliest possible reference of this the Internet is able to find for
  us is the 1875 book quoted above.
</p>
<!-- ### -->
<p>
  <a href="https://susam.net/logarithm-notation.html">Read on website</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a>
</p>
]]>
</description>
</item>
<item>
<title>Thurston's Paean</title>
<link>https://susam.net/thurstons-paean.html</link>
<guid isPermaLink="false">iipnj</guid>
<pubDate>Tue, 18 Jul 2023 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<p>
  I recently came across a beautiful and thoughtful answer on
  MathOverflow by the late mathematician William Thurston.  A brief
  background about him from
  the <a href="https://en.wikipedia.org/wiki/William_Thurston">Wikipedia
  article</a> about him:
</p>
<blockquote>
  <p>
    William Paul Thurston (October 30, 1946 &ndash; August 21, 2012)
    was an American mathematician.  He was a pioneer in the field of
    low-dimensional topology and was awarded the Fields Medal in 1982
    for his contributions to the study of 3-manifolds.
  </p>
  <p>
    Thurston was a professor of mathematics at Princeton University,
    University of California at Davis and Cornell University.  He was
    also a director of the Mathematical Sciences Research Institute.
  </p>
</blockquote>
<p>
  MathOverflow makes all answers posted to the website available under
  a Creative Commons licence.  In particular, all answers posted
  before 08 Apr 2011 (UTC) are available under the terms of the
  Creative Commons Attribution-ShareAlike 2.5 Generic (CC BY-SA 2.5)
  licence.  Thurston wrote the answer I am about to share on 30 Oct
  2010.  Due to the licence terms, this post too is available under
  the terms of the same licence.
</p>
<p>
  Thurston posted his answer while replying to a MathOverflow
  question:
  <a href="https://mathoverflow.net/q/43690"><em>What's a
  mathematician to do?</em></a>.  The question enquires about how an
  ordinary mathematician can contribute to mathematics.  Thurston's
  answer
  from <a href="https://mathoverflow.net/a/44213">mathoverflow.net/a/44213</a>
  is reproduced below:
</p>
<blockquote>
  <p>
    It's not <em>mathematics</em> that you need to contribute to.
    It's deeper than that: how might you contribute to humanity, and
    even deeper, to the well-being of the world, by pursuing
    mathematics?  Such a question is not possible to answer in a
    purely intellectual way, because the effects of our actions go far
    beyond our understanding.  We are deeply social and deeply
    instinctual animals, so much that our well-being depends on many
    things we do that are hard to explain in an intellectual way.
    That is why you do well to follow your heart and your passion.
    Bare reason is likely to lead you astray.  None of us are smart
    and wise enough to figure it out intellectually.
  </p>
  <p>
    The product of mathematics is clarity and understanding.  Not
    theorems, by themselves.  Is there, for example any real reason
    that even such famous results as Fermat's Last Theorem, or the
    Poincar&eacute; conjecture, really matter?  Their real importance
    is not in their specific statements, but their role in challenging
    our understanding, presenting challenges that led to mathematical
    developments that increased our understanding.
  </p>
  <p>
    The world does not suffer from an oversupply of clarity and
    understanding (to put it mildly).  How and whether specific
    mathematics might lead to improving the world (whatever that
    means) is usually impossible to tease out, but mathematics
    collectively is extremely important.
  </p>
  <p>
    I think of mathematics as having a large component of psychology,
    because of its strong dependence on human minds.  Dehumanized
    mathematics would be more like computer code, which is very
    different.  Mathematical ideas, even simple ideas, are often hard
    to transplant from mind to mind.  There are many ideas in
    mathematics that may be hard to get, but are easy once you get
    them.  Because of this, mathematical understanding does not expand
    in a monotone direction.  Our understanding frequently
    deteriorates as well.  There are several obvious mechanisms of
    decay.  The experts in a subject retire and die, or simply move on
    to other subjects and forget.  Mathematics is commonly explained
    and recorded in symbolic and concrete forms that are easy to
    communicate, rather than in conceptual forms that are easy to
    understand once communicated.  Translation in the direction
    conceptual -&gt; concrete and symbolic is much easier than
    translation in the reverse direction, and symbolic forms often
    replaces the conceptual forms of understanding.  And mathematical
    conventions and taken-for-granted knowledge change, so older texts
    may become hard to understand.
  </p>
  <p>
    In short, mathematics only exists in a living community of
    mathematicians that spreads understanding and breaths life into
    ideas both old and new.  The real satisfaction from mathematics is
    in learning from others and sharing with others.  All of us have
    clear understanding of a few things and murky concepts of many
    more.  There is no way to run out of ideas in need of
    clarification.  The question of who is the first person to ever
    set foot on some square meter of land is really secondary.
    Revolutionary change does matter, but revolutions are few, and
    they are not self-sustaining --- they depend very heavily on the
    community of mathematicians.
  </p>
</blockquote>
<p>
  In the comments to the answer, one of the commenters
  was <a href="https://users.cs.utah.edu/~suresh/">Suresh
  Venkatasubramanian</a> who was a professor in the School of
  Computing at the University of Utah back then.  He
  is <a href="https://vivo.brown.edu/display/suresh">now</a> a
  professor of Computer Science and Data Science at Brown University.
  In his <a href="https://mathoverflow.net/questions/43690/whats-a-mathematician-to-do/44213#comment271029_44213">comment</a>,
  Suresh proposed that this answer be called <em>Thurston's
  Paean</em>.  Here is his complete comment:
</p>
<blockquote>
  <p>
    This seems like an ideal counterpoint to Hardy's Lament.  I'm
    calling it Thurston's Paean :).  Seems poignant now that he has
    passed.
  </p>
</blockquote>
<p>
  Thurston's answer does appear to be a perfect complement to Hardy's
  lament in the 1940 essay <em>A Mathematician's Apology</em>.  While
  Hardy's lament is remarkably beautiful and introspective, it may
  also feel a little depressing at places.  Thurston's post on the
  other hand is full of hope and purpose that goes beyond the actual
  work of doing mathematics.  Indeed <em>Thurston's Paean</em> is a
  befitting title for his answer.
</p>
<!-- ### -->
<p>
  <a href="https://susam.net/thurstons-paean.html">Read on website</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a> |
  <a href="https://susam.net/tag/miscellaneous.html">#miscellaneous</a> |
  <a href="https://susam.net/tag/quote.html">#quote</a>
</p>
]]>
</description>
</item>
<item>
<title>Integrating Factor</title>
<link>https://susam.net/integrating-factor.html</link>
<guid isPermaLink="false">cczvm</guid>
<pubDate>Wed, 10 Nov 2021 00:00:00 +0000</pubDate>
<description>
<![CDATA[
<h2 id="introduction">Introduction<a href="#introduction"></a></h2>
<p>
  One of the many techniques for solving ordinary differential
  equations involves using an <em>integrating factor</em>.  An
  integrating factor is a function that a differential equation is
  multiplied by to simplify it and make it integrable.  It almost
  appears to work like magic!
</p>
<h2 id="method">The Method<a href="#method"></a></h2>
<p>
  Let us first see how the integrating factor method works.  In this
  post, we will work with linear first-order ordinary differential
  equations of type

  \[
    \frac{dy}{dx} + y P(x) = Q(x)
  \]

  to discuss, reason about and illustrate this method.  We will also
  often use the Leibniz's notation \( dy/dx \) and the Lagrange's
  notation \( y'(x) \) or simply \( y' \) interchangeably as is
  typical in calculus.  They all mean the same thing: the derivative
  of the function \( y \) with respect to \( x.  \)  Thus the above
  differential equation may also be written as

  \[
    y' + y P(x) = Q(x).
  \]

  Given a differential equation of this form, we first find an
  integrating factor \( M(x) \) using the formula

  \[
    M(x) = e^{\int P(x) \, dx}.
  \]

  Then we multiply both sides of the differential equation by this
  integrating factor.  Now remarkably, the left-hand side (LHS)
  reduces to a single term consisting only of a derivative.  As a
  result, we can get rid of that derivative by integrating both sides
  of the equation and we then proceed to obtain a solution.
</p>
<h2 id="example">An Example<a href="#example"></a></h2>
<p>
  Here is an example that demonstrates the method of using an
  integrating factor.  Let us say we want to solve the differential
  equation

  \[
    y' + y \left( \frac{x + 1}{x} \right) = \frac{1}{x}.
  \]

  Indeed this is in the form \( y' + y P(x) = Q(x) \) with \( P(x) =
  (x + 1)/x \) and \( Q(x) = 1/x.  \)  We first obtain the integrating
  factor

  \[
    M(x)
    = e^{\int P(x) \, dx}
    = e^{\int (x + 1)/x \, dx}
    = e^{\int (1 + 1/x) \, dx}
    = e^{x + \ln x}
    = x e^x.
  \]

  Now we multiply both sides of the differential equation by this
  integrating factor and get

  \[
    y' x e^x + y (x + 1) e^x = e^x.
  \]

  The LHS can now be simplified to \( \frac{d}{dx} (y x e^x).  \)  This
  can be verified using the product rule for derivatives.  This
  simplification of the LHS is the remarkable feature of this method.
  Therefore the above equation can be written as

  \[
    \frac{d}{dx} (y x e^x) = e^x.
  \]

  Note that the expression on the LHS is a product of the function \(
  y \) and the integrating factor \( x e^x.  \)  We will discuss this
  observation in more detail a little later.  Let us first complete
  solving this differential equation.  Since the LHS is now a single
  term that consists of a derivative, obtaining a solution now simply
  involves integrating both sides with respect to \( x.  \)
  Integrating both sides we get

  \[
    y x e^x = e^x + C
  \]

  where \( C \) is the constant of integration.  Finally, we divide
  both sides by the integrating factor \( x e^x \) to get

  \[
    y = \frac{1}{x} + \frac{C}{x e^x}.
  \]

  We have now obtained a solution for the differential equation.  If
  we review the steps above, we will find that after multiplying both
  sides of the given differential equation by the integrating factor,
  the differential equation becomes significantly simpler and
  integrable.  In fact, after multiplying both sides of the given
  differential equation by the integrating factor, the LHS always
  becomes the derivative of the product of the function \( y \) and
  the integrating factor.  We will now see why this is so.
</p>
<h2 id="interesting-relationship">An Interesting Relationship<a href="#interesting-relationship"></a></h2>
<p>
  Consider once again the linear first-order differential equation

  \begin{equation}
    \label{eq-if-diff}
    y' + yP(x) = Q(x).
  \end{equation}

  We first find the integrating factor

  \begin{equation}
    \label{eq-if-integrating-factor}
    M(x) = e^{\int P(x)\, dx}.
  \end{equation}

  The integrating factor obtained like this satisfies an interesting
  relationship:

  \begin{equation}
    \label{eq-if-property}
    M'(x) = M(x) P(x).
  \end{equation}

  We can prove this relationship easily by differentiating both sides
  of \( \eqnref{eq-if-integrating-factor}{2} \) as follows:

  \[
    M'(x)
    = \frac{d}{dx} \left( e^{\int P(x)\, dx} \right)
    = e^{\int P(x)\, dx} \frac{d}{dx} \left( \int P(x)\, dx \right)
    = M(x) P(x).
  \]

  Note that we use the chain rule to work out the derivative above.
  This beautiful result is due to how the derivative of the
  exponential function works.  When we apply the chain rule to obtain
  the derivative of \( e^{f(x)} \) we get

  \[
    \frac{d}{dx} e^{f(x)} = e^{f(x)} f'(x).
  \]

  This nice property of the exponential function leads to the
  interesting relationship in \( \eqnref{eq-if-property}{3}.  \)
</p>
<h2 id="simplification-of-lhs">Simplification of LHS<a href="#simplification-of-lhs"></a></h2>
<p>
  Now let us multiply both sides of the differential equation \(
  \eqnref{eq-if-diff}{1} \) by the integrating factor \( M(x) \) By
  doing so, we get

  \[
    y' M(x) + y P(x) M(x) = Q(x) M(x).
  \]

  But from \( \eqnref{eq-if-property}{3} \) we know that \( P(x) M(x)
  = M'(x), \) so the above equation can be written as

  \[
    y' M(x) + y M'(x) = Q(x) M(x).
  \]

  Look what we have got on the LHS!  We have the expansion of \(
  \frac{d}{dx}(yM(x)) \) on the LHS.  By product rule of
  differentiation, we have
  \( \frac{d}{dx}(yM(x)) = y' M(x) + y M'(x).  \)  Therefore the above
  equation can be written as

  \[
    \frac{d}{dx}(yM(x)) = Q(x) M(x).
  \]

  The "magic" has occurred here!  Multiplying both sides of the
  differential equation by the integrating factor has led us to an
  equation that has got a single derivative only on the LHS.  As a
  result, finding the solution is now a simple matter of integrating
  both sides, i.e.

  \[
    y M(x) = \int Q(x) M(x) \, dx.
  \]

  Thus

  \[
    y = \frac{1}{M(x)} \int Q(x) M(x) \, dx.
  \]

  Note that the result of indefinite integral on the RHS will contain
  the constant of integration, which we will denote as \( C, \) so the
  final solution looks like

  \begin{equation}
    \label{eq-if-general-solution}
    y = \frac{1}{M(x)} \int Q(x) M(x) \, dx + \frac{C}{M(x)}.
  \end{equation}
</p>
<h2 id="illustration">Illustration<a href="#illustration"></a></h2>
<p>
  Let us illustrate the method and its magic with a very simple
  differential equation:

  \[
    y' + \frac{y}{x} = x.
  \]

  First we note that this equation is in the form
  \( y' + yP(x) = Q(x) \) with \( P(x) = 1/x \) and \( Q(x) = x.  \)
  We then find the integrating factor

  \[
  M(x)
  = e^{\int P(x) \, dx}
  = e^{\int \frac{1}{x} \, dx}
  = e^{\ln x}
  = x.
  \]

  Then we multiply both sides of the differential equation by the
  integrating factor to get

  \[
    y'x + y = x^2.
  \]

  Now indeed the LHS can be written down as a single derivative as
  shown below:

  \[
    \frac{d}{dx} yx = x^2.
  \]

  Note that the LHS is the derivative of the product of \( y \) and
  the integrating factor \( x.  \)  This is exactly what we discussed
  in the previous section.  We integrate both sides of the above
  equation to get

  \[
    yx = \frac{x^3}{3} + C.
  \]

  Finally we divide both sides by the integrating factor \( x \) to
  get

  \[
    y = \frac{x^2}{3} + \frac{C}{x}.
  \]

  We have arrived at the solution \( y(x) \) for the differential
  equation.
</p>
<h2 id="conclusion">Conclusion<a href="#conclusion"></a></h2>
<p>
  In this post, we used very simple and convenient differential
  equations that led to nice closed-form solutions.  In practice,
  differential equations can be quite complicated and may not always
  lead to closed-form solutions.  In such cases, we leave the result
  in the form of an expression that contains an unsolved integral.
  Such solutions may resemble the form shown in
  \( \eqnref{eq-if-general-solution}{4}.  \)
</p>
<p>
  The method of using integrating factors to solve differential
  equations can also be extended to linear higher-order differential
  equations.  That is something we did not discuss in this post.
  However, I hope that the intuition gained from understanding how and
  why this method works for linear first-order differential equations
  will be useful while studying such extensions of this method.
</p>
<!-- ### -->
<p>
  <a href="https://susam.net/integrating-factor.html">Read on website</a> |
  <a href="https://susam.net/tag/mathematics.html">#mathematics</a>
</p>
]]>
</description>
</item>


</channel>
</rss>
