Sorting in Emacs

By Susam Pal on 09 Aug 2023

In this article, we will perform a series of hands-on experiments that demonstrate the various Emacs commands that can be used to sort text in different ways. There is sufficient documentation available for these commands in the Emacs and Elisp manuals. In this article, however, we will take a look at some concrete examples to illustrate how they work.

Sorting Lines

Our first set of experiments demonstrates different ways to sort lines. Follow the steps below to perform these experiments.

  1. First create a buffer that has the following text:

    Carol  200  London  LHR->SFO
    Dan    20   Tokyo   HND->LHR
    Bob    100  London  LCY->CDG
    Alice  10   Paris   CDG->LHR
    Bob    30   Paris   ORY->HND
    

    Let us pretend that each line is a record that represents some details about different persons. From left to right, we have each person's name, some sort of numerical ID, their current location, and their upcoming travel plan. For example, the first line says that Carol from London is planning to travel from London Heathrow (LHR) to San Francisco (SFO).

  2. Type C-x h to mark the whole buffer and type M-x sort-lines RET to sort lines alphabetically. The buffer looks like this now:

    Alice  10   Paris   CDG->LHR
    Bob    100  London  LCY->CDG
    Bob    30   Paris   ORY->HND
    Carol  200  London  LHR->SFO
    Dan    20   Tokyo   HND->LHR
    
  3. Type C-x h followed by C-u M-x sort-lines RET to reverse sort lines alphabetically. The key sequence C-u specifies a prefix argument that indicates that a reverse sort must be performed. The buffer looks like this now:

    Dan    20   Tokyo   HND->LHR
    Carol  200  London  LHR->SFO
    Bob    30   Paris   ORY->HND
    Bob    100  London  LCY->CDG
    Alice  10   Paris   CDG->LHR
    
  4. Type C-x h followed by M-x sort-fields RET to sort the lines by the first field only. Fields are separated by whitespace. Note that the result now is slightly different from the result of M-x sort-lines RET presented in point 2 earlier. Here Bob from Paris comes before Bob from London because the sorting was performed by the first field only. The sorting algorithm ignored the rest of each line. However in point 2 earlier, Bob from London came before Bob from Paris because the sorting was performed by entire lines.

    Alice  10   Paris   CDG->LHR
    Bob    30   Paris   ORY->HND
    Bob    100  London  LCY->CDG
    Carol  200  London  LHR->SFO
    Dan    20   Tokyo   HND->LHR
    
  5. Type C-x h followed by M-2 M-x sort-fields RET to sort the lines alphabetically by the second field. The key sequence M-2 here specifies a numeric argument that identifies the field we want to sort by. Note that 100 comes before 20 because we performed an alphabetical sort, not numerical sort. The result looks like this:

    Alice  10   Paris   CDG->LHR
    Bob    100  London  LCY->CDG
    Dan    20   Tokyo   HND->LHR
    Carol  200  London  LHR->SFO
    Bob    30   Paris   ORY->HND
    
  6. Type C-x h followed by M-2 M-x sort-numeric-fields RET to sort the lines numerically by the second field. The result looks like this:

    Alice  10   Paris   CDG->LHR
    Dan    20   Tokyo   HND->LHR
    Bob    30   Paris   ORY->HND
    Bob    100  London  LCY->CDG
    Carol  200  London  LHR->SFO
    
  7. Type C-x h followed by M-3 M-x sort-fields RET to sort the lines alphabetically by the third field containing city names. The result looks like this:

    Bob    100  London  LCY->CDG
    Carol  200  London  LHR->SFO
    Alice  10   Paris   CDG->LHR
    Bob    30   Paris   ORY->HND
    Dan    20   Tokyo   HND->LHR
    

    Note that we cannot supply the prefix argument C-u to this command to perform a reverse sort by a specific field because the prefix argument here is used to identify the field we need to sort by. If we do specify the prefix argument C-u, it would be treated as the numeric argument 4 which would sort the lines by the fourth field. However, there is a little trick to reverse sort lines by a specific field. The next point shows this.

  8. Type C-x h followed by M-x reverse-region RET. This reverses the order of lines in the region. Combined with the previous command, this effectively reverse sorts the lines by city names. The result looks like this:

    Dan    20   Tokyo   HND->LHR
    Bob    30   Paris   ORY->HND
    Alice  10   Paris   CDG->LHR
    Carol  200  London  LHR->SFO
    Bob    100  London  LCY->CDG
    
  9. Type C-x h followed by M-- M-2 M-x sort-fields RET to sort the lines alphabetically by the second field from the right (third from the left). Note that the first two key combinations are meta+- and meta+2. They specify the negative argument -2 to sort the lines by the second field from the right. The result looks like this:

    Carol  200  London  LHR->SFO
    Bob    100  London  LCY->CDG
    Bob    30   Paris   ORY->HND
    Alice  10   Paris   CDG->LHR
    Dan    20   Tokyo   HND->LHR
    
  10. Type M-< to move the point to the beginning of the buffer. Then type C-s London RET followed by M-b to move the point to the beginning of the word London on the first line. Now type C-SPC to set a mark there.

    Then type C-4 C-n C-e to move the point to the end of the last line. An active region should be visible in the buffer now.

    Finally type M-x sort-columns RET to sort the columns bounded by the column positions of mark and point (i.e., the last two columns). The result looks like this:

    Bob    100  London  LCY->CDG
    Carol  200  London  LHR->SFO
    Alice  10   Paris   CDG->LHR
    Bob    30   Paris   ORY->HND
    Dan    20   Tokyo   HND->LHR
    
  11. Like before, type M-< to move the point to the beginning of the buffer. Then type C-s London RET followed by M-b to move the point to the beginning of the word London on the first line. Now type C-SPC to set a mark there.

    Again, like before, type C-4 C-n C-e to move the point to the end of the last line. An active region should be visible in the buffer now.

    Now type C-u M-x sort-columns RET to reverse sort the last two columns.

    Dan    20   Tokyo   HND->LHR
    Bob    30   Paris   ORY->HND
    Alice  10   Paris   CDG->LHR
    Carol  200  London  LHR->SFO
    Bob    100  London  LCY->CDG
    
  12. Warning: This step shows how not to use the sort-regexp-fields command. In most cases you probably do not want to do this. The next point shows a typical usage of this command that is correct in most cases.

    Type C-x h followed by M-x sort-regexp-fields RET [A-Z]*->\(.*\) RET \1 RET to sort by the destination airport. This command first matches the destination aiport in each line in a regular expression capturing group (\(.*\)). Then we ask this command to sort the lines by the field matched by this capturing group (\1). The result looks like this:

    Dan    20   Tokyo   LCY->CDG
    Bob    30   Paris   ORY->HND
    Alice  10   Paris   HND->LHR
    Carol  200  London  CDG->LHR
    Bob    100  London  LHR->SFO
    

    Observe how all our travel records are messed up in this result. Now Dan from Tokyo is travelling from LCY to CDG instead of travelling from HND to LHR. Compare the results in this point with that of the previous point. This command has sorted the destination fields fine and it has maintained the association between the source airport and destination airport fine too. But the association between the other fields (first three columns) and the last field (source and destination airports) is broken. This happened because the regular expression matches only the last column and we sorted by only the destination field of the last column, so the association of the fields in the last column is kept intact but the rest of the association is broken. Only the part of each line that is matched by the regular expression moves around while the sorting is performed; everything else remains unchanged. This behaviour may be useful in some limited situations but in most cases, we want to keep the association between all the fields intact. The next point shows how to do this.

    Now type C-/ (or C-x u) to undo this change and revert the buffer to the previous good state. After doing this, the buffer should look like the result presented in the previous point.

  13. Assuming the state of the buffer is same as that of the result in point 11, we will now see how to alter the previous step such that when we sort the lines by the destination field, entire lines move along with the destination fields. The trick is to ensure that the regular expression matches entire lines. To do so, we make a minor change in the regular expression. Type C-x h followed by M-x sort-regexp-fields RET .*->\(.*\) RET \1 RET.

    Bob    100  London  LCY->CDG
    Bob    30   Paris   ORY->HND
    Dan    20   Tokyo   HND->LHR
    Alice  10   Paris   CDG->LHR
    Carol  200  London  LHR->SFO
    

    Now the lines are sorted by the destination field and Dan from Tokyo is travelling from HND to LHR.

  14. Type C-x h followed by M-- M-x sort-regexp-fields RET .*->\(.*\) RET \1 RET to reverse sort the lines by the destination airport. Note that the first key combination is meta+- here. This key combination specifies a negative argument that results in a reverse sort. The result looks like this:

    Carol  200  London  LHR->SFO
    Dan    20   Tokyo   HND->LHR
    Alice  10   Paris   CDG->LHR
    Bob    30   Paris   ORY->HND
    Bob    100  London  LCY->CDG
    
  15. Finally, note that we can always invoke shell commands on a region and replace the region with the output of the shell command. To see this in action, first prepare the buffer by typing M-< followed by C-k C-k C-y C-y to duplicate the first line of the buffer.

    Then type C-x h followed by C-u M-| sort -u RET to sort the lines but remove duplicate lines during the sort operation. The M-| key sequence invokes the command shell-command-on-region which prompts for a shell command, executes it, and usually displays the output in the echo area. If the output cannot fit in the echo area, then it displays the output in a separate buffer. However, if a prefix argument is supplied, say with C-u, then it replaces the region with the output. As a result, the buffer now looks like this:

    Alice  10   Paris   CDG->LHR
    Bob    100  London  LCY->CDG
    Bob    30   Paris   ORY->HND
    Carol  200  London  LHR->SFO
    Dan    20   Tokyo   HND->LHR
    

    This particular problem of removing duplicates while sorting can be also be accomplished by typing C-x h followed by M-x sort-lines RET and then C-x h followed by M-x delete-duplicate-lines. Nevertheless, it is useful to know that we can execute arbitrary shell commands on a region.

Sorting Paragraphs and Pages

We have covered most of the sorting commands mentioned in the Emacs manual in the previous section. Now we will switch gears and discuss a few more of the remaining ones. We will no longer sort individual lines but paragraphs and pages instead.

  1. First create a buffer with the content provided below. Note that the text below contains three form feed characters. In Emacs, they are displayed as ^L. Many web browsers generally do not display them. The ^L symbols that we see in the text below have been overlayed with CSS. But there are actual form feed characters next to those overlays. If you are viewing this post with any decent web browser, you can copy the text below into your Emacs and you should be able to see the form feed characters in Emacs. In case you do not, insert them yourself by typing C-q C-l.

    Emacs is an advanced, extensible, customisable,
    self-documenting editor.
    
    Emacs editing commands operate in terms of
    characters, words, lines, sentences, paragraphs,
    pages, expressions, comments, etc.
    
    We will use the term frame to mean a graphical
    window or terminal screen occupied by Emacs.
    
    At the very bottom of the frame is an echo area.
    The main area of the frame, above the echo area,
    is called the window.
    
    The cursor in the selected window shows the
    location where most editing commands take effect,
    which is called point.
    
    If you are editing several files in Emacs, each in
    its own buffer, each buffer has its own value of
    point.
    
    
  2. Our text has six paragraphs spread across three pages. Each form feed character represents a page break. Type C-x h followed by M-x sort-pages RET to sort the pages alphabetically. Note how the second page moves to the bottom because it begins with the letter "W". The buffer now looks like this now:

    Emacs is an advanced, extensible, customisable,
    self-documenting editor.
    
    Emacs editing commands operate in terms of
    characters, words, lines, sentences, paragraphs,
    pages, expressions, comments, etc.
    
    The cursor in the selected window shows the
    location where most editing commands take effect,
    which is called point.
    
    If you are editing several files in Emacs, each in
    its own buffer, each buffer has its own value of
    point.
    
    We will use the term frame to mean a graphical
    window or terminal screen occupied by Emacs.
    
    At the very bottom of the frame is an echo area.
    The main area of the frame, above the echo area,
    is called the window.
    
    
  3. Finally, type C-x h followed by M-x sort-paragraphs to sort the paragraphs alphabetically. The buffer looks like this now:

    At the very bottom of the frame is an echo area.
    The main area of the frame, above the echo area,
    is called the window.
    
    Emacs editing commands operate in terms of
    characters, words, lines, sentences, paragraphs,
    pages, expressions, comments, etc.
    
    Emacs is an advanced, extensible, customisable,
    self-documenting editor.
    
    If you are editing several files in Emacs, each in
    its own buffer, each buffer has its own value of
    point.
    
    The cursor in the selected window shows the
    location where most editing commands take effect,
    which is called point.
    
    We will use the term frame to mean a graphical
    window or terminal screen occupied by Emacs.
    
    

References

To read and learn more about the sorting commands described above refer to the following resources:

Within Emacs, type the following commands to read these manuals:

Further, the documentation strings for these commands have useful information too. Use the key sequence C-h f to look up the documentation strings. For example, type C-h f sort-regexp-fields RET to look up the documentation string for the sort-regexp-fields command.

Comments | #emacs | #technology