Wednesday, April 20, 2016

Terminal Multiplexers: Simplified and Unified

I have a love-hate relationship with terminal multiplexers. I've been using screen for years, but mostly on remote servers, and mostly just to keep something running on logout and later get back to it again. But on my home machine or my laptop, I've avoided terminal multiplexers like the plague, mostly because of their strange (or is "horrid" more appropriate?) user interfaces.

For a really long time now, I've simply used LXTerminal and its tabs as a crutch, but I've recently grown rather tired of that approach. When you're writing (more or less complicated) client/server software, it really pays off to have both ends running side-by-side: switching tabs, even with a quick key combination, gets old fast. Also LXTerminal lacks quite a few features, true-color among them. What I really wanted to use was st, but that lean beast doesn't even have a scrollback buffer (so forget about tabs or a contextual menu).

Wait, so why don't I just use a tiling window manager like dwm and open several terminals? Sadly I've been a spoiled Openbox person since 2008 and a spoiled "overlapping windows" person since 1987 or so (thank the Amiga for that). I like the idea of a tiling window manager exactly for a bunch of terminals and not much else. In the long run I may actually become a dwm nut, but not just yet.

So I had to face it, the time was right to actually learn a terminal multiplexer for real. But which one? For text editors there's an easy (if controversial) answer: just learn vi or emacs, both of those you're likely to find on any UNIX system you may ever have to work with. (Heck even busybox has a vi clone.) That seems to suggest that I should spend time on learning screen for real: It's the oldest terminal multiplexer out there, so it's most likely to be available just about everywhere.

The only problem is that it sucks. If you want to see why it sucks, just start htop in screen. I don't care if that's a terminfo problem or not, in screen the htop interface looks messed up. It's still usable, but you know, the eyes want to be pleased as well (yes, even in the terminal). But it sucks even more: Just split the terminal and then run cat </dev/urandom in one of the splits. Chances are you'll get random garbage spewed all over the other split as well. Doesn't inspire much confidence in how well-isolated those splits are, does it?

So of course I tried tmux next, a much more recent project and maybe "better" because it doesn't have as much buggy legacy code. Sadly it immediately fails the htop test as well, but at least it does a little better when hit with the random hammer: No more spewing into the other split, but still the status line gets trashed and once you stop hammering some of the UI elements are just a little out of it. A little groggy most likely?

One more attempt, let's try dvtm instead. If you don't know, that's a dwm clone inside the terminal. And wow, it passes both the htop test and the random hammer with flying colors, only a few of the border elements get trashed. On the downside, it's least likely to be installed on a random UNIX system you need to work on. That, and it has a bunch of opinionated layout assumptions that may not be to your liking. I, however, like them just fine, at least for the most part.

At this point I started asking myself what I actually need in my terminal multiplexer, and I arrived at a rather short list of features:

  • create a new terminal window with a shell running in it (splitting the terminal area horizontally or vertically or at least sanely)
  • switch from one terminal window to another quickly and reliably
  • destroy an existing terminal window and whatever is running in it

That's really it as far as interactive use on my home machine or laptop is concerned. Sure, being able to detach and reattach sessions later is great, especially when working remotely. But for that use-case I already have screen wired into my brain and fingers. (Also it turns out that dvtm doesn't persist sessions in any which way, so I'd need to use another tool like dtach or (more likely) abduco with dvtm.)

So what am I to do? Which one of these am I to learn and put deep into my muscle memory over the next few weeks? That's when it struck me: I can probably learn none of them and all of them at the same time! After all, I have very few requirements (see above) and as luck would have it, all of the tools are highly configurable. Especially in regards to their keybindings! Can it be done?

Can I configure all three tools in such a way that one set of keybindings will get me all the features I actually need?

Let's see what we find in each program for each use-case, then try to generalize from there. Let's start with screen:

  • CTRL-a-c create a new full-sized window (and start a shell)
  • CTRL-a-n switch to the next full-sized window
  • CTRL-a-p switch to the previous full-sized window
  • CTRL-a-S split current region horizontally (no shell started)
  • CTRL-a-| split current region vertically (no shell started)
  • CTRL-a-TAB switch to the next region
  • CTRL-a-X remove current region (shell turns into full-sized window)
  • CTRL-a-k kill current window/region (including the shell)
  • CTRL-a-d detach screen

So obviously the concepts of window and region are a bit strange, but for better or worse that's what we get. I am pretty sure that regions (splitting one terminal into several "subterminals") was added later? In terms of my uses cases it's a bit sad that creating a new region does not also immediately launch a shell in it, instead I have to move to the new region with CTRL-a-TAB and then hit CTRL-a-c to do that manually. It's also strange that removing a region doesn't kill the shell running in it but turns that shell into a "window" instead. And of course navigating windows is different from navigating regions. Let's see how tmux does it:

  • CTRL-b-c create new full-sized window (and start a shell)
  • CTRL-b-n switch to next full-sized window
  • CTRL-b-p switch to previous full-sized window
  • CTRL-b-" split current pane horizontally (and start a shell)
  • CTRL-b-% split current pane vertically (and start a shell)
  • CTRL-b-o switch to next pane
  • CTRL-b-UP/DOWN/LEFT/RIGHT switch to pane in that direction
  • CTRL-b-x kill current pane (as well as shell running in it)
  • CTRL-b-& kill current window (as well as shell running in it)
  • CTRL-b-d detach tmux

So we can pretty much do the same things, but not quite. That's annoying. Note how creating a new "pane" launches a shell and how killing a "pane" kills the shell running in it? That's exactly what screen doesn't do. Also note that tmux has a notion of "direction" with regards to panes, something completely lacking in screen. And note that "windows" are still different from "panes" in terms of navigation. Finally, what's dvtm like?

  • CTRL-g-c create a new window (and start a shell, splitting automatically)
  • CTRL-g-x close current window (and kill the shell running in it)
  • CTRL-g-TAB switch to previously selected window
  • CTRL-g-j switch to next window
  • CTRL-g-k switch to previous window
  • CTRL-g-SPACE switch between defined layouts
  • CTRL-\ detach (using abduco, no native session support)

Note that dvtm's notion of "window" unifies what the other two tools call "window" (full-sized) and "region" or "pane" (split-view). But the biggest difference is that dvtm actually has an opinion about what the layout should be. You can choose between a few predefined layouts (with CTRL-g-SPACE) but that's it. One of these layouts shows one "window" at a time, but the navigation between "windows" stays consistent regardless of how many you can see in your terminal.

So what's the outcome here? Personally, I very much prefer what dvtm does over what tmux does over what screen does. Obviously your mileage may vary, but what I'll try to do here is make all the programs behave (more or less, and within reason) like dvtm.

Of course there are plenty of issues to resolve. For starters, two of the programs care about whether a split is vertical (top/bottom) or horizontal (left/right) while one doesn't. The simplest solution I can think of is to have separate key bindings for each but to map them to the same underlying command in dvtm. This way I'll learn the key bindings for the worst case while enjoying the features of the best case.

Next screen doesn't automatically start a shell in a new split while the other two do. Luckily that's easy to resolve by writing new and slightly more complex key bindings. There are a few additional issues we'll get into below when we talk about how to configure each program, but first we need to "switch gears" as it were: It's time to do some very basic user interface design.

Let's face the most important decision first, namely which "leader key" should I use? We have CTRL-a, CTRL-b, and CTRL-g as precedents, but we don't necessarily have to follow those.

I played with various keys for a while to see what's easiest for me to trigger. My favorite "leader keys" would have to be CTRL-c and CTRL-d. Sadly, those already have "deep meaning" for shells, so I don't want to use either of them. The next-best keys (for my hands anyway) would be CTRL-e and CTRL-f. Sadly, CTRL-e is used for "move to end of line" in bash and I tend to use that quite a bit; screen's CTRL-a sort of shares that problem. In terms of "finger distance" I also find CTRL-a "too close and squishy" and CTRL-b/CTRL-g "too far and stretchy" for myself.

Based on this very unscientific analysis, I ended up picking CTRL-f as the "unified leader key" for all three multiplexers. The drawback of this is that I'll train my muscle memory for "the wrong key" but between three programs, any key would have been "the right key" only for one of them anyway. Big deal. (Yes, in bash CTRL-f triggers "move one character forward" but that's not really a big loss either, is it?)

With the leader settled, how shall we map each use-case to actual key strokes? The guiding principle I'll follow is to rank things by how often I am likely to do them. (Of course that's just a guess for now, if it turns out to be wrong I'll adapt later.) I believe that the ranking is as follows:

  1. Navigate between splits.
  2. Create a new split.
  3. Remove the current split.

If that's true, then moving to the next split should be on the same key as the leader, so the sequence would be CTRL-f-f. That even works mnemonically if we interpret "f" as "forward" or something. Now let's find keys to create splits. At first I thought "something around f" would be good, the idea being that I'd hit the key with the same finger I used to hit "f" a moment before. But that's actually slower than hitting a key with another finger on the same hand. The way I type, I hit "f" with my index finger and my middle finger naturally hovers over "w" in the process. So CTRL-f-w it is, after all that's again a mnemonic, for "window" this time. But what kind of split should it be, horizontal or vertical? I settled on horizontal because in dvtm's default layout, the first split will also be horizontal (left/right). So I'll get some consistency after all. The other key that's quick for me to hit is "e" so CTRL-f-e shall be the vertical (top/bottom) split. That leaves removing the current split, and for that CTRL-f-r is good enough, again with some mnemonic goodness. Here's the summary:

  • CTRL-f-f switch to next split
  • CTRL-f-w split horizontally (and start a new shell)
  • CTRL-f-e split vertically (and start a new shell)
  • CTRL-f-r remove current split (and terminate the shell)

Sounds workable to me. All that remains is actually making the three programs behave "similarly enough" for those keystrokes. As per usual, we'll start with screen. The file to create/edit is ~/.screenrc and here's what we'll do:

hardstatus ignore
startup_message off
escape ^Ff
bind f eval "focus"
bind ^f eval "focus"
bind e eval "split" "focus" "screen"
bind ^e eval "split" "focus" "screen"
bind w eval "split -v" "focus" "screen"
bind ^w eval "split -v" "focus" "screen"
bind r eval "kill" "remove"
bind ^r eval "kill" "remove"

I'll admit right away that my understanding of "hardstatus" is lacking. What I am hoping this command does is turn off any status line that would cost us terminal real estate: I want to focus on the applications I am using, not on the terminal multiplexer and what it thinks of the world. The "startup_message" bit just makes sure that there's no such thing; many distros disable it by default anyway, but for some strange reason Ubuntu leaves it on. (I am all for giving people credit for their work, but that message requires a key press to go away and therefore it's annoying as heck.)

The remaining lines simply establish the key bindings described above, albeit in a somewhat repetitive manner: I define each key twice, once with CTRL and once without, that way it doesn't matter how quickly I release the CTRL key during the sequence of key presses. (If you have a more concise way of doing the same thing, please let me know!)

We should briefly look at what we lose compared to the default key bindings. Our use of "f" overrides "flow control" and luckily I cannot think of many reasons why there should be "flow control" these days. Our use of "w" overrides "list of windows" but since I intend to mostly use splits that's not a big problem. Our use of "e" comes for free because it's not used in the default configuration. Finally, our use of "r" overrides line-wrapping, something I don't imagine caring about a lot. So we really don't lose too much of the basic functionality here, do we?

Next we need to configure tmux. The file to create/edit is ~/.tmux.conf and here's how that one works:

set-option -g status off
set-option -g prefix C-f
unbind-key C-b
bind-key C-f send-prefix
bind-key f select-pane -t :.+
bind-key C-f select-pane -t :.+
bind-key w split-window -h
bind-key C-w split-window -h
bind-key e split-window
bind-key C-e split-window
bind-key r kill-pane
bind-key C-r kill-pane

Different syntax, same story. First we switch off the status bar to get one more line of terminal real estate back. Then we change the leader key to CTRL-f and define (repetitively, I know) the key bindings we settled on above. Easy. (Well, except for figuring out the select-pane arguments, I need to credit Josh Clayton for that. Yes, it's in the man page, but it's hard to grok at first.)

What do we lose? By using "f" we lose the "find text in windows" functionality, something I don't foresee having much use for. By using "w" we lose "choose window interactively" which seems equally useless. Luckily "e" is once again a freebie, a key not used by the default configuration. Finally, by using "r" we lose "force redraw" which is hopefully not something I'll need very often. Seems alright by me!

Last but not least, let's configure dvtm. In true suckless style there is of course no configuration file because that would "attract too many users with stupid questions" and the like. (Arrogance really is bliss, isn't it? Let me just say for the record that I really appreciate the suckless ideals and their software. But that doesn't change the fact that configuration files are convenient for all users (not just idiots!) who don't want to manually compile every last bit of code on their machines. But I digress...) So we have to edit config.h and recompile the application, which in Gentoo amounts to (a) setting the savedconfig USE flag, (b) editing the file

/etc/portage/savedconfig/app-misc/dvtm-0.14

and then (c) re-emerging the application. Here's the (grisly?) gist of it:

#define BAR_POS BAR_OFF
#define MOD CTRL('f')
...
{{MOD, 'f',}, {focusnext, {NULL}}},
{{MOD, CTRL('f'),}, {focusnext, {NULL}}},
{{MOD, 'w',}, {create, {NULL}}},
{{MOD, CTRL('w'),}, {create, {NULL}}},
{{MOD, 'e',}, {create, {NULL}}},
{{MOD, CTRL('e'),}, {create, {NULL}}},
{{MOD, 'r',}, {killclient, {NULL}}},
{{MOD, CTRL('r'),}, {killclient, {NULL}}},

There are a few more modifications I didn't show, but just to remove existing key bindings that conflict with ours. Which brings us to the question what we lose. Our use of "f" costs us the ability to choose one specific layout, something I can live without for sure. Our use of "w" is a freebie, it's unused by default. Our use of "e" costs us "copymode" which, so far, I didn't need; eventually I may have to revisit this decision and maybe remap the functionality elsewhere. Finally, our use of "r" costs us being able to "redraw" the screen, something I hope I won't need too much.

Wow, what a project. Had I known I'd spend about 10 hours on learning all the relevant things and writing this blog post, maybe I would not ever have started. But now that it's all done, I am actually enjoying the fruits of my labor: Splitting my terminal regardless of what ancient UNIX system I am on? Using fancier tools on newer machines, including my own? Debugging client/server stuff with a lot less need to grab the mouse and click something? It's paradise! Well, close to it anyway...

Update 2016/04/23: My "unified" configuration files are now available on github.com if you want to grab them directly.

2 comments: