How X Window Managers Work, And How To Write One (Part II)

In Part I of this series, we examined the role of X window managers in a modern Linux/BSD desktop environment, and how they interact with the X server and applications. In Part II, we will dig into the dirty details and walk through the code of an example reparenting non-compositing window manager, basic_wm.

Introduction

Before we start with the code, let’s go over a couple of basic implementation choices such as language and API.

Language

You can write a window manager in Haskell, Python, Lisp, Go, Java, or any other language that has X bindings, i.e. a library for communicating with X servers.

I chose C++ for basic_wm, our example window manager, mainly because the C libraries for X11 are the best documented. In addition to books such as the Xlib Programming Manual, documentation can be found in the form of widely available man pages (e.g., try man XOpenDisplay at a terminal). Example usage and common patterns abound in the source code of many great window managers written in the past three decades.

We will use C++11 and C++14 features where convenient, so you will need a compatible compiler (GCC 4.9 or higher, or Clang 3.4 or higher) if you want to play with the example source code.

A Tale of Two X Libraries

There are two official C libraries for X: Xlib and XCB. Xlib, hailing from 1985, was the original X client library, and was the only official X client library until the introduction of XCB in 2001. The two libraries have very different philosophies: whereas Xlib tries to hide the X protocol behind a friendly C API with lots of bells and whistles, XCB directly exposes the plumbing beneath.

In practice, this different manifests itself most prominently in how the two libraries handle the fundamental asynchronous nature of X’s client-server architecture. Xlib attempts to hide the asynchronous X protocol behind a mixed synchronous and asynchronous API, whereas XCB exposes a fully asynchronous API.

For example, to lookup the attributes (e.g., size and position) of a window, you would write the following code using Xlib:

XWindowAttributes attrs;
XGetWindowAttributes(display, window, &attrs);
// Do stuff.

Under the hood, XGetWindowAttributes() sends a request to the X server and blocks until it receives a response; in other words, it is synchronous. On the other hand, using XCB, you would write this instead:

xcb_get_window_attributes_cookie_t cookie =
    xcb_get_window_attributes(
        connection, window);
// Do other stuff while waiting for reply.
xcb_get_window_attributes_reply_t* reply =
    xcb_get_window_attributes_reply(
        connection, cookie, nullptr);
// Do stuff.
free(reply);

The function xcb_get_window_attributes merely sends the request to the X server, and returns immediately without waiting for the reply; in other words, it is asynchronous. The client program must subsequently call xcb_get_window_attributes_reply to block on the response.

The advantage of the asynchronous approach is obvious if we consider an example where we need to retrieve the attributes of, say, 5 windows at once. Using XCB, we can immediately fire off all 5 requests to the X server, and then wait for all of them to return. With Xlib, we have send one request at a time and wait for its response to come back before we can send the next request. Therefore, we’d expect to only block for the duration of one round-trip to the X server using XCB, compared to 5 with Xlib.

The downside of XCB’s fully asynchronous approach is verbosity and a less programmer-friendly interface. The Xlib code above looks like your average C library call; the XCB code above is significantly more involved.

However, it is important to note that Xlib isn’t fully synchronous. Rather, Xlib has a mixture of synchronous and asynchronous APIs. In general, functions that do not return values (e.g., XResizeWindow, which changes the size of a window) are asynchronous, while functions that return values (e.g., XGetGeometry, which return the size and position of a window) are synchronous:

Xlib saves up requests instead of sending them to the server immediately, so that the client program can continue running instead of waiting to gain access to the network after every Xlib call. This is possible because most Xlib calls do not require immediate action by the server. This grouping of requests by the client before sending them over the network also increases the performance of most networks, because it makes the network transactions longer and less numerous, reducing the total overhead involved.

Xlib sends the buffer full of requests to the server under three conditions.

The most common is when an application calls an Xlib routine to wait for an event but no matching event is currently available on Xlib’s queue. Since, in this case, the application must wait for an appropriate event anyway, it makes sense to flush the request buffer.

Second, Xlib calls that get information from the server require a reply before the program can continue, and therefore, the request buffer is sent and all the requests acted on before the information is returned.

Third, the client would like to be able to flush the request buffer manually in situations where no user events and no calls to query the server are expected. One good example of this third case is an animated game, where the display changes even when there is no user input.

— Xlib Programming Manual §2.1.2

This is the most confusing aspect of Xlib, and a source of endless frustration for those new to X programming. One of the major motivations for the creation of XCB was to eliminate this complexity.

Many popular window managers have already been ported to XCB from Xlib for the performance benefits. If you are interested, you can read up on how the Awesome and KWin window managers were ported to XCB.

I chose to use Xlib for basic_wm, however, because as a pedagogical example, readability and simplicity is much more important than performance. In fact, I would recommend starting with Xlib first for any project and worry about porting to XCB later, as Xlib is much easier to learn and prototype with.

While an in-depth discussion of the merits of Xlib and XCB is beyond the scope of this discussion, I do recommend you check out the official article on Xlib vs. XCB as it presents a fascinating case study of API design.

Dependencies and Building

Firstly, you will need Xlib development headers in order to compile against Xlib. They are available on Debian/Ubuntu as libx11-dev, on Fedora as libX11-devel, and on Arch Linux as part of libx11.

The only additional library used by the example basic_wm code is google-glog, Google’s open source C++ logging library. It is available on Debian/Ubuntu as libgoogle-glog-dev, on Fedora as glog-devel, and on Arch Linux as google-glog.

The recommended way to build the source code is with GNU Make: just run make in the source directory. Alternatively, g++ *.cpp will also do the trick if you supply all the libraries correctly.

To test the window manager, you will likely need Xephyr along with a couple of simple X programs such as xeyes or xterm.

Step 1: Setup and Teardown

Let’s start off with a skeleton implementation of the WindowManager class, which will encapsulate all the window management logic in our example. All it will do for now is set up a connection to the X server on construction, and close that connection on destruction.

extern "C" {
#include <X11/Xlib.h>
}
#include <memory>

class WindowManager {
 public:
  // Factory method for establishing a connection to an X server and creating a
  // WindowManager instance.
  static ::std::unique_ptr<WindowManager> Create();
  // Disconnects from the X server.
  ~WindowManager();
  // The entry point to this class. Enters the main event loop.
  void Run();

 private:
  // Invoked internally by Create().
  WindowManager(Display* display);

  // Handle to the underlying Xlib Display struct.
  Display* display_;
  // Handle to root window.
  const Window root_;
};
#include "window_manager.hpp"
#include <glog/logging.h>

using ::std::unique_ptr;

unique_ptr<WindowManager> WindowManager::Create() {
  // 1. Open X display.
  Display* display = XOpenDisplay(nullptr);
  if (display == nullptr) {
    LOG(ERROR) << "Failed to open X display " << XDisplayName(nullptr);
    return nullptr;
  }
  // 2. Construct WindowManager instance.
  return unique_ptr<WindowManager>(new WindowManager(display));
}

WindowManager::WindowManager(Display* display)
    : display_(CHECK_NOTNULL(display)),
      root_(DefaultRootWindow(display_)) {
}

WindowManager::~WindowManager() {
  XCloseDisplay(display_);
}

void WindowManager::Run() { /* TODO */ }

The main function in main.cpp:

#include <cstdlib>
#include <glog/logging.h>
#include "window_manager.hpp"

using ::std::unique_ptr;

int main(int argc, char** argv) {
  ::google::InitGoogleLogging(argv[0]);

  unique_ptr<WindowManager> window_manager(WindowManager::Create());
  if (!window_manager) {
    LOG(ERROR) << "Failed to initialize window manager.";
    return EXIT_FAILURE;
  }

  window_manager->Run();

  return EXIT_SUCCESS;
}

Even if you have never programmed Xlib before, this should not be hard to understand. WindowManager::Create() is a static factory method that sets up a connection to an X server via XOpenDisplay(); we will let XOpenDisplay() figure out which X server to connect to from the DISPLAY environment variable. The connection is represented by the opaque Display structure. We call XCloseDisplay() on the saved Display* in the destructor to close the connection.

The other function of note is DefaultRootWindow(), which returns the default root window for a given X server. Technically, an X server may have several root windows in some rare multihead setups, but let’s not worry about that here.

If you run this program now, it should connect to the X server, close the connection, and exit. Hooray!

Step 2: Initialization

Now, let’s dig into the mysterious Run() function above. We’ll start with the initialization steps required after opening an X server connection. In window_manager.hpp:

class WindowManager {
  ...
  // Xlib error handler. It must be static as its address is passed to Xlib.
  static int OnXError(Display* display, XErrorEvent* e);
  // Xlib error handler used to determine whether another window manager is
  // running. It is set as the error handler right before selecting substructure
  // redirection mask on the root window, so it is invoked if and only if
  // another window manager is running. It must be static as its address is
  // passed to Xlib.
  static int OnWMDetected(Display* display, XErrorEvent* e);
  // Whether an existing window manager has been detected. Set by OnWMDetected,
  // and hence must be static.
  static bool wm_detected_;
};
void WindowManager::Run() {
  // 1. Initialization.
  //   a. Select events on root window. Use a special error handler so we can
  //   exit gracefully if another window manager is already running.
  wm_detected_ = false;
  XSetErrorHandler(&WindowManager::OnWMDetected);
  XSelectInput(
      display_,
      root_,
      SubstructureRedirectMask | SubstructureNotifyMask);
  XSync(display_, false);
  if (wm_detected_) {
    LOG(ERROR) << "Detected another window manager on display "
               << XDisplayString(display_);
    return;
  }
  //   b. Set error handler.
  XSetErrorHandler(&WindowManager::OnXError);

  // 2. Main event loop.
  ...
}

int WindowManager::OnWMDetected(Display* display, XErrorEvent* e) {
  // In the case of an already running window manager, the error code from
  // XSelectInput is BadAccess. We don't expect this handler to receive any
  // other errors.
  CHECK_EQ(static_cast<int>(e->error_code), BadAccess);
  // Set flag.
  wm_detected_ = true;
  // The return value is ignored.
  return 0;
}

int WindowManager::OnXError(Display* display, XErrorEvent* e) { /* Print e */ }

We first select substructure redirection and substructure notify events on the root window. This is discussed in more detail in the Substructure Redirection section in Part I; to recap, this allows the window manager to intercept requests from top level windows, and subscribe to events concerning the same. Only one X client can select substructure redirection on the root window at any given time; the second client to attempt to do so will get a BadAccess error.

Catching this error is somewhat tricky, however. XSelectInput, like all asynchronous Xlib functions, does not actually send a request to the X server, but instead only queues the request and returns. Hence, we have to explicitly flush the request queue with XSync (see our discussion above in A Tale of Two X Libraries). We set up a temporary error handler, OnWMDetected, to catch errors during this XSync invocation.

Next, we set up our regular error handler which will be invoked for any future errors. Our implementation, which logs the error and continues, will be an important debugging aid as we implement and test our window manager. I will not show it here for the sake of brevity; for reference, check it out in window_manager.cpp.

Step 3: The Event Loop

Now let’s add to Run() method above the signature construct of every modern GUI program - the event loop. In window_manager.cpp:

void WindowManager::Run() {
  // 1. Initialization.
  ...

  // 2. Main event loop.
  for (;;) {
    // 1. Get next event.
    XEvent e;
    XNextEvent(display_, &e);
    LOG(INFO) << "Received event: " << ToString(e);

    // 2. Dispatch event.
    switch (e.type) {
      case CreateNotify:
        OnCreateNotify(e.xcreatewindow);
        break;
      case DestroyNotify:
        OnDestroyNotify(e.xdestroywindow);
        break;
      case ReparentNotify:
        OnReparentNotify(e.xreparent);
        break;
      ...
      // etc. etc.
      ...
      default:
        LOG(WARNING) << "Ignored event";
    }
  }
}

If you have done low-level GUI programming before, this should look very familiar. We sit in an event loop and repeatedly fetch the next event with XNextEvent() and dispatch it to the appropriate handlers.

The structure of the XEvent type is typical of a polymorphic C structure. Each type of event carries different attributes and corresponds to an event struct, such as XKeyEvent, XButtonEvent, and XConfigureEvent. The first field of each struct is always int type. The XEvent type is a C union of all the event structs plus int type:

typedef struct _XKeyEvent {
  int type;
  // Fields specific to XKeyEvent.
  ...
} XKeyEvent;

typedef struct _XButtonEvent {
  int type;
  // Fields specific to XButtonEvent.
  ...
} XButtonEvent;

// etc.
...

typedef union _XEvent {
  int type;
  XKeyEvent xkey;
  XButtonEvent xbutton;
  // etc.
  ...
} XEvent;

This way, the type is always available regardless of the type of event and requires no additional storage. The same pattern can be observed in GTK+/GLib, Python’s C API, and many other object-oriented C APIs.

In basic_wm, the event handlers follow the naming convention of OnFoo(), where Foo is the type of the event, so it should be straightforward to figure out who does what.

What’s Next

We now have a basic skeleton for our window manager, and we can start filling in the meat - the event handlers. The million-dollar question is, what events does a window manager handle, and what should it do with them?

In the next installment in this series, we’ll answer that question by diving into the complex ways window managers, clients and the user interact with each other via X events. In the meantime, you’re more than welcome to check out the code for basic_wm on GitHub.