How X Window Managers Work, And How To Write One (Part III)

In Part II of this series, we discussed X libraries and implementation choices, and examined the basic structure of a window manager. In Part III, we will start interacting with client windows and the user through events. We will review the fundamentals of window manager implementation, using the implementation in our example non-compositing reparenting window manager, basic_wm, for reference.

Step 4: Interaction with Application Windows

Following the steps in Part II of this series, we now have a basic skeleton for our window manager. Our next step is to start talking to clients and the user via events.

The interaction between clients, X, and the window manager is fairly complex. To facilitate our discussion, I’ve created a diagram that illustrates the flow of events throughout the lifetime of a client window, and how a window manager might respond to each of them. We’ll be referring to this cheat sheet for window manager event handling throughout this series. You can click through for the full-sized diagram.

In general, a window manager must handle two kinds of actions: those initiated by client applications (such as creating new windows), and those initiated by users (such as moving or minimizing windows). In this diagram, actions initiated by client applications are shown in the yellow box on the left hand side, and actions initiated by users are shown in blue on the right hand side. A window manager communicates with client applications via events, which are represented as parallelograms in red.

You may have noticed that some of the events in this diagram have the suffix Request, while others have the suffix Notify. This distinction is crucial to our discussion.

Recalling our discussion in Part I on substructure redirection, when a client application wants to do something with a window (such as moving, resizing, showing, or hiding), its request is redirected to the window manager, which can grant, modify, or deny the request. Such requests are delivered to a window manager as events with the Request suffix. It is important to understand that when a window manager receives such an event, the action it represents has not actually occurred, and it is the responsibility of the window manager to decide what to do with it. If the window manager does nothing, the request is implicitly denied.

On the other hand, events with the Notify suffix represent actions that have already been executed by the X server. The window manager can respond to such events, but of course cannot change the fact that they have already happened.

With that in mind, let’s dive into the implementation by looking at how our example window manager will handle the life cycle of a client window from creation to destruction.

Creating a Window

When an X client application creates a top-level window (XCreateWindow()), our window manager will receive a CreateNotify event. However, a newly created window is always invisible, so there’s nothing for our window manager to do. In window_manager.cpp:

void WindowManager::Run() {
  ...
  // 2. Main event loop.
  for (;;) {
    // 1. Get next event.
    ...
    // 2. Dispatch event.
    switch (e.type) {
      ...
      case CreateNotify:
        OnCreateNotify(e.xcreatewindow);
        break;
      ...
    }
  }
}

void WindowManager::OnCreateNotify(const XCreateWindowEvent& e) {}

Configuring a Newly Created Window

At this stage, the application can configure the window to set its initial size, position, or other attributes. To do so, the application would invoke XConfigureWindow(), which would send a ConfigureRequest event to the window manager. However, since the window is still invisible, the window manager doesn’t need to care and can grant such requests without modification by invoking XConfigureWindow() itself with the same parameters.

void WindowManager::Run() {
      ...
      case ConfigureRequest:
        OnConfigureRequest(e.xconfigurerequest);
        break;
      ...
}

void WindowManager::OnConfigureRequest(const XConfigureRequestEvent& e) {
  XWindowChanges changes;
  // Copy fields from e to changes.
  changes.x = e.x;
  changes.y = e.y;
  changes.width = e.width;
  changes.height = e.height;
  changes.border_width = e.border_width;
  changes.sibling = e.above;
  changes.stack_mode = e.detail;
  // Grant request by calling XConfigureWindow().
  XConfigureWindow(display_, e.window, e.value_mask, &changes);
  LOG(INFO) << "Resize " << e.window << " to " << Size<int>(e.width, e.height);
}

Mapping a Window

To make the window finally visible on screen, the client application will call XMapWindow() to map it. This sends a MapRequest event to the window manager. As noted earlier, at this point, the window is still not yet visible, as it’s up to the window manager to actually make it so. This is probably the most important event in our discussion, as this is where a window manager would usually start really managing a window.

A reparenting window manager would typically respond to a MapRequest for a client application window w with the following actions:

  1. Create a frame window f, perhaps with borders and window decoration (e.g. title, minimize / maximize / close buttons).

  2. Register for substructure redirect on f with XSelectInput(). Recall that substructure redirect only applies to direct child windows, so after reparenting, the substructure redirect previously registered on the root window would no longer apply to w, hence this step.

  3. Make w a child of f with XReparentWindow().

  4. Render f and w with XMapWindow().

  5. Register for mouse or keyboard shortcuts on w and/or f.

The example implementation in basic_wm will create a very simple frame window that has the same size as the client window, but with a 3px red border:

void WindowManager::Run() {
      ...
      case MapRequest:
        OnMapRequest(e.xmaprequest);
        break;
      ...
}

void WindowManager::OnMapRequest(const XMapRequestEvent& e) {
  // 1. Frame or re-frame window.
  Frame(e.window);
  // 2. Actually map window.
  XMapWindow(display_, e.window);
}

void WindowManager::Frame(Window w) {
  // Visual properties of the frame to create.
  const unsigned int BORDER_WIDTH = 3;
  const unsigned long BORDER_COLOR = 0xff0000;
  const unsigned long BG_COLOR = 0x0000ff;

  // 1. Retrieve attributes of window to frame.
  XWindowAttributes x_window_attrs;
  CHECK(XGetWindowAttributes(display_, w, &x_window_attrs));

  // 2. TODO - see Framing Existing Top-Level Windows section below.

  // 3. Create frame.
  const Window frame = XCreateSimpleWindow(
      display_,
      root_,
      x_window_attrs.x,
      x_window_attrs.y,
      x_window_attrs.width,
      x_window_attrs.height,
      BORDER_WIDTH,
      BORDER_COLOR,
      BG_COLOR);
  // 3. Select events on frame.
  XSelectInput(
      display_,
      frame,
      SubstructureRedirectMask | SubstructureNotifyMask);
  // 4. Add client to save set, so that it will be restored and kept alive if we
  // crash.
  XAddToSaveSet(display_, w);
  // 5. Reparent client window.
  XReparentWindow(
      display_,
      w,
      frame,
      0, 0);  // Offset of client window within frame.
  // 6. Map frame.
  XMapWindow(display_, frame);
  // 7. Save frame handle.
  clients_[w] = frame;
  // 8. Grab events for window management actions on client window.
  //   a. Move windows with alt + left button.
  XGrabButton(...);
  //   b. Resize windows with alt + right button.
  XGrabButton(...);
  //   c. Kill windows with alt + f4.
  XGrabKey(...);
  //   d. Switch windows with alt + tab.
  XGrabKey(...);

  LOG(INFO) << "Framed window " << w << " [" << frame << "]";
}

The outline of the code should be fairly clear following our discussion. A few additional points to note:

  • Regarding the save-set and XAddToSaveSet():

    The save-set is a list of windows, usually maintained by the window manager, but including only windows created by other clients. If the window manager dies, all windows listed in the save-set will be reparented back to their closest living ancestor if they were reparented in the first place and mapped if the window manager has unmapped them so that it could map an icon.

    The save-set is necessary because the window manager might not exit normally. The user might kill it with CTRL-C if it is running in the foreground, or more likely, the user might get the process number and kill it. Actually, the actions of the save-set are performed even if the window manager exits normally, so less code is needed since the save-set does the cleaning up.

    Window managers almost always place in the save-set all the windows they reparent or iconify, using XAddToSaveSet().

    Windows are automatically removed from the save-set when they are destroyed.

    — Xlib Programming Manual §16.4
  • When our window manager creates a frame window (step 2 in the example code), it will also trigger a CreateNotify event for the frame window. It will ignore it just like it ignores other CreateNotify events as discussed earlier.

  • When our window manager calls XReparentWindow() in step 5 in the example code, it will trigger a ReparentNotify event, which it will ignore:

    void WindowManager::Run() {
          ...
          case ReparentNotify:
            OnReparentNotify(e.xreparent);
            break;
          ...
    }
    
    void WindowManager::OnReparentNotify(const XReparentEvent& e) {}
    
  • When our window manager calls XMapWindow() to map the frame window (step 6 in the example code), the X server knows that the action originates from the current window manager, and will execute it directly instead of redirecting it back as a MapRequest event. Our window manager will later receive a MapNotify event, which it can ignore:

    void WindowManager::Run() {
          ...
          case MapNotify:
            OnMapNotify(e.xmap);
            break;
          ...
    }
    
    void WindowManager::OnMapNotify(const XMapEvent& e) {}
    

Configuring a Mapped Window

A client application can configure a window that is currently visible, again with the XConfigureWindow() function. For example, an application may want to resize a window to better accomodate its contents. When a reparenting window manager receives the resulting ConfigureRequest and decides to grant the request, it additionally needs to resize / reposition the corresponding frame window and any window decorations.

void WindowManager::OnConfigureRequest(const XConfigureRequestEvent& e) {
  XWindowChanges changes;
  // Copy fields from e to changes.
  ...
  if (clients_.count(e.window)) {
    const Window frame = clients_[e.window];
    XConfigureWindow(display_, frame, e.value_mask, &changes);
    LOG(INFO) << "Resize [" << frame << "] to " << Size<int>(e.width, e.height);
  }
  // Grant request by calling XConfigureWindow().
  ...
}

When our window manager re-configures the frame window with the XConfigureWindow() call above, the X server knows that the action originates from the current window manager, and will execute it directly instead of redirecting it back as a ConfigureRequest event. Our window manager will then receive a ConfigureNotify event, which it will ignore:

void WindowManager::Run() {
      ...
      case ConfigureNotify:
        OnConfigureNotify(e.xconfigure);
        break;
      ...
}

void WindowManager::OnConfigureNotify(const XConfigureEvent& e) {}

Unmapping a Window

When a client application unmaps (i.e. hides) a window with XUnmapWindow(), for example in response to the user exiting or minimizing the application, the window manager will receive a UnmapNotify event. Unlike the MapRequest event, the UnmapNotify event is delivered to the window manager after the fact, and the window manager can only respond to it, not intercept it.

A reparenting window manager will typically want to reverse the actions it performed in response to MapRequest. In other words, it would reparent the client window back to the root window, and destroy the corresponding frame window.

void WindowManager::Run() {
      ...
      case UnmapNotify:
        OnUnmapNotify(e.xunmap);
        break;
      ...
}

void WindowManager::OnUnmapNotify(const XUnmapEvent& e) {
  // If the window is a client window we manage, unframe it upon UnmapNotify. We
  // need the check because we will receive an UnmapNotify event for a frame
  // window we just destroyed ourselves.
  if (!clients_.count(e.window)) {
    LOG(INFO) << "Ignore UnmapNotify for non-client window " << e.window;
    return;
  }

  Unframe(e.window);
}

void WindowManager::Unframe(Window w) {
  // We reverse the steps taken in Frame().
  const Window frame = clients_[w];
  // 1. Unmap frame.
  XUnmapWindow(display_, frame);
  // 2. Reparent client window back to root window.
  XReparentWindow(
      display_,
      w,
      root_,
      0, 0);  // Offset of client window within root.
  // 3. Remove client window from save set, as it is now unrelated to us.
  XRemoveFromSaveSet(display_, w);
  // 4. Destroy frame.
  XDestroyWindow(display_, frame);
  // 5. Drop reference to frame handle.
  clients_.erase(w);

  LOG(INFO) << "Unframed window " << w << " [" << frame << "]";
}

A few additional points to note:

  • When our window manager unmaps the frame window with XUnmapWindow() in step 1 in the example code above, it will again receive a corresponding UnmapNotify event. This is the reason why the UnmapNotify event handler needs to check that the unmapped window is an actual client window.

  • When our window manager makes the client window a direct child of the root window with XReparentWindow() in step 2 above, it will receive a ReparentNotify event. As discussed in the Mapping a Window section above, this ReparentNotify event will be ignored.

  • When our window manager destroys the frame window with XDestroyWindow() in step 4, it will trigger a DestroyNotify event. This event will also be ignored, as shown in the next section.

At this point, the client window has become invisible, but not yet destroyed. It can be displayed again with a call to XMapWindow(), which would take us back to the Mapping a Window step. It could also be reconfigured in this state, which would take us back to the Configuring a Newly Created Window step.

Destroying a Window

When a client application exits or no longer needs a window, it will call XDestroyWindow() to dispose of the window. This triggers a DestroyNotify event. In our case, there’s nothing we need to do in response.

void WindowManager::Run() {
      ...
      case DestroyNotify:
        OnDestroyNotify(e.xdestroywindow);
        break;
      ...
}

void WindowManager::OnDestroyNotify(const XDestroyWindowEvent& e) {}

Framing Existing Top-Level Windows

Now that we’ve walked through the life cycle of a client window, from creation to destruction, let’s turn our attention to the problem of existing top-level windows.

You may recall from Part I that X applications in general run just fine without a window manager. Depending on how an X session is started (e.g. xinitrc), by the time a window manager starts, any number of windows may have already been created by other applications. Additionally, the user can kill a running window manager and replace it with a different window manager, without affecting windows from other applications.

Therefore, when our window manager starts up, it needs to handle any existing top-level windows that are already mapped. As a reparenting window manager, it will invoke the same Frame() function on such windows as if these windows are being mapped for the first time:

void WindowManager::Run() {
  // 1. Initialization.
  //   a. Select events on root window. Use a special error handler so we can
  //   exit gracefully if another window manager is already running.
  ...
  //   b. Set error handler.
  ...
  //   c. Grab X server to prevent windows from changing under us while we
  //   frame them.
  XGrabServer(display_);
  //   d. Frame existing top-level windows.
  //     i. Query existing top-level windows.
  Window returned_root, returned_parent;
  Window* top_level_windows;
  unsigned int num_top_level_windows;
  CHECK(XQueryTree(
      display_,
      root_,
      &returned_root,
      &returned_parent,
      &top_level_windows,
      &num_top_level_windows));
  CHECK_EQ(returned_root, root_);
  //     ii. Frame each top-level window.
  for (unsigned int i = 0; i < num_top_level_windows; ++i) {
    Frame(top_level_windows[i], true /* was_created_before_window_manager */);
  }
  //     iii. Free top-level window array.
  XFree(top_level_windows);
  //   e. Ungrab X server.
  XUngrabServer(display_);

  // 2. Main event loop.
  ...
}

void WindowManager::OnMapRequest(const XMapRequestEvent& e) {
  // 1. Frame or re-frame window.
  Frame(e.window, false /* was_created_before_window_manager */);
  ...
}

void WindowManager::Frame(Window w, bool was_created_before_window_manager) {
  ...
  // 1. Retrieve attributes of window to frame.
  ...
  // 2. If window was created before window manager started, we should frame
  // it only if it is visible and doesn't set override_redirect.
  if (was_created_before_window_manager) {
    if (x_window_attrs.override_redirect ||
        x_window_attrs.map_state != IsViewable) {
      return;
    }
  }
  // 3. Create frame.
  ...
}

void WindowManager::OnUnmapNotify(const XUnmapEvent& e) {
  ...

  // Ignore event if it is triggered by reparenting a window that was mapped
  // before the window manager started.
  //
  // Since we receive UnmapNotify events from the SubstructureNotify mask, the
  // event attribute specifies the parent window of the window that was
  // unmapped. This means that an UnmapNotify event from a normal client window
  // should have this attribute set to a frame window we maintain. Only an
  // UnmapNotify event triggered by reparenting a pre-existing window will have
  // this attribute set to the root window.
  if (e.event == root_) {
    LOG(INFO) << "Ignore UnmapNotify for reparented pre-existing window "
              << e.window;
    return;
  }

  Unframe(e.window);
}

Some additional things to note:

  • You may notice that the process of framing existing top-level windows is guarded by XGrabServer() and XUngrabServer(). From the Xlib Programming Manual:

    These functions can be used to control processing of output on other connections by the window system server. While the server is grabbed, no processing of requests or close downs on any other connection will occur.

    — Xlib Programming Manual §9.5

    By grabbing the X server, our window manager ensures that, between the time when it fetches the list of existing top-level windows and when it finishes framing them, no other application can interfere and mess up our state: no new windows can be created, and no existing windows can be modified or destroyed.

  • The override_redirect attribute, if set to true, indicates that a window should not be managed by window managers. From the Xlib Programming Manual:

    To control window placement or to add decoration, a window manager often needs to intercept (redirect) any map or configure request. Pop-up windows, however, often need to be mapped without a window manager getting in the way. […]

    The override-redirect flag specifies whether map and configure requests on this window should override a SubstructureRedirectMask on the parent. You can set the override-redirect flag to True or False (default). Window managers use this information to avoid tampering with pop-up windows […].

    — Xlib Programming Manual §3.2.8

    The reason our window manager doesn’t need to check for this attribute except at start up is that the X server knows not to redirect events from such windows:

    The window manager […] will normally ignore windows that are mapped with their override_redirect attribute set, since no *Request events will be generated for them.

    — Xlib Programming Manual §16.3
  • The map_state attribute indicates whether a window is currently visible (mapped). When Frame() is invoked for pre-existing windows during start up, we want to ignore windows that are currently unmapped. However, when Frame() is invoked during the event loop as part of the MapRequest handler, we know that the client window to be framed is necessarily still unmapped, as our window manager wouldn’t have granted the request yet.

  • You might be wondering why an additional check for e.event == root_ is needed in the UnmapNotify handler. It turns out that reparenting an already mapped window (XReparentWindow()) will trigger a pair of UnmapNotify and MapNotify events in addition to ReparentNotify. Therefore, when we enter into the event loop, we will receive an UnmapNotify event for every pre-existing top-level window we reparented. We can distinguish these events by their event attribute, which in this case represents the parent of the client window. Normally, when a client window we already framed is unmapped, the event attribute would be its frame window. But when a pre-existing window is reparented at start up, the event attribute in the resulting UnmapNotify event will be its original parent - i.e., the root window.

What’s Next

At this point, we have a basic but functional reparenting window manager that will correctly handle the life cycle of windows. If you strip out window decorations, shortcuts and fancy UI, the core structure of every X window manager will quite closely resemble what we have here.

In our next installment, we will improve the user-facing functionality of our window manager by adding ways to move, resize and close windows. In the meantime, you’re more than welcome to check out the code for basic_wm on GitHub.

How X Window Managers Work, And How To Write One (Part II)

In Part I of this series, we examined the role of X window managers in a modern Linux/BSD desktop environment, and how they interact with the X server and applications. In Part II, we will dig into the dirty details and walk through the code of an example reparenting non-compositing window manager, basic_wm.

Introduction

Before we start with the code, let’s go over a couple of basic implementation choices such as language and API.

Language

You can write a window manager in Haskell, Python, Lisp, Go, Java, or any other language that has X bindings, i.e. a library for communicating with X servers.

I chose C++ for basic_wm, our example window manager, mainly because the C libraries for X11 are the best documented. In addition to books such as the Xlib Programming Manual, documentation can be found in the form of widely available man pages (e.g., try man XOpenDisplay at a terminal). Example usage and common patterns abound in the source code of many great window managers written in the past three decades.

We will use C++11 and C++14 features where convenient, so you will need a compatible compiler (GCC 4.9 or higher, or Clang 3.4 or higher) if you want to play with the example source code.

A Tale of Two X Libraries

There are two official C libraries for X: Xlib and XCB. Xlib, hailing from 1985, was the original X client library, and was the only official X client library until the introduction of XCB in 2001. The two libraries have very different philosophies: whereas Xlib tries to hide the X protocol behind a friendly C API with lots of bells and whistles, XCB directly exposes the plumbing beneath.

In practice, this different manifests itself most prominently in how the two libraries handle the fundamental asynchronous nature of X’s client-server architecture. Xlib attempts to hide the asynchronous X protocol behind a mixed synchronous and asynchronous API, whereas XCB exposes a fully asynchronous API.

For example, to lookup the attributes (e.g., size and position) of a window, you would write the following code using Xlib:

XWindowAttributes attrs;
XGetWindowAttributes(display, window, &attrs);
// Do stuff.

Under the hood, XGetWindowAttributes() sends a request to the X server and blocks until it receives a response; in other words, it is synchronous. On the other hand, using XCB, you would write this instead:

xcb_get_window_attributes_cookie_t cookie =
    xcb_get_window_attributes(
        connection, window);
// Do other stuff while waiting for reply.
xcb_get_window_attributes_reply_t* reply =
    xcb_get_window_attributes_reply(
        connection, cookie, nullptr);
// Do stuff.
free(reply);

The function xcb_get_window_attributes merely sends the request to the X server, and returns immediately without waiting for the reply; in other words, it is asynchronous. The client program must subsequently call xcb_get_window_attributes_reply to block on the response.

The advantage of the asynchronous approach is obvious if we consider an example where we need to retrieve the attributes of, say, 5 windows at once. Using XCB, we can immediately fire off all 5 requests to the X server, and then wait for all of them to return. With Xlib, we have send one request at a time and wait for its response to come back before we can send the next request. Therefore, we’d expect to only block for the duration of one round-trip to the X server using XCB, compared to 5 with Xlib.

The downside of XCB’s fully asynchronous approach is verbosity and a less programmer-friendly interface. The Xlib code above looks like your average C library call; the XCB code above is significantly more involved.

However, it is important to note that Xlib isn’t fully synchronous. Rather, Xlib has a mixture of synchronous and asynchronous APIs. In general, functions that do not return values (e.g., XResizeWindow, which changes the size of a window) are asynchronous, while functions that return values (e.g., XGetGeometry, which return the size and position of a window) are synchronous:

Xlib saves up requests instead of sending them to the server immediately, so that the client program can continue running instead of waiting to gain access to the network after every Xlib call. This is possible because most Xlib calls do not require immediate action by the server. This grouping of requests by the client before sending them over the network also increases the performance of most networks, because it makes the network transactions longer and less numerous, reducing the total overhead involved.

Xlib sends the buffer full of requests to the server under three conditions.

The most common is when an application calls an Xlib routine to wait for an event but no matching event is currently available on Xlib’s queue. Since, in this case, the application must wait for an appropriate event anyway, it makes sense to flush the request buffer.

Second, Xlib calls that get information from the server require a reply before the program can continue, and therefore, the request buffer is sent and all the requests acted on before the information is returned.

Third, the client would like to be able to flush the request buffer manually in situations where no user events and no calls to query the server are expected. One good example of this third case is an animated game, where the display changes even when there is no user input.

— Xlib Programming Manual §2.1.2

This is the most confusing aspect of Xlib, and a source of endless frustration for those new to X programming. One of the major motivations for the creation of XCB was to eliminate this complexity.

Many popular window managers have already been ported to XCB from Xlib for the performance benefits. If you are interested, you can read up on how the Awesome and KWin window managers were ported to XCB.

I chose to use Xlib for basic_wm, however, because as a pedagogical example, readability and simplicity is much more important than performance. In fact, I would recommend starting with Xlib first for any project and worry about porting to XCB later, as Xlib is much easier to learn and prototype with.

While an in-depth discussion of the merits of Xlib and XCB is beyond the scope of this discussion, I do recommend you check out the official article on Xlib vs. XCB as it presents a fascinating case study of API design.

Dependencies and Building

Firstly, you will need Xlib development headers in order to compile against Xlib. They are available on Debian/Ubuntu as libx11-dev, on Fedora as libX11-devel, and on Arch Linux as part of libx11.

The only additional library used by the example basic_wm code is google-glog, Google’s open source C++ logging library. It is available on Debian/Ubuntu as libgoogle-glog-dev, on Fedora as glog-devel, and on Arch Linux as google-glog.

The recommended way to build the source code is with GNU Make: just run make in the source directory. Alternatively, g++ *.cpp will also do the trick if you supply all the libraries correctly.

To test the window manager, you will likely need Xephyr along with a couple of simple X programs such as xeyes or xterm.

Step 1: Setup and Teardown

Let’s start off with a skeleton implementation of the WindowManager class, which will encapsulate all the window management logic in our example. All it will do for now is set up a connection to the X server on construction, and close that connection on destruction.

extern "C" {
#include <X11/Xlib.h>
}
#include <memory>

class WindowManager {
 public:
  // Factory method for establishing a connection to an X server and creating a
  // WindowManager instance.
  static ::std::unique_ptr<WindowManager> Create();
  // Disconnects from the X server.
  ~WindowManager();
  // The entry point to this class. Enters the main event loop.
  void Run();

 private:
  // Invoked internally by Create().
  WindowManager(Display* display);

  // Handle to the underlying Xlib Display struct.
  Display* display_;
  // Handle to root window.
  const Window root_;
};
#include "window_manager.hpp"
#include <glog/logging.h>

using ::std::unique_ptr;

unique_ptr<WindowManager> WindowManager::Create() {
  // 1. Open X display.
  Display* display = XOpenDisplay(nullptr);
  if (display == nullptr) {
    LOG(ERROR) << "Failed to open X display " << XDisplayName(nullptr);
    return nullptr;
  }
  // 2. Construct WindowManager instance.
  return unique_ptr<WindowManager>(new WindowManager(display));
}

WindowManager::WindowManager(Display* display)
    : display_(CHECK_NOTNULL(display)),
      root_(DefaultRootWindow(display_)) {
}

WindowManager::~WindowManager() {
  XCloseDisplay(display_);
}

void WindowManager::Run() { /* TODO */ }

The main function in main.cpp:

#include <cstdlib>
#include <glog/logging.h>
#include "window_manager.hpp"

using ::std::unique_ptr;

int main(int argc, char** argv) {
  ::google::InitGoogleLogging(argv[0]);

  unique_ptr<WindowManager> window_manager(WindowManager::Create());
  if (!window_manager) {
    LOG(ERROR) << "Failed to initialize window manager.";
    return EXIT_FAILURE;
  }

  window_manager->Run();

  return EXIT_SUCCESS;
}

Even if you have never programmed Xlib before, this should not be hard to understand. WindowManager::Create() is a static factory method that sets up a connection to an X server via XOpenDisplay(); we will let XOpenDisplay() figure out which X server to connect to from the DISPLAY environment variable. The connection is represented by the opaque Display structure. We call XCloseDisplay() on the saved Display* in the destructor to close the connection.

The other function of note is DefaultRootWindow(), which returns the default root window for a given X server. Technically, an X server may have several root windows in some rare multihead setups, but let’s not worry about that here.

If you run this program now, it should connect to the X server, close the connection, and exit. Hooray!

Step 2: Initialization

Now, let’s dig into the mysterious Run() function above. We’ll start with the initialization steps required after opening an X server connection. In window_manager.hpp:

class WindowManager {
  ...
  // Xlib error handler. It must be static as its address is passed to Xlib.
  static int OnXError(Display* display, XErrorEvent* e);
  // Xlib error handler used to determine whether another window manager is
  // running. It is set as the error handler right before selecting substructure
  // redirection mask on the root window, so it is invoked if and only if
  // another window manager is running. It must be static as its address is
  // passed to Xlib.
  static int OnWMDetected(Display* display, XErrorEvent* e);
  // Whether an existing window manager has been detected. Set by OnWMDetected,
  // and hence must be static.
  static bool wm_detected_;
};
void WindowManager::Run() {
  // 1. Initialization.
  //   a. Select events on root window. Use a special error handler so we can
  //   exit gracefully if another window manager is already running.
  wm_detected_ = false;
  XSetErrorHandler(&WindowManager::OnWMDetected);
  XSelectInput(
      display_,
      root_,
      SubstructureRedirectMask | SubstructureNotifyMask);
  XSync(display_, false);
  if (wm_detected_) {
    LOG(ERROR) << "Detected another window manager on display "
               << XDisplayString(display_);
    return;
  }
  //   b. Set error handler.
  XSetErrorHandler(&WindowManager::OnXError);

  // 2. Main event loop.
  ...
}

int WindowManager::OnWMDetected(Display* display, XErrorEvent* e) {
  // In the case of an already running window manager, the error code from
  // XSelectInput is BadAccess. We don't expect this handler to receive any
  // other errors.
  CHECK_EQ(static_cast<int>(e->error_code), BadAccess);
  // Set flag.
  wm_detected_ = true;
  // The return value is ignored.
  return 0;
}

int WindowManager::OnXError(Display* display, XErrorEvent* e) { /* Print e */ }

We first select substructure redirection and substructure notify events on the root window. This is discussed in more detail in the Substructure Redirection section in Part I; to recap, this allows the window manager to intercept requests from top level windows, and subscribe to events concerning the same. Only one X client can select substructure redirection on the root window at any given time; the second client to attempt to do so will get a BadAccess error.

Catching this error is somewhat tricky, however. XSelectInput, like all asynchronous Xlib functions, does not actually send a request to the X server, but instead only queues the request and returns. Hence, we have to explicitly flush the request queue with XSync (see our discussion above in A Tale of Two X Libraries). We set up a temporary error handler, OnWMDetected, to catch errors during this XSync invocation.

Next, we set up our regular error handler which will be invoked for any future errors. Our implementation, which logs the error and continues, will be an important debugging aid as we implement and test our window manager. I will not show it here for the sake of brevity; for reference, check it out in window_manager.cpp.

Step 3: The Event Loop

Now let’s add to Run() method above the signature construct of every modern GUI program - the event loop. In window_manager.cpp:

void WindowManager::Run() {
  // 1. Initialization.
  ...

  // 2. Main event loop.
  for (;;) {
    // 1. Get next event.
    XEvent e;
    XNextEvent(display_, &e);
    LOG(INFO) << "Received event: " << ToString(e);

    // 2. Dispatch event.
    switch (e.type) {
      case CreateNotify:
        OnCreateNotify(e.xcreatewindow);
        break;
      case DestroyNotify:
        OnDestroyNotify(e.xdestroywindow);
        break;
      case ReparentNotify:
        OnReparentNotify(e.xreparent);
        break;
      ...
      // etc. etc.
      ...
      default:
        LOG(WARNING) << "Ignored event";
    }
  }
}

If you have done low-level GUI programming before, this should look very familiar. We sit in an event loop and repeatedly fetch the next event with XNextEvent() and dispatch it to the appropriate handlers.

The structure of the XEvent type is typical of a polymorphic C structure. Each type of event carries different attributes and corresponds to an event struct, such as XKeyEvent, XButtonEvent, and XConfigureEvent. The first field of each struct is always int type. The XEvent type is a C union of all the event structs plus int type:

typedef struct _XKeyEvent {
  int type;
  // Fields specific to XKeyEvent.
  ...
} XKeyEvent;

typedef struct _XButtonEvent {
  int type;
  // Fields specific to XButtonEvent.
  ...
} XButtonEvent;

// etc.
...

typedef union _XEvent {
  int type;
  XKeyEvent xkey;
  XButtonEvent xbutton;
  // etc.
  ...
} XEvent;

This way, the type is always available regardless of the type of event and requires no additional storage. The same pattern can be observed in GTK+/GLib, Python’s C API, and many other object-oriented C APIs.

In basic_wm, the event handlers follow the naming convention of OnFoo(), where Foo is the type of the event, so it should be straightforward to figure out who does what.

What’s Next

We now have a basic skeleton for our window manager, and we can start filling in the meat - the event handlers. The million-dollar question is, what events does a window manager handle, and what should it do with them?

In the next installment in this series, we’ll answer that question by diving into the complex ways window managers, clients and the user interact with each other via X events. In the meantime, you’re more than welcome to check out the code for basic_wm on GitHub.

How X Window Managers Work, And How To Write One (Part I)

Window managers are one of the core components of the modern Linux/BSD desktop. It is not an exaggeration to say that they define to a large degree our day-to-day user experience, as they are responsible for deciding how individual windows look, move around, react to input, and organize themselves. Hence, almost 30 years since the first X window manager, we still argue over the merits of different window managers, and new window managers continue to reinvent how we interact with our digital world.

In this series of posts, I hope to demystify how window managers work, and how you might go about writing one yourself.

I will be quoting quite heavily from the seminal Xlib Programming Manual (3rd Ed, 1994) by Adrian Nye and published by O’Reilly. Despite its age, it remains amazingly relevant and is the best available introductory text to the internals of X, which has not changed over the past two decades as much as you’d think. Since you could buy the book plus shipping for less than the price of a cup of coffee, I strongly recommend it to anyone interested in learning more about X. In addition, its chapter 16 also covers the basics of window management.

The Role of an X Window Manager

Let’s start with an examination of the role of the window manager in a modern Linux/BSD desktop environment.

The Rights of X Window Managers

Unlike other windowing systems such as Microsoft Windows or Mac OS X, X does not dictate a window manager or how a window manager should behave. This decision is to thank for the wild diversity of X window managers we see today.

X is somewhat unusual in that it does not mandate a particular type of window manager. Its developers have tried to make X itself as free of window management or user interface policy as possible.
— Xlib Programming Manual §1.2.3

In fact, it does not even require a window manager to be present at all:

Unlike citizens, the window manager has rights but not responsibilities. Programs must be prepared to cooperate with any type of window manager or with none at all […].
— Xlib Programming Manual §1.2.3

This is in stark contrast to the integrative approach of other GUI systems. On Mac OS X and Unity, for example, an application could not possibly function without the window manager, as the latter is responsible for rendering a part of the application’s interface (e.g., menus).

The Responsibilities of X Window Managers

As you probably already know, X operates in a server-client model. An X server controls one or more physical display devices as well as input devices (mouse, keyboard, etc.). An application that wants to interact with these devices assumes the role of an X client. An X server and its clients may run on the same computer, in which case they communicate via domain sockets, or on different computers, in which case they communicate through TCP/IP.

A window manager is a regular X client. It doesn’t have any superuser privileges or keys to kernel backdoors; it is a normal user process that is allowed by the X server to call a set of special APIs. X ensures that no more than one window manager is running at any given point by denying a client access to these APIs if another client currently has access. The first client to attempt to access these APIs always succeeds.

A window manager communicates with the windows it manages through two X mechanisms: properties and events. We will discuss these in detail in later sections, but the takeaway is that the communication happens through the X server, not directly between the window manager and other applications.

This is illustrated by the following diagram:

Role of a Window Manager

How an X Window Manager Manages Windows

Let’s now dive into the details of how a window manager does its job.

The Window Hierarchy

When we think about modern GUIs, we usually use the term widgets or controls to refer to UI elements such as buttons, scrollbars, or text boxes, and the term windows to refer to a container for such widgets that has its own name and can be independently moved around, closed, resized, etc..

X, however, was designed to be as low-level as possible. The fundamental UI model that X provides, upon which UI frameworks such as GTK+ and Qt are built, is that of an hierarchy of rectangles. In X terminology, all top level windows and all UI elements within are windows. In other words, a window, is any rectangular area that is an unit of user interaction and/or graphic display.

Windows are organized into a tree hierarchy. At the root of the hierarchy is the root window, a virtual, invisible window that has the same size as the screen, and is always present. Top level windows are direct children of the root window. UI elements within a top level window are descendants of that window.

An
example X window

For example, consider the dialog box above from the Xfce desktop environment. The entire dialog is an X window. All UI elements in the dialog box - the magnifying glass icon, the text box, the green down arrow, the Close and Launch buttons, and the icons inside those buttons - are also X _window_s.

The whole dialog window is a child of the root window. The magnifying glass icon, the text box, and the Close and Launch buttons are children of the dialog window. The green down arrow is a child of the text box window, and the icons in the Close and Launch buttons are children of those buttons respectively.

An important thing to note about X windows is that a child window is clipped to the boundaries of its parent:

A child may be positioned partially or completely outside its parent window, but output to the child is displayed and input received only in the area where the child overlaps with the parent.
— Xlib Programming Manual §2.2.2

For example, if we increase the width of the text box in the dialog above by 2x without changing the size of the dialog box, the portion of the text box that extends outside of the dialog box will become invisible, and clicking on it will not send an event to the text box.

A window manager manages top level windows - that is, direct children of the root window.

Substructure Redirection

In the absence of a window manager, when an application wants to do something with a window - move it, resize it, show/hide it, etc. - its request is directly processed by the X server, and that’s the end of that. A window manager, however, needs to intercept these requests. For example, a window manager may need to know that a new top level window has been created and displayed, in order to draw window decorations (e.g. minimize / maximize / close buttons) around it. It may also need to know that an existing top level window has been resized, in order to redraw the window decorations to reflect the change.

The mechanism that allows a window manager to intercept such requests is called substructure redirection.

This is how substructure redirection works. Suppose we have a window W. If a program M registers for substructure redirection on W, a matching request to modify any direct child window of W will not be executed by the X server. Instead, the X server redirects this request to the program M, which can do whatever it wants with the request, including denying the request outright or granting the request with modifications. More formally,

The structure, as the term is used here, is the location, size, stacking order, border width, and mapping status of a window. The substructure is all these statistics about the children of a particular window. This is the complete set of information about screen layout that the window manager might need in order to implement its policy. Redirection means that an event is sent to the client selecting redirection (usually the window manager), and the original structure−changing request is not executed.
— Xlib Programming Manual §16.2

Note that only direct children of a window W is affected by substructure redirection on W, not any windows further down the hierarchy.

This gets interesting when we consider substructure redirection on the root window:

When the window manager selects SubstructureRedirectMask on the root window, an attempt by any other client to change the configuration of any child of the root window will fail. Instead an event describing the layout change request will be sent to the window manager. The window manager then reads the event and determines whether to honor the request, modify it, or deny it completely. If it decides to honor the request, it calls the routine that the client called that triggered the event with the same arguments. If it decides to modify the request, it calls the same routine but with modified arguments.
— Xlib Programming Manual §16.2

In other words, a window manager must register for substructure redirection on the root window, which causes all creation, destruction, reconfiguration etc. of top level windows - which are direct children of the root window - to be routed to the window manager. This is the magic hook into the X server that window managers rely on to do their job.

This relationship is shown in the following diagram:

Role of a Window Manager

Finally, the X server only allows one running program to register for substructure redirection on any given window at any given time. An attempt to register for substructure redirection on a window will fail if another X client has already done the same on the same window, and has not unregistered, disconnected from the X server, or crashed. Since all window managers must register for substructure redirection on the root window, this latter acts as a locking mechanism that prevents two or more window managers from running simultaneously on the same screen.

Reparenting

In the example dialog box above, we see a title bar with, for example, little buttons for minimizing, maximizing, and closing the window. These UI elements are not created by the application, but by the window manager, via a process known as reparenting or framing:

A window manager can decorate [top level] windows on the screen with titlebars and place little boxes on the titlebar with which the window can be moved or resized. This is only one possibility […].

To do this, the window manager creates a child of the root somewhat larger than the top level window of the application. Then it calls XReparentWindow(), specifying the top level window of the application as win and the new parent [window it just created] as parent. win and all its descendants will then be descendants of parent.

— Xlib Programming Manual §16.3

In other words, if we were to run an X application without a window manager present, the top level window of the application would be a direct child of the root window. With a window manager running, however, the top level window of the application may be reparented by the window manager; it becomes a child of a frame window which is created by the window manager, and which is itself a direct child of the root window. The window manager can add other UI elements inside this frame window alongside the application’s top level window as it sees fit.

Therefore, I’ve kind of lied to you several paragraphs ago: the dialog box shown earlier is really a child window within a frame window created by Xfce's window manager, Xfwm, along with other UI elements for window management:

Reparenting is what allows different window managers to draw different window decorations, and thereby achieve a consistent look-and-feel across windows. However, there are also window managers that do not reparent at all: these are called non-reparenting window managers. There are two reasons why a window manager would not want to reparent:

  1. If a window manager does not draw window decorations around top level windows , it obviously has no need to reparent them. Examples: xmonad, dwm.

  2. Compositing window managers do not always need to reparent windows; we will discuss why below. Example: Compiz. This is not true for all compositing window managers, however; for example, GNOME’s default window manager, Mutter, is a reparenting comopositing window manager.

Let’s now consider substructure redirection in the context of reparenting. When a top level window W is first shown (map'ped in X jargon), the window manager is notified because it has registered for substructure redirection on the root window, and a top level window is a direct child of the root window. It then creates a frame F and reparents W, so that W becomes a child of F, which itself is a child of the root window. But since now W is no longer a direct child of the root window, the window manager will no longer be able to intercept changes to W!

Therefore, a reparenting window manager must also subsequently register for substructure redirection on each frame window it creates.

Compositing

Compositing window managers are a relatively new development. Compositing support in X was added in late 2004, a full decade after the last edition of Xlib Programming Manual. The first compositing window managers, Xfwm and Compiz, launched in early 2005.

So, what exactly does a compositing window manager do?

In our discussion above on substructure redirection and reparenting, we saw how a window manager can respond to various requests for a top level window - to display/hide it (map/unmap in X jargon), to resize it, to move it, etc.. But we didn’t talk about how to deal with what’s inside the top level windows.

Indeed, from the perspective of the window manager, top level windows are black boxes; they each manage their own descendant windows (UI elements), perhaps through a framework such as GTK+ or Qt, and the window manager has no right to interfere there. The application that creates a top level window is responsible for rendering and handling events for any descendant windows (UI elements), and does so directly through X. This is shown in the first diagram above.

As the computing power of graphics hardware grew, so did people’s expectations from their window managers. With hardware acceleration, it became possible to build much more computationally intensive user interfaces, such as the (in)famous Desktop Cube in Compiz:

or the Shift Switcher:
Let’s take a moment to think about how we can implement an interface such as the Shift Switcher above. When the user triggers this interface, we need to:

  1. Render each top level window and all its descendant windows (UI elements) to an off-screen, in-memory buffer, instead of directly to the hardware.

  2. Transform (rotate, contort, etc.) each buffer according to our design.

  3. Composite the transformed buffers into a final buffer along with a background and any other floating UI elements else we need to display.

  4. Create an overlay window that covers the entire screen and hides all other windows.

  5. Render the final buffer into the overlay window.

There are a number of challenges:

  • We must be able to retrieve the displayed contents of top level windows. However, as we described earlier, top level windows render their contents directly through X, without going through the window manager.

  • We need to update our interface in real time as the contents of the top level windows change. However, top level windows do not notify window managers when their contents change, again because they render their contents directly through X.

  • A top level window A may overlap with another top level window B below, which means a portion of B isn’t currently displayed. Our interface, however, must capture the full rendering of A and B, regardless of overlapping regions.

  • All this complex compositing process is computationally intensive and requires hardware acceleration to function adequately.

It is clear that none of this would be possible without some heavy cooperation from the X server. Enter the Composite extension:

Many user interface operations would benefit from having pixel contents of window hierarchies available without respect to sibling and antecedent clipping. In addition, placing control over the composition of these pixel contents into a final screen image in an external application will enable a flexible system for dynamic application content presentation.
— X Composite Extension

The Composite extension provides a mechanism to request the X server not to render a specific window and its descendants directly to hardware, but to a special buffer maintained by the X server, and do so without the normal clipping and overlap computations. This buffer can then be read and used by the client that made the request.

That’s exactly what a compositing window manager does: it will ask X to render each top level window to an off-screen, in-memory buffer and composite the results into an overlay window itself. And it needs to do this not just for fancy task switcher interfaces as in our example, but also to achieve effects like translucency, animations, soft shadows, and the like.

This is illustrated in the following diagram:

Let’s end this section by considering whether a compositing window manager should reparent top level windows.

Since a compositing window manager already knows the size and position of all top level windows, it’s easy for it to just draw window decorations during compositing into the overlay window using graphics operations (e.g. OpenGL), without ever creating an actual X frame window and reparenting. Some compositing window managers do operate this way.

On the other hand, a window manager may need to support both a compositing and a non-compositing mode, for compatibility with older or unsupported graphics hardware. In this case, it needs to implement reparenting and frame windows for non-compositing mode anyway, so additionally implementing drawing window decorations using graphics operations becomes redundant. This is why may other compositing window managers still choose to reparent.

Ready For Some Code?

If you’ve read everything up to this point, you’re probably holding back the urge to cry out "Enough talk - show me some code!" I don’t blame you.

In the next installment in this series, I will walk you through a basic implementation of a reparenting, non-compositing window manager. Impatient? Check out the code on GitHub!