2014-07-05

Introducing Plasma Fence

Two weeks ago I got the delightful news that XCOM: Enemy Unknown was made available on Linux. I enjoyed playing it last year on Windows and was ready to play it again on Linux.

XCOM

Unfortunately, after I downloaded it and started playing, I faced a crash. I started again, trying to avoid what I though would lead to the crash, but another situation led to another crash. That was a disappointment. Later, when I reported my problem, I learned that the free driver for my graphics card, a Radeon HD 4870, is not supported by the game developers. Ok, let's try to install the closed source driver on AMD's web site. Later, after facing more problem, I learned that the Radeon HD 4870 is not supported anymore by AMD! So much time wasted, and so much frustration...

Anyway, when the game is not crashing with the open source drivers, it runs very well, fast enough and without major graphics glitch, so I wanted to know more about the crash. Maybe it was something not that hard to fix?

The first thing was to get a callstack. The crash was occurring in the libc, but the bug was not actually in the libc. The meta data of the heap allocator (for malloc, free...) in the libc was damaged. So the question was, who is damaging the heap?

#0  malloc_consolidate (av=av@entry=0x7fbf78000020) at malloc.c:4151
#1  0x00007fbf920863ad in _int_free (av=0x7fbf78000020, p=<optimized out>, have_lock=0) at malloc.c:4057
#2  0x0000000002accbd7 in nv::mem::free(void const*) ()
#3  0x0000000002760ed6 in operator delete(void*) ()
#4  0x000000000166b743 in FSynchronizedActorVisibilityHistory::SetStates(FActorVisibilityHistoryInterface*) ()
#5  0x0000000001a8ddcc in FSceneRenderer::RenderFinish(unsigned int) ()
#6  0x0000000001aa3685 in FSceneRenderer::Render() ()
#7 ...

Chasing a bug in a big application like a AAA game is challenging, even more when you don't have the source, but I thought I could at least try some debugging tools available on Linux.

I first tried with Valgrind, which helps finding memory leaks and illegal accesses. Unfortunately it made the game running so slow that I could not get it to start at all.

Then I thought that if I could protect some part of the heap meta data, I may find the offending code more quickly. For that, I tried eFence, a library that put fences around memory block allocated with malloc and similar functions. It uses a technique I already experimented on Windows, which consists in allocating a extra page for each allocation that will have no permission at all: no thread can read or write it. That way, if a crazy pointer starts reading or writing to it, you get immediately notified with a segmentation fault signal/exception.

Unfortunately I discovered that the eFence source code I've been able to found was missing several new allocation functions introduced by the libc, like posix_memalign() that allows allocating memory with a specific alignment.

I went on a Web search journey and finally found a fork of eFence called Duma, sporting the missing functions. I started using it but then I found after one day of decompiling a piece of code I thought was faulty that the actual problem came from a missing overridden function: malloc_usable_size().

I eventually implemented the missing function and restarted XCOM but after 10 minutes of waiting and no opened Windows I gave up and decided to write my own fence library from scratch.

Because the "e" in eFence means "electric", I could not use that word to name my library. I thought about "fire", but it reminds me too much the concept of "firewall". The only thing left was "plasma", so I decided to name that library "Plasma Fence" :)

It is written in C++ and uses procedural programming, which makes it very close to C. I also experimented with a different case to name functions and variable. I am used to PascalCase for classes and functions, and lower camelCase for variables. On Linux, I found that snake_case is very popular. Because Plasma Fence is a Linux only library, I decided to check how I felt by using that way of writing symbols.

I decided to publish that library hoping it will help people facing the same situation than me. Plasma Fence offers all the needed malloc functions I could think of, and I also wanted to make it fast enough to debug a huge program like XCOM. Most of the time a program memory allocations are very small so I optimized that case by using a specific list of already fenced buffers that can be quickly pulled upon application request. With that strategy, I was able to run big programs like XCOM of course but also Firefox, the Gimp and several medium sized applications (VLC, Audacious...) without visible performance penalty.

With XCOM however I cannot reach the main menu because Plasma Fence detects an illegal access very soon when the game starts. I opened a bug on the Mesa bug tracker and I hope an expert will find out what is happening. Meanwhile I would like to do more but I still need to understand how to build a debug official library on Ubuntu and find out how to activate it for the whole system.