Installing a Xeon Phi coprocessor on Debian Squeeze

Written by erik - 29 july 2013

Today I installed a Xeon Phi coprocessor in a Debian box. The software is only supported for Red Hat systems so I was a little bit afraid. Fortunately it was easy. I took most of my information from the readme files of mpssd and from that guy. Thanks to him because he did most of the job.

Ok, let's get started. Hardware wise, you need a 16x PCI-express 3.0 slot and you need the BIOS to be configured to be compatible with devices with more than 4GB of ressource. There was a NVIDIA K20 with 6GB of memory plugged on the machine I used; but still the BIOS complained. A simple BIOS configuration allowed the machine boot. (Check the BIOS configuration under PCI-express.)

From here I did everything as root. Get the MPSS archive from Intel's website. (Mine was called mpss_gold_update_3-2.1.6720-15-rhel-6.3.tar). Untar it with tar xvf mpss_gold_update_*.tar. You'll get a bunch of RPM files. Install alien if you do not already have it with aptitude install alien. Convert all the RPM files except the -kmod- one into DEB files with for file in `ls *.rpm | grep -v kmod` ; do alien --scripts $file; done. Install them all with for file in `ls *.deb | grep -v kmod`;do dpkg --install $file; done.

Up to now, we installed all the files you need beside the kernel module. The kernel module that is provided is only for Red Hat system (with a kernel 2.6.32). My machine had a kernel 2.6.39, so I needed to do a little bit more work. Fortunately, the source for the kernel are available in the src/ directory. Convert that RPM file into a tgz with alien: alien -t intel-mic-kmod-*.src.rpm. When you uncompress it with tar zxvf intel-mic-kmod-*tgz you get a .spec file you do not care about and a tar.bz2 file that contain the kernel module source code. Uncompress it with tar jxvf intel-mic-kmod-*.tar.bz2. Compile it with make and install it with make install. You should now have a kernel module in /lib/modules/`uname -a`/extra/mic.ko. Somehow, the installation script does not run depmod to declare the module so you need to run it manually with depmod.

To use a Xeon Phi you also need a service running on the host. The files are already installed, but there is no startup script for the service. Go to /etc/init.d. We will use the skeleton script as a base. cp skeleton mpssd. You need to edit that file and perform the following modifications. Change the 3rd line to Provides: mpssd. Change line 22 to DESC="MIC service deamon". Change line 23 to NAME=mpssd. Comment line 25, you don't need arguments. At the beginning of function do_start, add the following line modprobe mic; [ -d "/var/lock/subsys" ] || mkdir /var/lock/subsys. The first part makes sure the mic kernel module is loaded. That should not be necessary because a udev rule was added earlier, but somehow I needed it. The second part ensures that the directory used by mpssd to store its lock file is properly created. Save and quit. Add the proper permission to the startup script with chmod +x mpssd. And at that point the service can be started with service mpssd start. Starting and stopping the service takes time (about a minute).

Now you need to be able to use the card. Typically one logs on the card using ssh. The card automatically takes the private IP address but it is not routable automatically from linux. So you need to add a network interface in /etc/network/interfaces:

iface mic0 inet static

Load the interface with ifup mic0. To log on the card, you will need to ssh to the card using key-based autentification. If you have local users to the machine (I do not think that NIS based ones are supported with that technique), you should make sure that you have an ssh key on the ones that want to use the MIC card. If not, just get them to run ssh-keygen. After that, you need to load the ssh keys on the MIC card but mpssd need to be off to do that. Just running service stop mpssd; micctrl --resetconfig; service start mpssd should do the trick. I am not sure at which point, but an entry to /etc/hosts was made. So ssh mic0 should just work; otherwise ssh

From there you are pretty much all set as you have a working installation of your Xeon Phi. You will certainly need ICC to do anything useful. Some people mount the host filesystem on the device, which can be pretty useful. I won't cover these because they are not really Debian specific and other people will certainly explain these better than me. NFS mounting is explained in MPSS readme file but I haven't tested it.

Happy hacking

EDIT on 2014/01/28: it seems that to use the offload mode, one need to have proper permission on /dev/mic/*. You can edit the udev rule in /etc/udev/rules.d/50-udev-scif.rules.

Classified in : Homepage, geek, programming, en - Tags : xeonphi, linux - 7 comments

Security questions

Written by erik - 30 august 2012

Today I got a call because I applied for a credit card a few days ago. A part of the discussion went like this:

  • In order to verify your identity, I am going to ask you questions ONLY YOU knows the answer of. Are you ready?
  • Well, how are you going to know whether I lie or not?
  • I will know! I have the answers in your file.
  • So you know the answers too, I am not the only one knowing them.
  • long silence

These so called security questions makes no sense at all. Anybody that knows me would be able to answer them (my brother, my boss). Anybody that has access to my file will be able to answer them (my banker, my insurer). Anybody that asks me security questions will be able to answer them (phone companies, thousands of websites). Why are we still relying on such obvious questions?

Classified in : Homepage, 3617mylife, en - Tags : none - 2 comments

The best AI for cassis

Written by erik - 16 august 2012

Hi folks,

The AI I had written for Cassis were pretty bad. They were basically either playing randomly or just trying not to die. I was debatting on how to build the best AI for that game. I gave myself the goal of putting that AI running on a phone. There are only 6 points and 15 possible move possible at the first round. Here I present the simple techniques I used to reach this goal. I wrote everything with on my laptop first and then ported it to my android phone. I published version 0.91 of cassis for android.

The basics.

The AI is working on a brute force basis. Basically, it considers all the possible moves up to the end of the game and play the best. To see if this is feasible, I started by trying to decide whether player 1 wins or loses. So I wrote the simplest code which recursively expand all moves:

bool player1wins(Game g)
  if (g.gameover()) return g.winner() == PLAYER1;
  if (g.whoseturn() == PLAYER1) {
    for (edge e = 0; e != 15; ++e) {
      if (g.edge(e) != NOT_PLAYED_YET) continue;
      Game g2 = g;;
      if (player1wins(g2)) return true;
    return false; //could not find a winning move. That means player 1 lost.
    //pretty much the same thing with player2

The code is pretty straight forward, if it is player 1's turn, it tries all the possible move and stops if it finds a winning one. The code for player 2 is the same as for player 1 except it stops if it find a losing move for player 1. I let it run for about 15 minutes on my laptop and it did not complete. Clearly I was doing something wrong. Let's do a little bit of maths. At the first iteration there are 15 possible moves, at the second one 14 possible move, then 13, ... Overall, there are 15! case to examine. That's about 1.3 x 10^12 combination. If the code can examine 1 combination per clock cycle (which it does not) and the processor runs at 1GHz, it will still take 1300 seconds (20 minutes). Provided it definitely need more that 1 clock cycle per combination, that is definitely not a working approach. It will never run fine on a phone.

Hashing based.

The main problem with that approach is that it processes the same combination multiple times. If the first moves are Player1/Edge1, Player2/Edge2, Player1/Edge3, the code will compute these combination even if it processed Player1/Edge3, Player2/Edge2, Player1/Edge1 before, despite they are actually the very same board. If the former is winning then the latter is winning too and vice versa (same goes with losing). If we can remember every board we processed so far, we will process less of them. Let's do some math first here, for each edge of the graph, it can be either played by 1, by 2 or not played yet. So each edge can have 3 different states, and there are 15 edges in the game. So there are at most 3^15 possible boards, that about 14 million boards. (Actually there are less possible board since some board are not reachable because a player lost first and the game stopped. but let's use that number as a worst case scenario.) That is not too many board so that seems like a reasonnable thing to try.

To be able to leverage that number, I need a simple way of naming a board. Otherwise, I would need to go through the list of all the board seen so far to decide whether I need to process it of not. That would give a quadratic algorithm and 14 million squared does not seem reasonnable anymore. Instead I can just store the boards in a table and annotate them winning or losing. If I have a simple way of building an offset from a board, then, I can just use that. Fortunately, the state of a board can be encoded with a number of 15 digit in base 3. Each digit represents an edge and is worth 0 for unplayed, 1 for played by player 1, and 2 for played by player 2. Actually, I would rather not allocate a table of size 14 millions (that won't work well on a phone), so I choose to use an associative data structure instead such as a hash map (actually, the code use std::map to be precise, so that's a red-black tree. but a hashmap would work the same (probably better actually)). And since I am going with hashing, I can directly make one hash map for each round since if two boards are in a different round they are clearly different. The code now looks like that:

bool player1wins(Game g)
  if (g.gameover()) return g.winner() == PLAYER1;

  if (HASH[g.round()].exist(g.hash())) return HASH[g.round()][g.hash()];
  if (g.whoseturn() == PLAYER1) {
    for (edge e = 0; e != 15; ++e) {
      if (g.edge(e) != NOT_PLAYED_YET) continue;
      Game g2 = g;;
      if (player1wins(g2)) {
        HASH[g.round()][g.hash()] = true;
        return true;
    HASH[g.round()][g.hash()] = false;
    return false; //could not find a winning move. That means player 1 lost.
    //pretty much the same thing with player2

With this approach, it takes about 1.65 seconds on my laptop to decide whether player 1 wins or not. And the size of the hashes are distributed like this:

winning[0].size() == 1
winning[1].size() == 15
winning[2].size() == 15
winning[3].size() == 104
winning[4].size() == 217
winning[5].size() == 969
winning[6].size() == 2693
winning[7].size() == 6896
winning[8].size() == 14560
winning[9].size() == 21724
winning[10].size() == 27754
winning[11].size() == 21757
winning[12].size() == 12826
winning[13].size() == 2433
winning[14].size() == 132

That represents 112096 state to store. Note that it is much lower than the 3^15 that were predicted, because some of the exploration stops early once player find a winning move. Yet 112K states are a lot to store, and generating them still takes 1.65 seconds on a laptop. A phone will be slower.


So what is the problem with that version? Well, we are not exploiting any kind of symetry. Look at the number of state evaluated at round 1: "winning[1].size() == 15". There are 15 of them. It means that the algorithm is trying all the first move possible. But all these moves are actually the same move up to a permutation of the vertices. In that case the board are said to be isomorph. The idea is then to store only one permutation for each board. Reaching one permutation per board might be a little complicated to do, so I will settle for storing less permutation. One thing that is constant for all the permutation of a board is the "degree" of the vertices. What I mean by degree is the number of edges connected to a given vertex. I introduce the notion of normal form of a board, which is a permutation of the board where the vertices of high degree are first. For the purpose of computing the hash value of a board, I first normalize the board and then compute its hash value. That way all the move at level 1 (and many more) are detected as equivalent. When running that variant of the code now takes 0.084939 seconds, with a total of 3173 different boards traversed and the distribution of values is now:

winning[0].size() == 1
winning[1].size() == 1
winning[2].size() == 1
winning[3].size() == 5
winning[4].size() == 7
winning[5].size() == 39
winning[6].size() == 70
winning[7].size() == 194
winning[8].size() == 347
winning[9].size() == 557
winning[10].size() == 703
winning[11].size() == 660
winning[12].size() == 442
winning[13].size() == 136
winning[14].size() == 10

Actually, we can prune a little more by differenciating for each vertex the number of incident edges from player 1 and from player 2. It reduces the time to 0.052072 seconds, 1568 different boards and the distribution is:

winning[0].size() == 1
winning[1].size() == 1
winning[2].size() == 1
winning[3].size() == 5
winning[4].size() == 7
winning[5].size() == 28
winning[6].size() == 48
winning[7].size() == 119
winning[8].size() == 207
winning[9].size() == 313
winning[10].size() == 353
winning[11].size() == 278
winning[12].size() == 168
winning[13].size() == 37
winning[14].size() == 2

Reconstructing the move.

Alright, that is enough for me, less than 2K states and about 50 milliseconds in total on my laptop. Though, it only gives us which board is winning and which one is losing. We need to know what to play. So when a move of player 1 leads to a win (for player 1), or when a move of player 2 leads to a lose (for player 1), we need to record the move. One of the proble is that we no longer evaluate every single reachable state. Only the ones that were not considered similar. So we need to compute the permutation to the normal form, and permute the move properly. However, this is not enough because two state that have the same normal form might not have derived state that have the same normal form. (It is not easy to find an example, but it has something to do with the sorting operation and the fact that two isomorph board can have the same representative.)

What else?

I guess there are more things to do, but it is fast on my phone right now. The code on the phone is actually using a stupid data structure in place of a std::map or an hash map, just a simple array. (There is no STL that works on android and supports exception.) Most of the performance to gain probably come from language details now and that's not really fun to do. So I'll stop my optimization there. There are more things to do on cassis though, probably adding a simpler difficulty mode, adding a proper icon for the game and adding a 2 player mode.

In the mean time, can you beat my new IA? Here is a link to the apk file for android. The code has been updated on github.

Classified in : Homepage, geek, programming, en - Tags : none - no comments

Releasing Cassis v0.9

Written by erik - 16 july 2012

Hello everybody,

My first post in english will be dedicated to my first release of Cassis. Cassis is a two-player game that is based on graph theory. Six points are displayed on the screen. Each player at his/her turn adds an edge between two of the points. The first player who makes a triangle with his/her own color loses.

I am releasing Cassis for the Android platform (Android 2.3+ required). You will find here a link to the apk file to install. Cassis is released under the GPLv3 license. The source code (C++) is available on github. In the source code, you will also find a version for Linux that I used as a prototype. I can post a binary for debian squeeze if needed.

The Android version is fully written in C++. There is no java code involved. This is made possible by the use of the NDK and the NativeActivity interface. I actually developped Cassis as a first look into C++ programming for Android. You will most likely not be able to compile the apk by yourself since it depends on two external libraries that are not included: android-cairo and libpng-android. I will post later some more development informations.

Cassis is in version 0.9 because there are many things I still want to do:

  • Separate cairo and libpng as a shared library (does any body knows how to do that?)
  • A two-player mode
  • Two more level of difficulty
  • A proper icon

Have fun.

Screenshot: screenshot.

Classified in : Homepage, geek, programming, en - Tags : none - no comments

Rss feed of the category




Last articles

Last comments