VMPI - Part 1
Published: 2011-06-19
Author: Santamon
VMPI
What is VMPI and why should you care?
VMPI stands for ‘Valve Message Passing Interface’ and is effectively a way of spreading the compile load of a map across several computers – a form of ‘cloud computing’ so to speak. You get a ‘master’ computer which allocates and handles the work whilst the ‘worker’ computers do the actual processing work. So basically, it means I can use your spare CPU cycles to compile my map no matter where you are, great! Sounds like it will solve all our (my) problems!..
Why should you care? Basically, because you’ll get RP faster and a more polished map. If it takes 10 hours + to compile my map each time, then the development cycle is going to go along like a snail, so faster compile times, mean a better, more polished map!
The RP map is something that has never been done (As far as I’m aware) in source before. An open city instead of the normal RP maps which are just a few streets with a big ‘city’ skybox around you. To get this to work requires some ungodly visibility optimisation.
Will it solve your problem of long compile times?
Short answer, probably not
Long answer,
It seems the VMPI is very temperamental, but let’s start from the beginning.
It was a stormy Saturday afternoon, whilst rain poured down outside, and a strange man was seen in the flickering light of a lamppost …Sorry….I’ll get on with it. Basically, Teddi and I were discussing the RP map Saturday afternoon following the previous night’s test on it. Basically, we noticed that performance started to drop – expected to be fair as the map expands. So I started to play around with optimisation techniques, and I came up with one that I thought would work, great! Problem solved!.. Hmm
So I went to compile the map, 2 hours passed and VVis still running and only on …6 (For info, with VVis, that does NOT mean it is 60%, normal VVis doesn’t scale like that). Right, I wasn’t waiting about 10 hours, but if need be, I will. A solution I heard about was VMPI a while back, so with Teddi, we gave it a shot, using my computer as the ‘master compiler’ and his computer and Maruuk server dedicating 4 threads each (A single Quad Core Processor). So this would mean, 12 threads compiling the map, so it should compile in 1/3 of the time – in theory!
Unfortunately, we couldn’t get it to work. After a hell of a lot of playing around on my local LAN, (Thank god for Wireshark) I managed to get my I5 processor and Quad Core processor on my downstairs computer to work and were compiling the map in tandem. In theory, if it can work across a LAN it can work on the internet surely?
So Teddi and I tested it, perfect, it worked now! We had my computer, my downstairs computer, Maruuk server and Teddi’s computer all donating CPU cycles, 16 threads of processing power! It should compile in no time surely!
After 2 hours, very little, to no progress. We even had people from BB donating their CPU cycles – we had about 26 threads processing visibility calculations, and yet, it was still going slow. After just over or so 2 hours, we abandoned it, suspecting that my slow upstream ( Curse you ADSL!) was the cause.
Sunday broke, and Teddi and I set up Maruuk server with its blistering 1Gbit ( 1000base-SX assuming) to be the master server and rigged our computers up to process the map – again though, after 2 hours little progress and even with people from BB again kindly donating their spare CPU cycles, no luck! Okay, right, we need a benchmark, after all, this map we were compiling had never been fully compiled so we didn’t know how long to expect a single processor to take – you know, how long is a piece of string….etc.
So we compiled an older version of the map, knowing that VVis on this version took 1 hours, 48 minutes. Kicked it off and at first, progress was very slow, but once more threads joined, progress sky rockets but it still took nearly 3 hours to compile! Way to long again, was it because we didn’t have many people at the start? Because of the networking latency involved? Because of the way VMPI allocates work? I suspect all three are a factor.
The first issues is probably caused because as Teddi spams and spams more event notification, more people might take the time to download the batch file, move it to a folder and double click it.
The second one is because pretty much everyone (except the server) will be running on some variant of ADSL, ADSL by nature is Asymmetric (Asymmetric Digital Subscriber line) so your upstream is much slower (Normally about 448Kbps (About 50 – 60 KBps) off the top of my head) than your download speed. This means you can’t get data back to ‘Master’ server very fast, this means you get spells whereby your computer sits there, just uploading the data and not crunching numbers!
This leads onto my third point, in that VMPI seems to allocate you work, allows you to crunch it, and then you spend ages uploading that data to the master server – meaning you get massive periods where everyone is idle and uploading their work, whereas compiling locally, you don’t get this.
It seems like then; we get a very inefficient use of everyone’s CPU processing power, mostly as a result of networking issues. So basically, to get over the network inefficiencies, we need a lot (And I mean a lot) of CPU’s to overcome that.
The final test is going to see if linking two computers on my LAN speed up the compile time, to overcome these network issues, but this won’t be done until tomorrow.
In this blog post, I’ve only scratched the surface about what we’ve done in the past few days, but hopefully it will give you vague idea as to why we’ve been asking you to partake in these little ‘experiments’ and use your CPU cycles. And for that everyone, I thank you for putting up with it!
You may remember the map we spent 3 hours compiling? Well, as thanks for all your help, here is a little screenshot of what we got at the end (after I ran Vrad on it as well):
Disclaimer: I wrote this quickly, rather late at night, so there probably is a lot of spelling and grammatical mistakes.