Development Blog

Where we pretend to know how to code.


Replay: It's here, but what took so long?

Published: 2024-09-04

Author: Teddi

After many years of complaints, begging, prodding, and poking, we finally have replays on [BB]! 🎉

However, the question remains: what took so long? What took so long image

But wait a minute, other servers (including in CS:S and CS:GO!) have had replays for years!

If we look at how most other servers handle replay systems (often called a WR Bot), they usually just record runs similarly to how we do, but only keep the most recent data available. On top of that, they don’t offer much customization with the data they collect. For games like CS:S or CS:GO this approach works fine. But for Garry’s Mod, where we have a lot more control and flexibility why not aim for something better?

This became my wishlist:

  • Store as many replays as possible. I want every All-Time on the leaderboard logged and recorded. If it isn’t, how is a run valid?
  • I want to be able to store historical records of replays. It would be neat if we could show players a history of their runs, or at the very least how they’ve improved to become an All-Time great.
  • I want to offer more control over the camera. If players want to analyse something frame-by-frame, this should be possible.
  • I want to offer more control over the playback. If players want to slow down the replay, this should be possible. Speed it up? Lets go sonic.
  • It should be possible to jump to any part of the run with an instant click. None of this having to wait.
  • Players should be able to independently watch a replay of a run. No sharing bots, no needing bots. Bots shouldn’t even be part of the equation. We have control of the player camera.

That being said, when you try to create a more expansive system you’re bound to run into the same challenges others have faced. There’s a reason why many servers only keep the latest record or limit how many people can record or why they use opt-in timers. It all comes down to resource usage - it can become a major issue if not managed carefully.

Storage Woes

The biggest issue for years was our disk drives. Back when we were in Texas we were working with 2x 2TB HDDs - yep, not SSDs, just old-fashioned hard drives. If I even tried to open a file larger than 50MB while players were online I’d instantly hear complaints like, “Omg, lag?!“. So while we technically had the storage space, we didn’t have the speed to match. Writing multiple files to disk at the same time could’ve easily caused an I/O bottleneck, especially with the OS and other services running on the server.

Since then, we’ve upgraded to a new server chassis with SSDs! The speed boost is great, but we’ve traded storage space for it - now we’re down to just 250GB. Sure, we can write files without causing lockups, but we’ll burn through that storage quickly making it a premium resource. So what other options do we have?

Cloud
?

For years, [BB] has experimented with different cloud providers in various ways. At one point, we even hosted our FastDL on AWS, which was super fast but also a quick way to rack up unmanageable costs. Given the volume of replays we want to store and distribute, relying on a cloud solution like AWS just isn’t sustainable for the long haul.

These days, we take advantage of the Bandwidth Alliance between Cloudflare and Backblaze to create a CDN-like setup for FastDL. I’ve looked into using this system to store our replay data, and overall, it works pretty well. However, there can be a slight ‘lag’ when loading a file that’s not cached yet. Another issue is that Backblaze sometimes goes into prolonged maintenance, where you can read data but can’t write new data. This is fine for FastDL but not ideal for constantly uploading replay files. Cost-wise, though, it’s great: $6 per TB and free egress bandwidth when routed through Cloudflare! The only drawback is the reliability.

A Competitor Emerges

Back in 2021, Cloudflare announced R2, their solution to Amazon’s S3, which promises $0 egress fees (!!!) with 99.999999999% (eleven 9’s) reliability at a cost of $0.015 per GB, or around $15 a month for a TB of storage. The only issue? R2 wasn’t widely available yet. It wouldn’t be until the back-end of 2022 that it became generally available although it was still missing some useful features.

To recap so far, our options for developing a replay system were:

  • Use our own server storage, but we’d run out of space quickly.
  • Use a reliable cloud storage provider (AWS), but we’d run out of money quickly.
  • Copy existing systems, which would probably never advance beyond “good enough.”
  • Wait for Cloudflare R2 to become available and hope it delivers on its promises.

So, I sat on my hands with option #4. If R2 didn’t turn out to be good enough, then option #3 would be the fallback. The original plan was to start testing R2 and see how it performed right during Q3 2023, but other things got in the way and that work got pushed back to Summer 2024.

The Final Push

About two weeks before Replays were set to launch, I scrapped the entire Replay system I’d been working on for years. It had become fragmented over the past 3-5 years with different ideas and goals. I wasn’t happy with it and so I decided to start fresh. If we were going to make this work, we’d do it right with as few preconceived notions as possible. After about an hour of work, I had a prototype that was already better than the old system. A bit more effort and I had something I was actually happy with, even if it was just the recording side of things.

Secondary concerns

Another concern I had during this time was how to get the data to the player. When Replay development first began, the current [BB] API didn’t exist. While we likely would have built something to handle this, we would have been limited by srcds’ internal network speeds, which are around 20kbit/s. Additionally, we can only send 64KB of data through the net system in one go, meaning we’d have to be careful about how much data we send at once. We’d need to split it up and stream it properly to avoid issues like buffer overflows or net stream lockups.

Having the web API really solves this issue. You’re no longer capped by the internal network speeds - just the API speed and your local internet speed. From the game server’s perspective, that data doesn’t even exist! Since it completely bypasses srcds, stability and performance shouldn’t be affected any more than they already are with replay recording.

Ultimately

The rest is history, but I hope this gives you some insight into why it took so long to get Replays out. It wasn’t for lack of effort, but primarily a lack of a solid storage solution. I hope you enjoy the new system and I can’t wait to see all the replays set with new All-Times!


Historical Posts