Replay: It's here, but what took so long?
Published: 2024-09-04
Author: Teddi
After many years of complaints, begging, prodding, and poking, we finally have replays on [BB]! š
However, the question remains: what took so long?
But wait a minute, other servers (including in CS:S and CS:GO!) have had replays for years!
If we look at how most other servers handle replay systems (often called a WR Bot), they usually just record runs similarly to how we do, but only keep the most recent data available. On top of that, they donāt offer much customization with the data they collect. For games like CS:S or CS:GO this approach works fine. But for Garryās Mod, where we have a lot more control and flexibility why not aim for something better?
This became my wishlist:
- Store as many replays as possible. I want every All-Time on the leaderboard logged and recorded. If it isnāt, how is a run valid?
- I want to be able to store historical records of replays. It would be neat if we could show players a history of their runs, or at the very least how theyāve improved to become an All-Time great.
- I want to offer more control over the camera. If players want to analyse something frame-by-frame, this should be possible.
- I want to offer more control over the playback. If players want to slow down the replay, this should be possible. Speed it up? Lets go sonic.
- It should be possible to jump to any part of the run with an instant click. None of this having to wait.
- Players should be able to independently watch a replay of a run. No sharing bots, no needing bots. Bots shouldnāt even be part of the equation. We have control of the player camera.
That being said, when you try to create a more expansive system youāre bound to run into the same challenges others have faced. Thereās a reason why many servers only keep the latest record or limit how many people can record or why they use opt-in timers. It all comes down to resource usage - it can become a major issue if not managed carefully.
Storage Woes
The biggest issue for years was our disk drives. Back when we were in Texas we were working with 2x 2TB HDDs - yep, not SSDs, just old-fashioned hard drives. If I even tried to open a file larger than 50MB while players were online Iād instantly hear complaints like, āOmg, lag?!ā. So while we technically had the storage space, we didnāt have the speed to match. Writing multiple files to disk at the same time couldāve easily caused an I/O bottleneck, especially with the OS and other services running on the server.
Since then, weāve upgraded to a new server chassis with SSDs! The speed boost is great, but weāve traded storage space for it - now weāre down to just 250GB. Sure, we can write files without causing lockups, but weāll burn through that storage quickly making it a premium resource. So what other options do we have?
Cloudā¦?
For years, [BB] has experimented with different cloud providers in various ways. At one point, we even hosted our FastDL on AWS, which was super fast but also a quick way to rack up unmanageable costs. Given the volume of replays we want to store and distribute, relying on a cloud solution like AWS just isnāt sustainable for the long haul.
These days, we take advantage of the Bandwidth Alliance between Cloudflare and Backblaze to create a CDN-like setup for FastDL. Iāve looked into using this system to store our replay data, and overall, it works pretty well. However, there can be a slight ālagā when loading a file thatās not cached yet. Another issue is that Backblaze sometimes goes into prolonged maintenance, where you can read data but canāt write new data. This is fine for FastDL but not ideal for constantly uploading replay files. Cost-wise, though, itās great: $6 per TB and free egress bandwidth when routed through Cloudflare! The only drawback is the reliability.
A Competitor Emerges
Back in 2021, Cloudflare announced R2, their solution to Amazonās S3, which promises $0 egress fees (!!!) with 99.999999999%
(eleven 9ās) reliability at a cost of $0.015 per GB, or around $15 a month for a TB of storage. The only issue? R2 wasnāt widely available yet. It wouldnāt be until the back-end of 2022 that it became generally available although it was still missing some useful features.
To recap so far, our options for developing a replay system were:
- Use our own server storage, but weād run out of space quickly.
- Use a reliable cloud storage provider (AWS), but weād run out of money quickly.
- Copy existing systems, which would probably never advance beyond āgood enough.ā
- Wait for Cloudflare R2 to become available and hope it delivers on its promises.
So, I sat on my hands with option #4. If R2 didnāt turn out to be good enough, then option #3 would be the fallback. The original plan was to start testing R2 and see how it performed right during Q3 2023, but other things got in the way and that work got pushed back to Summer 2024.
The Final Push
About two weeks before Replays were set to launch, I scrapped the entire Replay system Iād been working on for years. It had become fragmented over the past 3-5 years with different ideas and goals. I wasnāt happy with it and so I decided to start fresh. If we were going to make this work, weād do it right with as few preconceived notions as possible. After about an hour of work, I had a prototype that was already better than the old system. A bit more effort and I had something I was actually happy with, even if it was just the recording side of things.
Secondary concerns
Another concern I had during this time was how to get the data to the player. When Replay development first began, the current [BB] API didnāt exist. While we likely would have built something to handle this, we would have been limited by srcdsā internal network speeds, which are around 20kbit/s. Additionally, we can only send 64KB of data through the net system in one go, meaning weād have to be careful about how much data we send at once. Weād need to split it up and stream it properly to avoid issues like buffer overflows or net stream lockups.
Having the web API really solves this issue. Youāre no longer capped by the internal network speeds - just the API speed and your local internet speed. From the game serverās perspective, that data doesnāt even exist! Since it completely bypasses srcds, stability and performance shouldnāt be affected any more than they already are with replay recording.
Ultimately
The rest is history, but I hope this gives you some insight into why it took so long to get Replays out. It wasnāt for lack of effort, but primarily a lack of a solid storage solution. I hope you enjoy the new system and I canāt wait to see all the replays set with new All-Times!