Mistakes Part Two – It's a Blogging

I can’t believe I’m posting about this again. After a few months of activity, I took Vore-Bot offline. Not quite a year ago, I discovered there was an interest in it, and I’ve been maintaining/improving it ever since. This article is a collection of things I’ve learned throughout the experience of a personal project of mine going “viral” for the first time.

Some background

I designed Vore-bot as a joke quite some time ago, for use on a Discord server I frequented. Sometime after killing the bot, someone on that server sent me [this] tumblr post. As it turns out, someone liked the concept enough to clone my bot and make it popular: by the time I checked my github, I had multiple people asking why the bot was down, and discovered it had been added to about ~200 servers. Since then I’ve had to make a number of changes, and the bot currently runs on somewhere arond 3500 servers.

Guess the bot need to always be on

Originally, I had run this bot on Heroku. While I’m sure that’d work for a lot of bots, I pretty quickly discovered that my bot needed to be able to properly save its state if it was ever going to be useful for any period of time, which seemed pretty incompatible with Heroku’s spinning up and shutting down of Dynos. I ended up rolling out a small EC2 instance, which now handles running the bot and everything it needs, which has been a fun little excursion into learning how to manage those kinds of things.

Databases are cool

By far the most significant (and most recent) change that had to be made was switching to an actual database of some sort, instead of the makeshift pseudo-csv I had been using. I know very little about this kind of back-end development, so I enlisted the help of a [friend of mine] to set up a basic SQLite server and interface. This managed to fix a whole host of issues that had cropped up, and generally cleaned things up significantly.

One of the nicest parts was no longer needing to load everything during startup and store it all in memory. We now just fetch and save the relevant server’s data as part of parsing the message. This fixed a massive issue caused by the introduction of sharding, which we’ll talk about a little later on.

Supporting this required me writing an upgrade script that would parse the old timestamps file to the new database version. This will hopefully eventually get rolled into the startup script (only performed if it hasn’t already updated).

Turns out, people care about this

Like I mentioned, I decided to bring this bot back after discovering it had become popular through some weird coincidence. Since then, I’ve discovered kind of a weird little community around the bot: I’ve accepted pull requests from people wanting to help make it better, fielded a surprising number of bug reports/fixes from people caring enough to contact me, and recently started up a Discord server and got a website (I guess I need to learn how to develop those things now, huh) to let people contact me a little easier.

It absolutely boggles my mind that anyone would want to use it, but it makes me a little giddy and is quite frankly the only reason I still develop what has now become my most popular/well-architected project. It feels a little ridiculous.

Refactoring is my favourite thing

This major rewrite was the opportunity I had been looking forward to for a while to rewrite most of the bot. Thanks to the pretty slapdash way I tacked features onto this dumb bot, there was tons of duplicated or otherwise bad code throughout it. Taking this time to refactor has left me with a code-base that should make it much simpler to test and add or change other functionality going forward.

More importantly than anything else, it let me bring everything back up to my current standards. Thanks to the benefit of hindsight, I know exactly what use-cases I need to support with each bit of functionality; and thanks to how soon it has been required, the technical debt wasn’t able to sit around long enough that I lost all desire to work on the project.

Sharding my bot

There were a few kind of spooky days where the bot was down and I couldn’t figure out why: my test-bot worked fine (having started that test bot a while ago was one of my better decisions), but the production one didn’t work at all. What I ended up discovering is that Discord only allows a single bot to connect to 2500 servers. Once a bot is actively on more servers than that, you need to launch multiple “shards” of the bot. I ended up handling this via a startup script which checked the endpoint to see how many shards I needed to launch, launch them, and then record the PIDs for teardown later. At some point this needs to actually get turned into a teardown script, as I currently kill the processes manually.

The downside to this while change was that it went in before the refactor, and the following issue was discovered: all of the shards needed to load the entire state on startup, and wrote their entire stored state every time they fired off. This meant that on startup, a significant chunk of servers had an un-updated associated with them. The fix of course, was reducing how much state modified at write time; in this case, getting it all the way down to one server’s data at a time.

Upcoming additions

Of course, there are a number of things I’d like to do that I haven’t been able to quite dedicate time for. Near the top of my list is creating an actual website so I don’t have to keep directing non-technical users to my github to let them add the bot. Sometime after that I’d like to set up some kind of donation system to help pay for the infrastructure that goes into running the bot.

The biggest user-facing change I’d like to make though, is the ability for server admins to specify their own target word to scan for. This is going to be tricky as heck for a few reasons I’ve noticed:

I need to be able to take an arbitrary string and create a reasonably robust RegEx out of it. This will likely involve creating a new library to map characters onto things which can be confused for it (i.e: o and 0, ö, or ᴑ), which is turning out to be a messier process than anticipated
It means I can no longer compile my single RegEx, as it’s possibly different for every server. This might cause a noticeable speed decrease, but we’ll have to see how it can be worked around.

Mistakes Part Two