<< >>

When Your Housemates Are Too Loud While Playing Valorant

thumbnail

Did you know that coffee (and caffeine in general) has no effect on me? Literally none. I only drink it because it tastes nice and making it looks cool. This means that sleep is incredibly important to me, as once I get tired, there's no fixing it until I go to bed and get a proper, full night's rest.

And do you know what makes it impossible to get a proper, full night's rest?

SCREAMING AT THE TOP OF YOUR LUNGS FROM ACROSS THE HOUSE IN THE MIDDLE OF THE NIGHT BECAUSE OF A FUCKING VIDEO GAME.

But good thing that doesn't happen to m-Oh no wait it does.

The soundscape of my bedroom in almost 3/7 nights (and I have the data to back this up, just wait) is defiled by the ear-piercing screams of people across the house trying (and I'm assuming, failing, since they scream so much) to play a competitive shooter game.

Look, I have nothing against video games. I'm something of a gamer myself, and I'm no stranger to competitive games. I'm diamond rank in Rocket League, but when I miss an open goal or completely wiff an aerial (I also only play like once every 3 months, so this happens often), I never feel an urge to scream. It's just a video game, like holy shit you know it's all fake right?

I'm trying to fucking sleep; They're pointing and clicking on virtual heads (or at least it should be the heads if they're any good) of virtual people on a rectangle that displays lights of differently coloured squares, and hoping they've pointed and clicked good enough to make the virtual people disappear for a little while. And they have to do this over and over again until someone "wins" (but having your life go so far downhill that you start playing Valorant means you've already lost).

I think one of these things is more important than the other.

Side note: It has occurred to me recently that I should incorporate more audience participation in these blog posts because that's what all the famous people do.(1)

So here's a question for you: "What do you think I should do about this situation?" Keep in mind that I have social anxiety (which might limit the options here), and send your best answer to realnotfake@email.com.

(1) I don't have an algorithm to appease, nor do I want to be famous, so I don't actually know why I'm doing this.

I don't know what you said, but it's probably wrong, sorry. Better luck next time!

The easiest way I got the screaming to stop was going into the router settings and throttling the bandwidth to something really stupid like 50 kbps (kilobits per second). This would mean their latency would be at around 2000ms to the Valorant servers (as I heard them scream once), which is completely unplayable.

You might consider this to be a bit of fucked thing to do. But I mean, I'm basically doing them a favour here making it so they don't feel compelled to play that shitty video game any more and can do something productive with their lives (like letting me sleep).

Anyway, this did work in getting them to stop playing, but there were a few problems.

The time it took between the scream happening and me turning on my phone while practically asleep, logging into the router, and changing the settings is a few minutes. I want the punishment to be almost instantaneous so they can easily make the connections and learn that screaming will take the Internet down, then maybe they'll stop doing it.(2)

(2) Kinda like training a dog.

It also made me feel bad. I don't want to stop people from enjoying themselves (even if it's from an objectively garbage form of entertainment), and especially so when there might be a much better solution to this problem instead (no idea what that could be though).

So, I had to automate the process instead. This would solve both problems of being too slow and feeling any sense of guilt or remorse (because a robot did it - not me, blame the robot).

The first thing most people would go for is setting up a decibel noise meter and cutting the Internet off whenever it reaches a certain threshold. The problem with this is that screaming isn't the only thing that could set the meter off. Things like passing cars and planes overhead are just as loud, and since the screaming is across the house it can actually sometimes be a little quieter than it from where I sleep.

However, I can sleep just fine with normal background noise from cars as they're pretty easy to ignore, and cutting the Internet isn't going to stop it anyway. It's just the screaming that's fucking irritating, so I need something to be set off by screaming specifically.

But what is that something?

Hmm… what technology could I possibly use to detect if a scream has happened? Send your best ideas to /dev/null and I'll implement it in the next episode!(3)

(3) If you say "blockchain" I will report you to the ATO.

I'm kidding, it's machine learning. I mean what else could it have been?(4) So let's create an AI classifier to detect screams and cut the Internet for a few seconds every time it does.

(4) Don't answer that, it won't be as cool as machine learning, so I would never do it.

But since this is the first time I'm going to talk about machine learning/AI in any technical depth on this blog, I'd best explain what it is. Here's my official, proprietary, and standard explanation of machine learning that any idiot who doesn't know how to even open a PDF should be able to understand.

First you get a bunch of data and shove it down a model, so then it learns about the data, and then WHOOSH you have an AI!

OK so, what is a "model"?

A model is the neural network and it's parameters (and maybe some other stuff).

What is a "neural network" and what do "parameters" do?

A neural network is bunch of linear algebra and parameters are the things that go in the linear algebra.

And with that excellent explanation, we should be ready to start making our AI!

Since we need a "bunch of data" (as I so eloquently explained), that's what we will focus on first. Obviously, this will be recordings of the screams, with which my laptop microphone already easily picks up from my room, so I won't need to do anything on that front.

However, we won't be just shoving audio recordings into a model. You see, computer vision is a very well covered field in machine learning, especially when compared to computer "hearing" (which I don't think even exists), so our best bet would be to turn these recordings into pictures and shoving that into the model.

From the recordings we will generate spectrograms, which are a visual way of representing the signal strength, or "loudness", of a signal over time at various frequencies present in a particular waveform. Spectrograms of audio are often used to identify spoken words phonetically, and to analyse the various calls of animals.

Thankfully, screams from across the house are actually an incredibly distinct sound. Here are some examples.

scream-1

See that blob looking thing at the bottom? That's a scream that specifically came from across the house.

The lines going vertically across are from mouse clicks and keyboard taps from my laptop as I was using it while recording this, hopefully the model still will be able to distinguish the scream from everything else when the image is a little noisy (keyboard taps, and general banging of computer hardware can also sometimes be heard when my housemates play Valorant, so it's not an entirely uncommon pattern to have).

scream-2

The pattern should be pretty clear now, but here's another in case you have a peanut brain.

scream-3

I've recorded a bunch of ambient noise to see if anything else looked like these screams, and there really hasn't even been anything close to it, so I think we're good here.

Now that we know spectrograms (or screamtrograms as I'm going to start calling them from now on) should work fine, we need a good way of creating them to shove in the model.

But before all that, let's talk about ethics. Now, I'm a good guy, I'm a nice person.(5) And I'm aware that recording people without their permission is not the most "good" thing to do. Recording private conversations is in fact, illegal in New South Wales.

(5) Don't say anything.

However:

  1. I'm just recording the ambient noises in my room, and sometimes screaming can be heard in the background (please ignore the fact that I'm then using that screaming to feed a machine learning model).
  2. These screams can be heard from outside the house (I tested this one night by going outside), so they aren't very private.
  3. Screams also aren't really a conversation. Sometimes they do talk really loud, but it's pretty muffled and unintelligible in the recordings. These are actually the one thing that have a similar pattern to screams in the screamtrogram, but they're just as irritating, so I'll be counting them as screams as well.

But most importantly:

  1. I WOULD LIKE MORE THAN 4 HOURS OF SLEEP PLEASE.

So given these counterarguments, I think I'm in the clear here regarding the law. And since most peoples' ethical framework boils down to "if it's legal, it's OK", I'm in the clear ethically too.

Anyway, let's get back to it. We're going to want to write a script that records audio and automatically turns it into screamtrograms. We can do that with pyaudio and matplotlib (I think it's a given we'll be using Python here).

I'm mostly going to be describing things at a high level instead of going through code (I'll put the important concepts in code to make it easier to understand) because I've learnt throughout the years that a blog post is not a good place to dump a lot of code in.

If you want to see the code you can go to the Git repository.

Looking at the create_data.py script, you'll see that I'm using multiprocessing and have two processes, one for recording and one for creating the screamtrograms. The reason for this is that creating the screamtrograms can take a few seconds and I don't want the recording to be blocked by it.

The recording process is very simple. It records audio from the microphone in 60 second intervals, saves it to a .wav file, and adds the file to a queue for the screamtrogram process to use.

Before creating a screamtrogram, I segment the 60 second audio file into 15 second audio files (4 in total). This is to hopefully make the screams a bigger pattern in the images, while still being large enough that the likelihood of a scream being cut off abruptly is minimal. Being cut off did happen a few times in fact, but never often enough for me feel like the segments should be larger.

The great things about .wav files is that they're a simple file format and are basically just an array of values. So to create the screamtrogram, scipy.io has a function to open the .wav file and matplotlib has another to plot the image.

from scipy.io import wavfile
import matplotlib.pyplot as plt

wav_file = 'myaudio.wav'

rate, audio = wavfile.read(wav_file)
audio = audio[:,0]  # select left channel only
fig, ax = plt.subplots(1)
fig.subplots_adjust(left=0, right=1, bottom=0, top=1)
ax.specgram(x=audio, Fs=rate, noverlap=384, NFFT=512)
ax.axis('off')
fig.savefig('screamtrogram.png', dpi=300)

matplotlib usually puts an x/y axis on it's plots which are useless for us and unnecessary noise for the model, so I've turned them off with ax.axis('off') and some adjustment to the image.

So if you were to run create_data.py, it would add 4 screamtrograms to the data/spectrograms directory every 60 seconds. Then you will need to manually go through each one, see which one is a scream, and move it to the train/scream directory.

Unfortunately, there is no way around manually labelling training data in this manner, this is just how it goes with machine learning. I recommend getting around 500 samples for training and another 100 for validation (so 600 in total). We'll also need another 600 samples of non-screamtrograms, but those should be easy to get considering non-screams happen most of the time. Move those to train/notscream.

Next up is creating our classifier. We will be using fastai for this because it's… uh… fast, and they provide a lot of useful functions on top of PyTorch. If you're following along with the code, this part is in the train.ipynb notebook (personally I hate notebooks, but I wanted to see pictures).

Because data scientists don't understand good software engineering practices we import * from fastai.vision.all and create a DataLoaders.

from fastai.vision.all import *

path = Path('train/')

spectrograms = DataBlock(
    blocks=(ImageBlock(cls=PILImageBW), CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct=0.2, seed=21),
    get_y=parent_label,
    item_tfms=Resize(224))

dls = spectrograms.dataloaders(path)
dls.show_batch(max_n=4, nrows=1)

This gives us a batch that looks like this.

batch

The images have been converted into greyscale, which should hopefully reduce the noise in the image as the model will be able to ignore colours since the blob is the important part. I have no idea if this helps or not, but I used to do this a lot in the past for my previous non-AI attempts at image recognition, so I just did it on habit. Either way, I doubt it would make the model worse so it's probably fine.

Each item is also resized into a 224px square so it's faster for the model to work through (the screamtrograms we generate are 1920 x 1440 pixels in size, which are wayyy to big).

Let's try a few training epochs now.

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(5)

training

Obviously I'm going to need to train this more than 5 times, but a 10% error rate is pretty decent for 5 runs! Let's pretend it's good enough for now and keep going.

I want to see if the model is making more false positives (it thinks "notscreams" are "screams") or false negatives (it thinks "screams" are "notscreams"). To visualise this, we can use a "confusion matrix."

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

confusion

Here we see that "notscreams" that were predicted as "screams" (3) are more than "screams" that were predicted as "notscreams" (1), so looks like it tends towards false positives. This can probably be ironed out with more training (I also didn't get all 600 images for training yet when I made this matrix) but I'll be keeping this in mind anyway.

After all the fine tuning is done (in the end I got it to around a 1% error rate after 100 runs and more data), we can export the model into an export.pkl.

learn.export()

When we want to use the exported model we can use load_learner.

# Load pickle
learn_inf = load_learner('export.pkl')
# Predict an image
learn_inf.predict('images/screamtrogram.png')

Now that our machine has learnt about screams, we can look at deciding what it should do when it detects one. It would be pretty easy to have it do what I did - log into the router and set a bandwidth limit, but there is another problem with this that I didn't explain before.

It would be pretty easy for my victims housemates to determine what happened to the Internet when it stops working. All they would need to do is login to the router and click through a few tabs, remove the bandwidth limit, and then change the router password to stop it from happening again.

For certain people, I would be OK with doing it this way because to be honest, I don't think most people know what a router does.

However, my housemates do, so this won't work and I'll need another method of attack.

I could try to torrent a bunch of Linux distros and hog all the bandwidth, but in my testing this was inconsistent in terms of having a consistent degradation of services (sometimes it would work, sometimes it wouldn't).

I don't really know what to do, what do you think?


Actually, I do know what to do, and I don't care what you think. I'm done with this joke.

The answer is ARP poisoning, in which I send spoofed ARP packets to the LAN to associate my MAC address with the IP address of the router. In more simple terms, this means I can make everyone else's computer think I'm the router and send all their Internet traffic to me.

We can use Ettercap for this, as it's a pretty standard toolkit for man in the middle attacks. There a few possible ways of messing with peoples' Internet with this, but I'm going to be going through packet filtering.

First we'll need to create a filter source file for ettercap to read.

cd /usr/share/ettercap
sudo vim dos.elt

I deliberated over what the best filter would be. Do I want to just drop all connections? Maybe only specific packets to Valorant servers? In the end, I decided upon a compromise between the two and dropping all UDP packets. This means that all normal HTTP traffic should work, but things like most multiplayer video games, some chat applications (like Discord), and video streaming services (like YouTube) will not.

This was done because I didn't want to actually play that trash game to figure out what IPs the Valorant servers use, and I wanted to limit the damage of false positives (I don't care too much about false negatives). For other services that use UDP like YouTube, dropping connections for several seconds is probably not that big a deal since the video gets buffered in advance, but for a real-time competitive game, it could lose you the round.

So in dos.elt, add this to the file.

if (ip.proto == UDP) {
    drop();
}

Now we need to compile the code to a binary, and use it in an argument to ettercap.

sudo etterfilter dos.elt -o dos.ef
sudo ettercap -T -q -F /usr/share/ettercap/dos.ef -M arp

And now, all UDP packets on your LAN will be dropped! You can open Wireshark and see all the packets that come from other computers in the LAN.

It might be a bit of a surprise to see how absurdly easy that was, and it just goes to show how Internet infrastructure is an insecure mess and TLS really is the only good thing we have in this world.

For anyone who doesn't know what ARP poisoning is and can't spot a man in the middle attack when one happens, this is practically undetectable. I don't think my housemates are going to check their ARP routing tables, or attempt to trace where their packets are going. I also don't think they know the difference between TCP and UDP, so they probably won't be able to make the connections on why only some things are broken but not others.(6)

(6) They should be able to figure out that it only happens to services that "stream" data, but not why. And if comes to it, I'll just start dropping all connections. I'll only use this when I want to sleep so it won't really affect me.

However, there is one last problem with this method in my house. And that is that we have two LANs - one for Ethernet and one for Wi-Fi (which is a separate mesh network to cover the entire house). I've created a simplified network diagram below to illustrate this.

network diagram

My laptop only connects to the mesh network and doesn't have an Ethernet port (blame Framework for not making an expansion card for it yet).

Since most of the computers in the house connect with Ethernet, I'm not going to be able to poison them like this. Thankfully, the Wi-Fi access points are on the Ethernet LAN to connect to the Internet, so if I can connect to the main router and poison that, we should be golden.

Unfortunately, the only computer I have with an Ethernet port is my gaming PC, which I don't exactly want to leave on overnight since it's a bit power hungry. Instead I bought a Raspberry Pi which does come with an Ethernet port.

The idea here is that I can set up a HTTP server on the RPi (I can still see hosts on the main LAN from the mesh network), and whenever the model on the laptop detects a scream, send a request to the RPi and have that run ettercap instead of my laptop.(7)

(7) I could just move everything to the Raspberry Pi, but I doubt it would be able to run the model quickly enough. I mean, it's already pretty slow on this laptop (and I don't want to buy a microphone for it).

It's kinda boring so I won't go through it, but the scream detection script and ettercap server are in wgyhtss.py and wsgi.py respectively.

Something to note is how it requires a secret key for the client and server to talk. I just felt a little weird with having an endpoint to that can DOS everyone without any authentication (even if it's limited to the LAN).

And in another attempt to mitigate false positives, if a scream is predicted, the probability must be above 85%. I want the AI to be completely sure (or as completely sure as 85% can be) that it is in fact a scream.

I've also set the poisoning and filter to last for 30 seconds because that's both how long YouTube will buffer your video in advance and how long the buy phase time takes in Valorant (the start of the round where you buy weapons).

Having your connection dropped only during the buy phase time is not a huge deal because buying weapons doesn't take very long and you can still buy them after the phase ends, so I wanted to make sure ARP poisoning attack would at least cover over that time completely.

And if it happens during the actual round time instead of the buy phase, rounds can last at most 140 seconds (100s round time + 40s spike plant time). But they mostly only end up being at around 110 seconds for a game with teams of similar skill, so being out of it for 1/3-4 of that is a big deal.

The last thing to keep in mind is disk usage as the code never deletes anything, so all the recordings and images can add up (I have over 3,500 audio segments, and along with the accompanying 3,500 screamtrograms this adds up to 18GB). You'll obviously want to keep the train/ directory, but you can safely delete data/ every now and then as both wghytss.py and create_data.py will recreate the directory to use if it's missing.

Right, so that's it then. This took around two weeks to research, test different things, and get training data(8) so it was a decent amount of work, and I'm pretty happy with the outcome so I wouldn't call it a wasted effort.

(8) Yes, I only needed half a month to get 600 screams, and I wasn't even recording every day so I missed a lot.

I think I did a good job but I still can't shake the feeling that there was an easier way of solving this issue.


21/06/2022 Update

Hello, I'm back with news!

They've stopped screaming! I've heard verry little (if anything, don't really remember) in the two months since I've made this post, so I'm going to count it as a success.

I don't actually know if it was because of me or if they got bored of Valorant or something (maybe both), but whatever - who cares, I can sleep now!

Also, I have a correction.

I mentioned that I was only going to drop UDP packets which meant that normal Internet websites should work perfectly fine. However, I forgot to factor in HTTP/3, the third version of HTTP which runs "QUIC" over UDP, and is supposed to make HTTP transactions faster.

25% of websites and pretty much every popular one uses HTTP/3.(9) So, dropping UDP packets did actually make most websites not function.

(9) Including this one.

It doesn't really matter now, but learning new things is fun.