I thought it would be interesting to see whether it was possible to identify players from replay files by the in-game actions they took. One obvious use of this is to associate smurf accounts with the original player. I hacked something together using a combination of bμg's replay parser and the statistics tool I use to create analysis posts about RAGL. Before I go through the results I wanted to go through some of the limitations that the approach has.
Firstly bμg's tool can only cope with replays from 2021 onwards. This immediately means that the approach isn't able to shed light on the identity of Misery from RAGL S08 or Archangel (which happy has claimed a long time ago).
Another reason I can't identify Archangel is that I'm limited to replays that I have, or have downloaded from the Ladder (and I don't have enough free disk space/patience to download everything from the ladder!)
The classification algorithm I've put together is pretty crude. It computes different metrics for each replay, averages these per player account and then compares the absolute difference between the metric scores. This could definitely be improved by filtering out metrics which are adding noise or by computing an optimal weighting for the different metrics. There are probably other metrics that could be included to improve the score too. Having said all this, using a 2:1 train/test split, the algorithm does seem to match accounts correctly.
Since I have a limited data set then the script can only guess at players which are within the data set. This means that since there are no LorryDriver replays (because he played before 2021) then it will never guess that an account is a Lorry smurf.
There are a number of factors which I deliberately did NOT use:
A final note before we get on to some results in the next post: if you start a witch hunt then you're going to find witches. The script simply points out accounts that play in similar ways - therefore it will definitely find similar accounts. This does not mean that the players are smurfs of each other (and in many cases there is plenty of evidence that they are not smurfs of each other).
Firstly bμg's tool can only cope with replays from 2021 onwards. This immediately means that the approach isn't able to shed light on the identity of Misery from RAGL S08 or Archangel (which happy has claimed a long time ago).
Another reason I can't identify Archangel is that I'm limited to replays that I have, or have downloaded from the Ladder (and I don't have enough free disk space/patience to download everything from the ladder!)
The classification algorithm I've put together is pretty crude. It computes different metrics for each replay, averages these per player account and then compares the absolute difference between the metric scores. This could definitely be improved by filtering out metrics which are adding noise or by computing an optimal weighting for the different metrics. There are probably other metrics that could be included to improve the score too. Having said all this, using a 2:1 train/test split, the algorithm does seem to match accounts correctly.
Since I have a limited data set then the script can only guess at players which are within the data set. This means that since there are no LorryDriver replays (because he played before 2021) then it will never guess that an account is a Lorry smurf.
There are a number of factors which I deliberately did NOT use:
- Player chat
- Player names
- Skill level
- IP address
- Time of day/day of week
- Game count (some players play lots more games than others)
- List of opponents (since it's hard for a smurf to play against themself)
A final note before we get on to some results in the next post: if you start a witch hunt then you're going to find witches. The script simply points out accounts that play in similar ways - therefore it will definitely find similar accounts. This does not mean that the players are smurfs of each other (and in many cases there is plenty of evidence that they are not smurfs of each other).
Statistics: Posted by TTTPPP — Tue May 21, 2024 5:47 pm — Replies 1 — Views 22