Code accompanying the paper "The GANfather: Controllable generation of malicious activity to improve defence systems".
The src
folder contains:
- the Generator and Discriminator architectures for both use cases;
- the skeleton of the rules proxy network from the anti-money laundering use case, but the forward method was removed because the rules' logic is confidential;
- the recommender system.
The experiments
folder contains the hyperparameter tuning executables from each use case.
For information regarding the versions of the packages we used, please refer to the environment.yml
file.
The data used for the anti-money laundering use case is confidentual and as such cannot be published.
The data used for the recommender system was the MovieLens-1M dataset that can be found here.
@inproceedings{10.1145/3604237.3626882, author = {Pereira, Ricardo Ribeiro and Bono, Jacopo and Ascens~{a}o, Jo~{a}o Tiago and Apar'{\i}cio, David and Ribeiro, Pedro and Bizarro, Pedro}, title = {The GANfather: Controllable generation of malicious activity to improve defence systems}, year = {2023}, isbn = {9798400702402}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3604237.3626882}, doi = {10.1145/3604237.3626882}, abstract = {Machine learning methods to aid defence systems in detecting malicious activity typically rely on labelled data. In some domains, such labelled data is unavailable or incomplete. In practice this can lead to low detection rates and high false positive rates, which characterise for example anti-money laundering systems. In fact, it is estimated that 1.7–4 trillion euros are laundered annually and go undetected. We propose The GANfather, a method to generate samples with properties of malicious activity, without label requirements. We propose to reward the generation of malicious samples by introducing an extra objective to the typical Generative Adversarial Networks (GANs) loss. Ultimately, our goal is to enhance the detection of illicit activity using the discriminator network as a novel and robust defence system. Optionally, we may encourage the generator to bypass pre-existing detection systems. This setup then reveals defensive weaknesses for the discriminator to correct. We evaluate our method in two real-world use cases, money laundering and recommendation systems. In the former, our method moves cumulative amounts close to 350 thousand dollars through a network of accounts without being detected by an existing system. In the latter, we recommend the target item to a broad user base with as few as 30 synthetic attackers. In both cases, we train a new defence system to capture the synthetic attacks.}, booktitle = {Proceedings of the Fourth ACM International Conference on AI in Finance}, pages = {133–140}, numpages = {8}, location = {Brooklyn, NY, USA}, series = {ICAIF '23} }