To empirically study how humans perceive AI-generated news, researchers require a reliable and highly controllable source of stimuli. RogueGPT serves as the dedicated stimulus generation engine for the JudgeGPT research project. Its purpose is to create a diverse and methodologically sound dataset of news fragments under specific, reproducible conditions.
This project is one half of a complete research pipeline. While its sister project, JudgeGPT, is the data collection platform where humans evaluate news authenticity, RogueGPT is the tool that creates the very content to be evaluated. This two-part structure is essential for maintaining experimental control and ensuring the integrity of the research findings, as outlined in our foundational paper, "Blessing or curse? A survey on the Impact of Generative AI on Fake News".
RogueGPT is the starting point in our end-to-end experimental workflow. It provides a user interface for researchers to generate news fragments with precise control over numerous variables. This process allows us to systematically investigate how different factors influence human perception of authenticity.
The process flows from controlled generation to human judgment, creating a rich dataset that links specific content characteristics to perception scores:
-
Controlled Stimulus Generation (
RogueGPT
): A researcher utilizes theRogueGPT
interface to generate news fragments. The generation process is highly controlled, using specific variables defined in a configuration file (prompt_engine.json
). These variables include parameters such as news outletStyle
(e.g., 'NYT', 'BILD'),Format
('tweet', 'short article'),Language
('en', 'de'), and the underlyingGeneratorModel
(e.g., 'openai_gpt-4-turbo_2024-04-09'). -
Data Storage (MongoDB): Each generated fragment, along with its full metadata (the parameters used to create it), is stored in a shared MongoDB database. This is handled by the
save_fragment
function within RogueGPT's codebase, which uses the PyMongo library to interact with the database. -
Human Data Collection (
JudgeGPT
): A participant accesses theJudgeGPT
survey application. The application retrieves a fragment generated by RogueGPT from the MongoDB collection to present to the user. -
Judgment and Analysis: The participant reads the news fragment and uses sliders to rate its perceived authenticity and origin. This judgment data is then saved back to the database, creating a comprehensive record that links specific generation parameters to quantitative human perception scores.
RogueGPT is a tool for researchers and developers interested in the generative side of our experimental setup.
Audience | Primary Goal | Action |
---|---|---|
Researchers | Understand the generation methodology or use the tool for your own research. | Read the Paper ✉️ Contact Us See Citation |
Developers | Contribute code, add new models, fix bugs, or suggest features. | Fork the Repo 🐞 Open an Issue See Contributing Guide |
General Public | To evaluate the news generated by this tool, please visit our sister project. | Participate in the JudgeGPT Survey |
Are you an expert in AI, policy, or journalism? We are conducting a follow-up study to gather expert perspectives on the risks and mitigation strategies related to AI-driven disinformation. Your insights are invaluable for this research.
Please consider contributing by participating in our 15-minute survey: ➡️ https://forms.gle/EUdbkEtZpEuPbVVz5
- Purpose: This survey explores expert perceptions of generative-AI–driven disinformation for an academic research project.
- Data Use: All responses will be treated as confidential and reported in an anonymised, aggregated format by default. At the end of the survey, you will have the option to be publicly acknowledged for your contribution. All data will be used for academic purposes only.
- Time: Approximately 15 minutes.
This section provides a comprehensive guide for developers and technical users who wish to run, inspect, or contribute to the RogueGPT project locally.
The project is built with the following components:
- Frontend/Backend: A Streamlit application (
app.py
) written in Python provides the user interface for both manual and automated news fragment generation. - Configuration: A
prompt_engine.json
file defines the parameters for automated generation, including prompt templates, styles, languages, and target generative models. - Database: A MongoDB (NoSQL) database is used to store the generated news fragments and their associated metadata.
Follow these steps to set up the project on your local machine.
-
Prerequisites
- Python 3.8+
- pip package manager
- Git
-
Clone the Repository
git clone [https://github.com/aloth/RogueGPT.git](https://github.com/aloth/RogueGPT.git) cd RogueGPT
-
Set Up a Virtual Environment (Recommended)
# For macOS/Linux python3 -m venv venv source venv/bin/activate # For Windows python -m venv venv .\venv\Scripts\activate
-
Install Dependencies
pip install -r requirements.txt
Key dependencies include
streamlit
,pymongo
, andopenai
. -
Configure Environment Variables The application requires a connection string to a MongoDB database and potentially API keys for generative models. It is best practice to manage these secrets using environment variables.
Once the setup is complete, launch the Streamlit application:
streamlit run app.py
The application will open in your default web browser, presenting two tabs: "Generator" for automated creation and "Manual Data Entry" for direct input.
The roadmap for RogueGPT is focused on enhancing its capabilities as a state-of-the-art stimulus generation engine.
To support the broader research goals of the JudgeGPT project, a key priority is to expand RogueGPT's capabilities to include the generation and incorporation of images and other visual media. This will enable the study of multimedia and "deepfake" disinformation.
To keep pace with the "technological arms race," the research must test human perception against an ever-wider array of sophisticated models. The roadmap includes the integration of a greater variety of generative models, such as BERT, T5, and other emerging LLMs, allowing for more nuanced and diverse content generation.
Future work includes building a content verification layer and integrating with established fact-checking services. This would allow RogueGPT to not only generate content but also to annotate it with veracity scores, enabling new lines of research into misinformation mitigation and "inoculation" theories.
If you use RogueGPT or its underlying research in your work, please cite our foundational paper:
@misc{loth2024blessing,
title={Blessing or curse? A survey on the Impact of Generative AI on Fake News},
author={Alexander Loth and Martin Kappes and Marc-Oliver Pahl},
year={2024},
eprint={2404.03021},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
We welcome contributions from the community! To get involved, please follow these steps:
- Fork the repository.
- Create your feature branch (
git checkout -b feature/AmazingFeature
). - Commit your changes (
git commit -m 'Add some AmazingFeature'
). - Push to the branch (
git push origin feature/AmazingFeature
). - Open a Pull Request.
For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for full details.
This work would not be possible without the foundational technologies and support from:
- OpenAI for their groundbreaking GPT models.
- Streamlit for enabling the rapid development of our web application.
- MongoDB for robust and scalable database solutions.
- The broader open-source community for providing invaluable tools and libraries.
RogueGPT is an independent research project and is not affiliated with, endorsed by, or in any way officially connected to OpenAI. The use of "GPT" within the project name is employed in a pars pro toto manner, where it represents the broader class of Generative Pre-trained Transformer models and Large Language Models (LLMs) that are the subject of this research. The project's explorations and findings are its own and do not reflect the views or positions of OpenAI. We are committed to responsible AI research and adhere to ethical guidelines in all aspects of our work.