+
Skip to content

aloth/RogueGPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RogueGPT: The Stimulus Generation Engine for News Authenticity Research

arXiv Status License GitHub Stars

The Research Mandate: Why RogueGPT Exists

To empirically study how humans perceive AI-generated news, researchers require a reliable and highly controllable source of stimuli. RogueGPT serves as the dedicated stimulus generation engine for the JudgeGPT research project. Its purpose is to create a diverse and methodologically sound dataset of news fragments under specific, reproducible conditions.

This project is one half of a complete research pipeline. While its sister project, JudgeGPT, is the data collection platform where humans evaluate news authenticity, RogueGPT is the tool that creates the very content to be evaluated. This two-part structure is essential for maintaining experimental control and ensuring the integrity of the research findings, as outlined in our foundational paper, "Blessing or curse? A survey on the Impact of Generative AI on Fake News".

The Research Pipeline: From Generation to Judgment

RogueGPT is the starting point in our end-to-end experimental workflow. It provides a user interface for researchers to generate news fragments with precise control over numerous variables. This process allows us to systematically investigate how different factors influence human perception of authenticity.

The Experimental Workflow

The process flows from controlled generation to human judgment, creating a rich dataset that links specific content characteristics to perception scores:

  1. Controlled Stimulus Generation (RogueGPT): A researcher utilizes the RogueGPT interface to generate news fragments. The generation process is highly controlled, using specific variables defined in a configuration file (prompt_engine.json). These variables include parameters such as news outlet Style (e.g., 'NYT', 'BILD'), Format ('tweet', 'short article'), Language ('en', 'de'), and the underlying GeneratorModel (e.g., 'openai_gpt-4-turbo_2024-04-09').

  2. Data Storage (MongoDB): Each generated fragment, along with its full metadata (the parameters used to create it), is stored in a shared MongoDB database. This is handled by the save_fragment function within RogueGPT's codebase, which uses the PyMongo library to interact with the database.

  3. Human Data Collection (JudgeGPT): A participant accesses the JudgeGPT survey application. The application retrieves a fragment generated by RogueGPT from the MongoDB collection to present to the user.

  4. Judgment and Analysis: The participant reads the news fragment and uses sliders to rate its perceived authenticity and origin. This judgment data is then saved back to the database, creating a comprehensive record that links specific generation parameters to quantitative human perception scores.

Getting Involved: A Guide for Developers and Researchers

RogueGPT is a tool for researchers and developers interested in the generative side of our experimental setup.

Audience Primary Goal Action
Researchers Understand the generation methodology or use the tool for your own research. Read the Paper
✉️ Contact Us
See Citation
Developers Contribute code, add new models, fix bugs, or suggest features. Fork the Repo
🐞 Open an Issue
See Contributing Guide
General Public To evaluate the news generated by this tool, please visit our sister project. Participate in the JudgeGPT Survey

📢 Calling All Experts: Share Your Insights!

Are you an expert in AI, policy, or journalism? We are conducting a follow-up study to gather expert perspectives on the risks and mitigation strategies related to AI-driven disinformation. Your insights are invaluable for this research.

Please consider contributing by participating in our 15-minute survey: ➡️ https://forms.gle/EUdbkEtZpEuPbVVz5

  • Purpose: This survey explores expert perceptions of generative-AI–driven disinformation for an academic research project.
  • Data Use: All responses will be treated as confidential and reported in an anonymised, aggregated format by default. At the end of the survey, you will have the option to be publicly acknowledged for your contribution. All data will be used for academic purposes only.
  • Time: Approximately 15 minutes.

Technical Deep Dive

This section provides a comprehensive guide for developers and technical users who wish to run, inspect, or contribute to the RogueGPT project locally.

System Architecture

The project is built with the following components:

  • Frontend/Backend: A Streamlit application (app.py) written in Python provides the user interface for both manual and automated news fragment generation.
  • Configuration: A prompt_engine.json file defines the parameters for automated generation, including prompt templates, styles, languages, and target generative models.
  • Database: A MongoDB (NoSQL) database is used to store the generated news fragments and their associated metadata.

Local Installation and Setup

Follow these steps to set up the project on your local machine.

  1. Prerequisites

    • Python 3.8+
    • pip package manager
    • Git
  2. Clone the Repository

    git clone [https://github.com/aloth/RogueGPT.git](https://github.com/aloth/RogueGPT.git)
    cd RogueGPT
  3. Set Up a Virtual Environment (Recommended)

    # For macOS/Linux
    python3 -m venv venv
    source venv/bin/activate
    
    # For Windows
    python -m venv venv
    .\venv\Scripts\activate
  4. Install Dependencies

    pip install -r requirements.txt

    Key dependencies include streamlit, pymongo, and openai.

  5. Configure Environment Variables The application requires a connection string to a MongoDB database and potentially API keys for generative models. It is best practice to manage these secrets using environment variables.

Running the Application

Once the setup is complete, launch the Streamlit application:

streamlit run app.py

The application will open in your default web browser, presenting two tabs: "Generator" for automated creation and "Manual Data Entry" for direct input.

Project Roadmap: A Research-Driven Agenda

The roadmap for RogueGPT is focused on enhancing its capabilities as a state-of-the-art stimulus generation engine.

Expanding Modalities: Beyond Text to Deepfakes

To support the broader research goals of the JudgeGPT project, a key priority is to expand RogueGPT's capabilities to include the generation and incorporation of images and other visual media. This will enable the study of multimedia and "deepfake" disinformation.

Enhancing Realism and Deception

To keep pace with the "technological arms race," the research must test human perception against an ever-wider array of sophisticated models. The roadmap includes the integration of a greater variety of generative models, such as BERT, T5, and other emerging LLMs, allowing for more nuanced and diverse content generation.

Building Trust and Mitigation Systems

Future work includes building a content verification layer and integrating with established fact-checking services. This would allow RogueGPT to not only generate content but also to annotate it with veracity scores, enabling new lines of research into misinformation mitigation and "inoculation" theories.

Citation

If you use RogueGPT or its underlying research in your work, please cite our foundational paper:

@misc{loth2024blessing,
      title={Blessing or curse? A survey on the Impact of Generative AI on Fake News}, 
      author={Alexander Loth and Martin Kappes and Marc-Oliver Pahl},
      year={2024},
      eprint={2404.03021},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contributing

We welcome contributions from the community! To get involved, please follow these steps:

  1. Fork the repository.
  2. Create your feature branch (git checkout -b feature/AmazingFeature).
  3. Commit your changes (git commit -m 'Add some AmazingFeature').
  4. Push to the branch (git push origin feature/AmazingFeature).
  5. Open a Pull Request.

For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for full details.

Acknowledgments

This work would not be possible without the foundational technologies and support from:

  • OpenAI for their groundbreaking GPT models.
  • Streamlit for enabling the rapid development of our web application.
  • MongoDB for robust and scalable database solutions.
  • The broader open-source community for providing invaluable tools and libraries.

Disclaimer

RogueGPT is an independent research project and is not affiliated with, endorsed by, or in any way officially connected to OpenAI. The use of "GPT" within the project name is employed in a pars pro toto manner, where it represents the broader class of Generative Pre-trained Transformer models and Large Language Models (LLMs) that are the subject of this research. The project's explorations and findings are its own and do not reflect the views or positions of OpenAI. We are committed to responsible AI research and adhere to ethical guidelines in all aspects of our work.

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载