这是indexloc提供的服务,不要输入任何密码
Skip to content

Initial PR for Synthetic data generation #767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: ray-api
Choose a base branch
from

Conversation

abhinavg4
Copy link
Contributor

Description

Intial PR for synth data to Ray

Usage

python ray_curator/examples/quick-synthetic.py

Checklist

  • I am familiar with the Contributing Guide.
  • New or Existing tests cover these changes.
  • The documentation is up to date with these changes.

- Introduced `quick-synthetic.py` example demonstrating the use of `SimpleSyntheticStage` for generating synthetic text data using an LLM.
- Added `LLMClient` interface and `OpenAIClient` implementation for querying OpenAI models.
- Created `SimpleSyntheticStage` for processing empty tasks and generating document batches from prompts.
- Implemented necessary imports and setup for the new classes and example.

This commit enhances the functionality of the Ray Curator by providing a practical example and a structured way to interact with LLMs for synthetic data generation.

Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>
@abhinavg4 abhinavg4 changed the base branch from main to ray-api July 2, 2025 22:05
- Changed the hardcoded API key in `quick-synthetic.py` to a placeholder "<your-nvidia-api-key>" for better security practices.
- Added a comment to guide users on where to obtain their API key.
- The key has been rotated

This update enhances the usability and security of the synthetic data generation example.

Signed-off-by: Abhinav Garg <abhinavg@stanford.edu>
@abhinavg4 abhinavg4 changed the title Wip ab/synth init Initial PR for Synthetic data generation Jul 2, 2025
Copy link
Contributor

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jul 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant