-
Notifications
You must be signed in to change notification settings - Fork 39
Remove setup.py
and switch to src
-based project layout
#31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis update primarily consists of minor formatting adjustments, such as adding newline characters to configuration files and reordering import statements across multiple Python modules. Additionally, it removes the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ConfigManager
User->>ConfigManager: Initialize with CLI arguments
ConfigManager->>ConfigManager: Parse existing arguments
User->>ConfigManager: Specify --checkpoint.convert_to_hf_on_save
User->>ConfigManager: Specify --checkpoint.hf_upload_enabled
User->>ConfigManager: Specify --checkpoint.hf_repo_base_name
User->>ConfigManager: Specify --checkpoint.hf_upload_format
ConfigManager->>ConfigManager: Register new checkpoint/HF arguments
ConfigManager-->>User: Configuration includes new checkpoint/HF options
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (9)
download_checkpoint.py (2)
6-6
: Clean up commented code.Remove the commented imports and code lines to improve code cleanliness.
-# from huggingface_hub import HfApi - # api = HfApi()Also applies to: 10-10
1-1
: Add module docstring.The module is missing a docstring as flagged by static analysis.
+"""Utility script for downloading checkpoints from Hugging Face Hub.""" + import argparse🧰 Tools
🪛 Pylint (3.3.7)
[convention] 1-1: Missing module docstring
(C0114)
src/flame/utils/hf_utils.py (1)
7-7
: Remove commented code.Either uncomment this import if it's needed or remove it entirely.
-# from huggingface_hub import logging as hf_logging
src/flame/train.py (4)
26-26
: Remove commented import.Either uncomment this import if needed or remove it entirely.
-# from flame.utils.checkpoint import cleanup_local_checkpoints
774-775
: Simplify nested if statements for better readability.The nested if statements can be combined using logical operators.
- if torch.distributed.get_rank() == 0: - if job_config.checkpoint.enable_checkpoint: + if torch.distributed.get_rank() == 0 and job_config.checkpoint.enable_checkpoint:Also applies to: 808-810
🧰 Tools
🪛 Ruff (0.11.9)
774-775: Use a single
if
statement instead of nestedif
statements(SIM102)
801-805
: Use more specific exception types.Consider catching specific exceptions for better error diagnosis.
- except Exception as e: + except (OSError, ValueError, RuntimeError) as e: logger.error( f"Failed to convert checkpoint step {train_state.step} to HF format: {e}", exc_info=True, )🧰 Tools
🪛 Pylint (3.3.7)
[convention] 803-803: Line too long (106/100)
(C0301)
[warning] 801-801: Catching too general exception Exception
(W0718)
812-819
: Simplify path selection logic.The path selection logic can be simplified using a dictionary approach.
- local_path_to_upload = None - if upload_format == "hf": - if hf_target_path and os.path.isdir(hf_target_path): - local_path_to_upload = hf_target_path - elif upload_format == "dcp": - if dcp_save_path and os.path.isdir(dcp_save_path): - local_path_to_upload = dcp_save_path + path_mapping = { + "hf": hf_target_path, + "dcp": dcp_save_path + } + candidate_path = path_mapping.get(upload_format) + local_path_to_upload = candidate_path if candidate_path and os.path.isdir(candidate_path) else None🧰 Tools
🪛 Ruff (0.11.9)
816-817: Use a single
if
statement instead of nestedif
statements(SIM102)
src/flame/utils/checkpoint.py (2)
17-66
: Excellent checkpoint cleanup implementation with minor formatting improvements needed.The function logic is well-structured with proper error handling, logging, and separate handling for DCP and HF formats. The implementation correctly handles edge cases and provides good observability.
Consider breaking the long lines to improve readability:
-def cleanup_local_checkpoints(checkpoint_dir: str, keep_latest_k: int): +def cleanup_local_checkpoints( + checkpoint_dir: str, keep_latest_k: int +):- logger.info(f"Cleaning up local checkpoints in {checkpoint_dir}, keeping latest {keep_latest_k}") + logger.info( + f"Cleaning up local checkpoints in {checkpoint_dir}, " + f"keeping latest {keep_latest_k}" + )🧰 Tools
🪛 Pylint (3.3.7)
[convention] 18-18: Line too long (110/100)
(C0301)
[convention] 23-23: Line too long (101/100)
(C0301)
1-7
: Add module docstring for better documentation.The module is missing a docstring. Consider adding one to describe its purpose and usage.
+""" +Checkpoint management utilities for local cleanup and step number extraction. + +This module provides functions to manage checkpoint directories by extracting +step numbers from directory names and cleaning up old checkpoints while +retaining the latest k checkpoints for both DCP and HF formats. +""" + import glob🧰 Tools
🪛 Pylint (3.3.7)
[convention] 1-1: Missing module docstring
(C0114)
[error] 6-6: Unable to import 'torchtitan.tools.logging'
(E0401)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
.github/ISSUE_TEMPLATE/feature-request.yml
(1 hunks)3rdparty/flash-linear-attention
(1 hunks)3rdparty/lm-evaluation-harness
(1 hunks)configs/delta_net_1B.json
(1 hunks)configs/delta_net_340M.json
(1 hunks)configs/gla_340M.json
(1 hunks)configs/gla_7B.json
(1 hunks)configs/gsa_340M.json
(1 hunks)configs/hgrn2_340M.json
(1 hunks)configs/transformer_1B.json
(1 hunks)configs/transformer_340M.json
(1 hunks)configs/transformer_7B.json
(1 hunks)download_checkpoint.py
(1 hunks)pyproject.toml
(1 hunks)setup.py
(0 hunks)src/flame/config_manager.py
(1 hunks)src/flame/models/fla.toml
(1 hunks)src/flame/tools/utils.py
(2 hunks)src/flame/train.py
(4 hunks)src/flame/utils/checkpoint.py
(1 hunks)src/flame/utils/convert_dcp_to_hf.py
(1 hunks)src/flame/utils/hf_utils.py
(1 hunks)
💤 Files with no reviewable changes (1)
- setup.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/flame/train.py (2)
src/flame/utils/convert_dcp_to_hf.py (1)
save_pretrained
(20-54)src/flame/utils/hf_utils.py (1)
upload_checkpoint_to_hf
(10-85)
🪛 Pylint (3.3.7)
download_checkpoint.py
[convention] 1-1: Missing module docstring
(C0114)
[error] 4-4: Unable to import 'huggingface_hub'
(E0401)
[convention] 9-9: Missing function or method docstring
(C0116)
[warning] 9-9: Redefining name 'args' from outer scope (line 39)
(W0621)
src/flame/utils/hf_utils.py
[convention] 25-25: Line too long (127/100)
(C0301)
[convention] 35-35: Line too long (103/100)
(C0301)
[convention] 42-42: Line too long (104/100)
(C0301)
[convention] 54-54: Line too long (108/100)
(C0301)
[convention] 56-56: Line too long (110/100)
(C0301)
[convention] 58-58: Line too long (112/100)
(C0301)
[convention] 85-85: Line too long (109/100)
(C0301)
[convention] 1-1: Missing module docstring
(C0114)
[error] 3-3: Unable to import 'huggingface_hub'
(E0401)
[error] 5-5: Unable to import 'torchtitan.tools.logging'
(E0401)
[warning] 34-34: Catching too general exception Exception
(W0718)
[warning] 53-53: Catching too general exception Exception
(W0718)
[warning] 84-84: Catching too general exception Exception
(W0718)
src/flame/train.py
[convention] 787-787: Line too long (105/100)
(C0301)
[convention] 791-791: Line too long (127/100)
(C0301)
[convention] 799-799: Line too long (104/100)
(C0301)
[convention] 803-803: Line too long (106/100)
(C0301)
[convention] 807-807: Line too long (114/100)
(C0301)
[convention] 830-830: Line too long (124/100)
(C0301)
[warning] 783-783: TODO: Haven't tested this one yet
(W0511)
[warning] 801-801: Catching too general exception Exception
(W0718)
[warning] 829-829: Catching too general exception Exception
(W0718)
src/flame/utils/checkpoint.py
[convention] 18-18: Line too long (110/100)
(C0301)
[convention] 23-23: Line too long (101/100)
(C0301)
[convention] 1-1: Missing module docstring
(C0114)
[error] 6-6: Unable to import 'torchtitan.tools.logging'
(E0401)
🪛 Ruff (0.11.9)
src/flame/train.py
774-775: Use a single if
statement instead of nested if
statements
(SIM102)
816-817: Use a single if
statement instead of nested if
statements
(SIM102)
🔇 Additional comments (23)
src/flame/models/fla.toml (1)
67-67
: No-op newline change
Removal of the trailing newline is purely formatting; no functional impact on TOML parsing.configs/transformer_1B.json (1)
22-22
: Trailing newline addition only
Added EOF newline for consistency; no semantic changes to the JSON.configs/gsa_340M.json (1)
29-29
: Trailing newline addition only
Added EOF newline for consistency; no semantic changes to the JSON.configs/delta_net_340M.json (1)
27-27
: Trailing newline addition only
Added EOF newline for consistency; no semantic changes to the JSON.configs/transformer_340M.json (1)
18-18
: Trailing newline addition only
Added EOF newline for consistency; no semantic changes to the JSON.configs/hgrn2_340M.json (1)
20-20
: Add trailing newline for consistent JSON formatting.
No semantic change; aligns with other config files.configs/gla_7B.json (1)
25-25
: Add trailing newline for consistent JSON formatting.
No semantic change; aligns with other config files.configs/delta_net_1B.json (1)
29-29
: Add trailing newline for consistent JSON formatting.
No semantic change; aligns with other config files.configs/gla_340M.json (1)
24-24
: Add trailing newline for consistent JSON formatting.
No semantic change; aligns with other config files.configs/transformer_7B.json (1)
21-21
: Add trailing newline for consistent JSON formatting.
No semantic change; aligns with other config files..github/ISSUE_TEMPLATE/feature-request.yml (1)
30-30
: Add trailing newline for consistency.Including a final newline ensures POSIX-compliance and matches the formatting of other config templates.
3rdparty/lm-evaluation-harness (1)
1-1
: Approve submodule update.The
lm-evaluation-harness
submodule has been correctly bumped to commitd693dcd2b46c59d5478960d0fba52fdc1174763d
.3rdparty/flash-linear-attention (1)
1-1
: Subproject commit reference updatedThe submodule SHA has been bumped to
ff76ca0d1dec5d196ee6dc41d966d8fbfeb6853e
, aligning with the overall dependency refresh.src/flame/tools/utils.py (2)
8-8
: LGTM: Good formatting improvement.Adding a blank line after the import improves readability and follows Python conventions.
19-19
: LGTM: Trailing whitespace cleanup.Removing trailing whitespace is a good practice for code cleanliness.
src/flame/utils/convert_dcp_to_hf.py (1)
37-38
: LGTM: Improved checkpoint path construction logic.The change to use
os.path.dirname(path)
as the base directory makes the checkpoint path construction more robust and aligns with the expected directory structure where the DCP checkpoint resides parallel to the output path.pyproject.toml (2)
41-42
: LGTM: Modern build system configuration.The updated setuptools requirements and explicit build backend specification align with modern Python packaging best practices.
31-32
:✅ Verification successful
Verify that the dynamic version attribute exists.
The dynamic version configuration looks correct, but ensure that
flame.__version__
is properly defined and accessible in the package.
🏁 Script executed:
#!/bin/bash # Description: Verify that flame.__version__ is defined for dynamic versioning # Expected: Find __version__ definition in flame package # Search for __version__ definition in the flame package fd "__init__.py" src/flame --exec grep -l "__version__" {} \; # Show the actual version definition fd "__init__.py" src/flame --exec grep -n "__version__" {} \;Length of output: 171
Confirmed
flame.__version__
definition for dynamic versioningThe version attribute is defined as expected, so the dynamic version setup in
pyproject.toml
will work:
src/flame/__init__.py
(line 1):__version__ = "0.1.0"
No further changes needed.
src/flame/config_manager.py (2)
694-722
: Well-structured configuration additions!The new command-line arguments for HF Hub integration are properly documented and follow the existing patterns in the codebase.
694-722
: Inconsistency between PR objectives and code changes.The PR objectives state this is about removing
setup.py
and transitioning topyproject.toml
, but these changes add Hugging Face Hub integration features. Please clarify if the correct PR objectives or code changes were provided.Likely an incorrect or invalid review comment.
src/flame/train.py (1)
89-101
: Good implementation of repository ID generation.The logic properly handles missing configuration and generates unique repository IDs with timestamps.
src/flame/utils/checkpoint.py (2)
9-14
: LGTM! Well-implemented step extraction logic.The function correctly uses regex to extract step numbers and handles edge cases appropriately with the -1 sentinel value.
6-6
:✅ Verification successful
Verify the torchtitan import availability.
Static analysis indicates that
torchtitan.tools.logging
cannot be imported. Please ensure this dependency is properly configured in the project.
🏁 Script executed:
#!/bin/bash # Description: Check if torchtitan.tools.logging module exists in the codebase # Expected: Find the module definition or import statements # Search for torchtitan logging module definition fd -t f -e py | xargs grep -l "torchtitan.tools.logging" | head -10 # Search for torchtitan package structure fd -t d torchtitan | head -5 # Check if there are any other torchtitan imports rg "from torchtitan" -A 2 | head -20Length of output: 1819
Import path for
torchtitan.tools.logging
is validThe vendored
3rdparty/torchtitan
directory provides thetorchtitan.tools.logging
module used elsewhere in the codebase (e.g.,src/flame/train.py
,src/flame/utils/convert_hf_to_dcp.py
), so the import will resolve correctly. No changes are needed.🧰 Tools
🪛 Pylint (3.3.7)
[error] 6-6: Unable to import 'torchtitan.tools.logging'
(E0401)
36af963
to
3356dd5
Compare
Realized I based this branch off a fork instead of the upstream main branch (this project), force push is my rebase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/flame/utils/convert_dcp_to_hf.py (1)
10-10
: Remove unused import.
Theimport fla
is not referenced in this module and can be removed to eliminate a lint warning.🧰 Tools
🪛 Pylint (3.3.7)
[error] 10-10: Unable to import 'fla'
(E0401)
[warning] 10-10: Unused import fla
(W0611)
src/flame/utils/convert_hf_to_dcp.py (1)
7-7
: Remove unused import.
Theimport fla
is not used in this script; consider removing it to clean up the code.🧰 Tools
🪛 Pylint (3.3.7)
[error] 7-7: Unable to import 'fla'
(E0401)
[warning] 7-7: Unused import fla
(W0611)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (30)
.github/ISSUE_TEMPLATE/feature-request.yml
(1 hunks)README.md
(1 hunks)configs/delta_net_1B.json
(1 hunks)configs/delta_net_340M.json
(1 hunks)configs/gated_deltanet_1B.json
(1 hunks)configs/gated_deltanet_340M.json
(1 hunks)configs/gla_340M.json
(1 hunks)configs/gla_7B.json
(1 hunks)configs/gsa_340M.json
(1 hunks)configs/hgrn2_340M.json
(1 hunks)configs/mamba2_1B.json
(1 hunks)configs/mamba2_340M.json
(1 hunks)configs/mamba_1B.json
(1 hunks)configs/mamba_340M.json
(1 hunks)configs/samba_1B.json
(1 hunks)configs/transformer_1B.json
(1 hunks)configs/transformer_340M.json
(1 hunks)configs/transformer_7B.json
(1 hunks)pyproject.toml
(1 hunks)setup.py
(0 hunks)src/flame/data.py
(1 hunks)src/flame/models/activation_offloading.py
(0 hunks)src/flame/models/fla.toml
(1 hunks)src/flame/models/parallelize_fla.py
(1 hunks)src/flame/models/pipeline_fla.py
(1 hunks)src/flame/tools/utils.py
(1 hunks)src/flame/train.py
(2 hunks)src/flame/utils/convert_dcp_to_hf.py
(1 hunks)src/flame/utils/convert_hf_to_dcp.py
(1 hunks)src/flame/utils/preprocess.py
(1 hunks)
💤 Files with no reviewable changes (2)
- src/flame/models/activation_offloading.py
- setup.py
✅ Files skipped from review due to trivial changes (9)
- configs/delta_net_1B.json
- configs/mamba2_340M.json
- configs/samba_1B.json
- configs/mamba2_1B.json
- configs/mamba_1B.json
- configs/mamba_340M.json
- README.md
- configs/gated_deltanet_340M.json
- configs/gated_deltanet_1B.json
🚧 Files skipped from review as they are similar to previous changes (12)
- configs/transformer_1B.json
- configs/delta_net_340M.json
- configs/transformer_340M.json
- configs/transformer_7B.json
- configs/gla_340M.json
- configs/gla_7B.json
- .github/ISSUE_TEMPLATE/feature-request.yml
- configs/gsa_340M.json
- configs/hgrn2_340M.json
- src/flame/models/fla.toml
- src/flame/tools/utils.py
- pyproject.toml
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/flame/models/pipeline_fla.py (1)
src/flame/models/parallelize_fla.py (4)
parallelize_fla
(35-107)get_blocks
(496-502)get_components_name
(505-549)get_model
(488-493)
🪛 Pylint (3.3.7)
src/flame/data.py
[error] 20-20: Unable to import 'transformers'
(E0401)
src/flame/models/parallelize_fla.py
[error] 14-14: Unable to import 'fla.modules.fused_linear_cross_entropy'
(E0401)
[error] 15-15: Unable to import 'fla.modules.mlp'
(E0401)
[error] 16-16: Unable to import 'fla.modules.parallel'
(E0401)
src/flame/models/pipeline_fla.py
[error] 21-21: Unable to import 'transformers'
(E0401)
src/flame/train.py
[error] 13-13: Unable to import 'torch'
(E0401)
[error] 14-14: Unable to import 'fla.modules.fused_linear_cross_entropy'
(E0401)
[error] 15-15: Unable to import 'fla.ops.utils'
(E0401)
[error] 16-16: Unable to import 'torch.distributed.elastic.multiprocessing.errors'
(E0401)
[error] 30-30: Unable to import 'transformers'
(E0401)
src/flame/utils/convert_dcp_to_hf.py
[error] 10-10: Unable to import 'fla'
(E0401)
[error] 11-11: Unable to import 'torch'
(E0401)
[error] 12-12: Unable to import 'torch.serialization'
(E0401)
[error] 13-13: Unable to import 'torch.distributed.checkpoint.format_utils'
(E0401)
[error] 14-14: Unable to import 'torchtitan.tools.logging'
(E0401)
[error] 15-15: Unable to import 'transformers'
(E0401)
[warning] 10-10: Unused import fla
(W0611)
src/flame/utils/convert_hf_to_dcp.py
[error] 7-7: Unable to import 'fla'
(E0401)
[error] 8-8: Unable to import 'torch'
(E0401)
[error] 9-9: Unable to import 'torch.distributed.checkpoint'
(E0401)
[error] 10-10: Unable to import 'torchtitan.tools.logging'
(E0401)
[error] 11-11: Unable to import 'transformers'
(E0401)
[warning] 7-7: Unused import fla
(W0611)
src/flame/utils/preprocess.py
[error] 7-7: Unable to import 'torchtitan.tools.logging'
(E0401)
🔇 Additional comments (8)
src/flame/data.py (1)
20-20
: Approve import reordering for consistency.
Reordered thetransformers
import to the end of the block aligns with the project's import style.🧰 Tools
🪛 Pylint (3.3.7)
[error] 20-20: Unable to import 'transformers'
(E0401)
src/flame/utils/preprocess.py (1)
7-7
: Approve import reorder for logging modules first.
Movinginit_logger
andlogger
ahead oftransformers
imports improves import grouping consistency.🧰 Tools
🪛 Pylint (3.3.7)
[error] 7-7: Unable to import 'torchtitan.tools.logging'
(E0401)
src/flame/models/parallelize_fla.py (1)
14-16
: Approve grouping of related imports.
ConsolidatingLinearLossParallel
,SwiGLULinearParallel
, andPrepareModuleWeight
with other parallelism imports enhances readability.🧰 Tools
🪛 Pylint (3.3.7)
[error] 14-14: Unable to import 'fla.modules.fused_linear_cross_entropy'
(E0401)
[error] 15-15: Unable to import 'fla.modules.mlp'
(E0401)
[error] 16-16: Unable to import 'fla.modules.parallel'
(E0401)
src/flame/utils/convert_dcp_to_hf.py (1)
15-15
: Approve import order adjustment.
Reordering thetransformers
import after logging imports maintains consistency with other utilities.🧰 Tools
🪛 Pylint (3.3.7)
[error] 15-15: Unable to import 'transformers'
(E0401)
src/flame/utils/convert_hf_to_dcp.py (1)
11-11
: Approve import reordering.
Moving thetransformers
import after the logging imports aligns with project import conventions.🧰 Tools
🪛 Pylint (3.3.7)
[error] 11-11: Unable to import 'transformers'
(E0401)
src/flame/models/pipeline_fla.py (1)
21-23
: Import grouping is consistent
Thetransformers
and localflame.models.parallelize_fla
imports have been consolidated into a single block with clear separation, matching the project's import organization across other modules.🧰 Tools
🪛 Pylint (3.3.7)
[error] 21-21: Unable to import 'transformers'
(E0401)
src/flame/train.py (2)
12-17
: Core dependency imports are well-organized
Repositioningimport torch
and therecord
import maintains a logical flow from local (fla
) to core third-party (torch
) and distributed error-handling imports, consistent with adjacent files.🧰 Tools
🪛 Pylint (3.3.7)
[error] 12-12: Unable to import 'fla'
(E0401)
[error] 13-13: Unable to import 'torch'
(E0401)
[error] 14-14: Unable to import 'fla.modules.fused_linear_cross_entropy'
(E0401)
[error] 15-15: Unable to import 'fla.ops.utils'
(E0401)
[error] 16-16: Unable to import 'torch.distributed.elastic.multiprocessing.errors'
(E0401)
[error] 17-17: Unable to import 'torchtitan.components.checkpoint'
(E0401)
[warning] 12-12: Unused import fla
(W0611)
30-37
: Flame component and HF imports consolidated
Grouping alltransformers
andflame
component imports into contiguous blocks improves readability and aligns with the import structure used in companion utility and model files.🧰 Tools
🪛 Pylint (3.3.7)
[error] 30-30: Unable to import 'transformers'
(E0401)
In this pull request, we completely switch the project from legacy
setup.py
topyproject.toml
. In the process we change the project to besrc
-based, because otherwise things get messy. If switching tosrc
format is unacceptable, there are other solutions but it isn't as clean but I could do that.While working on this, I noticed you have pre-commit set up, but that there were errors running the configured pre-commit hooks, so I fixed that. If you would like this in a separate pull request I can do that.
Summary by CodeRabbit
Chores
Style