-
Notifications
You must be signed in to change notification settings - Fork 0
Simplifying tests #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| def url(self): # type: () -> str | ||
| return os.path.join('s3://', self.bucket, self.key) | ||
|
|
||
| def upload(self): # type: () -> UserModule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reuse functionality from either the python SDK or from sagemaker-containers to do this part? I don't like having complex test-only logic if we can avoid it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried before, but it cannot be done without changing the sdk, which would increase too much the scope at this moment.
|
|
||
| DEFAULT_REGION = 'us-west-2' | ||
|
|
||
| DEFAULT_CONFIG = dict(ContentType="application/x-numpy", TrainingInputMode="File", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the preference for dict() rather than {}?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I interchange the use of dict and {}, they have different advantages. Dict usually are shorter and easier to read. {} can work with dynamic keys. Both are valid syntax.
I will change all the " for ', since we agreed in our sintax guide
| json.dump(obj, f) | ||
|
|
||
|
|
||
| def prepare(user_module, hyperparameters, channels, current_host='algo-1', hosts=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the functionality in this file seems to be around preparing the filesystem for simulating the EASE environment. Could we use the local mode logic from the python SDK for this instead? Even if it's not usable right out of the box for this, if all of the pieces of logic exist in the Python SDK, it could be straightforward to refactor it to decouple the environment preparation from the docker execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is correct that the functionality is simulating SageMaker Training. The difference is that local mode does that inside a container while we do not have a container here.
My vision here would be: we refactor the python sdk and local mode to have atomic functions so we can use it either inside or outside the container. I did not try to cover that in this PR because the size of the effort, although I agree that we need to review the python sdk design.
I tried to write a clean, isolated design, that can be used later to improve the sdk side.
winstonaws
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed - it'd be ideal to consume the logic from the Python SDK instead of duplicating it in test-only code, but that would require more effort than we can spend right now.
* add container support code (#1) * Add missing test dependencies to setup.py (#2) * Add entry.py as script in setup.py so it can be invoked from our images (#3) * Add data_files to setup.py to copy nginx config correctly + fix install_requires (#4) * Move dependencies from extras_require to install_requires * Copy in change from internal repo: support pip install requirements.txt (#5) * Use package_data instead of data_files to package config files, refer to (#6) * Add missing conf files (accidentally left out of previous commit) (aws#8) * pip install requirements in training container (aws#11) * Use container exit code to exit Trainer process. (aws#9) * Add gunicorn and gevent as setup.py dependencies (aws#13) * Specify version ranges for install_requires dependencies * Add Apache license headers (aws#14) * Add changelog (aws#15) * Bump version to 1.0.0 (aws#16) * Fix year in license headers (aws#18) * Add README description (aws#20) * Update version to 1.0.1 (aws#21) * Read new TRAINING_JOB_NAME environment variable during training (aws#34) * Bump version to 1.0.2 and update Changelog (aws#37) * Move processing of requirements file out to the specific container. (aws#39) * Bump version to 1.0.3 and update Changelog (aws#42) * Change module name to string in __all__ (aws#50) * Hardcode special case for tuning job hyperparameter (aws#54) * Use strip() for more readability (aws#55) This change is a small improvement upon aws#54 * Bump version to 1.0.4 (aws#56)
No description provided.