这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@mvsusp
Copy link
Owner

@mvsusp mvsusp commented Apr 30, 2018

No description provided.

@mvsusp mvsusp requested a review from winstonaws April 30, 2018 16:59
def url(self): # type: () -> str
return os.path.join('s3://', self.bucket, self.key)

def upload(self): # type: () -> UserModule
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reuse functionality from either the python SDK or from sagemaker-containers to do this part? I don't like having complex test-only logic if we can avoid it.

Copy link
Owner Author

@mvsusp mvsusp Apr 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried before, but it cannot be done without changing the sdk, which would increase too much the scope at this moment.


DEFAULT_REGION = 'us-west-2'

DEFAULT_CONFIG = dict(ContentType="application/x-numpy", TrainingInputMode="File",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the preference for dict() rather than {}?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I interchange the use of dict and {}, they have different advantages. Dict usually are shorter and easier to read. {} can work with dynamic keys. Both are valid syntax.

I will change all the " for ', since we agreed in our sintax guide

json.dump(obj, f)


def prepare(user_module, hyperparameters, channels, current_host='algo-1', hosts=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the functionality in this file seems to be around preparing the filesystem for simulating the EASE environment. Could we use the local mode logic from the python SDK for this instead? Even if it's not usable right out of the box for this, if all of the pieces of logic exist in the Python SDK, it could be straightforward to refactor it to decouple the environment preparation from the docker execution.

Copy link
Owner Author

@mvsusp mvsusp May 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is correct that the functionality is simulating SageMaker Training. The difference is that local mode does that inside a container while we do not have a container here.

My vision here would be: we refactor the python sdk and local mode to have atomic functions so we can use it either inside or outside the container. I did not try to cover that in this PR because the size of the effort, although I agree that we need to review the python sdk design.

I tried to write a clean, isolated design, that can be used later to improve the sdk side.

Copy link
Collaborator

@winstonaws winstonaws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed - it'd be ideal to consume the logic from the Python SDK instead of duplicating it in test-only code, but that would require more effort than we can spend right now.

@mvsusp mvsusp closed this May 2, 2018
mvsusp added a commit that referenced this pull request Jul 26, 2018
* add container support code (#1)
* Add missing test dependencies to setup.py (#2)
* Add entry.py as script in setup.py so it can be invoked from our images (#3)
* Add data_files to setup.py to copy nginx config correctly + fix install_requires (#4)
* Move dependencies from extras_require to install_requires
* Copy in change from internal repo: support pip install requirements.txt (#5)
* Use package_data instead of data_files to package config files, refer to (#6)
* Add missing conf files (accidentally left out of previous commit) (aws#8)
* pip install requirements in training container (aws#11)
* Use container exit code to exit Trainer process. (aws#9)
* Add gunicorn and gevent as setup.py dependencies (aws#13)
* Specify version ranges for install_requires dependencies
* Add Apache license headers (aws#14)
* Add changelog (aws#15)
* Bump version to 1.0.0 (aws#16)
* Fix year in license headers (aws#18)
* Add README description (aws#20)
* Update version to 1.0.1 (aws#21)
* Read new TRAINING_JOB_NAME environment variable during training (aws#34)
* Bump version to 1.0.2 and update Changelog (aws#37)
* Move processing of requirements file out to the specific container. (aws#39)
* Bump version to 1.0.3 and update Changelog (aws#42)
* Change module name to string in __all__ (aws#50)
* Hardcode special case for tuning job hyperparameter (aws#54)
* Use strip() for more readability (aws#55)
This change is a small improvement upon aws#54
* Bump version to 1.0.4 (aws#56)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants