Support backward compatibility #6

mvsusp · 2018-05-16T08:38:54Z

Description of changes:

add network_interface_name env parameter
append current_host to output_data_dir
revert parameters to predict_fn to match MXNet
invoking transformation functions with positional arguments instead of named arguments
unicode support
passing dtype in the encoder functions

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

jesterhazy · 2018-05-16T15:20:36Z

src/sagemaker_containers/encoders.py

+    return np.load(stream).astype(dtype)
+
+
+def unicode_to_str(string_like):  # type: (str or unicode) -> str


do we really need this? why not limit support to utf-8 only?

if we do need other charsets (and I think we don't):

why 'unicode_to_str' for something that handles non-unicode charsets?

this function is called after we check content is in 'UTF8_TYPES', so if this is needed, we have a mistake in worker.py

why just latin1?

guessing == mojibake. let's use charset from Content-Type header to get it right.

Talked with @jesterhazy offline. We will remove it.

jesterhazy · 2018-05-16T15:22:07Z

src/sagemaker_containers/env.py

        resource_config = read_resource_config()
        current_host = resource_config['current_host']
        hosts = resource_config['hosts']
+        network_interface_name = resource_config.get('network_interface_name', 'ethwe')


is it safe to have a (non-None) default for this?

This isn't actually added to resourceconfig yet, but the network interface name is needed for distributed training. As I see it, the choice is having a default here vs. having a default downstream and changing the value once this value changes.

jesterhazy · 2018-05-16T15:23:32Z

src/sagemaker_containers/env.py

                For example, `["algo-1", "algo-2", "algo-3"]` for a three-node cluster.
        Returns:
-              list[str]: all the hosts in the training network.
+              (list[str]): all the hosts in the training network.


(<type>) is not google doc style for return values (even though it is for args)

andremoeller · 2018-05-16T15:26:12Z

src/sagemaker_containers/encoders.py

    """
    stream = BytesIO(npy_array)
-    return np.load(stream)
+    return np.load(stream).astype(dtype)


I don't think we should do astype(dtype), at least not with np.float32 instead of none. The dtype is serialized with the data -- we should default to that.

I agree. Thanks guys

andremoeller · 2018-05-16T15:28:21Z

src/sagemaker_containers/encoders.py

+        try:
+            return string_like.decode('utf-8')
+        finally:
+            return string_like.decode('latin1')


This will always return string_like.decode('latin1'). I think this should be try: return utf-8, catch some exception, return latin1.

andremoeller · 2018-05-16T15:30:16Z

src/sagemaker_containers/encoders.py

    """Convert an NPY array into numpy.

    Args:
+        dtype (dtype, optional):  Data type of the resulting array.


Should the docstrings for the args be reversed to match the function signature?

Same for other serializers.

Good call. I will do it.

andremoeller · 2018-05-16T15:35:49Z

src/sagemaker_containers/encoders.py

-    data = json.loads(string)
-    return np.array(data)
+    data = json.loads(unicode_to_str(string_like))
+    return np.array(data).astype(dtype=dtype)


You can also pass in the dtype to np.array.

^^ but it works slightly differently. i.e. constructor arg won't downcast, but astype will.

So we will only allow, upcast? Is that the decision here?

Leaving as-is is fine with me

andremoeller · 2018-05-16T15:39:24Z

test/unit/test_encoder.py

-@patch('numpy.load', lambda x: 'loaded %s' % x)
+@patch('numpy.load', lambda x: np.array([42]))
 @patch('sagemaker_containers.encoders.BytesIO', lambda x: 'byte io %s' % x)
 def test_npy_to_numpy():


Could you test deserialization of other dtypes? The dtypes of the serialized numpy array and deserialized numpy array should match.

Support to requirements.txt file

…ility-support

* add container support code (#1) * Add missing test dependencies to setup.py (#2) * Add entry.py as script in setup.py so it can be invoked from our images (#3) * Add data_files to setup.py to copy nginx config correctly + fix install_requires (#4) * Move dependencies from extras_require to install_requires * Copy in change from internal repo: support pip install requirements.txt (#5) * Use package_data instead of data_files to package config files, refer to (#6) * Add missing conf files (accidentally left out of previous commit) (aws#8) * pip install requirements in training container (aws#11) * Use container exit code to exit Trainer process. (aws#9) * Add gunicorn and gevent as setup.py dependencies (aws#13) * Specify version ranges for install_requires dependencies * Add Apache license headers (aws#14) * Add changelog (aws#15) * Bump version to 1.0.0 (aws#16) * Fix year in license headers (aws#18) * Add README description (aws#20) * Update version to 1.0.1 (aws#21) * Read new TRAINING_JOB_NAME environment variable during training (aws#34) * Bump version to 1.0.2 and update Changelog (aws#37) * Move processing of requirements file out to the specific container. (aws#39) * Bump version to 1.0.3 and update Changelog (aws#42) * Change module name to string in __all__ (aws#50) * Hardcode special case for tuning job hyperparameter (aws#54) * Use strip() for more readability (aws#55) This change is a small improvement upon aws#54 * Bump version to 1.0.4 (aws#56)

Refactor to support backward compatibility

724dda9

mvsusp requested a review from andremoeller May 16, 2018 08:39

mvsusp added 2 commits May 16, 2018 03:21

Handle unicode

17b76d8

Add dtype to encoders

33853cb

jesterhazy requested changes May 16, 2018

View reviewed changes

andremoeller requested changes May 16, 2018

View reviewed changes

mvsusp and others added 4 commits May 16, 2018 11:41

Support to requirements.txt file (aws#60)

258f9f7

Support to requirements.txt file

Merge remote-tracking branch 'origin/r2.0' into mvs-backward-compatib…

7d9e631

…ility-support

Handle PR observations

07be4aa

Additional PR comments

555310b

mvsusp closed this May 22, 2018

		return np.load(stream).astype(dtype)


		def unicode_to_str(string_like): # type: (str or unicode) -> str

Support backward compatibility #6

Support backward compatibility #6

Uh oh!

Conversation

mvsusp commented May 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of changes:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mvsusp commented May 16, 2018 •

edited

Loading