这是indexloc提供的服务,不要输入任何密码
Skip to content

KeyError: '_subspace' when using results_as_dataframe and ThompsonSampling+CMAES #40

@williamjshipman

Description

@williamjshipman

Calling results_as_dataframe works fine if I use a MongoDBConnection and QuasiRandom sampler. However, changing the sampler to ThompsonSampling causes results_as_dataframe to throw an exception KeyError: '_subspace'. Here is some example code that demonstrates the problem. Uncommenting the line that uses ThompsonSamplingand commenting out the line that usesQuasiRandom` results in the error.

from chocolate import Space, ThompsonSampling, CMAES, SQLiteConnection, QuasiRandom, log, quantized_uniform

s = Space([
    {
        "algo": "svm",
        "C": log(low=-3, high=5, base=10),
        "kernel": {
            "linear": None,
            "rbf": {
                "gamma": log(low=-2, high=3, base=10)
            }
        }
    },
    {
        "algo": "knn",
        "n_neighbors": quantized_uniform(low=1, high=20, step=1)
    }])

conn = SQLiteConnection(url="sqlite:///db.db")
sampler = QuasiRandom(conn, s)
# sampler = ThompsonSampling(CMAES, conn, s)
token, params = sampler.next()
print(f'Token: {token}')
print(f'Parameters: {params}')

results = conn.results_as_dataframe()
print(results)

The output, exception and stack trace when using ThompsonSampling are:

Token: {'_chocolate_id': 0, '_arm_id': 1}
Parameters: {'C': 80716.84865011052, 'gamma': 3.7193589528638826, 'kernel': 'rbf', 'algo': 'svm'}
Traceback (most recent call last):
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "c:\Users\williams\.vscode\extensions\ms-python.python-2020.9.114305\pythonFiles\lib\python\debugpy\__main__.py", line 45, in <module>
    cli.main()
  File "c:\Users\williams\.vscode\extensions\ms-python.python-2020.9.114305\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 430, in main
    run()
  File "c:\Users\williams\.vscode\extensions\ms-python.python-2020.9.114305\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 267, in run_file
    runpy.run_path(options.target, run_name=compat.force_str("__main__"))
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "g:\Research\WS\DeepRL\experimental\test_thompsonsampling_bug.py", line 26, in <module>
    results = conn.results_as_dataframe()
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\site-packages\chocolate\base.py", line 65, in results_as_dataframe
    result = s([r[k] for k in s.names()])
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\site-packages\chocolate\base.py", line 65, in <listcomp>
    result = s([r[k] for k in s.names()])
KeyError: '_subspace'

Looking at the database that is generated, I can see that the results table is lacking a _subspace column. Note that sampling new parameters works fine, as does storing losses, but I can't extract the results.

When everything works, I expect the output to look something like the following:

Token: {'_chocolate_id': 0}
Parameters: {'n_neighbors': 7, 'algo': 'knn'}
    n_neighbors algo
id
0             7  knn

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions