diff --git a/README.md b/README.md index fe954ba..225c0bc 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,173 @@ -# floc-simhash -A fast python implementation of the SimHash algorithm. +# FLoC SimHash + +This Python package provides hashing algorithms for computing cohort ids of users based on their browsing history. +As such, it may be used to compute cohort ids of users following Google's **Federated Learning of Cohorts** (FLoC) proposal. + +The FLoC proposal is an important part of [The Privacy Sandbox](https://www.chromium.org/Home/chromium-privacy/privacy-sandbox), which is Google's replacement for third-party cookies. +FLoC will enable interest-based advertising, thus preserving an important source of monetization for today's web. + +The main idea, as outlined in the [FLoC whitepaper](https://raw.githubusercontent.com/google/ads-privacy/master/proposals/FLoC/FLOC-Whitepaper-Google.pdf), is to replace user cookie ids, which enable user-targeting across multiple sites, by _cohort ids_. +A cohort would consist of a set of users sharing similar browsing behaviour. +By targeting a given cohort, advertisers can ensure that relevant ads are shown while user privacy is preserved by a _hiding in the pack_ mechanism. + +The FLoC whitepaper mentions several mechanisms to map users to cohorts, with varying amounts of centralized information. +The algorithms [currently](https://blog.google/products/ads-commerce/2021-01-privacy-sandbox/) being implemented in Google Chrome as a POC are methods based on **SimHash**, which is a type of locality-sensitive hashing initially introduced for detecting near-duplicate documents. + +## Contents + + + +- [FLoC SimHash](#floc-simhash) + - [Contents](#contents) + - [Installation](#installation) + - [Usage](#usage) + - [Individual document-based SimHash](#individual-document-based-simhash) + - [Providing your own tokenizer](#providing-your-own-tokenizer) + - [Using the SimHashTransformer in scikit-learn pipelines](#using-the-simhashtransformer-in-scikit-learn-pipelines) + - [Caveats](#caveats) + - [Development](#development) + + + +## Installation + +The `floc-simhash` package is available at PyPI. Install using `pip` as follows. + +```bash +pip install floc-simhash +``` + +The package requires `python>=3.7` and will install `scikit-learn` as a dependency. + +## Usage + +The package provides two main classes. + +- `SimHash`, applying the SimHash algorithm on the md5 hashes of tokens in the given document. + +- `SimHashTransformer`, applying the SimHash algorithm to a document vectorization as part of a scikit-learn [pipeline](https://scikit-learn.org/stable/modules/compose.html#pipeline) + +Finally, there is a third class available: + +- `SortingSimHash`, which performs the SortingLSH algorithm by first applying `SimHash` and then clipping the resulting hashes to a given precision. + +### Individual document-based SimHash + +The `SimHash` class provides a way to calculate the SimHash of any given document, without using any information coming from other documents. + +In this case, the document hash is computed by looking at md5 hashes of individual tokens. +We use: + +- The implementation of the md5 hashing algorithm available in the `hashlib` module in the [Python standard library](https://docs.python.org/3/library/hashlib.html). + +- Bitwise arithmetic for fast computations of the document hash from the individual hashed tokens. + +The program below, for example, will print the following hexadecimal string: `cf48b038108e698418650807001800c5`. + +```python +from floc_simhash import SimHash + +document = "Lorem ipsum dolor sit amet consectetur adipiscing elit" +hashed_document = SimHash(n_bits=128).hash(document) + +print(hashed_document) +``` + +An example more related to computing cohort ids: +the following program computes the cohort id of a user by applying SimHash to the document formed by the pipe-separated list of domains in the user browsing history. + +```python +from floc_simhash import SimHash + +document = "google.com|hybridtheory.com|youtube.com|reddit.com" +hasher = SimHash(n_bits=128, tokenizer=lambda x: x.split("|")) +hashed_document = hasher.hash(document) + +print(hashed_document) +``` + +The code above will print the hexadecimal string: `14dd1064800880b40025764cd0014715`. + +#### Providing your own tokenizer + +The `SimHash` constructor will split the given document according to white space by default. +However, it is possible to pass any callable that parses a string into a list of strings in the `tokenizer` parameter. +We have provided an example above where we pass `tokenizer=lambda x: x.split("|")`. + +A good example of a more complex tokenization could be passing the [word tokenizer](https://www.nltk.org/api/nltk.tokenize.html#nltk.tokenize.word_tokenize) in NLTK. +This would be a nice choice if we wished to compute hashes of text documents. + +### Using the SimHashTransformer in scikit-learn pipelines + +The approach to SimHash outlined in the [FLoC Whitepaper](https://raw.githubusercontent.com/google/ads-privacy/master/proposals/FLoC/FLOC-Whitepaper-Google.pdf) consists of choosing random unit vectors and working on already vectorized data. + +The choice of a random unit vector is equivalent to choosing a random hyperplane in feature space. +Choosing `p` random hyperplanes partitions the feature space into `2^p` regions. +Then, a `p`-bit SimHash of a vector encodes the region to which it belongs. + +It is reasonable to expect _similar_ documents to have the same hash, provided the vectorization respects the given notion of similarity. + +Two vectorizations are discussed in the aforementioned whitepaper: **one-hot** and **tf-idf**; they are available in [scikit-learn](https://scikit-learn.org/stable/modules/feature_extraction.html#text-feature-extraction). + +The `SimHashTransformer` supplies a transformer (implementing the `fit` and `transform` methods) that can be used directly on the output of any of these two vectorizers in order to obtain hashes. + +For example, given a 1d-array `X` containing strings, each of them corresponding to a concatenation of the domains visited by a given user and separated by `"|"`, the following code will store in `y` the cohort id of each user, using one-hot encoding and a 32-bit SimHash. + +```python +from sklearn.feature_extraction.text import CountVectorizer +from sklearn.pipeline import Pipeline + +from floc_simhash import SimHashTransformer + + +X = [ + "google.com|hybridtheory.com|youtube.com|reddit.com", + "google.com|youtube.com|reddit.com", + "github.com", + "google.com|github.com", +] + +one_hot_simhash = Pipeline( + [ + ("vect", CountVectorizer(tokenizer=lambda x: x.split("|"), binary=True)), + ("simhash", SimHashTransformer(n_bits=32)), + ] +) + +y = one_hot_simhash.fit_transform(X) +``` + +After running this code, the value of `y` would look similar to the following (expect same lengths; actual hash values depend on the choice of random vectors during `fit`): + +```python +['0xd98c7e93' '0xd10b79b3' '0x1085154d' '0x59cd150d'] +``` + +#### Caveats + +- The implementation works on the sparse matrices output by `CountVectorizer` and `TfidfTransformer`, in order to manage memory efficiently. + +- At the moment, the choice of precision in the numpy arrays results in overflow errors for `p >= 64`. While we are waiting for implementation details of the FLoC POCs, the first indications hint at choices around `p = 50`. + +## Development + +This project uses [poetry](https://python-poetry.org/) for managing dependencies. + +In order to clone the repository and run the unit tests, execute the following steps on an environment with `python>=3.7`. + +```bash +git clone https://github.com/hybridtheory/floc-simhash.git +cd floc-simhash +poetry install +pytest +``` + +The unit tests are property-based, using the [hypothesis](https://hypothesis.readthedocs.io/en/latest/) library. +This allows for algorithm veritication against hundreds or thousands of random generated inputs. + +Since running many examples may lengthen the test suite runtime, we also use `pytest-xdist` in order to parallelize the tests. +For example, the following call will run up to 1000 examples for each test with parallelism 4. + +```bash +pytest -n 4 --hypothesis-profile=ci +``` diff --git a/floc_simhash/__init__.py b/floc_simhash/__init__.py new file mode 100644 index 0000000..30a54ad --- /dev/null +++ b/floc_simhash/__init__.py @@ -0,0 +1,3 @@ +from .text.simhash import SimHash +from .text.sorting import SortingSimHash +from .transformer.geometric import SimHashTransformer diff --git a/floc_simhash/text/__init__.py b/floc_simhash/text/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/floc_simhash/text/simhash.py b/floc_simhash/text/simhash.py new file mode 100644 index 0000000..5e54a54 --- /dev/null +++ b/floc_simhash/text/simhash.py @@ -0,0 +1,62 @@ +import hashlib +from typing import List, Callable + + +class SimHash: + def __init__( + self, + n_bits: int, + tokenizer: Callable[[str], List[str]] = lambda x: x.split(" "), + ): + """Return a class that performs SimHash. + + The algorithm will compute hashes of `n_bits` bits, splitting documents using `tokenizer`. + + :param n_bits: Number of bits of the returned hashes. This is currently limited to a maximum + of 128 bits. + :type n_bits: int + :param tokenizer: rule to split a given document into a list of tokens, defaults to `lambda + x: x.split(" ")`. + :type tokenizer: Callable[[str], List[str]], optional + :raises ValueError: if `n_bits` is greater than 128. + """ + if n_bits > 128: + raise ValueError("Only hashes up to 128 bits are currently supported") + + self.n_bits = n_bits + self.tokenizer = tokenizer + + def _bitwise_compare(self, hashes: List[int], result: int, bit: int) -> int: + + if bit >= self.n_bits: + return result + + hash_bits = [h & 1 for h in hashes] + next_bit = int(sum(hash_bits) / len(hashes) > 0.5) + + return self._bitwise_compare( + [h >> 1 for h in hashes], result + next_bit * 2 ** bit, bit + 1 + ) + + def inthash(self, document: str) -> int: + """Compute the hash of `document` and return it as an integer. + + :type document: str + :rtype: int + """ + tokens: List[bytes] = [w.encode() for w in self.tokenizer(document)] + md5_hashes: List = [hashlib.md5(token) for token in tokens] + token_clipped_hashes: List[int] = [ + int(h.hexdigest(), 16) >> (h.digest_size * 8 - self.n_bits) + for h in md5_hashes + ] + + return self._bitwise_compare(token_clipped_hashes, 0, 0) + + def hash(self, document: str) -> str: + """Compute the hash of `document` as a hexadecimal string. + + :type document: str + :rtype: str + """ + return hex(self.inthash(document))[2:] diff --git a/floc_simhash/text/sorting.py b/floc_simhash/text/sorting.py new file mode 100644 index 0000000..273a2e2 --- /dev/null +++ b/floc_simhash/text/sorting.py @@ -0,0 +1,38 @@ +from typing import Callable, List + +from .simhash import SimHash + + +class SortingSimHash(SimHash): + def __init__( + self, + bits_to_keep: int, + simhash_bits: int, + tokenizer: Callable[[str], List[str]] = lambda x: x.split(" "), + ): + """Return a class that performs SortingLSH. + + First, a SimHash with `simhash_bits` is computed. Then, the first `bits_to_keep` bits are + kept. + + :type bits_to_keep: int + :type simhash_bits: int + :param tokenizer: method to split a given document into a list of tokens, defaults to + `lambda x: x.split(" ")` + :type tokenizer: Callable[[str], List[str]], optional + :raises ValueError: if `bits_to_keep > simhash_bits`. + """ + if bits_to_keep > simhash_bits: + raise ValueError("Cannot keep more bits than those given by Simhash") + + self.bits_to_keep = bits_to_keep + super().__init__(simhash_bits, tokenizer) + + def inthash(self, document: str) -> int: + """Compute the hash of `document` and return it as an integer. + + :type document: str + :rtype: int + """ + simhash = super().inthash(document) + return simhash >> (self.n_bits - self.bits_to_keep) diff --git a/floc_simhash/transformer/geometric.py b/floc_simhash/transformer/geometric.py new file mode 100644 index 0000000..8e13afb --- /dev/null +++ b/floc_simhash/transformer/geometric.py @@ -0,0 +1,61 @@ +from typing import Optional, Union, Any + +import numpy as np +import scipy +from sklearn.base import TransformerMixin + + +class SimHashTransformer(TransformerMixin): + def __init__(self, n_bits: int): + """Initialize a transformer capable of computing the SimHash algorithm on vectorized data. + + :param n_bits: Precision to hash vectors to. + :type n_bits: int + """ + self.n_bits = n_bits + self._vectors: Optional[np.array] = None + + def fit( + self, X: Union[np.array, scipy.sparse.csr_matrix], y: Any = None + ) -> "SimHashTransformer": + """Fit the transformer to the given data. + + This amounts to choosing random unit vectors of dimension `n`, where `X` is of dimension + `(number_of_documents, n)`. + + :param X: vectorized data to be hashed. Can either be a numpy array or a sparse matrix. + :type X: Union[np.array, scipy.sparse.csr_matrix] + :param y: not used, left for matching the scikit-learn transformer pattern. Defaults to + None. + :type y: Any, optional + :return: the fitted instance itself. + :rtype: "SimHashTransformer" + """ + random_vectors = np.random.randn(X.shape[1], self.n_bits) + self._vectors = random_vectors / np.linalg.norm( + random_vectors, axis=0, keepdims=True + ) + return self + + def transform( + self, X: Union[np.array, scipy.sparse.csr_matrix], y: Any = None + ) -> np.array: + """Compute the SimHashes of the vectors in X as an array of hexadecimal strings. + + :param X: 2D array, where each row contains a vector to be hashed. + :type X: Union[np.array, scipy.sparse.csr_matrix] + :param y: not used, left for matching the scikit-learn transformer pattern. Defaults to + None. + :type y: Any, optional + :return: an array of hexadecimal strings + :rtype: np.array + :raises ValueError: if the fit method has not been previously called. + """ + if self._vectors is None: + raise ValueError("The fit method has not been called") + mul = X.dot(self._vectors) + bits = np.where(mul > 0, 1, 0) + powers = 2 ** np.arange(self.n_bits) + int_values = np.dot(bits, powers) + hex_converter = np.vectorize(lambda x: str(hex(x))) + return hex_converter(int_values) diff --git a/lefthook.yml b/lefthook.yml new file mode 100644 index 0000000..f449726 --- /dev/null +++ b/lefthook.yml @@ -0,0 +1,10 @@ +pre-commit: + commands: + black: + tags: python + glob: "*.py" + run: black --check {staged_files} + pylint: + tags: python + glob: "*.py" + run: pylint-fail-under --fail_under 8.0 {staged_files} diff --git a/poetry.lock b/poetry.lock new file mode 100644 index 0000000..fcbac20 --- /dev/null +++ b/poetry.lock @@ -0,0 +1,822 @@ +[[package]] +name = "apipkg" +version = "1.5" +description = "apipkg: namespace control and lazy-import mechanism" +category = "dev" +optional = false +python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" + +[[package]] +name = "appdirs" +version = "1.4.4" +description = "A small Python module for determining appropriate platform-specific dirs, e.g. a \"user data dir\"." +category = "dev" +optional = false +python-versions = "*" + +[[package]] +name = "astroid" +version = "2.5.1" +description = "An abstract syntax tree for Python with inference support." +category = "dev" +optional = false +python-versions = ">=3.6" + +[package.dependencies] +lazy-object-proxy = ">=1.4.0" +typed-ast = {version = ">=1.4.0,<1.5", markers = "implementation_name == \"cpython\" and python_version < \"3.8\""} +wrapt = ">=1.11,<1.13" + +[[package]] +name = "atomicwrites" +version = "1.4.0" +description = "Atomic file writes." +category = "dev" +optional = false +python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" + +[[package]] +name = "attrs" +version = "20.3.0" +description = "Classes Without Boilerplate" +category = "dev" +optional = false +python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" + +[package.extras] +dev = ["coverage[toml] (>=5.0.2)", "hypothesis", "pympler", "pytest (>=4.3.0)", "six", "zope.interface", "furo", "sphinx", "pre-commit"] +docs = ["furo", "sphinx", "zope.interface"] +tests = ["coverage[toml] (>=5.0.2)", "hypothesis", "pympler", "pytest (>=4.3.0)", "six", "zope.interface"] +tests_no_zope = ["coverage[toml] (>=5.0.2)", "hypothesis", "pympler", "pytest (>=4.3.0)", "six"] + +[[package]] +name = "black" +version = "20.8b1" +description = "The uncompromising code formatter." +category = "dev" +optional = false +python-versions = ">=3.6" + +[package.dependencies] +appdirs = "*" +click = ">=7.1.2" +mypy-extensions = ">=0.4.3" +pathspec = ">=0.6,<1" +regex = ">=2020.1.8" +toml = ">=0.10.1" +typed-ast = ">=1.4.0" +typing-extensions = ">=3.7.4" + +[package.extras] +colorama = ["colorama (>=0.4.3)"] +d = ["aiohttp (>=3.3.2)", "aiohttp-cors"] + +[[package]] +name = "click" +version = "7.1.2" +description = "Composable command line interface toolkit" +category = "dev" +optional = false +python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" + +[[package]] +name = "colorama" +version = "0.4.4" +description = "Cross-platform colored terminal text." +category = "dev" +optional = false +python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" + +[[package]] +name = "coverage" +version = "5.5" +description = "Code coverage measurement for Python" +category = "dev" +optional = false +python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, <4" + +[package.extras] +toml = ["toml"] + +[[package]] +name = "execnet" +version = "1.8.0" +description = "execnet: rapid multi-Python deployment" +category = "dev" +optional = false +python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" + +[package.dependencies] +apipkg = ">=1.4" + +[package.extras] +testing = ["pre-commit"] + +[[package]] +name = "hypothesis" +version = "6.3.4" +description = "A library for property-based testing" +category = "dev" +optional = false +python-versions = ">=3.6" + +[package.dependencies] +attrs = ">=19.2.0" +sortedcontainers = ">=2.1.0,<3.0.0" + +[package.extras] +all = ["black (>=19.10b0)", "click (>=7.0)", "django (>=2.2)", "dpcontracts (>=0.4)", "lark-parser (>=0.6.5)", "libcst (>=0.3.16)", "numpy (>=1.9.0)", "pandas (>=0.25)", "pytest (>=4.6)", "python-dateutil (>=1.4)", "pytz (>=2014.1)", "redis (>=3.0.0)", "importlib-resources (>=3.3.0)", "importlib-metadata", "backports.zoneinfo (>=0.2.1)", "tzdata (>=2020.4)"] +cli = ["click (>=7.0)", "black (>=19.10b0)"] +codemods = ["libcst (>=0.3.16)"] +dateutil = ["python-dateutil (>=1.4)"] +django = ["pytz (>=2014.1)", "django (>=2.2)"] +dpcontracts = ["dpcontracts (>=0.4)"] +ghostwriter = ["black (>=19.10b0)"] +lark = ["lark-parser (>=0.6.5)"] +numpy = ["numpy (>=1.9.0)"] +pandas = ["pandas (>=0.25)"] +pytest = ["pytest (>=4.6)"] +pytz = ["pytz (>=2014.1)"] +redis = ["redis (>=3.0.0)"] +zoneinfo = ["importlib-resources (>=3.3.0)", "backports.zoneinfo (>=0.2.1)", "tzdata (>=2020.4)"] + +[[package]] +name = "importlib-metadata" +version = "3.7.0" +description = "Read metadata from Python packages" +category = "dev" +optional = false +python-versions = ">=3.6" + +[package.dependencies] +typing-extensions = {version = ">=3.6.4", markers = "python_version < \"3.8\""} +zipp = ">=0.5" + +[package.extras] +docs = ["sphinx", "jaraco.packaging (>=8.2)", "rst.linker (>=1.9)"] +testing = ["pytest (>=3.5,!=3.7.3)", "pytest-checkdocs (>=1.2.3)", "pytest-flake8", "pytest-cov", "pytest-enabler", "packaging", "pep517", "pyfakefs", "flufl.flake8", "pytest-black (>=0.3.7)", "pytest-mypy", "importlib-resources (>=1.3)"] + +[[package]] +name = "iniconfig" +version = "1.1.1" +description = "iniconfig: brain-dead simple config-ini parsing" +category = "dev" +optional = false +python-versions = "*" + +[[package]] +name = "isort" +version = "5.7.0" +description = "A Python utility / library to sort Python imports." +category = "dev" +optional = false +python-versions = ">=3.6,<4.0" + +[package.extras] +pipfile_deprecated_finder = ["pipreqs", "requirementslib"] +requirements_deprecated_finder = ["pipreqs", "pip-api"] +colors = ["colorama (>=0.4.3,<0.5.0)"] + +[[package]] +name = "joblib" +version = "1.0.1" +description = "Lightweight pipelining with Python functions" +category = "main" +optional = false +python-versions = ">=3.6" + +[[package]] +name = "lazy-object-proxy" +version = "1.5.2" +description = "A fast and thorough lazy object proxy." +category = "dev" +optional = false +python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" + +[[package]] +name = "mccabe" +version = "0.6.1" +description = "McCabe checker, plugin for flake8" +category = "dev" +optional = false +python-versions = "*" + +[[package]] +name = "mypy-extensions" +version = "0.4.3" +description = "Experimental type system extensions for programs checked with the mypy typechecker." +category = "dev" +optional = false +python-versions = "*" + +[[package]] +name = "numpy" +version = "1.20.1" +description = "NumPy is the fundamental package for array computing with Python." +category = "main" +optional = false +python-versions = ">=3.7" + +[[package]] +name = "packaging" +version = "20.9" +description = "Core utilities for Python packages" +category = "dev" +optional = false +python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" + +[package.dependencies] +pyparsing = ">=2.0.2" + +[[package]] +name = "pathspec" +version = "0.8.1" +description = "Utility library for gitignore style pattern matching of file paths." +category = "dev" +optional = false +python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" + +[[package]] +name = "pluggy" +version = "0.13.1" +description = "plugin and hook calling mechanisms for python" +category = "dev" +optional = false +python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" + +[package.dependencies] +importlib-metadata = {version = ">=0.12", markers = "python_version < \"3.8\""} + +[package.extras] +dev = ["pre-commit", "tox"] + +[[package]] +name = "py" +version = "1.10.0" +description = "library with cross-python path, ini-parsing, io, code, log facilities" +category = "dev" +optional = false +python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" + +[[package]] +name = "pylint" +version = "2.7.2" +description = "python code static checker" +category = "dev" +optional = false +python-versions = "~=3.6" + +[package.dependencies] +astroid = ">=2.5.1,<2.6" +colorama = {version = "*", markers = "sys_platform == \"win32\""} +isort = ">=4.2.5,<6" +mccabe = ">=0.6,<0.7" +toml = ">=0.7.1" + +[package.extras] +docs = ["sphinx (==3.5.1)", "python-docs-theme (==2020.12)"] + +[[package]] +name = "pylint-fail-under" +version = "0.3.0" +description = "Pylint wrapper that verifies code reaches a minimum quality score." +category = "dev" +optional = false +python-versions = "*" + +[package.dependencies] +pylint = "*" + +[[package]] +name = "pyparsing" +version = "2.4.7" +description = "Python parsing module" +category = "dev" +optional = false +python-versions = ">=2.6, !=3.0.*, !=3.1.*, !=3.2.*" + +[[package]] +name = "pytest" +version = "6.2.2" +description = "pytest: simple powerful testing with Python" +category = "dev" +optional = false +python-versions = ">=3.6" + +[package.dependencies] +atomicwrites = {version = ">=1.0", markers = "sys_platform == \"win32\""} +attrs = ">=19.2.0" +colorama = {version = "*", markers = "sys_platform == \"win32\""} +importlib-metadata = {version = ">=0.12", markers = "python_version < \"3.8\""} +iniconfig = "*" +packaging = "*" +pluggy = ">=0.12,<1.0.0a1" +py = ">=1.8.2" +toml = "*" + +[package.extras] +testing = ["argcomplete", "hypothesis (>=3.56)", "mock", "nose", "requests", "xmlschema"] + +[[package]] +name = "pytest-cov" +version = "2.11.1" +description = "Pytest plugin for measuring coverage." +category = "dev" +optional = false +python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" + +[package.dependencies] +coverage = ">=5.2.1" +pytest = ">=4.6" + +[package.extras] +testing = ["fields", "hunter", "process-tests (==2.0.2)", "six", "pytest-xdist", "virtualenv"] + +[[package]] +name = "pytest-forked" +version = "1.3.0" +description = "run tests in isolated forked subprocesses" +category = "dev" +optional = false +python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" + +[package.dependencies] +py = "*" +pytest = ">=3.10" + +[[package]] +name = "pytest-xdist" +version = "2.2.1" +description = "pytest xdist plugin for distributed testing and loop-on-failing modes" +category = "dev" +optional = false +python-versions = ">=3.5" + +[package.dependencies] +execnet = ">=1.1" +pytest = ">=6.0.0" +pytest-forked = "*" + +[package.extras] +psutil = ["psutil (>=3.0)"] +testing = ["filelock"] + +[[package]] +name = "regex" +version = "2020.11.13" +description = "Alternative regular expression module, to replace re." +category = "dev" +optional = false +python-versions = "*" + +[[package]] +name = "scikit-learn" +version = "0.24.1" +description = "A set of python modules for machine learning and data mining" +category = "main" +optional = false +python-versions = ">=3.6" + +[package.dependencies] +joblib = ">=0.11" +numpy = ">=1.13.3" +scipy = ">=0.19.1" +threadpoolctl = ">=2.0.0" + +[package.extras] +benchmark = ["matplotlib (>=2.1.1)", "pandas (>=0.25.0)", "memory-profiler (>=0.57.0)"] +docs = ["matplotlib (>=2.1.1)", "scikit-image (>=0.13)", "pandas (>=0.25.0)", "seaborn (>=0.9.0)", "memory-profiler (>=0.57.0)", "sphinx (>=3.2.0)", "sphinx-gallery (>=0.7.0)", "numpydoc (>=1.0.0)", "Pillow (>=7.1.2)", "sphinx-prompt (>=1.3.0)"] +examples = ["matplotlib (>=2.1.1)", "scikit-image (>=0.13)", "pandas (>=0.25.0)", "seaborn (>=0.9.0)"] +tests = ["matplotlib (>=2.1.1)", "scikit-image (>=0.13)", "pandas (>=0.25.0)", "pytest (>=5.0.1)", "pytest-cov (>=2.9.0)", "flake8 (>=3.8.2)", "mypy (>=0.770)", "pyamg (>=4.0.0)"] + +[[package]] +name = "scipy" +version = "1.6.1" +description = "SciPy: Scientific Library for Python" +category = "main" +optional = false +python-versions = ">=3.7" + +[package.dependencies] +numpy = ">=1.16.5" + +[[package]] +name = "sortedcontainers" +version = "2.3.0" +description = "Sorted Containers -- Sorted List, Sorted Dict, Sorted Set" +category = "dev" +optional = false +python-versions = "*" + +[[package]] +name = "threadpoolctl" +version = "2.1.0" +description = "threadpoolctl" +category = "main" +optional = false +python-versions = ">=3.5" + +[[package]] +name = "toml" +version = "0.10.2" +description = "Python Library for Tom's Obvious, Minimal Language" +category = "dev" +optional = false +python-versions = ">=2.6, !=3.0.*, !=3.1.*, !=3.2.*" + +[[package]] +name = "typed-ast" +version = "1.4.2" +description = "a fork of Python 2 and 3 ast modules with type comment support" +category = "dev" +optional = false +python-versions = "*" + +[[package]] +name = "typing-extensions" +version = "3.7.4.3" +description = "Backported and Experimental Type Hints for Python 3.5+" +category = "dev" +optional = false +python-versions = "*" + +[[package]] +name = "wrapt" +version = "1.12.1" +description = "Module for decorators, wrappers and monkey patching." +category = "dev" +optional = false +python-versions = "*" + +[[package]] +name = "zipp" +version = "3.4.0" +description = "Backport of pathlib-compatible object wrapper for zip files" +category = "dev" +optional = false +python-versions = ">=3.6" + +[package.extras] +docs = ["sphinx", "jaraco.packaging (>=3.2)", "rst.linker (>=1.9)"] +testing = ["pytest (>=3.5,!=3.7.3)", "pytest-checkdocs (>=1.2.3)", "pytest-flake8", "pytest-cov", "jaraco.test (>=3.2.0)", "jaraco.itertools", "func-timeout", "pytest-black (>=0.3.7)", "pytest-mypy"] + +[metadata] +lock-version = "1.1" +python-versions = ">=3.7,<4" +content-hash = "632a5329956549ceed556c001b7b632f2de39cb372424b682506e441170addc9" + +[metadata.files] +apipkg = [ + {file = "apipkg-1.5-py2.py3-none-any.whl", hash = "sha256:58587dd4dc3daefad0487f6d9ae32b4542b185e1c36db6993290e7c41ca2b47c"}, + {file = "apipkg-1.5.tar.gz", hash = "sha256:37228cda29411948b422fae072f57e31d3396d2ee1c9783775980ee9c9990af6"}, +] +appdirs = [ + {file = "appdirs-1.4.4-py2.py3-none-any.whl", hash = "sha256:a841dacd6b99318a741b166adb07e19ee71a274450e68237b4650ca1055ab128"}, + {file = "appdirs-1.4.4.tar.gz", hash = "sha256:7d5d0167b2b1ba821647616af46a749d1c653740dd0d2415100fe26e27afdf41"}, +] +astroid = [ + {file = "astroid-2.5.1-py3-none-any.whl", hash = "sha256:21d735aab248253531bb0f1e1e6d068f0ee23533e18ae8a6171ff892b98297cf"}, + {file = "astroid-2.5.1.tar.gz", hash = "sha256:cfc35498ee64017be059ceffab0a25bedf7548ab76f2bea691c5565896e7128d"}, +] +atomicwrites = [ + {file = "atomicwrites-1.4.0-py2.py3-none-any.whl", hash = "sha256:6d1784dea7c0c8d4a5172b6c620f40b6e4cbfdf96d783691f2e1302a7b88e197"}, + {file = "atomicwrites-1.4.0.tar.gz", hash = "sha256:ae70396ad1a434f9c7046fd2dd196fc04b12f9e91ffb859164193be8b6168a7a"}, +] +attrs = [ + {file = "attrs-20.3.0-py2.py3-none-any.whl", hash = "sha256:31b2eced602aa8423c2aea9c76a724617ed67cf9513173fd3a4f03e3a929c7e6"}, + {file = "attrs-20.3.0.tar.gz", hash = "sha256:832aa3cde19744e49938b91fea06d69ecb9e649c93ba974535d08ad92164f700"}, +] +black = [ + {file = "black-20.8b1.tar.gz", hash = "sha256:1c02557aa099101b9d21496f8a914e9ed2222ef70336404eeeac8edba836fbea"}, +] +click = [ + {file = "click-7.1.2-py2.py3-none-any.whl", hash = "sha256:dacca89f4bfadd5de3d7489b7c8a566eee0d3676333fbb50030263894c38c0dc"}, + {file = "click-7.1.2.tar.gz", hash = "sha256:d2b5255c7c6349bc1bd1e59e08cd12acbbd63ce649f2588755783aa94dfb6b1a"}, +] +colorama = [ + {file = "colorama-0.4.4-py2.py3-none-any.whl", hash = "sha256:9f47eda37229f68eee03b24b9748937c7dc3868f906e8ba69fbcbdd3bc5dc3e2"}, +] +coverage = [ + {file = "coverage-5.5-cp27-cp27m-macosx_10_9_x86_64.whl", hash = "sha256:b6d534e4b2ab35c9f93f46229363e17f63c53ad01330df9f2d6bd1187e5eaacf"}, + {file = "coverage-5.5-cp27-cp27m-manylinux1_i686.whl", hash = "sha256:b7895207b4c843c76a25ab8c1e866261bcfe27bfaa20c192de5190121770672b"}, + {file = "coverage-5.5-cp27-cp27m-manylinux1_x86_64.whl", hash = "sha256:c2723d347ab06e7ddad1a58b2a821218239249a9e4365eaff6649d31180c1669"}, + {file = "coverage-5.5-cp27-cp27m-manylinux2010_i686.whl", hash = "sha256:900fbf7759501bc7807fd6638c947d7a831fc9fdf742dc10f02956ff7220fa90"}, + {file = "coverage-5.5-cp27-cp27m-manylinux2010_x86_64.whl", hash = "sha256:004d1880bed2d97151facef49f08e255a20ceb6f9432df75f4eef018fdd5a78c"}, + {file = "coverage-5.5-cp27-cp27m-win32.whl", hash = "sha256:06191eb60f8d8a5bc046f3799f8a07a2d7aefb9504b0209aff0b47298333302a"}, + {file = "coverage-5.5-cp27-cp27m-win_amd64.whl", hash = "sha256:7501140f755b725495941b43347ba8a2777407fc7f250d4f5a7d2a1050ba8e82"}, + {file = "coverage-5.5-cp27-cp27mu-manylinux1_i686.whl", hash = "sha256:372da284cfd642d8e08ef606917846fa2ee350f64994bebfbd3afb0040436905"}, + {file = "coverage-5.5-cp27-cp27mu-manylinux1_x86_64.whl", hash = "sha256:8963a499849a1fc54b35b1c9f162f4108017b2e6db2c46c1bed93a72262ed083"}, + {file = "coverage-5.5-cp27-cp27mu-manylinux2010_i686.whl", hash = "sha256:869a64f53488f40fa5b5b9dcb9e9b2962a66a87dab37790f3fcfb5144b996ef5"}, + {file = "coverage-5.5-cp27-cp27mu-manylinux2010_x86_64.whl", hash = "sha256:4a7697d8cb0f27399b0e393c0b90f0f1e40c82023ea4d45d22bce7032a5d7b81"}, + {file = "coverage-5.5-cp310-cp310-macosx_10_14_x86_64.whl", hash = "sha256:8d0a0725ad7c1a0bcd8d1b437e191107d457e2ec1084b9f190630a4fb1af78e6"}, + {file = "coverage-5.5-cp310-cp310-manylinux1_x86_64.whl", hash = "sha256:51cb9476a3987c8967ebab3f0fe144819781fca264f57f89760037a2ea191cb0"}, + {file = "coverage-5.5-cp310-cp310-win_amd64.whl", hash = "sha256:c0891a6a97b09c1f3e073a890514d5012eb256845c451bd48f7968ef939bf4ae"}, + {file = "coverage-5.5-cp35-cp35m-macosx_10_9_x86_64.whl", hash = "sha256:3487286bc29a5aa4b93a072e9592f22254291ce96a9fbc5251f566b6b7343cdb"}, + {file = "coverage-5.5-cp35-cp35m-manylinux1_i686.whl", hash = "sha256:deee1077aae10d8fa88cb02c845cfba9b62c55e1183f52f6ae6a2df6a2187160"}, + {file = "coverage-5.5-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:f11642dddbb0253cc8853254301b51390ba0081750a8ac03f20ea8103f0c56b6"}, + {file = "coverage-5.5-cp35-cp35m-manylinux2010_i686.whl", hash = "sha256:6c90e11318f0d3c436a42409f2749ee1a115cd8b067d7f14c148f1ce5574d701"}, + {file = "coverage-5.5-cp35-cp35m-manylinux2010_x86_64.whl", hash = "sha256:30c77c1dc9f253283e34c27935fded5015f7d1abe83bc7821680ac444eaf7793"}, + {file = "coverage-5.5-cp35-cp35m-win32.whl", hash = "sha256:9a1ef3b66e38ef8618ce5fdc7bea3d9f45f3624e2a66295eea5e57966c85909e"}, + {file = "coverage-5.5-cp35-cp35m-win_amd64.whl", hash = "sha256:972c85d205b51e30e59525694670de6a8a89691186012535f9d7dbaa230e42c3"}, + {file = "coverage-5.5-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:af0e781009aaf59e25c5a678122391cb0f345ac0ec272c7961dc5455e1c40066"}, + {file = "coverage-5.5-cp36-cp36m-manylinux1_i686.whl", hash = "sha256:74d881fc777ebb11c63736622b60cb9e4aee5cace591ce274fb69e582a12a61a"}, + {file = "coverage-5.5-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:92b017ce34b68a7d67bd6d117e6d443a9bf63a2ecf8567bb3d8c6c7bc5014465"}, + {file = "coverage-5.5-cp36-cp36m-manylinux2010_i686.whl", hash = "sha256:d636598c8305e1f90b439dbf4f66437de4a5e3c31fdf47ad29542478c8508bbb"}, + {file = "coverage-5.5-cp36-cp36m-manylinux2010_x86_64.whl", hash = "sha256:41179b8a845742d1eb60449bdb2992196e211341818565abded11cfa90efb821"}, + {file = "coverage-5.5-cp36-cp36m-win32.whl", hash = "sha256:040af6c32813fa3eae5305d53f18875bedd079960822ef8ec067a66dd8afcd45"}, + {file = "coverage-5.5-cp36-cp36m-win_amd64.whl", hash = "sha256:5fec2d43a2cc6965edc0bb9e83e1e4b557f76f843a77a2496cbe719583ce8184"}, + {file = "coverage-5.5-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:18ba8bbede96a2c3dde7b868de9dcbd55670690af0988713f0603f037848418a"}, + {file = "coverage-5.5-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:2910f4d36a6a9b4214bb7038d537f015346f413a975d57ca6b43bf23d6563b53"}, + {file = "coverage-5.5-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:f0b278ce10936db1a37e6954e15a3730bea96a0997c26d7fee88e6c396c2086d"}, + {file = "coverage-5.5-cp37-cp37m-manylinux2010_i686.whl", hash = "sha256:796c9c3c79747146ebd278dbe1e5c5c05dd6b10cc3bcb8389dfdf844f3ead638"}, + {file = "coverage-5.5-cp37-cp37m-manylinux2010_x86_64.whl", hash = "sha256:53194af30d5bad77fcba80e23a1441c71abfb3e01192034f8246e0d8f99528f3"}, + {file = "coverage-5.5-cp37-cp37m-win32.whl", hash = "sha256:184a47bbe0aa6400ed2d41d8e9ed868b8205046518c52464fde713ea06e3a74a"}, + {file = "coverage-5.5-cp37-cp37m-win_amd64.whl", hash = "sha256:2949cad1c5208b8298d5686d5a85b66aae46d73eec2c3e08c817dd3513e5848a"}, + {file = "coverage-5.5-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:217658ec7187497e3f3ebd901afdca1af062b42cfe3e0dafea4cced3983739f6"}, + {file = "coverage-5.5-cp38-cp38-manylinux1_i686.whl", hash = "sha256:1aa846f56c3d49205c952d8318e76ccc2ae23303351d9270ab220004c580cfe2"}, + {file = "coverage-5.5-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:24d4a7de75446be83244eabbff746d66b9240ae020ced65d060815fac3423759"}, + {file = "coverage-5.5-cp38-cp38-manylinux2010_i686.whl", hash = "sha256:d1f8bf7b90ba55699b3a5e44930e93ff0189aa27186e96071fac7dd0d06a1873"}, + {file = "coverage-5.5-cp38-cp38-manylinux2010_x86_64.whl", hash = "sha256:970284a88b99673ccb2e4e334cfb38a10aab7cd44f7457564d11898a74b62d0a"}, + {file = "coverage-5.5-cp38-cp38-win32.whl", hash = "sha256:01d84219b5cdbfc8122223b39a954820929497a1cb1422824bb86b07b74594b6"}, + {file = "coverage-5.5-cp38-cp38-win_amd64.whl", hash = "sha256:2e0d881ad471768bf6e6c2bf905d183543f10098e3b3640fc029509530091502"}, + {file = "coverage-5.5-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:d1f9ce122f83b2305592c11d64f181b87153fc2c2bbd3bb4a3dde8303cfb1a6b"}, + {file = "coverage-5.5-cp39-cp39-manylinux1_i686.whl", hash = "sha256:13c4ee887eca0f4c5a247b75398d4114c37882658300e153113dafb1d76de529"}, + {file = "coverage-5.5-cp39-cp39-manylinux1_x86_64.whl", hash = "sha256:52596d3d0e8bdf3af43db3e9ba8dcdaac724ba7b5ca3f6358529d56f7a166f8b"}, + {file = "coverage-5.5-cp39-cp39-manylinux2010_i686.whl", hash = "sha256:2cafbbb3af0733db200c9b5f798d18953b1a304d3f86a938367de1567f4b5bff"}, + {file = "coverage-5.5-cp39-cp39-manylinux2010_x86_64.whl", hash = "sha256:44d654437b8ddd9eee7d1eaee28b7219bec228520ff809af170488fd2fed3e2b"}, + {file = "coverage-5.5-cp39-cp39-win32.whl", hash = "sha256:d314ed732c25d29775e84a960c3c60808b682c08d86602ec2c3008e1202e3bb6"}, + {file = "coverage-5.5-cp39-cp39-win_amd64.whl", hash = "sha256:13034c4409db851670bc9acd836243aeee299949bd5673e11844befcb0149f03"}, + {file = "coverage-5.5-pp36-none-any.whl", hash = "sha256:f030f8873312a16414c0d8e1a1ddff2d3235655a2174e3648b4fa66b3f2f1079"}, + {file = "coverage-5.5-pp37-none-any.whl", hash = "sha256:2a3859cb82dcbda1cfd3e6f71c27081d18aa251d20a17d87d26d4cd216fb0af4"}, + {file = "coverage-5.5.tar.gz", hash = "sha256:ebe78fe9a0e874362175b02371bdfbee64d8edc42a044253ddf4ee7d3c15212c"}, +] +execnet = [ + {file = "execnet-1.8.0-py2.py3-none-any.whl", hash = "sha256:7a13113028b1e1cc4c6492b28098b3c6576c9dccc7973bfe47b342afadafb2ac"}, + {file = "execnet-1.8.0.tar.gz", hash = "sha256:b73c5565e517f24b62dea8a5ceac178c661c4309d3aa0c3e420856c072c411b4"}, +] +hypothesis = [ + {file = "hypothesis-6.3.4-py3-none-any.whl", hash = "sha256:69f7d32270f52a14f853cb3673eae65fc2f057601f6e7b5502e5c905985d8c69"}, + {file = "hypothesis-6.3.4.tar.gz", hash = "sha256:26771ce7f21c2ed08e93f62bdd63755dab051c163cc9d54544269b5b8b550eac"}, +] +importlib-metadata = [ + {file = "importlib_metadata-3.7.0-py3-none-any.whl", hash = "sha256:c6af5dbf1126cd959c4a8d8efd61d4d3c83bddb0459a17e554284a077574b614"}, + {file = "importlib_metadata-3.7.0.tar.gz", hash = "sha256:24499ffde1b80be08284100393955842be4a59c7c16bbf2738aad0e464a8e0aa"}, +] +iniconfig = [ + {file = "iniconfig-1.1.1.tar.gz", hash = "sha256:bc3af051d7d14b2ee5ef9969666def0cd1a000e121eaea580d4a313df4b37f32"}, +] +isort = [ + {file = "isort-5.7.0-py3-none-any.whl", hash = "sha256:fff4f0c04e1825522ce6949973e83110a6e907750cd92d128b0d14aaaadbffdc"}, + {file = "isort-5.7.0.tar.gz", hash = "sha256:c729845434366216d320e936b8ad6f9d681aab72dc7cbc2d51bedc3582f3ad1e"}, +] +joblib = [ + {file = "joblib-1.0.1-py3-none-any.whl", hash = "sha256:feeb1ec69c4d45129954f1b7034954241eedfd6ba39b5e9e4b6883be3332d5e5"}, + {file = "joblib-1.0.1.tar.gz", hash = "sha256:9c17567692206d2f3fb9ecf5e991084254fe631665c450b443761c4186a613f7"}, +] +lazy-object-proxy = [ + {file = "lazy-object-proxy-1.5.2.tar.gz", hash = "sha256:5944a9b95e97de1980c65f03b79b356f30a43de48682b8bdd90aa5089f0ec1f4"}, + {file = "lazy_object_proxy-1.5.2-cp27-cp27m-macosx_10_14_x86_64.whl", hash = "sha256:e960e8be509e8d6d618300a6c189555c24efde63e85acaf0b14b2cd1ac743315"}, + {file = "lazy_object_proxy-1.5.2-cp27-cp27mu-manylinux1_x86_64.whl", hash = "sha256:522b7c94b524389f4a4094c4bf04c2b02228454ddd17c1a9b2801fac1d754871"}, + {file = "lazy_object_proxy-1.5.2-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:3782931963dc89e0e9a0ae4348b44762e868ea280e4f8c233b537852a8996ab9"}, + {file = "lazy_object_proxy-1.5.2-cp35-cp35m-manylinux2014_aarch64.whl", hash = "sha256:429c4d1862f3fc37cd56304d880f2eae5bd0da83bdef889f3bd66458aac49128"}, + {file = "lazy_object_proxy-1.5.2-cp35-cp35m-win32.whl", hash = "sha256:cd1bdace1a8762534e9a36c073cd54e97d517a17d69a17985961265be6d22847"}, + {file = "lazy_object_proxy-1.5.2-cp35-cp35m-win_amd64.whl", hash = "sha256:ddbdcd10eb999d7ab292677f588b658372aadb9a52790f82484a37127a390108"}, + {file = "lazy_object_proxy-1.5.2-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:ecb5dd5990cec6e7f5c9c1124a37cb2c710c6d69b0c1a5c4aa4b35eba0ada068"}, + {file = "lazy_object_proxy-1.5.2-cp36-cp36m-manylinux2014_aarch64.whl", hash = "sha256:b6577f15d5516d7d209c1a8cde23062c0f10625f19e8dc9fb59268859778d7d7"}, + {file = "lazy_object_proxy-1.5.2-cp36-cp36m-win32.whl", hash = "sha256:c8fe2d6ff0ff583784039d0255ea7da076efd08507f2be6f68583b0da32e3afb"}, + {file = "lazy_object_proxy-1.5.2-cp36-cp36m-win_amd64.whl", hash = "sha256:fa5b2dee0e231fa4ad117be114251bdfe6afe39213bd629d43deb117b6a6c40a"}, + {file = "lazy_object_proxy-1.5.2-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:1d33d6f789697f401b75ce08e73b1de567b947740f768376631079290118ad39"}, + {file = "lazy_object_proxy-1.5.2-cp37-cp37m-manylinux2014_aarch64.whl", hash = "sha256:57fb5c5504ddd45ed420b5b6461a78f58cbb0c1b0cbd9cd5a43ad30a4a3ee4d0"}, + {file = "lazy_object_proxy-1.5.2-cp37-cp37m-win32.whl", hash = "sha256:e7273c64bccfd9310e9601b8f4511d84730239516bada26a0c9846c9697617ef"}, + {file = "lazy_object_proxy-1.5.2-cp37-cp37m-win_amd64.whl", hash = "sha256:6f4e5e68b7af950ed7fdb594b3f19a0014a3ace0fedb86acb896e140ffb24302"}, + {file = "lazy_object_proxy-1.5.2-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:cadfa2c2cf54d35d13dc8d231253b7985b97d629ab9ca6e7d672c35539d38163"}, + {file = "lazy_object_proxy-1.5.2-cp38-cp38-manylinux2014_aarch64.whl", hash = "sha256:e7428977763150b4cf83255625a80a23dfdc94d43be7791ce90799d446b4e26f"}, + {file = "lazy_object_proxy-1.5.2-cp38-cp38-win32.whl", hash = "sha256:2f2de8f8ac0be3e40d17730e0600619d35c78c13a099ea91ef7fb4ad944ce694"}, + {file = "lazy_object_proxy-1.5.2-cp38-cp38-win_amd64.whl", hash = "sha256:38c3865bd220bd983fcaa9aa11462619e84a71233bafd9c880f7b1cb753ca7fa"}, + {file = "lazy_object_proxy-1.5.2-cp39-cp39-macosx_10_14_x86_64.whl", hash = "sha256:8a44e9901c0555f95ac401377032f6e6af66d8fc1fbfad77a7a8b1a826e0b93c"}, + {file = "lazy_object_proxy-1.5.2-cp39-cp39-manylinux1_x86_64.whl", hash = "sha256:fa7fb7973c622b9e725bee1db569d2c2ee64d2f9a089201c5e8185d482c7352d"}, + {file = "lazy_object_proxy-1.5.2-cp39-cp39-manylinux2014_aarch64.whl", hash = "sha256:71a1ef23f22fa8437974b2d60fedb947c99a957ad625f83f43fd3de70f77f458"}, + {file = "lazy_object_proxy-1.5.2-cp39-cp39-win32.whl", hash = "sha256:ef3f5e288aa57b73b034ce9c1f1ac753d968f9069cd0742d1d69c698a0167166"}, + {file = "lazy_object_proxy-1.5.2-cp39-cp39-win_amd64.whl", hash = "sha256:37d9c34b96cca6787fe014aeb651217944a967a5b165e2cacb6b858d2997ab84"}, +] +mccabe = [ + {file = "mccabe-0.6.1-py2.py3-none-any.whl", hash = "sha256:ab8a6258860da4b6677da4bd2fe5dc2c659cff31b3ee4f7f5d64e79735b80d42"}, + {file = "mccabe-0.6.1.tar.gz", hash = "sha256:dd8d182285a0fe56bace7f45b5e7d1a6ebcbf524e8f3bd87eb0f125271b8831f"}, +] +mypy-extensions = [ + {file = "mypy_extensions-0.4.3-py2.py3-none-any.whl", hash = "sha256:090fedd75945a69ae91ce1303b5824f428daf5a028d2f6ab8a299250a846f15d"}, + {file = "mypy_extensions-0.4.3.tar.gz", hash = "sha256:2d82818f5bb3e369420cb3c4060a7970edba416647068eb4c5343488a6c604a8"}, +] +numpy = [ + {file = "numpy-1.20.1-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:ae61f02b84a0211abb56462a3b6cd1e7ec39d466d3160eb4e1da8bf6717cdbeb"}, + {file = "numpy-1.20.1-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:65410c7f4398a0047eea5cca9b74009ea61178efd78d1be9847fac1d6716ec1e"}, + {file = "numpy-1.20.1-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:2d7e27442599104ee08f4faed56bb87c55f8b10a5494ac2ead5c98a4b289e61f"}, + {file = "numpy-1.20.1-cp37-cp37m-manylinux2010_i686.whl", hash = "sha256:4ed8e96dc146e12c1c5cdd6fb9fd0757f2ba66048bf94c5126b7efebd12d0090"}, + {file = "numpy-1.20.1-cp37-cp37m-manylinux2010_x86_64.whl", hash = "sha256:ecb5b74c702358cdc21268ff4c37f7466357871f53a30e6f84c686952bef16a9"}, + {file = "numpy-1.20.1-cp37-cp37m-manylinux2014_aarch64.whl", hash = "sha256:b9410c0b6fed4a22554f072a86c361e417f0258838957b78bd063bde2c7f841f"}, + {file = "numpy-1.20.1-cp37-cp37m-win32.whl", hash = "sha256:3d3087e24e354c18fb35c454026af3ed8997cfd4997765266897c68d724e4845"}, + {file = "numpy-1.20.1-cp37-cp37m-win_amd64.whl", hash = "sha256:89f937b13b8dd17b0099c7c2e22066883c86ca1575a975f754babc8fbf8d69a9"}, + {file = "numpy-1.20.1-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:a1d7995d1023335e67fb070b2fae6f5968f5be3802b15ad6d79d81ecaa014fe0"}, + {file = "numpy-1.20.1-cp38-cp38-manylinux1_i686.whl", hash = "sha256:60759ab15c94dd0e1ed88241fd4fa3312db4e91d2c8f5a2d4cf3863fad83d65b"}, + {file = "numpy-1.20.1-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:125a0e10ddd99a874fd357bfa1b636cd58deb78ba4a30b5ddb09f645c3512e04"}, + {file = "numpy-1.20.1-cp38-cp38-manylinux2010_i686.whl", hash = "sha256:c26287dfc888cf1e65181f39ea75e11f42ffc4f4529e5bd19add57ad458996e2"}, + {file = "numpy-1.20.1-cp38-cp38-manylinux2010_x86_64.whl", hash = "sha256:7199109fa46277be503393be9250b983f325880766f847885607d9b13848f257"}, + {file = "numpy-1.20.1-cp38-cp38-manylinux2014_aarch64.whl", hash = "sha256:72251e43ac426ff98ea802a931922c79b8d7596480300eb9f1b1e45e0543571e"}, + {file = "numpy-1.20.1-cp38-cp38-win32.whl", hash = "sha256:c91ec9569facd4757ade0888371eced2ecf49e7982ce5634cc2cf4e7331a4b14"}, + {file = "numpy-1.20.1-cp38-cp38-win_amd64.whl", hash = "sha256:13adf545732bb23a796914fe5f891a12bd74cf3d2986eed7b7eba2941eea1590"}, + {file = "numpy-1.20.1-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:104f5e90b143dbf298361a99ac1af4cf59131218a045ebf4ee5990b83cff5fab"}, + {file = "numpy-1.20.1-cp39-cp39-manylinux2010_i686.whl", hash = "sha256:89e5336f2bec0c726ac7e7cdae181b325a9c0ee24e604704ed830d241c5e47ff"}, + {file = "numpy-1.20.1-cp39-cp39-manylinux2010_x86_64.whl", hash = "sha256:032be656d89bbf786d743fee11d01ef318b0781281241997558fa7950028dd29"}, + {file = "numpy-1.20.1-cp39-cp39-manylinux2014_aarch64.whl", hash = "sha256:66b467adfcf628f66ea4ac6430ded0614f5cc06ba530d09571ea404789064adc"}, + {file = "numpy-1.20.1-cp39-cp39-win32.whl", hash = "sha256:12e4ba5c6420917571f1a5becc9338abbde71dd811ce40b37ba62dec7b39af6d"}, + {file = "numpy-1.20.1-cp39-cp39-win_amd64.whl", hash = "sha256:9c94cab5054bad82a70b2e77741271790304651d584e2cdfe2041488e753863b"}, + {file = "numpy-1.20.1-pp37-pypy37_pp73-manylinux2010_x86_64.whl", hash = "sha256:9eb551d122fadca7774b97db8a112b77231dcccda8e91a5bc99e79890797175e"}, + {file = "numpy-1.20.1.zip", hash = "sha256:3bc63486a870294683980d76ec1e3efc786295ae00128f9ea38e2c6e74d5a60a"}, +] +packaging = [ + {file = "packaging-20.9-py2.py3-none-any.whl", hash = "sha256:67714da7f7bc052e064859c05c595155bd1ee9f69f76557e21f051443c20947a"}, + {file = "packaging-20.9.tar.gz", hash = "sha256:5b327ac1320dc863dca72f4514ecc086f31186744b84a230374cc1fd776feae5"}, +] +pathspec = [ + {file = "pathspec-0.8.1-py2.py3-none-any.whl", hash = "sha256:aa0cb481c4041bf52ffa7b0d8fa6cd3e88a2ca4879c533c9153882ee2556790d"}, + {file = "pathspec-0.8.1.tar.gz", hash = "sha256:86379d6b86d75816baba717e64b1a3a3469deb93bb76d613c9ce79edc5cb68fd"}, +] +pluggy = [ + {file = "pluggy-0.13.1-py2.py3-none-any.whl", hash = "sha256:966c145cd83c96502c3c3868f50408687b38434af77734af1e9ca461a4081d2d"}, + {file = "pluggy-0.13.1.tar.gz", hash = "sha256:15b2acde666561e1298d71b523007ed7364de07029219b604cf808bfa1c765b0"}, +] +py = [ + {file = "py-1.10.0-py2.py3-none-any.whl", hash = "sha256:3b80836aa6d1feeaa108e046da6423ab8f6ceda6468545ae8d02d9d58d18818a"}, + {file = "py-1.10.0.tar.gz", hash = "sha256:21b81bda15b66ef5e1a777a21c4dcd9c20ad3efd0b3f817e7a809035269e1bd3"}, +] +pylint = [ + {file = "pylint-2.7.2-py3-none-any.whl", hash = "sha256:d09b0b07ba06bcdff463958f53f23df25e740ecd81895f7d2699ec04bbd8dc3b"}, + {file = "pylint-2.7.2.tar.gz", hash = "sha256:0e21d3b80b96740909d77206d741aa3ce0b06b41be375d92e1f3244a274c1f8a"}, +] +pylint-fail-under = [ + {file = "pylint-fail-under-0.3.0.tar.gz", hash = "sha256:4584c0eb6bf2c3dcf9402c74734a49d3a7a86fbe3590276b517fbcbdcba1333a"}, + {file = "pylint_fail_under-0.3.0-py3-none-any.whl", hash = "sha256:d372e3a698634b48550c4e4af10dc145a6d2c2d9ee26a40690ae635661c85de2"}, +] +pyparsing = [ + {file = "pyparsing-2.4.7-py2.py3-none-any.whl", hash = "sha256:ef9d7589ef3c200abe66653d3f1ab1033c3c419ae9b9bdb1240a85b024efc88b"}, + {file = "pyparsing-2.4.7.tar.gz", hash = "sha256:c203ec8783bf771a155b207279b9bccb8dea02d8f0c9e5f8ead507bc3246ecc1"}, +] +pytest = [ + {file = "pytest-6.2.2-py3-none-any.whl", hash = "sha256:b574b57423e818210672e07ca1fa90aaf194a4f63f3ab909a2c67ebb22913839"}, + {file = "pytest-6.2.2.tar.gz", hash = "sha256:9d1edf9e7d0b84d72ea3dbcdfd22b35fb543a5e8f2a60092dd578936bf63d7f9"}, +] +pytest-cov = [ + {file = "pytest-cov-2.11.1.tar.gz", hash = "sha256:359952d9d39b9f822d9d29324483e7ba04a3a17dd7d05aa6beb7ea01e359e5f7"}, + {file = "pytest_cov-2.11.1-py2.py3-none-any.whl", hash = "sha256:bdb9fdb0b85a7cc825269a4c56b48ccaa5c7e365054b6038772c32ddcdc969da"}, +] +pytest-forked = [ + {file = "pytest-forked-1.3.0.tar.gz", hash = "sha256:6aa9ac7e00ad1a539c41bec6d21011332de671e938c7637378ec9710204e37ca"}, + {file = "pytest_forked-1.3.0-py2.py3-none-any.whl", hash = "sha256:dc4147784048e70ef5d437951728825a131b81714b398d5d52f17c7c144d8815"}, +] +pytest-xdist = [ + {file = "pytest-xdist-2.2.1.tar.gz", hash = "sha256:718887296892f92683f6a51f25a3ae584993b06f7076ce1e1fd482e59a8220a2"}, + {file = "pytest_xdist-2.2.1-py3-none-any.whl", hash = "sha256:2447a1592ab41745955fb870ac7023026f20a5f0bfccf1b52a879bd193d46450"}, +] +regex = [ + {file = "regex-2020.11.13-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:8b882a78c320478b12ff024e81dc7d43c1462aa4a3341c754ee65d857a521f85"}, + {file = "regex-2020.11.13-cp36-cp36m-manylinux1_i686.whl", hash = "sha256:a63f1a07932c9686d2d416fb295ec2c01ab246e89b4d58e5fa468089cab44b70"}, + {file = "regex-2020.11.13-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:6e4b08c6f8daca7d8f07c8d24e4331ae7953333dbd09c648ed6ebd24db5a10ee"}, + {file = "regex-2020.11.13-cp36-cp36m-manylinux2010_i686.whl", hash = "sha256:bba349276b126947b014e50ab3316c027cac1495992f10e5682dc677b3dfa0c5"}, + {file = "regex-2020.11.13-cp36-cp36m-manylinux2010_x86_64.whl", hash = "sha256:56e01daca75eae420bce184edd8bb341c8eebb19dd3bce7266332258f9fb9dd7"}, + {file = "regex-2020.11.13-cp36-cp36m-manylinux2014_aarch64.whl", hash = "sha256:6a8ce43923c518c24a2579fda49f093f1397dad5d18346211e46f134fc624e31"}, + {file = "regex-2020.11.13-cp36-cp36m-manylinux2014_i686.whl", hash = "sha256:1ab79fcb02b930de09c76d024d279686ec5d532eb814fd0ed1e0051eb8bd2daa"}, + {file = "regex-2020.11.13-cp36-cp36m-manylinux2014_x86_64.whl", hash = "sha256:9801c4c1d9ae6a70aeb2128e5b4b68c45d4f0af0d1535500884d644fa9b768c6"}, + {file = "regex-2020.11.13-cp36-cp36m-win32.whl", hash = "sha256:49cae022fa13f09be91b2c880e58e14b6da5d10639ed45ca69b85faf039f7a4e"}, + {file = "regex-2020.11.13-cp36-cp36m-win_amd64.whl", hash = "sha256:749078d1eb89484db5f34b4012092ad14b327944ee7f1c4f74d6279a6e4d1884"}, + {file = "regex-2020.11.13-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:b2f4007bff007c96a173e24dcda236e5e83bde4358a557f9ccf5e014439eae4b"}, + {file = "regex-2020.11.13-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:38c8fd190db64f513fe4e1baa59fed086ae71fa45083b6936b52d34df8f86a88"}, + {file = "regex-2020.11.13-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:5862975b45d451b6db51c2e654990c1820523a5b07100fc6903e9c86575202a0"}, + {file = "regex-2020.11.13-cp37-cp37m-manylinux2010_i686.whl", hash = "sha256:262c6825b309e6485ec2493ffc7e62a13cf13fb2a8b6d212f72bd53ad34118f1"}, + {file = "regex-2020.11.13-cp37-cp37m-manylinux2010_x86_64.whl", hash = "sha256:bafb01b4688833e099d79e7efd23f99172f501a15c44f21ea2118681473fdba0"}, + {file = "regex-2020.11.13-cp37-cp37m-manylinux2014_aarch64.whl", hash = "sha256:e32f5f3d1b1c663af7f9c4c1e72e6ffe9a78c03a31e149259f531e0fed826512"}, + {file = "regex-2020.11.13-cp37-cp37m-manylinux2014_i686.whl", hash = "sha256:3bddc701bdd1efa0d5264d2649588cbfda549b2899dc8d50417e47a82e1387ba"}, + {file = "regex-2020.11.13-cp37-cp37m-manylinux2014_x86_64.whl", hash = "sha256:02951b7dacb123d8ea6da44fe45ddd084aa6777d4b2454fa0da61d569c6fa538"}, + {file = "regex-2020.11.13-cp37-cp37m-win32.whl", hash = "sha256:0d08e71e70c0237883d0bef12cad5145b84c3705e9c6a588b2a9c7080e5af2a4"}, + {file = "regex-2020.11.13-cp37-cp37m-win_amd64.whl", hash = "sha256:1fa7ee9c2a0e30405e21031d07d7ba8617bc590d391adfc2b7f1e8b99f46f444"}, + {file = "regex-2020.11.13-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:baf378ba6151f6e272824b86a774326f692bc2ef4cc5ce8d5bc76e38c813a55f"}, + {file = "regex-2020.11.13-cp38-cp38-manylinux1_i686.whl", hash = "sha256:e3faaf10a0d1e8e23a9b51d1900b72e1635c2d5b0e1bea1c18022486a8e2e52d"}, + {file = "regex-2020.11.13-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:2a11a3e90bd9901d70a5b31d7dd85114755a581a5da3fc996abfefa48aee78af"}, + {file = "regex-2020.11.13-cp38-cp38-manylinux2010_i686.whl", hash = "sha256:d1ebb090a426db66dd80df8ca85adc4abfcbad8a7c2e9a5ec7513ede522e0a8f"}, + {file = "regex-2020.11.13-cp38-cp38-manylinux2010_x86_64.whl", hash = "sha256:b2b1a5ddae3677d89b686e5c625fc5547c6e492bd755b520de5332773a8af06b"}, + {file = "regex-2020.11.13-cp38-cp38-manylinux2014_aarch64.whl", hash = "sha256:2c99e97d388cd0a8d30f7c514d67887d8021541b875baf09791a3baad48bb4f8"}, + {file = "regex-2020.11.13-cp38-cp38-manylinux2014_i686.whl", hash = "sha256:c084582d4215593f2f1d28b65d2a2f3aceff8342aa85afd7be23a9cad74a0de5"}, + {file = "regex-2020.11.13-cp38-cp38-manylinux2014_x86_64.whl", hash = "sha256:a3d748383762e56337c39ab35c6ed4deb88df5326f97a38946ddd19028ecce6b"}, + {file = "regex-2020.11.13-cp38-cp38-win32.whl", hash = "sha256:7913bd25f4ab274ba37bc97ad0e21c31004224ccb02765ad984eef43e04acc6c"}, + {file = "regex-2020.11.13-cp38-cp38-win_amd64.whl", hash = "sha256:6c54ce4b5d61a7129bad5c5dc279e222afd00e721bf92f9ef09e4fae28755683"}, + {file = "regex-2020.11.13-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:1862a9d9194fae76a7aaf0150d5f2a8ec1da89e8b55890b1786b8f88a0f619dc"}, + {file = "regex-2020.11.13-cp39-cp39-manylinux1_i686.whl", hash = "sha256:4902e6aa086cbb224241adbc2f06235927d5cdacffb2425c73e6570e8d862364"}, + {file = "regex-2020.11.13-cp39-cp39-manylinux1_x86_64.whl", hash = "sha256:7a25fcbeae08f96a754b45bdc050e1fb94b95cab046bf56b016c25e9ab127b3e"}, + {file = "regex-2020.11.13-cp39-cp39-manylinux2010_i686.whl", hash = "sha256:d2d8ce12b7c12c87e41123997ebaf1a5767a5be3ec545f64675388970f415e2e"}, + {file = "regex-2020.11.13-cp39-cp39-manylinux2010_x86_64.whl", hash = "sha256:f7d29a6fc4760300f86ae329e3b6ca28ea9c20823df123a2ea8693e967b29917"}, + {file = "regex-2020.11.13-cp39-cp39-manylinux2014_aarch64.whl", hash = "sha256:717881211f46de3ab130b58ec0908267961fadc06e44f974466d1887f865bd5b"}, + {file = "regex-2020.11.13-cp39-cp39-manylinux2014_i686.whl", hash = "sha256:3128e30d83f2e70b0bed9b2a34e92707d0877e460b402faca908c6667092ada9"}, + {file = "regex-2020.11.13-cp39-cp39-manylinux2014_x86_64.whl", hash = "sha256:8f6a2229e8ad946e36815f2a03386bb8353d4bde368fdf8ca5f0cb97264d3b5c"}, + {file = "regex-2020.11.13-cp39-cp39-win32.whl", hash = "sha256:f8f295db00ef5f8bae530fc39af0b40486ca6068733fb860b42115052206466f"}, + {file = "regex-2020.11.13-cp39-cp39-win_amd64.whl", hash = "sha256:a15f64ae3a027b64496a71ab1f722355e570c3fac5ba2801cafce846bf5af01d"}, + {file = "regex-2020.11.13.tar.gz", hash = "sha256:83d6b356e116ca119db8e7c6fc2983289d87b27b3fac238cfe5dca529d884562"}, +] +scikit-learn = [ + {file = "scikit-learn-0.24.1.tar.gz", hash = "sha256:a0334a1802e64d656022c3bfab56a73fbd6bf4b1298343f3688af2151810bbdf"}, + {file = "scikit_learn-0.24.1-cp36-cp36m-macosx_10_13_x86_64.whl", hash = "sha256:9bed8a1ef133c8e2f13966a542cb8125eac7f4b67dcd234197c827ba9c7dd3e0"}, + {file = "scikit_learn-0.24.1-cp36-cp36m-manylinux2010_i686.whl", hash = "sha256:9dfa564ef27e8e674aa1cc74378416d580ac4ede1136c13dd555a87996e13422"}, + {file = "scikit_learn-0.24.1-cp36-cp36m-manylinux2010_x86_64.whl", hash = "sha256:9c6097b6a9b2bafc5e0f31f659e6ab5e131383209c30c9e978c5b8abdac5ed2a"}, + {file = "scikit_learn-0.24.1-cp36-cp36m-win32.whl", hash = "sha256:7b04691eb2f41d2c68dbda8d1bd3cb4ef421bdc43aaa56aeb6c762224552dfb6"}, + {file = "scikit_learn-0.24.1-cp36-cp36m-win_amd64.whl", hash = "sha256:1adf483e91007a87171d7ce58c34b058eb5dab01b5fee6052f15841778a8ecd8"}, + {file = "scikit_learn-0.24.1-cp37-cp37m-macosx_10_13_x86_64.whl", hash = "sha256:ddb52d088889f5596bc4d1de981f2eca106b58243b6679e4782f3ba5096fd645"}, + {file = "scikit_learn-0.24.1-cp37-cp37m-manylinux2010_i686.whl", hash = "sha256:99349d77f54e11f962d608d94dfda08f0c9e5720d97132233ebdf35be2858b2d"}, + {file = "scikit_learn-0.24.1-cp37-cp37m-manylinux2010_x86_64.whl", hash = "sha256:83b21ff053b1ff1c018a2d24db6dd3ea339b1acfbaa4d9c881731f43748d8b3b"}, + {file = "scikit_learn-0.24.1-cp37-cp37m-win32.whl", hash = "sha256:c3deb3b19dd9806acf00cf0d400e84562c227723013c33abefbbc3cf906596e9"}, + {file = "scikit_learn-0.24.1-cp37-cp37m-win_amd64.whl", hash = "sha256:d54dbaadeb1425b7d6a66bf44bee2bb2b899fe3e8850b8e94cfb9c904dcb46d0"}, + {file = "scikit_learn-0.24.1-cp38-cp38-macosx_10_13_x86_64.whl", hash = "sha256:3c4f07f47c04e81b134424d53c3f5e16dfd7f494e44fd7584ba9ce9de2c5e6c1"}, + {file = "scikit_learn-0.24.1-cp38-cp38-manylinux2010_i686.whl", hash = "sha256:826b92bf45b8ad80444814e5f4ac032156dd481e48d7da33d611f8fe96d5f08b"}, + {file = "scikit_learn-0.24.1-cp38-cp38-manylinux2010_x86_64.whl", hash = "sha256:259ec35201e82e2db1ae2496f229e63f46d7f1695ae68eef9350b00dc74ba52f"}, + {file = "scikit_learn-0.24.1-cp38-cp38-win32.whl", hash = "sha256:8772b99d683be8f67fcc04789032f1b949022a0e6880ee7b75a7ec97dbbb5d0b"}, + {file = "scikit_learn-0.24.1-cp38-cp38-win_amd64.whl", hash = "sha256:ed9d65594948678827f4ff0e7ae23344e2f2b4cabbca057ccaed3118fdc392ca"}, + {file = "scikit_learn-0.24.1-cp39-cp39-macosx_10_13_x86_64.whl", hash = "sha256:8aa1b3ac46b80eaa552b637eeadbbce3be5931e4b5002b964698e33a1b589e1e"}, + {file = "scikit_learn-0.24.1-cp39-cp39-manylinux2010_i686.whl", hash = "sha256:895dbf2030aa7337649e36a83a007df3c9811396b4e2fa672a851160f36ce90c"}, + {file = "scikit_learn-0.24.1-cp39-cp39-manylinux2010_x86_64.whl", hash = "sha256:9a24d1ccec2a34d4cd3f2a1f86409f3f5954cc23d4d2270ba0d03cf018aa4780"}, + {file = "scikit_learn-0.24.1-cp39-cp39-win32.whl", hash = "sha256:fab31f48282ebf54dd69f6663cd2d9800096bad1bb67bbc9c9ac84eb77b41972"}, + {file = "scikit_learn-0.24.1-cp39-cp39-win_amd64.whl", hash = "sha256:4562dcf4793e61c5d0f89836d07bc37521c3a1889da8f651e2c326463c4bd697"}, +] +scipy = [ + {file = "scipy-1.6.1-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:a15a1f3fc0abff33e792d6049161b7795909b40b97c6cc2934ed54384017ab76"}, + {file = "scipy-1.6.1-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:e79570979ccdc3d165456dd62041d9556fb9733b86b4b6d818af7a0afc15f092"}, + {file = "scipy-1.6.1-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:a423533c55fec61456dedee7b6ee7dce0bb6bfa395424ea374d25afa262be261"}, + {file = "scipy-1.6.1-cp37-cp37m-manylinux2014_aarch64.whl", hash = "sha256:33d6b7df40d197bdd3049d64e8e680227151673465e5d85723b3b8f6b15a6ced"}, + {file = "scipy-1.6.1-cp37-cp37m-win32.whl", hash = "sha256:6725e3fbb47da428794f243864f2297462e9ee448297c93ed1dcbc44335feb78"}, + {file = "scipy-1.6.1-cp37-cp37m-win_amd64.whl", hash = "sha256:5fa9c6530b1661f1370bcd332a1e62ca7881785cc0f80c0d559b636567fab63c"}, + {file = "scipy-1.6.1-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:bd50daf727f7c195e26f27467c85ce653d41df4358a25b32434a50d8870fc519"}, + {file = "scipy-1.6.1-cp38-cp38-manylinux1_i686.whl", hash = "sha256:f46dd15335e8a320b0fb4685f58b7471702234cba8bb3442b69a3e1dc329c345"}, + {file = "scipy-1.6.1-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:0e5b0ccf63155d90da576edd2768b66fb276446c371b73841e3503be1d63fb5d"}, + {file = "scipy-1.6.1-cp38-cp38-manylinux2014_aarch64.whl", hash = "sha256:2481efbb3740977e3c831edfd0bd9867be26387cacf24eb5e366a6a374d3d00d"}, + {file = "scipy-1.6.1-cp38-cp38-win32.whl", hash = "sha256:68cb4c424112cd4be886b4d979c5497fba190714085f46b8ae67a5e4416c32b4"}, + {file = "scipy-1.6.1-cp38-cp38-win_amd64.whl", hash = "sha256:5f331eeed0297232d2e6eea51b54e8278ed8bb10b099f69c44e2558c090d06bf"}, + {file = "scipy-1.6.1-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:0c8a51d33556bf70367452d4d601d1742c0e806cd0194785914daf19775f0e67"}, + {file = "scipy-1.6.1-cp39-cp39-manylinux1_i686.whl", hash = "sha256:83bf7c16245c15bc58ee76c5418e46ea1811edcc2e2b03041b804e46084ab627"}, + {file = "scipy-1.6.1-cp39-cp39-manylinux1_x86_64.whl", hash = "sha256:794e768cc5f779736593046c9714e0f3a5940bc6dcc1dba885ad64cbfb28e9f0"}, + {file = "scipy-1.6.1-cp39-cp39-manylinux2014_aarch64.whl", hash = "sha256:5da5471aed911fe7e52b86bf9ea32fb55ae93e2f0fac66c32e58897cfb02fa07"}, + {file = "scipy-1.6.1-cp39-cp39-win32.whl", hash = "sha256:8e403a337749ed40af60e537cc4d4c03febddcc56cd26e774c9b1b600a70d3e4"}, + {file = "scipy-1.6.1-cp39-cp39-win_amd64.whl", hash = "sha256:a5193a098ae9f29af283dcf0041f762601faf2e595c0db1da929875b7570353f"}, + {file = "scipy-1.6.1.tar.gz", hash = "sha256:c4fceb864890b6168e79b0e714c585dbe2fd4222768ee90bc1aa0f8218691b11"}, +] +sortedcontainers = [ + {file = "sortedcontainers-2.3.0-py2.py3-none-any.whl", hash = "sha256:37257a32add0a3ee490bb170b599e93095eed89a55da91fa9f48753ea12fd73f"}, + {file = "sortedcontainers-2.3.0.tar.gz", hash = "sha256:59cc937650cf60d677c16775597c89a960658a09cf7c1a668f86e1e4464b10a1"}, +] +threadpoolctl = [ + {file = "threadpoolctl-2.1.0-py3-none-any.whl", hash = "sha256:38b74ca20ff3bb42caca8b00055111d74159ee95c4370882bbff2b93d24da725"}, + {file = "threadpoolctl-2.1.0.tar.gz", hash = "sha256:ddc57c96a38beb63db45d6c159b5ab07b6bced12c45a1f07b2b92f272aebfa6b"}, +] +toml = [ + {file = "toml-0.10.2-py2.py3-none-any.whl", hash = "sha256:806143ae5bfb6a3c6e736a764057db0e6a0e05e338b5630894a5f779cabb4f9b"}, + {file = "toml-0.10.2.tar.gz", hash = "sha256:b3bda1d108d5dd99f4a20d24d9c348e91c4db7ab1b749200bded2f839ccbe68f"}, +] +typed-ast = [ + {file = "typed_ast-1.4.2-cp35-cp35m-manylinux1_i686.whl", hash = "sha256:7703620125e4fb79b64aa52427ec192822e9f45d37d4b6625ab37ef403e1df70"}, + {file = "typed_ast-1.4.2-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:c9aadc4924d4b5799112837b226160428524a9a45f830e0d0f184b19e4090487"}, + {file = "typed_ast-1.4.2-cp35-cp35m-manylinux2014_aarch64.whl", hash = "sha256:9ec45db0c766f196ae629e509f059ff05fc3148f9ffd28f3cfe75d4afb485412"}, + {file = "typed_ast-1.4.2-cp35-cp35m-win32.whl", hash = "sha256:85f95aa97a35bdb2f2f7d10ec5bbdac0aeb9dafdaf88e17492da0504de2e6400"}, + {file = "typed_ast-1.4.2-cp35-cp35m-win_amd64.whl", hash = "sha256:9044ef2df88d7f33692ae3f18d3be63dec69c4fb1b5a4a9ac950f9b4ba571606"}, + {file = "typed_ast-1.4.2-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:c1c876fd795b36126f773db9cbb393f19808edd2637e00fd6caba0e25f2c7b64"}, + {file = "typed_ast-1.4.2-cp36-cp36m-manylinux1_i686.whl", hash = "sha256:5dcfc2e264bd8a1db8b11a892bd1647154ce03eeba94b461effe68790d8b8e07"}, + {file = "typed_ast-1.4.2-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:8db0e856712f79c45956da0c9a40ca4246abc3485ae0d7ecc86a20f5e4c09abc"}, + {file = "typed_ast-1.4.2-cp36-cp36m-manylinux2014_aarch64.whl", hash = "sha256:d003156bb6a59cda9050e983441b7fa2487f7800d76bdc065566b7d728b4581a"}, + {file = "typed_ast-1.4.2-cp36-cp36m-win32.whl", hash = "sha256:4c790331247081ea7c632a76d5b2a265e6d325ecd3179d06e9cf8d46d90dd151"}, + {file = "typed_ast-1.4.2-cp36-cp36m-win_amd64.whl", hash = "sha256:d175297e9533d8d37437abc14e8a83cbc68af93cc9c1c59c2c292ec59a0697a3"}, + {file = "typed_ast-1.4.2-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:cf54cfa843f297991b7388c281cb3855d911137223c6b6d2dd82a47ae5125a41"}, + {file = "typed_ast-1.4.2-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:b4fcdcfa302538f70929eb7b392f536a237cbe2ed9cba88e3bf5027b39f5f77f"}, + {file = "typed_ast-1.4.2-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:987f15737aba2ab5f3928c617ccf1ce412e2e321c77ab16ca5a293e7bbffd581"}, + {file = "typed_ast-1.4.2-cp37-cp37m-manylinux2014_aarch64.whl", hash = "sha256:37f48d46d733d57cc70fd5f30572d11ab8ed92da6e6b28e024e4a3edfb456e37"}, + {file = "typed_ast-1.4.2-cp37-cp37m-win32.whl", hash = "sha256:36d829b31ab67d6fcb30e185ec996e1f72b892255a745d3a82138c97d21ed1cd"}, + {file = "typed_ast-1.4.2-cp37-cp37m-win_amd64.whl", hash = "sha256:8368f83e93c7156ccd40e49a783a6a6850ca25b556c0fa0240ed0f659d2fe496"}, + {file = "typed_ast-1.4.2-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:963c80b583b0661918718b095e02303d8078950b26cc00b5e5ea9ababe0de1fc"}, + {file = "typed_ast-1.4.2-cp38-cp38-manylinux1_i686.whl", hash = "sha256:e683e409e5c45d5c9082dc1daf13f6374300806240719f95dc783d1fc942af10"}, + {file = "typed_ast-1.4.2-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:84aa6223d71012c68d577c83f4e7db50d11d6b1399a9c779046d75e24bed74ea"}, + {file = "typed_ast-1.4.2-cp38-cp38-manylinux2014_aarch64.whl", hash = "sha256:a38878a223bdd37c9709d07cd357bb79f4c760b29210e14ad0fb395294583787"}, + {file = "typed_ast-1.4.2-cp38-cp38-win32.whl", hash = "sha256:a2c927c49f2029291fbabd673d51a2180038f8cd5a5b2f290f78c4516be48be2"}, + {file = "typed_ast-1.4.2-cp38-cp38-win_amd64.whl", hash = "sha256:c0c74e5579af4b977c8b932f40a5464764b2f86681327410aa028a22d2f54937"}, + {file = "typed_ast-1.4.2-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:07d49388d5bf7e863f7fa2f124b1b1d89d8aa0e2f7812faff0a5658c01c59aa1"}, + {file = "typed_ast-1.4.2-cp39-cp39-manylinux1_i686.whl", hash = "sha256:240296b27397e4e37874abb1df2a608a92df85cf3e2a04d0d4d61055c8305ba6"}, + {file = "typed_ast-1.4.2-cp39-cp39-manylinux1_x86_64.whl", hash = "sha256:d746a437cdbca200622385305aedd9aef68e8a645e385cc483bdc5e488f07166"}, + {file = "typed_ast-1.4.2-cp39-cp39-manylinux2014_aarch64.whl", hash = "sha256:14bf1522cdee369e8f5581238edac09150c765ec1cb33615855889cf33dcb92d"}, + {file = "typed_ast-1.4.2-cp39-cp39-win32.whl", hash = "sha256:cc7b98bf58167b7f2db91a4327da24fb93368838eb84a44c472283778fc2446b"}, + {file = "typed_ast-1.4.2-cp39-cp39-win_amd64.whl", hash = "sha256:7147e2a76c75f0f64c4319886e7639e490fee87c9d25cb1d4faef1d8cf83a440"}, + {file = "typed_ast-1.4.2.tar.gz", hash = "sha256:9fc0b3cb5d1720e7141d103cf4819aea239f7d136acf9ee4a69b047b7986175a"}, +] +typing-extensions = [ + {file = "typing_extensions-3.7.4.3-py2-none-any.whl", hash = "sha256:dafc7639cde7f1b6e1acc0f457842a83e722ccca8eef5270af2d74792619a89f"}, + {file = "typing_extensions-3.7.4.3-py3-none-any.whl", hash = "sha256:7cb407020f00f7bfc3cb3e7881628838e69d8f3fcab2f64742a5e76b2f841918"}, + {file = "typing_extensions-3.7.4.3.tar.gz", hash = "sha256:99d4073b617d30288f569d3f13d2bd7548c3a7e4c8de87db09a9d29bb3a4a60c"}, +] +wrapt = [ + {file = "wrapt-1.12.1.tar.gz", hash = "sha256:b62ffa81fb85f4332a4f609cab4ac40709470da05643a082ec1eb88e6d9b97d7"}, +] +zipp = [ + {file = "zipp-3.4.0-py3-none-any.whl", hash = "sha256:102c24ef8f171fd729d46599845e95c7ab894a4cf45f5de11a44cc7444fb1108"}, + {file = "zipp-3.4.0.tar.gz", hash = "sha256:ed5eee1974372595f9e416cc7bbeeb12335201d8081ca8a0743c954d4446e5cb"}, +] diff --git a/pylintrc b/pylintrc new file mode 100644 index 0000000..7a0554a --- /dev/null +++ b/pylintrc @@ -0,0 +1,3 @@ +[MESSAGES CONTROL] + +disable=invalid-name,missing-docstring,broad-except,logging-fstring-interpolation diff --git a/pyproject.toml b/pyproject.toml new file mode 100644 index 0000000..ea1f200 --- /dev/null +++ b/pyproject.toml @@ -0,0 +1,33 @@ +[tool.poetry] +name = "floc-simhash" +version = "0.1.0" +description = "A fast python implementation of the SimHash algorithm" +authors = ["Engineering "] +exclude = ["tests/*"] +license = "MIT" +readme = "README.md" +homepage = "https://github.com/hybridtheory/floc-simhash" +repository = "https://github.com/hybridtheory/floc-simhash" +keywords = ["floc", "simhash"] +include = [ + "LICENSE", +] +classifiers = [ + "Topic :: Software Development :: Libraries :: Python Modules", +] + +[tool.poetry.dependencies] +python = ">=3.7,<4" +scikit-learn = "^0.24.1" + +[tool.poetry.dev-dependencies] +pytest = "^6.2.2" +pytest-cov = "^2.11.1" +hypothesis = "^6.3.4" +pylint-fail-under = "^0.3.0" +black = "^20.8b1" +pytest-xdist = "^2.2.1" + +[build-system] +requires = ["poetry-core>=1.0.0"] +build-backend = "poetry.core.masonry.api" diff --git a/tests/__init__.py b/tests/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/tests/conftest.py b/tests/conftest.py new file mode 100644 index 0000000..0b1769b --- /dev/null +++ b/tests/conftest.py @@ -0,0 +1,5 @@ +from hypothesis import settings, Verbosity + +settings.register_profile("ci", max_examples=1000) +settings.register_profile("dev", max_examples=10) +settings.register_profile("debug", max_examples=1, verbosity=Verbosity.verbose) diff --git a/tests/text/test_simhash.py b/tests/text/test_simhash.py new file mode 100644 index 0000000..420f498 --- /dev/null +++ b/tests/text/test_simhash.py @@ -0,0 +1,51 @@ +from hypothesis import given +import hypothesis.strategies as st +import pytest + +from floc_simhash.text.simhash import SimHash + + +@st.composite +def lists_of_ints_of_bounded_bits(draw, max_value: int): + n_bits = draw(st.integers(min_value=0, max_value=max_value)) + hashes = draw(st.lists(st.integers(min_value=0, max_value=2 ** n_bits), min_size=1)) + return n_bits, hashes + + +@given(st.integers(min_value=129)) +def test_large_number_of_bits_raises(n_bits): + with pytest.raises(ValueError): + SimHash(n_bits) + + +@given(st.integers(min_value=0, max_value=128)) +def test_number_of_bits_in_constructor(n_bits): + assert SimHash(n_bits=n_bits).n_bits == n_bits + + +@given(st.lists(st.integers(min_value=0, max_value=1), min_size=100, max_size=100)) +def test_bitwise_compare_computes_last_bit_accurately(hashes): + expected_next_bit = int(sum(hashes) / 100 > 0.5) + hasher = SimHash(n_bits=1) + assert hasher._bitwise_compare(hashes, 0, 0) == expected_next_bit + + +@given(lists_of_ints_of_bounded_bits(max_value=240)) +def test_bitwise_compare_arbitrary_bytes(bits_hashes): + n_bits, hashes = bits_hashes + hasher = SimHash(1) + hasher.n_bits = n_bits + simhash = hasher._bitwise_compare(hashes, 0, 0) + assert 0 <= simhash < 2 ** n_bits + + +@given(st.integers(min_value=0, max_value=128), st.text()) +def test_document_int_hashing(n_bits, document): + simhash = SimHash(n_bits).inthash(document) + assert 0 <= simhash < 2 ** n_bits + + +@given(st.integers(min_value=1, max_value=128), st.text()) +def test_document_hex_hashing(n_bits, document): + simhash = SimHash(n_bits).hash(document) + assert 0 < len(simhash) <= 2 * (n_bits // 8 + 1) diff --git a/tests/text/test_sorting.py b/tests/text/test_sorting.py new file mode 100644 index 0000000..101da12 --- /dev/null +++ b/tests/text/test_sorting.py @@ -0,0 +1,72 @@ +from typing import Tuple + +import pytest + +from hypothesis import given +import hypothesis.strategies as st + +from floc_simhash.text.sorting import SortingSimHash +from floc_simhash.text.simhash import SimHash + + +@st.composite +def bits_greater_than_given(draw, max_bits: int) -> Tuple[int, int]: + n_bits = draw(st.integers(min_value=0, max_value=max_bits)) + bits = draw(st.integers(min_value=n_bits + 1)) + return bits, n_bits + + +@st.composite +def bits_smaller_than_given(draw, max_bits: int) -> Tuple[int, int]: + n_bits = draw(st.integers(min_value=0, max_value=max_bits)) + bits = draw(st.integers(min_value=0, max_value=n_bits)) + return bits, n_bits + + +@st.composite +def bits_fitting_in_bytes(draw, max_value: int) -> Tuple[int, int]: + n_bytes = draw(st.integers(min_value=1, max_value=max_value)) + bits = draw(st.integers(min_value=1, max_value=n_bytes)) + return bits * 8 * 2, n_bytes + + +@given(bits_greater_than_given(max_bits=128)) +def test_bits_to_keep_is_lower_than_simhash_bits(bits_nbits): + with pytest.raises(ValueError): + SortingSimHash(*bits_nbits) + + +@given(bits_smaller_than_given(max_bits=128), st.text()) +def test_sortinglsh_clips_integer_hashes(bits_nbits, document): + bits, n_bits = bits_nbits + shift = n_bits - bits + sorting_hasher = SortingSimHash(bits, n_bits) + sim_hasher = SimHash(n_bits) + + sort_hash = sorting_hasher.inthash(document) + sim_hash = sim_hasher.inthash(document) + + assert sim_hash >> shift == sort_hash + + +@given(bits_smaller_than_given(max_bits=128), st.text()) +def test_sortinglsh_returns_expected_bits(bits_nbits, document): + bits, n_bits = bits_nbits + hasher = SortingSimHash(bits, n_bits) + + assert 0 <= hasher.inthash(document) < 2 ** bits + + +@given(st.integers(min_value=1, max_value=128), st.text()) +def test_keep_a_single_bit(n_bits, document): + hasher = SortingSimHash(1, n_bits) + assert 0 <= hasher.inthash(document) < 2 + + +@given(bits_fitting_in_bytes(max_value=8), st.text()) +def test_document_hex_hashing(bits_nbytes, document): + bits, n_bytes = bits_nbytes + sortinghash = SortingSimHash(bits, n_bytes * 8 * 2).hash(document) + simhash = SimHash(n_bytes * 8 * 2).hash(document) + + assert simhash.startswith(sortinghash) diff --git a/tests/transformer/test_geometric.py b/tests/transformer/test_geometric.py new file mode 100644 index 0000000..ecfe3a0 --- /dev/null +++ b/tests/transformer/test_geometric.py @@ -0,0 +1,85 @@ +import numpy as np + +from hypothesis import given +import hypothesis.strategies as st +from hypothesis.extra import numpy as st_np + +from sklearn.pipeline import Pipeline +from sklearn.feature_extraction.text import CountVectorizer + +from floc_simhash.transformer.geometric import SimHashTransformer + +bitsizes = st.integers(min_value=1, max_value=63) +dims = st.integers(min_value=1, max_value=100) +large_dims = dims.filter(lambda x: x >= 20) +dtypes = st.sampled_from([np.int64, np.float64]) + + +@given( + bitsizes, + st_np.arrays(dtypes, shape=st.tuples(dims, dims)), +) +def test_random_vectors_shape(n_bits, X): + + vectors = SimHashTransformer(n_bits).fit(X)._vectors + assert vectors.shape == (X.shape[1], n_bits), "Unexpected unit vectors shape" + + +@given( + bitsizes, + st_np.arrays(dtypes, shape=st.tuples(dims, large_dims)), +) +def test_random_vectors_are_unit(n_bits, X): + vectors = SimHashTransformer(n_bits).fit(X)._vectors + + norms = np.linalg.norm(vectors, axis=0) + assert norms.shape[0] == n_bits, "There should be n_bits unit vectors picked" + assert all(np.abs(norms - 1) <= 0.0001), "Random vectors should be unitary" + + +@given( + bitsizes, + st_np.arrays( + dtypes, + shape=st.tuples(large_dims, large_dims), + ), +) +def test_hex_strings(n_bits, X): + cohorts = list(SimHashTransformer(n_bits).fit_transform(X)) + + assert all(h[0:2] == "0x" for h in cohorts), "Cohorts expected as hex strings" + values = [0 <= int(h, 16) < 2 ** n_bits for h in cohorts] + assert all(values), "Hex strings longer than expected" + + +@given( + st.sampled_from([8, 16]), + st.lists( + st.lists( + st.text( + alphabet=st.characters(blacklist_characters=["|"]), + max_size=10, + ), + min_size=1, + max_size=100, + ), + min_size=1, + max_size=100, + ), +) +def test_in_pipeline(n_bits, texts): + documents = ["|".join(d) for d in texts] + pipeline = Pipeline( + [ + ( + "vect", + CountVectorizer(tokenizer=lambda x: x.split("|"), binary=True), + ), + ("simhash", SimHashTransformer(n_bits)), + ] + ) + + cohorts = pipeline.fit_transform(documents) + for cohort in cohorts: + assert cohort[0:2] == "0x" + assert 0 <= int(cohort, 16) < 2 ** n_bits