+
Skip to content

MBKEngineers/collect

Repository files navigation

collect

Web scraping utilties for DWR, USACE, USGS, CNRFC, CVO and SacALERT data repositories.

Setup instructions

Create a virtual environment, specifying Python version 3.11

With pure Python (3+)

Create a virtual environment with Python 3's built-in venv library.

$ python -m venv ~/.virtualenvs/myenv

Activate with $ myenv\Scripts\activate.bat (Windows) or $ source myenv/bin/activate (MacOS).

Other virtualenv managers

  • pyenv
  • virtualenvwrapper

Download the source code for this package.

$ git clone https://github.com/MBKEngineers/collect.git

Install collect as a Python package available for local use.

Use the "editable" flag (-e) flag to make sure changes in your repo are propagated to any use of your virtualenv.

$ cd collect
$ python -m pip install -e .

Install Java

If you plan to use the collect.cvo module which depends on tabula-py, you will need to install Java. Follow the instructions at: https://tabula-py.readthedocs.io/en/latest/getting_started.html

Configure package variables

Add username and password credentials to a .env file to enable downloading data from password-protected sources.

Updating Documentation

The collect module uses Sphinx to generate documentation from doc-strings in the project. To update and access documentation files, make sure that Sphinx is installed:

$ python -m pip install -e ".[docs]"

Namespace

Note, there is one other Python package on PyPi named collect. However, it is not maintained and is dated 2011, so not expecting MBK codebase to use that tool.

Adding new modules

collect now includes a command line interface for starting a new module called collect-start. Initialize a new module from a template with

$ collect-start modulename

About

MBK Python scripts for scraping water data from the web

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 9

Languages

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载