+
Skip to content

NRCan/etl-toolbox

Repository files navigation

NRCAN ETL Toolbox

codecov CI

Pour la version française de ce document, consultez README-fr.md.

etl-toolbox is a Python toolkit designed to simplify Extract, Transform, and Load (ETL) data processes. This modular toolkit offers several specialized components for different aspects of ETL workflows.

Components

etl_logging

Specialized logging module for ETL processes, allowing simple configuration and efficient log analysis.

etl_toolbox

Collection of tools for reading data from various sources. It includes readers for different file formats and databases, facilitating data integration in ETL processes:

  • Data Readers: CSV, Excel, GeoPackage, JSON, PostGIS, Shapefile

database

Interfaces and ORM for interacting with different database systems:

  • Database Interfaces: Abstract object handlers for database interactions
  • ORM: Object-relational mappings to simplify data access

Installation

Install the package via Poetry:

poetry install

Or by creating a distribution:

poetry build
pip install dist/nrcan_etl_toolbox-*.whl

Usage

Logging Module (etl_logging)

from nrcan_etl_toolbox.etl_logging import CustomLogger

logger = CustomLogger(level='INFO'
                      ,logger_type='verbose',
                      logger_file_name='test_logger.log')

# Logging messages
logger.info("Starting ETL process")
logger.debug("Technical details", extra={"data": {"items": 100}})
logger.error("Processing error", exc_info=True)

Data Readers (etl_toolbox)

from nrcan_etl_toolbox.etl_toolbox.reader import ReaderFactory

# Creating a CSV reader
csv_reader = ReaderFactory(input_source="data.csv")
data = csv_reader.data

# Creating a Shapefile reader
shp_reader = ReaderFactory(input_source="data.shp")
geo_data = shp_reader.data

Database Interface

# TODO: Complete documentation.
from nrcan_etl_toolbox.database.interface import AbstractDatabaseHandler
# Usage example to be documented

Development

To contribute to the project, install development dependencies:

poetry install --with dev

Run tests with:

pytest

Project Structure

nrcan_etl_toolbox/
├── database/               # Database interactions
│   ├── interface/          # Abstract interfaces for databases
│   └── orm/                # Object-relational mappings
├── etl_logging/            # ETL logging module
└── etl_toolbox/            # Main ETL tools
    └── reader/             # Data source readers
        └── source_readers/ # Specific reader implementations

Authors

For questions or suggestions, please use the project's GitHub issues.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载