Illustrations by @gaiaparte
Documentation: https://zazza123.github.io/hamana
hamana (Hamster Analysis) is a Python library designed to simplify data analysis by combining the practicality of pandas and SQL in an open-source environment. This library was born from the experience of working in a large company where tools like SAS
were often used as "shortcuts" to perform SQL queries across different data sources, without fully leveraging their potential. With the goal of providing a free and accessible alternative, hamana
replicates these functionalities in an open-source context.
- Support for Multiple Data Sources: Connect to various data sources such as relational databases, CSV files, mainframes, and more.
- SQLite Integration: Save data locally in an SQLite database, either as a file or in memory.
- SQL + pandas: Combine the power of
SQL
with the flexibility ofpandas
for advanced analysis. - Open Source: Available to everyone without licensing costs.
- Why "Hamster"?: Because hamsters are awesome!
Hamana allows you to extract data from a variety of sources:
- Relational databases (SQLite, Oracle, etc.)
- CSV, Excel, JSON, and other common file formats
- Legacy sources like mainframes
Extractions are automatically saved as pandas
DataFrames, making data manipulation simple and intuitive.
Each extraction can be saved in an SQLite database, enabling you to:
- Store data locally for future use
- Perform
SQL
queries to combine extractions from different sources
With Hamana, you can:
- Use
pandas
to quickly and flexibly manipulate data - Write
SQL
queries directly on datasets stored in SQLite - Integrate
SQL
andpandas
into a single workflow for advanced analysis
Hamana is available on PyPI, and you can install it easily with pip:
pip install hamana
Here is an example of how to use Hamana to connect to a data source, extract information, and combine it with another table:
import hamana as hm
# connect hamana database
hm.connect()
# connect to Oracle database
oracle_db = hm.connector.db.Oracle.new(
host = "localhost",
port = 1521,
user = "user",
password = "password"
)
# define, execute and store a query
orders = hm.Query("SELECT * FROM orders")
oracle_db.to_sqlite(orders, table_name = "orders")
# load a CSV file and store it in SQLite
customers = hm.connector.file.CSV("customers.csv")
customers.to_sqlite(table_name = "customers")
# combine the two tables using SQL
customers_orders = hm.execute("""
SELECT
c.customer_name
, o.order_date
, o.total
FROM customers c
JOIN orders o ON
c.customer_id = o.customer_id"""
)
# use `pandas` for further analysis
print(customers_orders.result.head())
# close connection
hm.disconnect()
If you want to contribute to Hamana:
-
Fork the repository.
-
Create a branch for your changes:
git checkout -b feature/your-feature-name
-
Submit a pull request describing the changes.
All contributions are welcome!
This project is distributed under the BSD 3-Clause "New" or "Revised" license.
For questions or suggestions, you can open an Issue on GitHub or contact me directly.
Thank you for choosing Hamana!