This repository contains a diverse collection of machine learning and data science datasets covering various domains including sports analytics, environmental monitoring, energy efficiency, and sensor data.
Dataset | Size | Rows | Description | Domain |
---|---|---|---|---|
ARM-Metric-train-TS.csv | 6.3MB | 41,889 | RSS signal strength data for activity recognition | IoT/Sensors |
ARM-Metric-test-TS.csv | 54KB | 351 | Test set for RSS signal strength data | IoT/Sensors |
CPP_base_ampliado.csv | 3.6MB | 57,409 | Power plant energy efficiency data | Energy |
DM-Metric-missing-3.csv | 587KB | 3,805 | Environmental sensor data with missing values | Environmental |
ENB2012_data-Y1.csv | 38KB | 770 | Energy efficiency building data | Building/Energy |
football.train2-r.csv | 3.3MB | 26,469 | Football match statistics and results | Sports |
football.test2-r.csv | 212KB | 2,037 | Test set for football match data | Sports |
Files: ARM-Metric-train-TS.csv
, ARM-Metric-test-TS.csv
Description: Time series data containing RSS (Received Signal Strength) measurements from multiple sensors for activity recognition tasks.
Features:
avg_rss12
,var_rss12
: Average and variance of RSS between sensors 1-2avg_rss13
,var_rss13
: Average and variance of RSS between sensors 1-3avg_rss23
,var_rss23
: Average and variance of RSS between sensors 2-3ATYPE
: Activity type (e.g., "bending")- Multiple time windows with suffix
_1
,_2
,_3
,_4
,_5
Use Cases: Activity recognition, IoT sensor analysis, time series classification
File: CPP_base_ampliado.csv
Description: Comprehensive power plant dataset with environmental and operational parameters for energy efficiency prediction.
Features:
AT
: Atmospheric TemperatureV
: Exhaust VacuumAP
: Atmospheric PressureRH
: Relative HumidityPE
: Net Electrical Energy Output (target variable)- Interaction features:
AT-V
,AT-AP
,AT-RH
,AT-PE
,AP-RH
,AP-PE
Use Cases: Energy efficiency prediction, power plant optimization, regression analysis
File: DM-Metric-missing-3.csv
Description: Multi-sensor environmental monitoring data with intentional missing values for data imputation research.
Features:
- Indoor Sensors: Temperature, CO2, Humidity, Lighting (Comedor & Habitacion)
- Weather Data: Precipitation, Wind, Solar radiation, Exterior temperature/humidity
- Temporal: Date, Time, Day of Week
- Environmental: Crepusculo, Piranometro, Entalpic measurements
Use Cases: Data imputation, environmental monitoring, time series analysis, missing data research
File: ENB2012_data-Y1.csv
Description: Building energy efficiency dataset from the ENB2012 study, focusing on heating and cooling load prediction.
Features:
X1
toX8
: Building characteristics (compactness, surface area, wall area, roof area, height, orientation, glazing area, glazing distribution)Y2
: Heating load (target variable)VAL
: Validation indicator
Use Cases: Building energy efficiency, heating load prediction, architectural optimization
Files: football.train2-r.csv
, football.test2-r.csv
Description: Comprehensive football match statistics and results data for sports analytics and prediction modeling.
Features:
- Match Info: Division, Date, Home/Away teams
- Scores: Full-time and half-time goals (FTHG, FTAG, HTHG, HTAG)
- Results: Full-time and half-time results (FTR, HTR)
- Statistics: Shots, shots on target, fouls, corners, cards
- Performance Metrics: Recent form indicators (res1, res5, res20)
- Target Variables: HomeWin, ScoreDraw
Use Cases: Sports betting, team performance analysis, match outcome prediction
- Python 3.7+
- pandas, numpy, matplotlib, seaborn (for data analysis)
- scikit-learn (for machine learning tasks)
# Clone the repository
git clone <repository-url>
cd datasets
# Install required packages
pip install pandas numpy matplotlib seaborn scikit-learn
import pandas as pd
import matplotlib.pyplot as plt
# Load a dataset
df = pd.read_csv('ENB2012_data-Y1.csv')
print(f"Dataset shape: {df.shape}")
print(f"Columns: {df.columns.tolist()}")
print(f"First few rows:\n{df.head()}")
# Basic statistics
print(df.describe())
# Check for missing values
print(df.isnull().sum())
# Visualize distributions
df.hist(figsize=(12, 8))
plt.tight_layout()
plt.show()
- ARM-Metric datasets for activity recognition
- DM-Metric for environmental monitoring trends
- Football data for performance over time
- CPP_base_ampliado for energy efficiency prediction
- ENB2012_data for building heating load prediction
- ARM-Metric for activity type classification
- Football data for match outcome prediction
- DM-Metric-missing-3 for testing missing value strategies
- Interaction features in CPP_base_ampliado
- Temporal features in football and DM-Metric datasets
- Missing Values: DM-Metric-missing-3 contains intentional missing values for research purposes
- Data Types: Most datasets contain mixed numerical and categorical variables
- Scaling: Some features may require normalization (e.g., RSS values, energy measurements)
- Temporal Aspects: Football and DM-Metric datasets include time-based features
- ARM-Metric: Research on RSS-based activity recognition
- CPP_base_ampliado: Power plant efficiency optimization studies
- ENB2012: Building energy efficiency standards and research
- Football Data: Sports analytics and betting research
This repository contains datasets for research and educational purposes. Please check individual dataset licenses and cite original sources when using in publications.