+
Skip to content

e2its/datasets

Repository files navigation

Datasets Repository

This repository contains a diverse collection of machine learning and data science datasets covering various domains including sports analytics, environmental monitoring, energy efficiency, and sensor data.

📊 Dataset Overview

Dataset Size Rows Description Domain
ARM-Metric-train-TS.csv 6.3MB 41,889 RSS signal strength data for activity recognition IoT/Sensors
ARM-Metric-test-TS.csv 54KB 351 Test set for RSS signal strength data IoT/Sensors
CPP_base_ampliado.csv 3.6MB 57,409 Power plant energy efficiency data Energy
DM-Metric-missing-3.csv 587KB 3,805 Environmental sensor data with missing values Environmental
ENB2012_data-Y1.csv 38KB 770 Energy efficiency building data Building/Energy
football.train2-r.csv 3.3MB 26,469 Football match statistics and results Sports
football.test2-r.csv 212KB 2,037 Test set for football match data Sports

🗂️ Detailed Dataset Descriptions

1. ARM-Metric Datasets (Activity Recognition)

Files: ARM-Metric-train-TS.csv, ARM-Metric-test-TS.csv

Description: Time series data containing RSS (Received Signal Strength) measurements from multiple sensors for activity recognition tasks.

Features:

  • avg_rss12, var_rss12: Average and variance of RSS between sensors 1-2
  • avg_rss13, var_rss13: Average and variance of RSS between sensors 1-3
  • avg_rss23, var_rss23: Average and variance of RSS between sensors 2-3
  • ATYPE: Activity type (e.g., "bending")
  • Multiple time windows with suffix _1, _2, _3, _4, _5

Use Cases: Activity recognition, IoT sensor analysis, time series classification

2. CPP_base_ampliado (Power Plant Data)

File: CPP_base_ampliado.csv

Description: Comprehensive power plant dataset with environmental and operational parameters for energy efficiency prediction.

Features:

  • AT: Atmospheric Temperature
  • V: Exhaust Vacuum
  • AP: Atmospheric Pressure
  • RH: Relative Humidity
  • PE: Net Electrical Energy Output (target variable)
  • Interaction features: AT-V, AT-AP, AT-RH, AT-PE, AP-RH, AP-PE

Use Cases: Energy efficiency prediction, power plant optimization, regression analysis

3. DM-Metric-missing-3 (Environmental Sensors)

File: DM-Metric-missing-3.csv

Description: Multi-sensor environmental monitoring data with intentional missing values for data imputation research.

Features:

  • Indoor Sensors: Temperature, CO2, Humidity, Lighting (Comedor & Habitacion)
  • Weather Data: Precipitation, Wind, Solar radiation, Exterior temperature/humidity
  • Temporal: Date, Time, Day of Week
  • Environmental: Crepusculo, Piranometro, Entalpic measurements

Use Cases: Data imputation, environmental monitoring, time series analysis, missing data research

4. ENB2012_data-Y1 (Energy Efficiency Buildings)

File: ENB2012_data-Y1.csv

Description: Building energy efficiency dataset from the ENB2012 study, focusing on heating and cooling load prediction.

Features:

  • X1 to X8: Building characteristics (compactness, surface area, wall area, roof area, height, orientation, glazing area, glazing distribution)
  • Y2: Heating load (target variable)
  • VAL: Validation indicator

Use Cases: Building energy efficiency, heating load prediction, architectural optimization

5. Football Datasets (Sports Analytics)

Files: football.train2-r.csv, football.test2-r.csv

Description: Comprehensive football match statistics and results data for sports analytics and prediction modeling.

Features:

  • Match Info: Division, Date, Home/Away teams
  • Scores: Full-time and half-time goals (FTHG, FTAG, HTHG, HTAG)
  • Results: Full-time and half-time results (FTR, HTR)
  • Statistics: Shots, shots on target, fouls, corners, cards
  • Performance Metrics: Recent form indicators (res1, res5, res20)
  • Target Variables: HomeWin, ScoreDraw

Use Cases: Sports betting, team performance analysis, match outcome prediction

🚀 Getting Started

Prerequisites

  • Python 3.7+
  • pandas, numpy, matplotlib, seaborn (for data analysis)
  • scikit-learn (for machine learning tasks)

Installation

# Clone the repository
git clone <repository-url>
cd datasets

# Install required packages
pip install pandas numpy matplotlib seaborn scikit-learn

Quick Start Examples

Load and Explore Data

import pandas as pd
import matplotlib.pyplot as plt

# Load a dataset
df = pd.read_csv('ENB2012_data-Y1.csv')
print(f"Dataset shape: {df.shape}")
print(f"Columns: {df.columns.tolist()}")
print(f"First few rows:\n{df.head()}")

Basic Data Analysis

# Basic statistics
print(df.describe())

# Check for missing values
print(df.isnull().sum())

# Visualize distributions
df.hist(figsize=(12, 8))
plt.tight_layout()
plt.show()

📈 Common Use Cases

1. Time Series Analysis

  • ARM-Metric datasets for activity recognition
  • DM-Metric for environmental monitoring trends
  • Football data for performance over time

2. Regression Tasks

  • CPP_base_ampliado for energy efficiency prediction
  • ENB2012_data for building heating load prediction

3. Classification Tasks

  • ARM-Metric for activity type classification
  • Football data for match outcome prediction

4. Data Imputation Research

  • DM-Metric-missing-3 for testing missing value strategies

5. Feature Engineering

  • Interaction features in CPP_base_ampliado
  • Temporal features in football and DM-Metric datasets

🔍 Data Quality Notes

  • Missing Values: DM-Metric-missing-3 contains intentional missing values for research purposes
  • Data Types: Most datasets contain mixed numerical and categorical variables
  • Scaling: Some features may require normalization (e.g., RSS values, energy measurements)
  • Temporal Aspects: Football and DM-Metric datasets include time-based features

📚 Additional Resources

  • ARM-Metric: Research on RSS-based activity recognition
  • CPP_base_ampliado: Power plant efficiency optimization studies
  • ENB2012: Building energy efficiency standards and research
  • Football Data: Sports analytics and betting research

📄 License

This repository contains datasets for research and educational purposes. Please check individual dataset licenses and cite original sources when using in publications.

About

Datasets for project testing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载