-
Notifications
You must be signed in to change notification settings - Fork 408
Open
Labels
FAQFrequently asked questionFrequently asked question
Description
Hi,
I have time data and split to train and test (keep it unseen) by slicing the df
from the end part. I used your pipeline over data_train
and tried to forecast as length as data_test
unsuccessfully as below :
#-----------------------------------------------------------
# Libs
#-----------------------------------------------------------
# for plotting, run: pip install pandas matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
from chronos import ChronosPipeline
#-----------------------------------------------------------
# LOAD THE DATASET
#-----------------------------------------------------------
df = pd.read_csv('https://raw.githubusercontent.com/amcs1729/Predicting-cloud-CPU-usage-on-Azure-data/master/azure.csv')
df['timestamp'] = pd.to_datetime(df['timestamp'])
data = df.rename(columns={'min cpu': 'min_cpu',
'max cpu': 'max_cpu',
'avg cpu': 'avg_cpu',})
# Data preparation
# ==============================================================================
sliced_df = data[['timestamp', 'avg_cpu']]
# Convert data from Hz to MHz
# ==============================================================================
sliced_df['avg_cpu_Mhz'] = sliced_df['avg_cpu'] / 1000000
sliced_df
# Configuration
# ==============================================================================
name_columns='avg_cpu_Mhz'
lags=288
steps=288
n_backtest=3
step_size = steps * n_backtest
data_train = sliced_df[:-step_size]
data_test = sliced_df[-step_size:] #unseen
# Pipeline
# ==============================================================================
pipeline = ChronosPipeline.from_pretrained(
"amazon/chronos-t5-small",
device_map="cuda",
torch_dtype=torch.bfloat16,
)
# context must be either a 1D tensor, a list of 1D tensors,
# or a left-padded 2D tensor with batch as the first dimension
context = torch.tensor(data_train['avg_cpu_Mhz'])
prediction_length = 64 #len(data_test) #12
forecast = pipeline.predict(
context,
prediction_length,
num_samples=288, #20,
temperature=1.0,
top_k=50,
top_p=1.0,
) # forecast shape: [num_series, num_samples, prediction_length]
but results is as follow:
# visualize the forecast
forecast_index = range(len(data_train), len(data_train) + prediction_length)
low, median, high = np.quantile(forecast[0].numpy(), [0.1, 0.5, 0.9], axis=0)
plt.figure(figsize=(8, 4))
plt.plot(data_train['avg_cpu_Mhz'], color="royalblue", label="historical train data")
plt.plot(data_test['avg_cpu_Mhz'] , color="navy", label="historical test data", linestyle='dashed')
plt.plot(forecast_index, median, color="tomato", label="median forecast")
plt.fill_between(forecast_index, low, high, color="tomato", alpha=0.3, label="80% prediction interval")
plt.title('Chronos forecast result')
plt.ylabel(' CPU usage [MHz]', fontsize=15)
plt.xlabel('Timestamp', fontsize=15)
plt.legend()
plt.grid()
plt.show()
- How I can configure the arguments within
predict()
class to have forecast autoregressive over unseendata_test
? - why
prediction_length
recommended to be <=64 ?
Metadata
Metadata
Assignees
Labels
FAQFrequently asked questionFrequently asked question