PJM Energy Consumption Forecast
This repository contains machine learning models trained to forecast energy consumption for the PJM Interconnection, one of the largest grid operators in the United States. The models can predict energy consumption for both 24-hour and 7-day horizons.
Models Overview
We've trained and compared multiple models for energy consumption forecasting:
Available Models
- SARIMA (Statistical approach)
- Random Forest (Ensemble method)
- XGBoost (Gradient boosting)
- LSTM (Deep learning)
Model Characteristics
- SARIMA: Statistical approach, captures temporal dependencies
- Random Forest: Ensemble method, good at capturing non-linear relationships
- XGBoost: Gradient boosting, typically best for structured data
- LSTM: Deep learning approach, specialized for sequential data
Performance Metrics
The models were evaluated using multiple metrics:
- MAE (Mean Absolute Error)
- RMSE (Root Mean Square Error)
- MAPE (Mean Absolute Percentage Error)
Trade-offs
- SARIMA: Simple, interpretable, but less flexible
- ML Models: More complex, better performance, require more data
- LSTM: Best for capturing long-term dependencies, but most computationally intensive
Feature Sets
24-hour Prediction Features
The models use various features including:
- Weather data (temperature, wind speed, precipitation)
- Temporal features (hour, day, month, weekday)
- Lag features (24h, 48h, 72h, 96h, 120h, 144h)
- Rolling statistics
- Holiday indicators
- Seasonal components
Data Preparation
- Training set size: 28,780 samples
- Test set size: 7,195 samples
- Total features: 89 (24h prediction)
Usage Requirements
Dependencies
- pandas
- numpy
- scikit-learn
- xgboost
- tensorflow
- statsmodels
Input Features Required
Key feature categories:
Weather Features:
- Average wind speed
- Precipitation
- Temperature (avg, max, min)
- Weather data from multiple cities (Chicago, Washington, Pittsburgh, Columbus)
Temporal Features:
- Year, hour, day, month
- Day of week
- Cyclical encodings (hour_sin, hour_cos, etc.)
- Time of day indicators (morning, afternoon, evening, night)
Historical Load:
- Previous day consumption
- Weekly lags
- Rolling statistics
Calendar Features:
- Holidays
- Seasonal indicators
- Weekend flags
Model Training
The models were trained using a time series split approach:
- 80% training data
- 20% test data
- Careful feature selection to avoid future data leakage
- Multiple evaluation metrics for comprehensive performance assessment
Limitations and Considerations
- Data Requirements: Models need extensive historical data and weather information
- Computational Resources: LSTM models require more computational power
- Feature Availability: Real-time predictions require access to current weather data
- Update Frequency: Models should be periodically retrained with new data
- Downloads last month
- 23