EMIT Model - Environmental Monitoring and Intelligence Tool

Title

EMIT Model - Environmental Monitoring and Intelligence Tool (CatBoost Classifier)


Overview

The EMIT Model (Environmental Monitoring and Intelligence Tool) is an advanced CatBoost Classifier designed to predict potential mining areas by analyzing environmental data. This tool is a part of the EMiTAL (Environmental Monitoring and Intelligence Tool Algorithm) framework and leverages Remote Sensing, RayCasting, and Polygon Gridding techniques to provide high-precision identification of viable mining zones.

Goal

To support decision-making in mining by providing a robust predictive model that identifies areas with high mining potential based on environmental characteristics. This model benefits regulatory bodies, mining companies, and environmental agencies aiming to balance resource extraction with sustainability.


Framework: EMiTAL

The EMiTAL framework integrates several innovative approaches to enhance prediction accuracy:

  • Remote Sensing: Captures large-scale environmental data (e.g., vegetation, soil, and air quality).
  • RayCasting and Polygon Gridding: Segments geographic regions into grids, enabling precise targeting.
  • Environmental Indicators:
    • NDVI (Normalized Difference Vegetation Index): Measures vegetation health.
    • NDWI (Normalized Difference Water Index): Evaluates water content.
    • NDTI (Normalized Difference Tillage Index): Assesses soil disturbance.
    • Land Elevation: Provides terrain insights.
    • Air Quality Metrics: NO2, PM10, and CO to gauge environmental impact.

Model Pipeline

The model pipeline is built to preprocess and optimize environmental data for classification. Using CatBoost’s native handling of categorical data, the pipeline minimizes preprocessing complexity while ensuring high performance.

  • Model Type: CatBoost Classifier
  • Objective: Binary classification to predict if a region is suitable for mining (True for viable, False for non-viable).
  • Cross-Validation Results:
    • Mean Accuracy: 78.32%
    • Standard Deviation: 4.25%
  • Final Accuracy on Test Data: 90.32%

Dataset and Features

Input Features:

  • Latitude and Longitude: Geospatial coordinates.
  • NDVI, NDWI, NDTI: Environmental indices critical for mining predictions.
  • Land Elevation: Topographic information.
  • Vegetation Index: Encoded categories (Null, Sparse, Moderate, Healthy).
  • Air Quality Metrics: NO2, PM10, and CO levels.

Initial Dataset:

  • Total Records: 152
  • Data Types: Numerical, categorical, and boolean.
  • Categorical Features: Vegetation Index, handled natively by CatBoost.

Model Performance

Key Metrics:

  • Accuracy: 90.32%

  • Precision, Recall, F1-Score:

    Class Precision Recall F1-Score Support
    False 0.86 0.75 0.80 8
    True 0.92 0.96 0.94 23
  • Overall Accuracy: 90%

  • Macro Average: Precision = 0.89, Recall = 0.85, F1-Score = 0.87

  • Weighted Average: Precision = 0.90, Recall = 0.90, F1-Score = 0.90

Confusion Matrix:

Predicted False Predicted True
Actual False 6 2
Actual True 1 22

Feature Importance

The model identified the following features as most influential:

Feature Importance (%)
Longitude 40.50
NO2 25.81
Latitude 19.43
NDWI 4.85
NDVI 4.60
NDTI 4.41
Vegetation Index (Encoded) 0.30
Land Elevation 0.10
PM10 0.00
CO 0.00

Usage Instructions

To use this model:

  1. Prepare your dataset with the specified input features.
  2. Ensure feature names match the training dataset.
  3. Run predictions using the following script:
import joblib
import pandas as pd

# Load the model
model = joblib.load("emit_model_catboost.joblib")

# Load and preprocess your data
data = pd.read_csv("path/to/your/data.csv")
predictions = model.predict(data)

Authors

  • Joseph Ackon
  • Felix Kudjo Mlagada
  • Aristotle Mbroh
  • Prince Mawuko Dzorkpe
  • Manford Ehuntem

Acknowledgments: Thanks to Takoradi Technical University, Data Hackathon Ghana Statistical Service (2024), and StatsBank for their support.


This version of the EMIT model is optimized with CatBoost for better performance on mixed-type datasets. Let me know if further updates are needed!

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for PMDEVS/explorers_emit_model_v1.0

Finetuned
(1)
this model