New Jersey Areas Air Quality Prediction Model Description

Training Data
Training Data

The dataset used for training the air quality forecasting model is sourced from 'airquality.csv', which is a part of a comprehensive collection of atmospheric data. This data contains various features that are pertinent to air quality prediction. Before feeding it to the model, the dataset is divided into training and test sets. The training data assists the model in learning and recognizing patterns, while the test data is used to evaluate the model's accuracy. To ensure the model processes the most accurate information, the features in the dataset are normalized to a standard scale. The primary sources of this data include:

  • Environmental Protection Agency (EPA) Air Monitoring Site
  • NOAA's National Centers for Environmental Information (NCEI)
  • Model Structure
    Model Structure

    The backbone of our air quality forecasting tool is a neural network designed using TensorFlow's Keras library. This network comprises three layers:

  • The first layer has 12 neurons and takes in 8 input features related to atmospheric conditions, using a ReLU activation function.
  • The subsequent layer has 8 neurons, again activated by ReLU.
  • The final layer has a single neuron with a sigmoid activation function to predict the air quality index. Once the structure is defined, the model is compiled using a mean squared error loss function, optimized with the Adam optimizer. This structure ensures that the model can process the input features efficiently and provide a reliable air quality forecast.
  • Resulting Statistics
    Resulting Statistics

    To ensure our model's reliability, we validate it using the reserved test data. During the training phase, the model's accuracy is continually checked against this test data, allowing us to see how well it performs on data it hasn't seen before. Additionally, to understand the relationships between different atmospheric conditions, a heatmap is generated, showcasing the correlation between each feature. But that's not all – we also employ a RandomForest classifier to discern the importance of each feature in the forecasting process. This not only helps in model refinement but also gives users insight into which factors majorly contribute to air quality variations.
    As a result, our air quality forecasting model has demonstrated exceptional performance, achieving an accuracy rate of more than 89%. This accuracy rate ensures that the forecasts made by the model are reliable for assessing air quality levels.