In an earlier post, we learned how to fetch data from a source (OpenMeteo) and make some decisions based on how hot or cool the source and destination are. Last week, I was assigned with another extension of the problem -
Train a simple model that forecasts the weather conditions in a given future. To simplify things, restrict predictions to the Dhaka district. After training the model, the model should be queryable via a simple API. For example, your API should be able to predict the temperature at any future date (beyond the 7 days provided by OpenMeteo).
We need to create a machine learning model with lots of historical data to predict data beyond 7 days (OpenMeteo can give up to 16 days forecast). Instead of using the default, We need the url for the historical dataset. This is what we need - https://archive-api.open-meteo.com/v1/archive .
One thing: use Jupyter Notebook instead of Colab from the get-go. This is a heavy dataset (starting from 1940), and if you train your model with this dataset, the free-tier GPU cannot keep up with it and crashes.
Here is a flow diagram of the whole process of building a prediction system using Python/Django and an ML model:
1. Fetch data from the source
First, install virtual environments. Here is a crucial part of the development: your trained model and backend should comply with the same Python version. Now, let's install some dependencies in the environment:
pip install openmeteo_requests
pip install pandas
pip install requests_cache
pip install retry_requests
pip install jupyter-notebook
pip install matplotlib
Launch the jupyter notebook and create a notebook. The OpenMeteo website has codes for fetching and storing the datasets.
import openmeteo_requests
import requests_cache
import pandas as pd
from retry_requests import retry
# Setup the Open-Meteo API client with cache and retry on error
cache_session = requests_cache.CachedSession('.cache', expire_after = 3600)
retry_session = retry(cache_session, retries = 5, backoff_factor = 0.2)
openmeteo = openmeteo_requests.Client(session = retry_session)
# Make sure all required weather variables are listed here
# The order of variables in hourly or daily is important to assign them correctly below
url = "https://api.open-meteo.com/v1/forecast"
params = {
"latitude": 23.8103, # Latitude for Dhaka
"longitude": 90.4125, #Longitude for Dhaka
"start_date": "1940-01-01", # This is the farthest
that you can go with the start date
"end_date": "2023-11-10", #This is the day that is
the end of your range, adjust as you wish
"hourly": "temperature_2m"
}
responses = openmeteo.weather_api(url, params=params)
# Process first location. Add a for-loop for multiple locations or weather models
response = responses[0]
print(f"Coordinates {response.Latitude()}°E {response.Longitude()}°N")
print(f"Elevation {response.Elevation()} m asl")
print(f"Timezone {response.Timezone()} {response.TimezoneAbbreviation()}")
print(f"Timezone difference to GMT+0 {response.UtcOffsetSeconds()} s")
# Process hourly data. The order of variables needs to be the same as requested.
hourly = response.Hourly()
hourly_temperature_2m = hourly.Variables(0).ValuesAsNumpy()
hourly_data = {"date": pd.date_range(
start = pd.to_datetime(hourly.Time(), unit = "s"),
end = pd.to_datetime(hourly.TimeEnd(), unit = "s"),
freq = pd.Timedelta(seconds = hourly.Interval()),
inclusive = "left"
)}
hourly_data["temperature_2m"] = hourly_temperature_2m
hourly_dataframe = pd.DataFrame(data = hourly_data)
print(hourly_dataframe)
If you want to view the data frame in a CSV file, use df.to_csv('file_name') in the end.
2. EDA
EDA or Exploratory Data Analysis is performed over the collected data to get an initial observation. Run the following pandas commands in each cell to get an overview of the dataset:
df.describe() # provides statistical insight into the dataset
df.dtypes # provides the datatype of each column
df.shapes # provide the dimension of the dataset
Check for null values in the dataset:
null_counts = df.isnull().sum()
# Print or use the result as needed
print(null_counts)
This code will return 0 for each column as there are no null values. You are lucky because if it was not, you have to spend a lot of time structuring and preprocessing the dataset.
3. Preprocess the data (If needed)
Let's say you do need to preprocess and clean up the dataset. You can check for the Interquartile Range of the datasets and replace the outliers with median values. Check for imputation methods such as KNN Imputer if your dataset needs that - .
4. Finding relationships
What model you will be required for your problem depends on what you want to answer your question. It can be discrete (yes/no) or continuous (what is the price of a house). We probe to Classification problems for discrete types of answers, whereas for continuous ones, we apply *Regression * models. Check this amazing 6-part blog series to understand the basics - Data Science for Cats
Let's plot the dataset:
import matplotlib.pyplot as plt
df.set_index('date', inplace=True)
# Plot the time series data
plt.figure(figsize=(15, 6))
plt.plot(df['temperature_2m'])
plt.title('Temperature Over Time')
plt.xlabel('Date')
plt.ylabel('Temperature (°C)')
plt.show()
# Plot a histogram of temperature values
plt.figure(figsize=(10, 6))
plt.hist(df['temperature_2m'], bins=30, edgecolor='black')
plt.title('Temperature Distribution')
plt.xlabel('Temperature (°C)')
plt.ylabel('Frequency')
plt.show()
The output looks like a rhythmic flow, is not it?
This is a special type of regression which is called a time series. In this regression, each data point is dependent on a timestamp. Following is a better view, where I sampled data from 2013 to the current year:
This graph shows the seasonality and upward trend with some residuals (Refer to 4 and 5 of the cat blog). For making the prediction, you need to do some preprocessing before fitting it to a model.
5. Applying Model to the Time-series - Prophet
For time series analysis, we usually use ARIMA and Facebook Prophet models. ARIMA is not beginner-friendly and requires a thorough theoretical understanding. FB Prophet takes care of the preprocessing (decomposing a time series and extracting seasonality, trend, and residuals) we discussed in the previous section. First, download the package:
pip install prophet
The Prophet requires just two columns, and they have to be named ds and y. ds is the date-time column, and y is what you want to predict. You need to rename your data frame, and then it will be suitable for fitting into the model. You need to create a data frame over a period in future and then pass it inside the predict dunction:
import matplotlib.pyplot as plt
from prophet import Prophet
print("Original DataFrame columns:", df.columns)
# Rename columns to match the Prophet's requirements
df = df.rename({'date': 'ds', 'temperature_2m': 'y'}, axis=1)
# Check the column names after renaming
print("DataFrame columns after renaming:", df.columns)
# Model fit
m = Prophet()
m.fit(df)
# Predict
future = m.make_future_dataframe(periods=365)
forecast = m.predict(future)
# Plot results
fig1 = m.plot(forecast)
fig2 = m.plot_components(forecast)
plt.show()
forecast
Watch something while this dataset gets trained. It took me 50 minutes to finally get the output of the model.
There is also a data frame with various parameters. Your predicted value is the yhat .
You need the model file for your API. Run the following code in a cell to get the JSON file (The prophet model can only be exported as a JSON file):
from prophet.serialize import model_to_json, model_from_json
with open('model.json', 'w') as fout:
fout.write(model_to_json(m))
with open('model.json', 'r') as fin:
m = model_from_json(fin.read())
To learn more about the Prophet, refer to this link - medium
6. Load the Model in Django
The AI part is done. We will now add logic for building a queryable API. First, add the notebook and the model inside your Django app. Then, load the model in the views.py:
def load_model():
with open('./apilist/utils/model.json', 'r') as fin:
return model_from_json(fin.read())
class WeatherPredictionAPIView(APIView):
async def post(self, request):
# we will implement it in the following section
7. Create Views
Let's finish the view. Essentially, your input is parsed as a ds in 'YYYY-mm-dd' format, and with the model.predict() function; you will get the yhat. This is your future prediction.
class WeatherPredictionAPIView(APIView):
async def post(self, request):
try:
# Load serialized model
m = load_model()
# Extract input date from the request data
input_date_str = request.data.get('date')
input_date = datetime.strptime(input_date_str, '%Y-%m-%d')
# Create a dataframe with the input date
future_df = pd.DataFrame({'ds': [input_date]})
# Make predictions
forecast = m.predict(future_df)
# Extract the predicted temperature for the input date
predicted_temperature = forecast.loc[0, 'yhat']
# Return the predicted temperature as a JSON response with status code 200 (OK)
return Response({'predicted_temperature': predicted_temperature}, status=status.HTTP_200_OK)
except FileNotFoundError:
# Return an error response if the model file is not found with status code 500 (Internal Server Error)
return Response({'error': 'Model file not found'}, status=status.HTTP_500_INTERNAL_SERVER_ERROR)
except Exception as e:
# Return an error response for other exceptions with status code 500 (Internal Server Error)
return Response({'error': str(e)}, status=status.HTTP_500_INTERNAL_SERVER_ERROR)
8. URL for the API
You are done! Just add the URL and make a post request from your postman with any date in the form-body!
from django.urls import path
from . import views
urlpatterns = [
path('predict-weather/', views.WeatherPredictionAPIView.as_view(), name='predict-weather')
]
9. Adding Celery and Redis to Pre-load the Model and Caching
Until now, you have applied your knowledge of time series in a practical use case. If you hit the API, you will see that the response takes over a few seconds. Well, this bothered me, therefore I further optimized the code. I introduced Celery for background loading of the model, and Redis cache for saving the model in the cache. You need to keep the Redis server running for this task.
Create a tasks.py file inside the app and add the following code with the decorator:
from django.core.cache import cache
from prophet.serialize import model_to_json, model_from_json
from rest_framework.response import Response
from rest_framework import status
from celery import shared_task
import requests
@shared_task
def load_model():
try:
# Load serialized model
with open('./apilist/utils/model.json', 'r') as fin:
m = model_from_json(fin.read())
# Cache the model for future requests
cache.set('serialized_model', m, timeout=None) # Set timeout=None for indefinite caching
except FileNotFoundError:
# Return an error response if the model file is not found with status code 500 (Internal Server Error)
return Response({'error': 'Model file not found'}, status=status.HTTP_500_INTERNAL_SERVER_ERROR)
10. Loading from the Redis Cache
Modify your API view code for getting the model from the Redis cache:
class WeatherPredictionAPIView(APIView):
def post(self, request):
try:
m = cache.get('serialized_model')
if m is None:
# Enqueue a background task to load the model
m = load_model()
cache.set('serialized_model', m, timeout=None)
# print("checking", m)
input_date_str = request.data.get('date')
input_date = datetime.strptime(input_date_str, '%Y-%m-%d')
future_df = pd.DataFrame({'ds': [input_date]})
forecast = m.predict(future_df)
predicted_temperature = forecast.loc[0, 'yhat']
return Response({'predicted_temperature': predicted_temperature}, status=status.HTTP_200_OK)
except FileNotFoundError:
return Response({'error': 'Model file not found'}, status=status.HTTP_500_INTERNAL_SERVER_ERROR)
except Exception as e:
return Response({'error': str(e)}, status=status.HTTP_500_INTERNAL_SERVER_ERROR)
At first, I tried the @sync_to_async decorator methods for asynchronously loading the Prophet model. Turns out, the Prophet does not allow asynchronous operation. That's why I implemented this optimization using Celery and Redis. Now, the API response time is a few milliseconds.
I pushed the code in the master; you can check - https://github.com/Afroza2/Strativ-AB-Travel-Management/tree/master. You need to run these commands in each terminal while keeping your Redis server running in the background. The basic setup with Celery and Redis in Django is discussed in the previous post:
python manage.py makemigrations
python manage.py migrate
python manage.py migrate django_celery_beat
I enjoyed doing this project. Let me know what you think. I hope to write about many more exciting projects ahead.
I wish you eternal sunshine!