Skip to content

praneeth300/Flight-Price-Deployement-in-Streamlit

Repository files navigation

Machine-Learning-for-Flight-Ticket-Pricing-DST-1

Project Report

Problem Statement

Can we use Machine Learning to help a customer decide the optimal time to purchase a flight ticket?

ABSTRACT

Airlines employ complex, secretly-kept algorithms to vary flight ticket prices over time based on several factors,including seat availability,airline capacity, the price of oil, seasonality, etc. At any point in time, a customer looking to purchase a flight ticket has the option to buy or wait (in the hope of the flight price reducing in future).However, since they lack knowledge of these algorithms, customers often default to purchasing a ticket as early as possible rather than trying to optimize their time of purchase.However, vast quantities of data regarding flight ticket prices are available on the Internet. Through this project,we hoped to use this data to help customers make their decisions. We created an airline ticket-buying agent that tries to buy a customer’s flight ticket to optimize for price of purchase.We have selected MakeMyTrip website to scrap the Indian flights data.

Prerequisites

You need to have installed following softwares and libraries in your machine before running this project.

Python 3 Anaconda: It will install ipython notebook and most of the libraries which are needed like sklearn, pandas, seaborn, matplotlib, numpy, scipy,streamlit.

For more details refer repo path : Web App Model/Flask/requirements.txt

Sample Data

Below is the small sample of our dataset:

Data Overview

Data Source --> Dataset/

Data points --> 330939 rows

Dataset date range --> April 2021 to May 2021

Dataset Attributes:

Price - flight price

departure_time - flight schedule time

arrival_time - arrival time of flight

Airline Cabin - There are three type

E - Economy

PE - Premium Economy

B - Business

Dept_city - Departure city

Dept_date - Departure Date

arrival_city - Arrival city

stops - Number of stops

duration - Flight duration in minutes

weekday dept_hours

Dept_flights_time

optimal_hours

BLUEPRINT

The blueprint file structure follows the following pattern: Data --> Data Processing-->EDA-->Training Model-->Test Model & Evaluation-->Model Prediction-->Model Deployment image image image image image image image

Machine Learning Framework:

Assume a customer decides to purchase a ticket for a particular flight at time = X hours before departure. The optimal time to purchase the ticket t0pt is:

  • in the range [X hours before dep., 4 hours before dep.]
  • time at which we achieve minimum flight price until departure

We have used 'LightGBM' algorithm to predict first optimal time then to predict price for each of the cabin classes whose architecture is as below:

Predict optimal time architecture for Economy Class

ec=LGBMRegressor(n_estimators=1200) #ec=RandomForestRegressor() ec.fit(x_train,y_train) #pred = rfg.predict(x_cv) pred = ec.predict(x_test) rmse = np.sqrt(mean_squared_error(y_test, pred)) r2= r2_score(y_test, pred) print("RMSE : % f" %(rmse)) print("R2 : % f" %(r2))

RMSE: 0.00000

R2: 1.0000

image

Predict Price architecture for Economy Class

ec_price=LGBMRegressor(n_estimators=1200) ec_price.fit(x_train,y_train) pred = ec_price.predict(x_test) rmse = np.sqrt(mean_squared_error(y_test, pred)) r2= r2_score(y_test, pred) print("RMSE : % f" %(rmse)) print("R2 : % f" %(r2))

RMSE: 0.141890

R2: 0.869850

image

Predict Optimal Time architecture for Business Class

bs=LGBMRegressor(n_estimators=1000) #bs=RandomForestRegressor(n_estimators=100 ) bs.fit(x_train,y_train) #pred = rfg.predict(x_cv) pred = bs.predict(x_test) rmse = np.sqrt(mean_squared_error(y_test, pred)) r2= r2_score(y_test, pred) print("RMSE : % f" %(rmse)) print("R2 : % f" %(r2))

RMSE: 0.0000

R2: 1.00000

image

Predict Price architecture for Business Class

bs_price=LGBMRegressor(n_estimators=1000) #bs=RandomForestRegressor(n_estimators=100 ) bs_price.fit(x_train,y_train) #pred = rfg.predict(x_cv) pred = bs_price.predict(x_test) rmse = np.sqrt(mean_squared_error(y_test, pred)) r2= r2_score(y_test, pred) print("RMSE : % f" %(rmse)) print("R2 : % f" %(r2))

RMSE: 0.121185

R2: 0.899513

image

Predict Optimal Time architecture for Premium Economy Class

pe=LGBMRegressor(n_estimators=1500) #pe = CatBoostRegressor() #rfg=RandomForestRegressor(n_estimators=100 ) pe.fit(x_train,y_train) #pred = rfg.predict(x_cv) pred = pe.predict(x_test) rmse = np.sqrt(mean_squared_error(y_test, pred)) r2= r2_score(y_test, pred) print("RMSE : % f" %(rmse)) print("R2 : % f" %(r2))

RMSE: 0.00000

R2: 1.00000

Predict Price architecture for Premium Economy Class

pe=LGBMRegressor(n_estimators=1500) #pe = CatBoostRegressor() #rfg=RandomForestRegressor(n_estimators=100 ) pe.fit(x_train,y_train) #pred = rfg.predict(x_cv) pred = pe.predict(x_test) rmse = np.sqrt(mean_squared_error(y_test, pred)) r2= r2_score(y_test, pred) print("RMSE : % f" %(rmse)) print("R2 : % f" %(r2))

RMSE: 0.093673

R2: 0.839008

Final Model output on WebApp

Using Flask Heroku Web App

Team Members : Ms. Deepika GoelMr. Praneeth Kumar Pinni!

Deployment Steps :

To deploy model on Heroku we have 2 options, by Heroku CLI or by GitHub. We have selected deployment by GitHub

Step 1 : Create an account on heroku.com

Step 2 : Upload all files on GitHub

Step 3 : Deployment method --> GitHub

Step 4 : App connected to GitHub

Step 5 : Select Manual deploy Deploy--> the current state of a branch to this app should be Master.

Step 6 : Resolve package error if occurs and test your Pulic URL

Flask Code : Web App Model/Flask/

Public URL : https://prediction-price-for-flight.herokuapp.com/

Frontend Of the Streamlit

https://docs.google.com/presentation/d/15HfriKFJ5acUQJ1qqCTX2-4JDUT5InTOfC6PH3KF9EE/e

Here the results template "https://docs.google.com/presentation/d/15HfriKFJ5acUQJ1qqCTX2-4JDUT5InTOfC6PH3KF9EE/edit#slide=id.gb69d85bd22_0_12"

Demo:

WhatsApp.Video.2021-06-01.at.00.25.23.mp4

Team Members : Mr. Prasad Pawar.Mr. Makarand Anna Rayate.Mr. Rudra Kumawat.

We have used RandomForestRegressor algorithm to predict first optimal time then to predict price whose architecture is as below:

Predict optimal time architecture

Fitting 5 folds for each of 10 candidates, totalling 50 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed: 15.4min
[Parallel(n_jobs=-1)]: Done  50 out of  50 | elapsed: 19.7min finished
RandomizedSearchCV(cv=5, error_score=nan,
                   estimator=RandomForestRegressor(bootstrap=True,
                                                   ccp_alpha=0.0,
                                                   criterion='mse',
                                                   max_depth=None,
                                                   max_features='auto',
                                                   max_leaf_nodes=None,
                                                   max_samples=None,
                                                   min_impurity_decrease=0.0,
                                                   min_impurity_split=None,
                                                   min_samples_leaf=1,
                                                   min_samples_split=2,
                                                   min_weight_fraction_leaf=0.0,
                                                   n_estimators=100,
                                                   n_jobs=None, oob_score=False,
                                                   random_state=None, verbose=0,
                                                   warm_start=False),
                   iid='deprecated', n_iter=10, n_jobs=-1,
                   param_distributions={'max_depth': [5, 10, 15, 20, 50],
                                        'min_samples_split': [2, 3, 5, 10]},
                   pre_dispatch='2*n_jobs', random_state=None, refit=True,
                   return_train_score=False, scoring=None, verbose=2)

Selected best_params_ after hyperparameter tunning : {'min_samples_split': 5, 'max_depth': 20}

Accuracy = 0.9017739133612731

Predict price architecture

Fitting 5 folds for each of 10 candidates, totalling 50 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed: 14.5min
[Parallel(n_jobs=-1)]: Done  50 out of  50 | elapsed: 19.6min finished
RandomizedSearchCV(cv=5, error_score=nan,
                   estimator=RandomForestRegressor(bootstrap=True,
                                                   ccp_alpha=0.0,
                                                   criterion='mse',
                                                   max_depth=None,
                                                   max_features='auto',
                                                   max_leaf_nodes=None,
                                                   max_samples=None,
                                                   min_impurity_decrease=0.0,
                                                   min_impurity_split=None,
                                                   min_samples_leaf=1,
                                                   min_samples_split=2,
                                                   min_weight_fraction_leaf=0.0,
                                                   n_estimators=100,
                                                   n_jobs=None, oob_score=False,
                                                   random_state=None, verbose=0,
                                                   warm_start=False),
                   iid='deprecated', n_iter=10, n_jobs=-1,
                   param_distributions={'max_depth': [5, 10, 15, 20, 50],
                                        'min_samples_split': [2, 3, 5, 10]},
                   pre_dispatch='2*n_jobs', random_state=None, refit=True,
                   return_train_score=False, scoring=None, verbose=2)

Selected best_params_ after hyperparameter tunning : {'min_samples_split': 5, 'max_depth': 20}

Accuracy = 0.9351213338653643

Final Model output on WebApp

Using Flask Heroku Web App

Deployment Steps :

To deploy model on Heroku we have 2 options, by Heroku CLI or by GitHub. We have selected deployment by GitHub

Step 1 : Create an account on heroku.com

Step 2 : Upload all files on GitHub

Step 3 : Deployment method --> GitHub

Step 4 : App connected to GitHub

Step 5 : Select Manual deploy Deploy--> the current state of a branch to this app should be Master.

Step 6 : Resolve package error if occurs and test your Pulic URL

Flask Code : Web App Model/Flask/

Public URL : https://mlflightpred.herokuapp.com/

Demo :

Screen_Recording_20210527-195935_Chrome.mp4

Steps that we performed:

  • Web scrapped
  • Data Loading
  • Data Preprocessing
  • Exploratory data analysis
  • Feature engineering
  • Feature selection
  • Feature transformation
  • Model building
  • Model evalutaion
  • Model tuning
  • Prediction's
  • Model deployement Flask & Heroku
  • Published the URL
  • Submitting the Reports using Tableu

Tools used:

Python
Pycharm
Jupyter Notebook
Google Colab
DataBricks
Streamlit
Flask
GitHub
GitBash
SublimeTextEditor


### Libraries used:
* Pandas
* Numpy
* scipy
* sklearn
* lightgbm
* Boosting
* selenium
* Matplotlib
* Seaborn
* Plotly
* Cufflinks

Commands that we used for deployement:

git init
git add .
git status
git commit -m "First commit"
git status

heroku create
git remote -v
git push origin master

heorku logs --tail

Procfile:

web: sh setup.sh && streamlit run gh.py

Setup.sh:

mkdir -p ~/.streamlit/

echo "\
[server]\n\
headless = true\n\
port = $PORT\n\
enableCORS = false\n\
\n\
" > ~/.streamlit/config.toml

Author

-Yasin Shah

DECLARATION

A project report on Machine-Learning-for-Flight-Ticket-Pricing project Successfully submitted By

Ms. Deepika Goel.

Mr. Prasad Pawar.

Mr. Makarand Anna Rayate.

Mr. Chakradhar Reddy Yerragudi.

Mr. Mervana Prit Jitendrabhai.

Mr. Praneeth Kumar.

Ms. Himadri Chutia.

Mr. Rudra Kumawat.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages