After the introduction to my website last time, in this blog, I am going to redirect the focus back to the topic of plane tickets prices and the analysis of it through data science approach.
Last week, I had a meeting with T. Tom, who is my mentor in this project, about the potential ways to ultimately reach a conclusion in this project. He suggested that since there were all those different accounts on whether or not travel companies game the system, I should abandon all the preconceived assumptions and collect some data myself. His advice was that I should narrow down the range of the tickets to those of a particular route, an example of which could be Philadelphia to Beijing, or even a particular flight, so it would be easier to collect the data and to eliminate the variables. We decided that every day I would track the price for a flight next day, a flight next week, a flight next month, and a flight on a fixed date in the future all for the same route.
In a second meeting, it occurred to me that I should compare the prices with and without cookies in the browser by using the incognito window of Chrome to see if cookies had an effect on the pricing. I thought that it would also be interesting to see if there were a price difference between Chinese and American travel websites for plane tickets, and if that difference was completely based off of the difference in the currency exchange rate or if there was an actual reason. We came up with a more structured plan for the whole semester in this meeting. In October, I would collect data for the whole month. I would start by collecting data manually – aka typing data in a spreadsheet by hand. In the meantime, I would hopefully learn to scrape data off of the websites by writing a program, which would be much faster and smarter. In November, I would start my analysis through graphing of the data and looking for patterns and trends in the data. I would also attempt to model the pattern in the prices by using mathematical methods. In December, I would do an evaluation of the model and whole process I have gone through. Questions like “what worked and what could be improved?” will be asked. In January, I will most likely be drafting a report of my findings through this project.
After this meeting, I officially started the data collection stage on October 1st. I chose Expedia as the US website, and Ctrip, the website I mentioned before, as the Chinese site. I picked a flight that left every day from Philadelphia around 10:15 in the morning and connected to a flight that left from Washington DC 1 hour 25 minutes after it arrives there. The flight to China is always UA 807, and it arrives in China around 3:40pm. Every day, I collected data on 5 dates from these two websites under both the regular browser window and the incognito window. So far, with the week’s worth of data that I have collected, I can see that the price for next-day flights usually vary by a lot, sometimes they are the same price as a flight a month a way, sometimes they are twice as much as a regular ticket. The next-week tickets also have some variations, but not as much. The next-month tickets are the most stable. Throughout this week, every day the next-month tickets have remained the same price. What took me by surprise was that the flight more than six months away was actually extremely expensive. I had always assumed that the earlier you booked a flight, the better price you got. Because of this, I picked another date that was in the nearer future, November 1st, to see how the price for this flight would change over the course of this month. So far, because it is still in the range of next-month ticket, the price has not changed much for this date, but we will wait and see.
Through this week’s data collection, more and more I started to feel the tediousness of collecting data by hand, which is why I have started my exploration of data scraping and have become more and more motivated to get it to function. By the time of my next blog, I hopefully will have figured it out, and I will share with you then, my take on data scraping.
Expedia Logo. Secret Flying, 5 May 2017, http://www.secretflying.com/posts/promo-code-50-off-a-200-hotel-spend-with-expedia/. Accessed 8 Oct. 2018.