Plane Tickets – Data Collection – Alina

携程新版标识expedia-logo

After the introduction to my website last time, in this blog, I am going to redirect the focus back to the topic of plane tickets prices and the analysis of it through data science approach.

Last week, I had a meeting with T. Tom, who is my mentor in this project, about the potential ways to ultimately reach a conclusion in this project. He suggested that since there were all those different accounts on whether or not travel companies game the system, I should abandon all the preconceived assumptions and collect some data myself. His advice was that I should narrow down the range of the tickets to those of a particular route, an example of which could be Philadelphia to Beijing, or even a particular flight, so it would be easier to collect the data and to eliminate the variables. We decided that every day I would track the price for a flight next day, a flight next week, a flight next month, and a flight on a fixed date in the future all for the same route.

In a second meeting, it occurred to me that I should compare the prices with and without cookies in the browser by using the incognito window of Chrome to see if cookies had an effect on the pricing. I thought that it would also be interesting to see if there were a price difference between Chinese and American travel websites for plane tickets, and if that difference was completely based off of the difference in the currency exchange rate or if there was an actual reason. We came up with a more structured plan for the whole semester in this meeting. In October, I would collect data for the whole month. I would start by collecting data manually – aka typing data in a spreadsheet by hand. In the meantime, I would hopefully learn to scrape data off of the websites by writing a program, which would be much faster and smarter. In November, I would start my analysis through graphing of the data and looking for patterns and trends in the data. I would also attempt to model the pattern in the prices by using mathematical methods. In December, I would do an evaluation of the model and whole process I have gone through. Questions like “what worked and what could be improved?” will be asked. In January, I will most likely be drafting a report of my findings through this project.

Screen Shot 2018-10-08 at 1.04.09 AM

Screen Shot 2018-10-08 at 1.35.46 PM.png

After this meeting, I officially started the data collection stage on October 1st. I chose Expedia as the US website, and Ctrip, the website I mentioned before, as the Chinese site. I picked a flight that left every day from Philadelphia around 10:15 in the morning and connected to a flight that left from Washington DC 1 hour 25 minutes after it arrives there. The flight to China is always UA 807, and it arrives in China around 3:40pm. Every day, I collected data on 5 dates from these two websites under both the regular browser window and the incognito window. So far, with the week’s worth of data that I have collected, I can see that the price for next-day flights usually vary by a lot, sometimes they are the same price as a flight a month a way, sometimes they are twice as much as a regular ticket. The next-week tickets also have some variations, but not as much. The next-month tickets are the most stable. Throughout this week, every day the next-month tickets have remained the same price. What took me by surprise was that the flight more than six months away was actually extremely expensive. I had always assumed that the earlier you booked a flight, the better price you got. Because of this, I picked another date that was in the nearer future, November 1st, to see how the price for this flight would change over the course of this month. So far, because it is still in the range of next-month ticket, the price has not changed much for this date, but we will wait and see.

Through this week’s data collection, more and more I started to feel the tediousness of collecting data by hand, which is why I have started my exploration of data scraping and have become more and more motivated to get it to function. By the time of my next blog, I hopefully will have figured it out, and I will share with you then, my take on data scraping.

Images:

Expedia Logo. Secret Flying, 5 May 2017, http://www.secretflying.com/posts/promo-code-50-off-a-200-hotel-spend-with-expedia/. Accessed 8 Oct. 2018.

Ctrip Logo. Seeking Alpha, 23 May 2018, seekingalpha.com/article/4176711-ctrip-com-good-growth-pick-earnings. Accessed 8 Oct. 2018.

 

4 thoughts on “Plane Tickets – Data Collection – Alina

  1. Jason Ono

    I love this project since I am also one of international students here who have to purchase (always expensive) transcontinental international flight tickets every break just to go back home. I have a comment to make on the website to use, though. From my experience of purchasing flight tickets, I have felt that price changes on third-party websites, such as Expedia and Trivago, are more volatile than the official websites of airline companies. I think cross-website comparison, therefore, is another interesting factor to consider.

    Reply
  2. baitingz

    Great topic Alina! I am already excited, especially as an international student who travels a lot every year. While doing this project, have you thought about looking at the model from two approaches? Maybe you can give suggestions to consumers on how to get the cheapest tickets and suggestions to firms on how to modify their system? Also, do you have a specific math/stats/comp sci model to use in mind? Please tell me more about the model, because I am really interested in that!

    Reply
  3. ninayichenwei

    I am excited to learn how close your project is connected to our daily lives! Your blog describes your initial hypotheses of the phenomenon observed and explains your data comparison with clarity. I would like to learn further experiments in your data collection process and how the methods could be applied to other problems related to our lives. You could also consider to add subtitles to your post and make it more organized 🙂

    Reply
  4. Nawal N'Garnim

    I like the idea to narrow it down to the one specific flight, because I can tell it will be easier for you to track the price of. I am really interested in seeing your progress the rest of the semester and how you will continue tracking everything!

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.