Initial Data Exploration – Alina

In this blog, I am going to share with you my initial exploration of the data that I collected last month. I am also experimenting with visualizing the data with R, which is a statistical programming language, but we will talk about that another time.

This time, I first looked at the data collected from two sites, US and Chinese, separately. I graphed the data as a line graph. The graphs contained data from all categories of dates – next-day, next-week, next-month, 6/4 (a date far in the future), and 11/1 (a date in the very near future) – as 5 series of lines of different colors.

Screen Shot 2018-11-27 at 8.28.05 PM


Screen Shot 2018-11-27 at 10.48.33 AM

Because the prices from the Chinese site, Ctrip were first collected in RMB and then converted to USD, there are some decimals in the prices shown.

Comparing the two graphs, it can be seen that the prices on the US website seemed to be more stable and usually kept at the same price (such as $600, $1072,  and $439)almost throughout. Even when the prices did spike at times, there still seemed to be a neat standard (like $639 or &797 which appeared frequently) to which the prices reached due to different circumstances. When looking at the Chinese website, the prices demonstrated the same trend overall – when the prices spiked on Expedia, they generally also did on Ctrip.

Although, sometimes, there are slight differences between the two. For example, on 10/3, the price on Expedia for a Next-Week ticket was much higher than the one on Ctrip. If we overlapped the two graphs, we would see that the Chinese website usually had a slightly lower price by about $20-$30.  In this graph below, the US site is represented with the lighter shaded colors, and the Chinese site is represented with the darker shades. It can be seen that the lighter shaded lines are mostly above their darker shaded counterparts.

Screen Shot 2018-11-27 at 2.08.50 PM

Another observation I made when comparing the two prices is that the prices on the Chinese site were not as tidy. They are almost never exactly the same. They usually fluctuate just a little bit on a day-to-day basis. I wondered if this was because of the difference in the currency exchange rate, as I converted the price every day based on the exchange rate of that day. So I decided to compare the prices on the Chinese site in USD versus that in RMB.  I realized that having the prices from all the dates on one graph is a little chaotic and hard to do the comparison, so I selected two categories of dates to compare. They are again contrasted with the difference in shades.

Screen Shot 2018-11-27 at 3.28.46 PM

Evidently, the fluctuation was not caused by the exchange rate at all. The prices in RMB actually fluctuate to an even greater extent than the prices in USD. Why would the prices fluctuate then? One possible reason is that the website wants people to think that they are getting a deal when the price is only slightly lower or they want to trick people into buying tickets as soon as possible thinking that they are securing “a good price”.

Seeing this, I was curious to see how the prices were distributed on the Chinese website. I took the data for next-day tickets and graphed it in a way that I thought showed distribution better. From this graph, it just seems that the “clump” of the price decreased a little bit over time, but by looking at it, I can not reach any statistical conclusions.

Screen Shot 2018-11-27 at 8.18.47 PM

Overall, that statistical conclusion is where I am going next. In the coming blogs, with more advanced tools, I will talk about more developed results of my analysis of the data.

2 thoughts on “Initial Data Exploration – Alina

  1. Nawal N'Garnim

    Great post! It seems like it was a good move to just compare two categories of dates. It’s interesting how the prices are never just exactly the same and how much they fluctuate. I’m looking forward to seeing everything else you do and how you will conclude it.

  2. nscavalieri

    I really like how detailed and digestible your graphs are. They portray necessary information without throwing in useless data. It also makes the blog post more understandable for people who need to see things visually. Great post!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.