In this blog, I am going to share with you my initial exploration of the data that I collected last month. I am also experimenting with visualizing the data with R, which is a statistical programming language, but we will talk about that another time.
This time, I first looked at the data collected from two sites, US and Chinese, separately. I graphed the data as a line graph. The graphs contained data from all categories of dates – next-day, next-week, next-month, 6/4 (a date far in the future), and 11/1 (a date in the very near future) – as 5 series of lines of different colors. Continue reading
This past week marks the end of the data collection period of my project. After I figured out how to scrape data generally on websites with simple structures in the last blog post, I had been experimenting with pulling data down from the Expedia website which was way more complex. However, as I tried to do this, I encountered some difficulties. At first, I decided to start experimenting with data that should be easily pulled to see if the code would indeed even work for this site. Therefore, I picked the date of the flight shown on the website. It had the tag class=”title-date-rtv“. I put this value into the code. Continue reading
If you remembered from my last blog, my focus in this project has recently been on figuring out a way to scrape data off of the travel websites using code instead of doing it manually since it is indeed a tedious job. Of course, while working on the code, I have also kept with the primitive collecting method since data collection is the objective of this month’s work in my project. So here’s an updated version of my data table: Continue reading
After the introduction to my website last time, in this blog, I am going to redirect the focus back to the topic of plane tickets prices and the analysis of it through data science approach. Continue reading
Hey guys, welcome back! You might remember that in my last blog I mentioned a study-hall sign-in website, which is part of the main focus of my project, and I promised to come back with more details on that, so here I am. Before I dive in though, I just want to provide some quick updates on my quest for answers regarding the manipulated plane ticket prices. Continue reading
This past summer, I flew back and forth between China and the US a lot, which meant I had to book plane tickets a number of times. During this process, I used a Chinese travel website called Ctrip, which my family has always liked and trusted. It is also the largest online travel agency in China. However, this time my experience was not so pleasant. The price for the tickets that I was looking at kept going up every time I returned from looking up similar tickets on other websites, which could be interpreted as normal since that price might go up as the date approached. The part that took me by surprise though was when I tried to log in using a different account and look at the same tickets on the same date, I found that the prices differed. I don’t recall the exact price gap but I just remember that it was enough for me to be upset and intrigued by it at the same time. After a brief search online, I found that there were already news reports accusing this company of manipulating their customers through the use of “big data.” This discovery deeply interested me. I could not help but started to wonder about questions like “How exactly are they using the data they collect to achieve their goal? How are other Internet companies like Netflix and Google using their data? What are the ethical implications of this? What impact does this have on our society as a whole?” Continue reading