In my previous blog post, I briefly mentioned Westtown Resort, a project that is currently under development and is planned to be released later this year. Starting from this blog post, I will be detailing the project’s development until its release.
Westtown Resort from a computer science class project. It was made with the objective of creating a revolution solution that allows students to have quick access to their schedules on their mobile devices. We, the seven members of the Westtown Resort team, came up with the idea of parsing the web schedule that is available to every student on MyBackpack, a school record management system, and then writing the data into an iCalendar file (a standard calendar exchange file). After such a file is created, the user would then able to import the file into their devices directly, or sync it to all of his or her devices using cloud services. The result of this is an experience that is fast and seamless. As shown in the short video clip below, users are able to access their schedule almost instantly.
Since the release of the first generation Westtown Resort, the feedback from our users has been overwhelmingly positive. Subsequently, we released Westtown Resort 2.0 which brought the support for semester 2 schedule and addressed many issues our users experienced in its predecessor.
A New Beginning
While current version of Westtown Resort is almost perfect, we are being driven by the demand of the student body to create the next generation Westtown Resort. We started by asking ourself the question: How might we make something that is already extraordinary even better? Our answer is to completely redesign the project from the ground up.
Our founding design of Westtown Resort relies entirely on parsing which involves the analyzation and extraction of data from a given piece of text. The user first selects, copies and pastes the schedule from MyBackpack into a text box on Resort’s webpage. This step essentially supplies the Resort with an unformatted piece of text that contains a the user’s schedule. Then, Resort begins the parsing process using a custom designed algorithm. It helps identify spaces (” “), characters (“a”), and numbers (“1”). The parser first scans for letters and number which known as the meaningful information. After it identifies the first character or letter, it continues scanning until it reaches a space, indicating an end to the particular piece of information. The parser then stores the information it just scanned and repeats the same process. Ultimately, the parser is able to extract the names, time intervals, and locations of the courses shown in the schedule.
However, it turned out that the parser would not work properly when it encounters exceptions. For example, while most courses displays their names, time intervals, and locations on the schedule, an math independent research seminar does not come with a location. When the parser encounters such a course, it inevitably fails as it attempts to extract the course’s time interval from the blank space that is meant to be used for the course’s location.
To make the Resort more reliable, we reimagined the algorithm. Recently, we developed an new parsers that employs HTML DOM parsing. Essentially, when a student visit MyBackpack’s schedule page, he or she is actually viewing a webpage generated by the browser from an HTML file. An HTML resembles the skeleton of a webpage. It contains unformatted information, including text, image, links, etc. Since HTML by its nature is a markup language, each element (such as text) is always enclosed in between a pair of tags.
For example, the element, “Hello!” in this case, is enclosed within two paragraph tags. “<p>” marks the beginning of the element, “</p>” marks the end of the element. Since tags in HTML always appear in pairs, the algorithm can easily identify and extract the element enclosed. In this way, a user would be instead asked to save and upload the HTML source code of the MyBackpack webpage instead of copying and pasting the plain text from the webpage.