Category Archives: Entrepreneurship

Moving below the surface (2): Cross-entropy — William

In my previous post, cross-entropy is presented as a cost function, measuring the difference between given inputs and outputs in supervised learning. It is also an important concept from the Information Theory.

Before we talk about what cross-entropy is, let me introduce John, an imaginary freshman entering Westtown. He loves desserts and talks a lot about it. In fact, all he says are none but four words: ice-cream, chocolate, cookie, and yogurt. When he is back home after school, however, John only use binary code to text with his classmates. So, the texts his classmates received looks like this:

binary_bits

To understand what the messages mean, John and his classmates need an established code system, a way to map sequences of binary bits into words. Here is a simple example:

Code_mapping

With the code system in hand, John and his classmates can encode text messages by simply substituting words for codewords and vice versa.

encoding

It turns out that John does not use all of his four words as often. As an ice-cream fan, John shares his ice-cream-loving moments all the time. He sometimes mentions his time eating chocolate ice-cream with his family, and rarely does he mention anything else. So, his word frequency chart looks like the following:

milton_word.png

Now we can plot out John’s word frequency against the number of binary bits each symbol matches. The area formed represents John’s average length of messages we use to send each word. These areas are formally named entropy.

milton_code.png

Since typing and sending text message takes up time for John and his classmates, they want to reduce average codeword length for each word so they could spend minimum time texting the same amount of words. This is when variable-length code comes into play, mapping commonly used words (like “ice-cream”) to shorter codewords and less-frequently used ones to longer codewords (like “cookie” or “yogurt”). So, we have a new mapping between words and codewords:

new_code.png

Side note: the mapping is not randomly picked — it is designed as a function of the word frequency so it is uniquely decodable and not cause confusion when splitting the message into codewords.

Again, we could plot the word frequency against the number of binary bits. Here is how it looks like:

new_code.png

The result is amazing: we successfully reduce the average length to 1.75 bits! It will be the most optimized code in this case. Encoding words with it will take up a minimum number of bits and the text messages will be the most concise.

Very soon after the school started, John meets Juliet, another freshman at Westtown. Juliet is a chocolate-lover. She talks about chocolate all day, and her favourite is chocolate chip cookies. Juliet despises ice-cream though, and mentions it only when necessary. Despite this, they share their obsession with dessert and, interestingly, the same limited vocabulary size.

Juliet_word.png

When Juliet started to use John’s code, unfortunately, the text messages she sends are much longer than John’s, since they have different word frequencies. As we plot Juliet’s entropy graph, we could see her average message length is as long as 2.25 bits! We call Juliet’s average message length using John’s code system the cross-entropy.

So, why do we care about cross-entropy in supervised machine learning? Well, cross-entropy provides us a way to measure the difference between the result our model produces and the provided outcome. Since both the result and the provided outcome could be both expressed in the form of frequency charts or possibility chars, cross-entropy fits its role as the cost function well. With the cost calculated, we could adjust the parameters to achieve a best-fit model for the given outcomes.

See you next week!

Work Cited

“I like this Maple Application – Vibration of Mindlin rectangular plates.” Vibration of Mindlin rectangular plates – Application Center, http://www.maplesoft.com/applications/view.aspx?SID=35302&view=html.

What is bit (binary digit)? – Definition from WhatIs.com. (n.d.). Retrieved October 14, 2017, from http://whatis.techtarget.com/definition/bit-binary-digit

Quakerism’s Influence on my Activism – KC

My interview  about Keystone Coalition for Advancing Sex Education was just published in Vice News’ Broadly section today titled “This Teen is Paving the Way for LGBTQ Inclusive Sex Ed in Schools”. The following post is a follow-up on the interview, which you can read here.

One of the questions I was asked for my Q&A was “Westtown is a Quaker school. Were you raised Quaker? If so, how has that influenced your path regarding Keystone CASE (if at all)? If not, how has your Quaker schooling influenced your path in any way (if at all)?” Continue reading

Moving below the surface: Artificial Neural Networks — William

As mentioned before, we will be discussing artificial neural networks in this blog. Being a significant subject in the field of supervised machine learning, neural networks excel at solving classification problems, and, when combined with convolution integrals, are the most popular model for image classification tasks. Continue reading

Simple Linear Regression in Machine Learning – Kevin

You might remember linear regression from statistics as a method to produce a linear equation that models the relationship between two variables. Not surprisingly, linear regression is quite similar in machine learning, except that the focus is on the prediction rather than the interpretation of data. Regression is a supervised learning algorithm (if you remember from my previous blog) that predicts real-valued output when given an input. In this blog post, I will discuss the model representation of simple linear regression and introduce its cost function.

Continue reading

Wetting the Feet: An Introduction to Machine Learning — William

As we discussed in the previous post, machine learning is one of the main branches of artificial intelligence, in which we aim to build a rational agent. Machine learning is essential to implementation of artificial intelligence, for it allows agents to adapt to different scenarios, as well as predict changes in evolving environment around them. Continue reading

Diving Deep Into Deep Learning: Dipping a Toe in the Water — William

Since its birth in mid 20th century, artificial intelligence has been present in various aspects of the popular culture, from Isaac Asimov’s Three Laws of Robotics in Handbook of Robotics and HAL in 2001: A Space Odyssey, to The Terminator and The Matrix, even influencing our way of life in recent years, including Alpha Go and Siri. But what is artificial intelligence? Or in other words, what is the ultimate goal for artificial intelligence? Continue reading

A Brief Introduction to Machine Learning – Kevin

There are two widely accepted definitions of machine learning. The phrase is first coined in 1959 by computer scientist Arthur Lee Samuel, who trained a computer program to play checkers with humans. He later described his work as “the field of study that gives computers the ability to learn without being explicitly programmed.” Decades later, Professor Tom Mitchell coined a more modern and formal definition: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”
Continue reading

Semester Goals – KC

Last semester I was able to reach a large majority of my goals. This year, the Independent Seminar is not built into my schedule. I added it on top of 6 other courses because of my dedication to advancing sex education in this country. I’ll be the first to admit, I’m tight with time, but still have a long list of goals. This semester I’m focusing on financial planning, team expansion, and event planning. Continue reading

A New Beginning – Kevin

Why machine learning? It all started back in August when I was getting Westtown Resort ready for this school year.

As I briefly mentioned in my previous blog posts, Resort utilizes a MySQL data table that resembles this one:

Date A/B Type
2017-09-05 A
2017-09-06 A
2017-09-07 A
2017-09-08 A
2017-09-11 B
2017-09-12 B
 ⋯  ⋯

Continue reading

Back on Track – KC

For many students, the summer is a time primarily to relax and binge watch the latest Netflix shows. Don’t get me wrong, I did do my fair share of relaxing in St. Croix and plowing through episodes of Shameless. But most of my time was spent working. Both on my nonprofit and full-time job at Jaco Taco. Continue reading