Moving below the surface (2): Cross-entropy — William

In my previous post, cross-entropy is presented as a cost function, measuring the difference between given inputs and outputs in supervised learning. It is also an important concept from the Information Theory.

Before we talk about what cross-entropy is, let me introduce John, an imaginary freshman entering Westtown. He loves desserts and talks a lot about it. In fact, all he says are none but four words: ice-cream, chocolate, cookie, and yogurt. When he is back home after school, however, John only use binary code to text with his classmates. So, the texts his classmates received looks like this:


To understand what the messages mean, John and his classmates need an established code system, a way to map sequences of binary bits into words. Here is a simple example:


With the code system in hand, John and his classmates can encode text messages by simply substituting words for codewords and vice versa.


It turns out that John does not use all of his four words as often. As an ice-cream fan, John shares his ice-cream-loving moments all the time. He sometimes mentions his time eating chocolate ice-cream with his family, and rarely does he mention anything else. So, his word frequency chart looks like the following:


Now we can plot out John’s word frequency against the number of binary bits each symbol matches. The area formed represents John’s average length of messages we use to send each word. These areas are formally named entropy.


Since typing and sending text message takes up time for John and his classmates, they want to reduce average codeword length for each word so they could spend minimum time texting the same amount of words. This is when variable-length code comes into play, mapping commonly used words (like “ice-cream”) to shorter codewords and less-frequently used ones to longer codewords (like “cookie” or “yogurt”). So, we have a new mapping between words and codewords:


Side note: the mapping is not randomly picked — it is designed as a function of the word frequency so it is uniquely decodable and not cause confusion when splitting the message into codewords.

Again, we could plot the word frequency against the number of binary bits. Here is how it looks like:


The result is amazing: we successfully reduce the average length to 1.75 bits! It will be the most optimized code in this case. Encoding words with it will take up a minimum number of bits and the text messages will be the most concise.

Very soon after the school started, John meets Juliet, another freshman at Westtown. Juliet is a chocolate-lover. She talks about chocolate all day, and her favourite is chocolate chip cookies. Juliet despises ice-cream though, and mentions it only when necessary. Despite this, they share their obsession with dessert and, interestingly, the same limited vocabulary size.


When Juliet started to use John’s code, unfortunately, the text messages she sends are much longer than John’s, since they have different word frequencies. As we plot Juliet’s entropy graph, we could see her average message length is as long as 2.25 bits! We call Juliet’s average message length using John’s code system the cross-entropy.

So, why do we care about cross-entropy in supervised machine learning? Well, cross-entropy provides us a way to measure the difference between the result our model produces and the provided outcome. Since both the result and the provided outcome could be both expressed in the form of frequency charts or possibility chars, cross-entropy fits its role as the cost function well. With the cost calculated, we could adjust the parameters to achieve a best-fit model for the given outcomes.

See you next week!

The Importance of Press – KC

On October 2nd 2017 a Q&A about the work I’ve been doing was posed on Vice News. It was later added to their national snapchat story. It’s hard to say how many people saw the article but this was national coverage which means a TON of people saw it all across the country and perhaps the world.

Let’s look at the numbers we do have:

We can use Facebook’s article tracking feature to see how many times it was simply shared on the popular social media site. In the past two weeks it has been shared by nearly 2,000 people and popular pages.

Screenshot at Oct 15 19-54-20.png

I don’t have any way to quantify any other post-based social media websites like twitter, but this gives an audience rage on one.

We can however extract a few numbers from Snapchat’s stories. Vice News is one of the most popular snapchat stories, and while the app does not release official viewership counts, NBC released their own count earlier this year.

According to Variety, the multi-media giant garnered a whopping 29 million views in the first month of starting their new snapchat story. While this number is probably inflated because of first month promotion, it allows us to see the amount of people who are tuning into a specific story – a new one at that.

It is safe to say that over a hundred thousand people saw the story on Vice. We don’t have any way of quantifying the number of people who then chose to read the article, but they were all able to see this video:


So why do these numbers matter? It’s simple, good press is one of the most crucial parts of any organization or movement. Over the past two weeks since my article dropped, my mailbox has been flooded with new people wanting to get involved. Leading activist in my field have begun reaching out to partner.

I’m really excited about working with these people and continuing to build my organization. To those trying to build something new, I suggest you start working on news coverage. Reach out to local reporters or people who frequently write about related topics. Start sending press-releases when new things happen inside your organization.

These kinds of articles will help propel your message and build a wider audience.

Moving below the surface: Artificial Neural Networks — William

As mentioned before, we will be discussing artificial neural networks in this blog. Being a significant subject in the field of supervised machine learning, neural networks excel at solving classification problems, and, when combined with convolution integrals, are the most popular model for image classification tasks. Continue reading

Teaching Spanish 2 and first essay – Peirce

Last week I was tasked with both preparing a lesson for T.Dan’s Spanish 2 classes and writing an argumentative essay.

For my lesson I decided I wanted to teach the class about País Vasco, one of the regions in Spain that I have been studying. Continue reading

What is Sister County? – Qiaochu

In my previous blogs, I mentioned many times the concept of establishing a Sister County agreement between two places. However, I haven’t had a chance to talk about what   a Sister County relationship is, the representation behind it, and the effect. So, my blog this week will focus on “What is a Sister County” and some updates on Commissioner Terrence Farrell’s upcoming trip to China. Continue reading

History of Culture – Perline

I’ve been reading about the history of the Shimenkan village for the past week, and I also organized my notes from the interviews I conducted over the summer.

The minority Miao came to this isolated village to avoid conflict with other minorities that lived in Southern China. Over time, their culture was passed on generation after generation. Before Christianity came to the village, the Miao minority mostly believed in folk religion. Continue reading

Simple Linear Regression in Machine Learning – Kevin

You might remember linear regression from statistics as a method to produce a linear equation that models the relationship between two variables. Not surprisingly, linear regression is quite similar in machine learning, except that the focus is on the prediction rather than the interpretation of data. Regression is a supervised learning algorithm (if you remember from my previous blog) that predicts real-valued output when given an input. In this blog post, I will discuss the model representation of simple linear regression and introduce its cost function.

Continue reading

Wetting the Feet: An Introduction to Machine Learning — William

As we discussed in the previous post, machine learning is one of the main branches of artificial intelligence, in which we aim to build a rational agent. Machine learning is essential to implementation of artificial intelligence, for it allows agents to adapt to different scenarios, as well as predict changes in evolving environment around them. Continue reading

Meeting the Congressman! -Qiaochu

My original plan for this week’s blog is to focus on the sister county initiative between Chester County and Yanqing District. However, since that part is mainly logistics and might appear to be boring, I decided to focus this week’s blog on my meeting with the congressman, Ryan Costello.  Continue reading

Diving Deep Into Deep Learning: Dipping a Toe in the Water — William

Since its birth in mid 20th century, artificial intelligence has been present in various aspects of the popular culture, from Isaac Asimov’s Three Laws of Robotics in Handbook of Robotics and HAL in 2001: A Space Odyssey, to The Terminator and The Matrix, even influencing our way of life in recent years, including Alpha Go and Siri. But what is artificial intelligence? Or in other words, what is the ultimate goal for artificial intelligence? Continue reading