R (and my plan) – Baiting

Image result for r coding language


At the very beginning, I want to conclude my project in Abstract Algebra. As the second round of competition (Delaware Valley Science Fair) ended on April 4th, my work with Abstract Algebra has temporarily ended as well. In the second round, I won 3rd Place in 12th Grade Mathematics. In speaking of the result, it was worse than last year when I won 2nd place. However, I clearly understood bringing a theoretically based project to a science fair was risky. So, I feel lucky that I have done what I always wanted to do, no matter what recognition I won.

Following the week after the science fair, I started to work on R, a type of computer coding language. I choose R instead of other languages because it is extensively used in statistics. Since I will most likely major in Mathematics/Statistics in college, R is one of the most helpful coding languages to learn.

First, I want to address a question asked by T. Margaret for my last blog. T. Margaret asked about my final target of the semester and what I would do as a demonstration of learning. To be honest, I am not completely confident to bring an impressive project for several reasons. First, I spent my first half of the semester on Abstract Algebra so there are no more than 2 months left for R. Second, I had no foundation on any computer programming before. Since R is my first coding language, I have to start from the most basic knowledge. And finally, there are no obvious relationships between Abstract Algebra and R. This means it will be hard or not very meaningful to combine R with Abstract Algebra. As a result, my current learning plan is to go through the most basic algorithms in R as quickly as possible. Then, I will continue my work from last year on Alzheimer’s disease.

Last year, I studied Alzheimer’s disease from a data set on Kaggle. I used models including Odds Ratio, Logistic Regression, and ROC Curves to analyze which type of people are more likely to have Alzheimer’s disease. As I handed my paper to a Statistics Doctor, she advised me to think about other models. She said Logistic Regression is a popular model in public health. However, since different subjects in my data set had a different number of testings, there was no guarantee that each data is independent of each other. For instance, Subject #1 may have 5 testings while Subject #2 has only 3, then we cannot treat each of these 8 testings as independent data point. As a result, Logistic Regression wouldn’t provide the most accurate conclusions.

My final target for the semester would be researching on a new statistic model and run it through R codes on the same data set. Even though there may not be enough time for me to write a whole report, I will bring interesting conclusions. As my demonstration of learning, I am expecting to talk about my model, my codes, and my conclusions from both years. In specific, I am interested in comparing the similarities and differences between my conclusions. As of a long term plan, I am curious about why these differences exist and if there are ways we can identify the scales of these differences.

In addition to my plans, I really want to talk about some codes I learned. However, since this is not a tutorial, talking about each function and code in R would not be helpful. In general, I learned how to set variables, perform basic algorithms, identify data type, create vectors/matrix, and draw data plots or basic functions. I have also self-learned something about “if”. Once I have a better understanding of these codes, I will certainly share my experience with you! For now, I want to introduce the software and R in general.

R (for windows): The most fundamental structure and logic of R are here. If you are using Mac or Linux, then simply google R for Mac or R for Linux. It is free.

Some advantages of R are:

1, It is free.

2, It is open source so you can install any packages easily.

3, R is easy to install and is only 50 MB.

4, R is overall easier to learn.

R Studio: This is another coding platform for R. You can’t run R Studio without having R in the first place. It goes through the same logic and process as R, however, it is neater and easier. For instance, R Studio allows you to edit your codes while R doesn’t have this function. I feel R Studio is extremely important for beginners. The most basic version is free, and is well enough for users like me!

Works Cited

Boysen, Jacob. “MRI and Alzheimers.” kaggle, www.kaggle.com/jboysen/mri-and-alzheimers. Accessed 15 Apr. 2019.

“Logistic Regression.” Carnegie Mellon University Department of Statistics, www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch12.pdf. Accessed 15 Apr. 2019.

“Plotting and Intrepretating an ROC Curve.” The Darwin Web Server, gim.unmc.edu/dxtests/roc2.htm. Accessed 15 Apr. 2019.

R Programming. Coursera, www.coursera.org/learn/r-programming. Accessed 15 Apr. 2019.

R Studio. R Studio, www.rstudio.com/. Accessed 15 Apr. 2019.

1 thought on “R (and my plan) – Baiting

  1. sabrina.schoenborn

    So excited to hear about your new plan and your new statistical model! Additionally, I have loved how you continually write your blogs for all people, not just those who understand complex math. Have you thought of making a video series for this? This might be a cool way to show your progress! Keep up the great work!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.