Would You Know If You Were Living In The Truman Show?

The Truman Show is a 1998 American science fiction comedy-drama film directed by Peter Weir and co-written by Scott Rudin, Andrew Niccol, Edward S. Feldman, and Adam Schroeder. The film stars Jim…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




My First Data Science Project

JPMC QUANT CHALLENGE 2023

You may not know this about me, but I studied Finance, with an emphasis on Investment Management at University. I competed in stock market challenges, studied portfolio management, and researched the fundamentals of different companies, but I never had dived into Machine Learning for any of my coursework (logically). So when I began studying the field of data science, the question always lingered on my mind, “Can you predict the price of a stock?” (this is not a unique concept). So for my first project, I thought that it would be fun to aim to answer that very question.

I used the JPMC Quant Challenge dataset from Kaggle. The file itself has 15,000 rows and 12 columns. Each of the columns contains fundamental data about a particular stock with a total of 100 stocks in the dataset. For the sake of the project, I chose to use one stock for testing.

The dataset itself was rather clean, and only a few of the columns had outliers.

Histogram for each column
Boxplot for each column
Price of the stock over time

I winsorized the data to remove the outliers, which you can see below.

At this point, I standardized the dataset. Then I performed a correlation analysis to look at the different relationships between variables and the target. The following three visualizations are what I found to be most interesting.

As you can see, the variables often did not linearly correlate with the target, but some of the variables had a linear correlation with each other (based on visualization alone). During this stage, I also created a new feature called “Working Capital”, which was the difference between current assets and current liabilities since those variables were highly correlated with each other.

I tested two different Supervised Learning Models. The first model was Ordinary Least Squares (OLS). The OLS model, in hindsight, was a poor choice. The variables did not appear to be linearly correlated with the target, and the OLS model reflected that with an R-squared value of 0.049.

The second Supervised Learning Model that I chose for the project was the Support Vector Regression (SVR). SVR can model non-linear relationships more effectively than OLS, and that became apparent when running the model. I achieved an R-squared value of 0.56.

Using the following hyperparameters, I was able to achieve an R-squared score of 0.91: gamma=0.5, C=10, epsilon = 0.05.

I truly enjoyed this project and am proud of the results. I also admit that there are limitations in the project itself. For example, we only discussed one of the one-hundred available stocks in the dataset. The analysis can definitely be scaled and improved upon further. I would also like to note that building a machine learning model for stock prediction should also only be used as another tool in a full analysis, as there are many outside factors that can come into play when predicting a stock’s price.

Also, after completing this project, I then performed a time series forecasting analysis on the stock, which I will discuss in a subsequent post.

Add a comment

Related posts:

How My Dog Taught Me to Play Again

One of my favorite things about dogs is their complete disregard for what’s appropriate. They simply do what they want to do regardless of what else is going on. I don’t mean misbehaving. I mean the…

How I batch tasks

I am super passionate about productivity hacks. And one of those hacks that I’ve found most useful is batching. Recently, I bumped into an article about Tim Dorsey (Founder of Twitter) about how he…

Level Up Your Tinder Game

Dating is still a mystery to many and remains tricky for most. The feeling when you match you, somebody, on Tinder, is great. But the excitement quickly disappears when you have to write the opener…