Select Page

Hello Canisius Math department! I hope everyone had a good Easter weekend. Today, I get the chance to share with you something I’ve worked very hard on. I previously completed my undergraduate thesis on alternate regression techniques in the presence of multicollinearity.

In many fields of study there exist questions of cause and effect. What causes interest rates to rise? Can real estate prices be forecasted? What affects agriculture positively and negatively, and what can be done? In all of these problems, multicollinearity may be present. By definition, multicollinearity is the presence of a near linear combination, or dependence, between two or more explanatory variables in regression analysis. Imagine you’re trying to forecast the sales of your food truck during the summer. You correctly assume that nicer weather will bring more customers, so weather is one explanatory variable. As you ponder what else affects sales, you think about events. If there’s a festival or market going on, that would bring more customers as well, so events becomes a second variable. However. when the weather is nicer, that would impact the number of people at the events, or even the events themselves!

In the common regression method of Ordinary Least Squares, presence of multicollinearity causes serious problems which impact the results of the study. Alternative methods to approach this problem are Ridge and Lasso regression. After explaining the problems with Ordinary Least Squares, we will walk through these remedies and discuss the advantages and drawbacks of using each one. I hope you enjoy!