Penanganan Multikolinieritas dalam Regresi Saham GOTO Menggunakan PCA, Ridge, LASSO, dan PLS
Abstract
This study aims to address the problem of multicollinearity in a multiple regression model of the daily closing stock price of PT GoTo Gojek Tokopedia Tbk (GOTO) during the period from 2022 to early 2025. Multicollinearity occurs when independent variables are highly correlated, which can lead to inefficient and unreliable parameter estimates. GOTO’s stock price experienced high volatility following its Initial Public Offering (IPO) in April 2022, making it necessary to apply appropriate analytical approaches to identify factors influencing its price movements. The study uses the closing price as the dependent variable, with opening price, high price, low price, and trading volume as independent variables. The methods employed include multiple regression and several approaches to handle multicollinearity, namely variable elimination, Principal Component Analysis (PCA), Ridge Regression, LASSO Regression, and Partial Least Squares (PLS) Regression. The initial multiple regression model achieved an R² of 0.9990 and an RMSE of 2.88, but Variance Inflation Factor (VIF) analysis indicated severe multicollinearity. After applying the alternative methods, PLS Regression demonstrated the best performance, with an R² of 0.9990 and an RMSE of 0.0318. Therefore, it can be concluded that PLS Regression is a more stable and accurate method for addressing multicollinearity and improving the prediction of GOTO’s stock prices.
Abstrak
Penelitian ini bertujuan menangani masalah multikolinearitas dalam model regresi berganda terhadap harga saham penutupan harian PT GoTo Gojek Tokopedia Tbk (GOTO) selama periode 2022 hingga awal 2025. Multikolinearitas terjadi ketika variabel bebas saling berkorelasi kuat sehingga menyebabkan estimasi parameter menjadi tidak efisien dan kurang akurat. Harga saham GOTO mengalami volatilitas tinggi sejak IPO April 2022, sehingga diperlukan pendekatan analisis yang tepat untuk mengidentifikasi faktor-faktor yang memengaruhi pergerakan harga. Data penelitian menggunakan variabel Terakhir sebagai variabel dependen, serta Pembukaan, Tertinggi, Terendah, dan Volume sebagai variabel independen. Metode yang digunakan meliputi regresi berganda dan beberapa pendekatan penanganan multikolinearitas, yaitu penghapusan variabel, Principal Component Analysis (PCA), Ridge Regression, LASSO Regression, dan Partial Least Squares (PLS) Regression. Model awal menghasilkan R² sebesar 0,9990 dan RMSE 2,88, namun terindikasi multikolinearitas tinggi berdasarkan nilai VIF. Setelah penerapan metode alternatif, PLS Regression memberikan performa terbaik dengan R² = 0,9990 dan RMSE = 0,0318. Dengan demikian, PLS Regression dinilai paling stabil dan akurat dalam mengatasi multikolinearitas serta meningkatkan ketepatan prediksi harga saham GOTO.
References
Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459. https://doi.org/10.1002/wics.101
Chand, S., & Kamal, S. (2011). Variable selection by lasso-type methods. Pakistan Journal of Statistics and Operation Research, 7(2 SPECIAL ISSUE), 451–464. https://doi.org/10.18187/pjsor.v7i2-sp.389
Chin, W. W., Henseler, J., & Ringle, P. (2010). Handbook of Partial Least Squares. In Handbook of Partial Least Squares (Issue January 2010). https://doi.org/10.1007/978-3-540-32827-8
Dewi, Y. S. (2010). OLS, LASSO dan PLS Pada data Mengandung Multikolinearitas. Jurnal ILMU DASAR, 11(1), 83–91.
Draper, N. R., & Smith, H. (2014). Applied regression analysis. In Applied Regression Analysis (pp. 1–716). https://doi.org/10.1002/9781118625590
Ferdiansyah, Tin, S., & Anthonius. (2016). Globalisasi Ekonomi, Integrasi Ekonomi Global, Dinamika Pasar Modal & Kebutuhan Standar Akuntansi Internasional Ferdiansyah Se Tin Anthonius. Jurnal Akuntansi, 8(1), 119–130.
Golam Kibria, B. M. (2003). Performance of some New Ridge regression estimators. Communications in Statistics Part B: Simulation and Computation, 32(2), 419–435. https://doi.org/10.1081/SAC-120017499
Gujarati, D. N. (2004). Basic Econometrics.: Student Solutions Manual for Use with Basic Econometrics. McGraw-Hill. https://doi.org/0072427922
Gupta, A., Akansha, Joshi, K., Patel, M., & Pratap, V. (2023). Stock Market Prediction using Machine Learning Techniques: A Systematic Review. 2023 International Conference on Power, Instrumentation, Control and Computing, PICC 2023, 1–6. https://doi.org/10.1109/PICC57976.2023.10142862
Montgomery, D.C., Peck, E.A. and Vining, G. G. (2012). Introduction to Linear Regression Analysis. (Fifth Edit). John Wiley & Sons, Hoboken.
Pirouz, D. M. (2012). An Overview of Partial Least Squares. SSRN Electronic Journal, March. https://doi.org/10.2139/ssrn.1631359
Rajput, G. G., & Kaulwar, B. H. (2018). Predicting Stock Prices in National Stock Exchange of India using Principal Component Analysis and Neural Networks. International Journal of Computer Sciences and Engineering, 6(6), 746–752. https://doi.org/10.26438/ijcse/v6i6.746752
Rouf, N., Malik, M. B., Arif, T., Sharma, S., Singh, S., Aich, S., & Kim, H. C. (2021). Stock market prediction using machine learning techniques: A decade survey on methodologies, recent developments, and future directions. Electronics (Switzerland), 10(21). https://doi.org/10.3390/electronics10212717
Sari, D. R. P. (2023). Metode Principal Component Analysis (PCA) sebagai penanganan asumsi multikolinearitas (studi kasus: data produksi tapioka). Parameter: Jurnal Matematika, Statistika dan Terapannya, 2(2), 115–124. https://doi.org/10.30598/parameterv2i02pp115-124
Shrestha, N. (2020). Detecting Multicollinearity in Regression Analysis. American Journal of Applied Mathematics and Statistics, 8(2), 39–42. https://doi.org/10.12691/ajams-8-2-1
Sujatha, K. V., & Sundaram, S. M. (2011). Ridge Regression Model for the Prediction of. 9304, 19–26.
Sungkono, J., & Nugrahaningsih, T. K. (2017). Simulasi Dampak Multikolinearitas Pada Kondisi Penyimpangan Asumsi Normalitas. Magistra, XXIX(101), 45–50.
Swanson, D. A., & Tayman, J. (2012). Regression Methods. Springer Series on Demographic Methods and Population Analysis, 31(November 1987), 165–185. https://doi.org/10.1007/978-90-481-8954-0_8
Tibshirani, R. (1996). Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society. Series B: Methodological, 58(1), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Verleysen, M., & Verleysen, M. (2001). Principal Component Analysis (PCA). Statistics, September, 1–8. https://doi.org/10.5455/ijlr.20170415115235
Wasilaine, T. L., Talakua, M. W., & Lesnussa, Y. A. (2014). Model Regresi Ridge untuk Mengatasi Model Regresi Linier Berganda yang Mengandung Multikolinieritas (Studi Kasus: Data Pertumbuhan Bayi di Kelurahan Namaelo RT 001, Kota Masohi). BAREKENG: Jurnal Ilmu Matematika Dan Terapan, 8(1), 31–37.
Zhang, Y., Shen, D., & Huang, L. (2021). Predicting stock market returns using deep learning and technical indicators. Neurocomputing, 432, 347–364.
Zhao, P., & Yu, B. (2006). On model selection consistency of Lasso. Journal of Machine Learning Research, 7, 2541–2563.






