Please use this identifier to cite or link to this item:
Title: On the Choice of Linear Regression Algorithms for Biological and Ecological Applications
Authors: Vieira, Vasco
Creed, Joel
Scrosati, Ricardo A.
Santos, Anabela
Dutschke, Georg
Leitão, Francisco
Engelen, Aschwin H.
Huanel, Oscar R.
Guillemin, Marie-Laure
Mateus, Marcus
Neves, Ramiro
Keywords: Model II regression
Principal Components Analysis
Reduced Major Axis
Issue Date: 2016
Publisher: Annual Research & Review in Biology
Citation: Vieira, V.; Creed, J.; Scrosati, R. A.; Santos, A.; Dutschke, G. et al.(2016). On the Choice of Linear Regression Algorithms for Biological and Ecological Applications. Annual Research & Review in Biology, Vol. 10(3), 1-9.
Abstract: Model II regression (i.e. minimizing residuals obliquely) is the adequate alternative to Model I regression by Ordinary Least Squares (i.e. minimizing residuals vertically) given the absence of well-established dependence relationships or x measured with error. Yet, it has no perfect solution. Determining the true slope from errors-in-the-variables models requires the errors in x and y estimated from higher order moments. However, their accurate estimation requires enormous data sets and thus they are not applicable to most ecological problems. The alternative Reduced Major Axis (RMA) is dependent on a strict set of assumptions, hardly met with real data, making it prone to bias, whereas Principal Components Analysis (PCA) becomes less reliable with decreasing correlations while x and y presenting approximate variances. We used artificial data (allowing for the determination of the true slope) to demonstrate when RMA or PCA should be preferred. Consequently, we propose using PCA whenever r2+s2 x/s2 y is higher than 1.5. Otherwise, we suggest generating artificial data manipulated to match the structure of the original, and to test which method provides closer estimates to the input true slope. We provide a user-friendly script to perform this task. We tested the use of RMA and PCA with real data about intraspecific and interspecific biomass-density relations in algae and seagrass, algae frond growth, crustacean and bird morphometry, sardine fisheries and social sciences data, commonly finding widely divergent slope estimates leading to severely biased parameter estimations and model applications. Their analyses support the suggested approach for method selection summarized above.
Appears in Collections:A CE/MKT - Artigos

Files in This Item:
File Description SizeFormat 
document.pdf327.6 kBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.