Issues in building multivariable regression models and the importance of transparent reporting


For many years the quality of research in the health sciences has been criticized and it is obvious that ‘waste in research’ has to be reduced (Ioannidis et al., 2014). Problems in design, analysis and reporting of studies are among the most important reasons for this very disappointing situation. Deficiencies in statistical methods and their applications have been raised and consistently expressed over many years (Altman et al 1994, Sauerbrei 2005). Statistical methodologies have been substantially developed, but many of them are ignored in practice and insufficient statistical knowledge in the research community is often emphasized. It is obvious that fishing for significant p-values produces many false positive results (Kyzas et al 2005). The untapped potential of observational research to inform clinical decision making is well known (Visvanathan et al., 2017). Essentially, it is necessary to ensure the use of rigorous methodologies with suitable methods for the design and analysis of a study and transparent reporting of results as key issues.

During the last two decades several initiatives have been started that aim at improving the research process. Obviously, transparent and complete reporting is a pre-requisite to judge the usefulness of data and to interpret study results in the appropriate context. For many different types of studies reporting guidelines have been developed and the EQUATOR network acts as an “umbrella” for developers of such guidelines (Simera et al 2010, Moher et al 2014, Altman et al 2012, Moons et al 2015).

The development of guidance for the statistical analysis of observational studies is one of the difficult areas that warrant more efforts. The STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative was recently founded to address these issues (Sauerbrei et al 2014). Currently there are nine topic groups (TG), each working on specific tasks such as study design, missing data, measurement error and misclassification, causal inference or high-dimensional data.

With an emphasize on topic groups TG2 ‘Selection of variables and their functional forms in multivariable analysis’ and TG6 ‘Evaluating diagnostic tests and prediction models’ we will illustrate the concepts, structure and the general approach of the STRATOS initiative. It will become apparent that considerable research is required to gain more insight into advantages and disadvantages of competing strategies.  We will concentrate on the discussion of strategies for variable selection, the role of shrinkage and the multivariable fractional polynomial (MFP) approach to conduct variable selection and the selection of the functional form for continuous variables.

In the context of prognostic marker research we will illustrate common weaknesses of the design, analysis and reporting studies, with an emphasis on continuous variables. Problems caused by categorization will be expounded and we will attest that modelling of continuous variables has numerous advantages (Sauerbrei and Royston 2010). The main aims of the PROGnosis RESearch Strategy (PROGRESS) partnership will be outlined (Hemingway et al 2013, Riley et al 2013, Riley et al 2019).

In many statistical analysis, models are fitted to the data but results are presented without notification that small perturbations in the data might lead to major changes in the model (Royston and Sauerbrei, 2008). As such, issues of model stability assessment using resampling method will be presented with a practical example.

