Multivariable Model-building

A pragmatic approach to regression analysis based on fractional polynomials for modelling continous variables


Patrick Royston and Willi Sauerbrei, Wiley Series in Probability and Statistics, Wiley, 2008

book cover image
Additional material including datasets, programs and teaching material:
* in preparation

last updated 7. September 2009

contact Patrick Royston and Willi Sauerbrei          Impressum



Book Description

return to top

Multivariable regression models are widely used in all areas of science in which empirical data are analysed. Using the multivariable fractional polynomials (MFP) approach this book focuses on the selection of important variables and the determination of functional form for continuous predictors. Despite being relatively simple, the selected models often extract most of the important information from the data. The authors have chosen to concentrate on examples drawn from medical statistics, although the MFP method has applications in many other subject-matter areas as well.

Multivariable Model-Building:



This book provides a readable text giving the rationale of, and practical advice on, a unified
approach to multivariable modelling. It aims to make multivariable model building  simpler, transparent and more effective. This book is aimed at graduate students studying regression modelling and professionals in statistics as well as researchers from medical, physical, social and many other sciences where regression models play a central role.





Table of Contents

return to top

1. Introduction
2. Selection of variables
3. Handling categorical and continous predictors
4. Fractional polynomials for one variable
5. Some issues with univariate FP models
6. MFP: multivariable model-building with fractional polynomials
7. Interactions
8. Model stability
9. Some comparisons of MFP with splines
10. How to work with MFP
11. Special topics involving fractional polynomials
12. Epilogue
Appendix A: Data and software resources 
Appendix B: Glossary of Abbreviations
References
Index

Order

Datasets

return to top

For more details about the data see the Appendix A of the book.


Datasets used once in our book:
No. Name
Outcome
Obs
Events
Vars
01
Myeloma
Survival
65
48
16
02
Freiburg DNA breast cancer
Survival
109
56
1
03
Cervix cancer
Binary
899
141
21
04
Nerve conduction
Cont. 406
N/A
1
05
Triceps skinfold thickness
Cont. 892
N/A 1
06
Diabetes
Cont. 42 N/A 2
07
Advanced prostate cancer Survival 475 338 13
08
Quit smoking study
Cont.
250
N/A
3
09
Breast cancer diagnosis
Binary
458
133
6
10
Boston housing
Cont. 506
N/A 13
11
Pima Indians
Binary 768
268
8
12
Rotterdam breast cancer
Survival 2982 
1518
11
13
Fetal growth
Cont.
574
N/A 1
14
Cholesterol (not available)
Cont. 553
N/A 1


Datasets used more than once in our book:
No. Name Outcome Obs Events Vars
15
Research body fat
Cont. 326
N/A
1
16
GBSG breast cancer            
Survival 686
299
9
17
Educational body fat
Cont. 252
N/A 13
18
Glioma
Survival 411
274
15
19
Prostate cancer
Cont. 97
N/A 7
20
Whitehall 1
Survival 17260
2576
10

Whitehall 1 Binary 17260
1670
10
21 PBC Survival 418 161 17
22
Oral cancer
Binary 397
194
1
23
Kidney cancer
Survival 347
322
10


Simulated data set from chapter 10

ART Study Cont. 250 N/A 10

Extended to 10 replicates of 500 observations, altogether 5000 obervations.


Dataset references, background or analyses


 1. Myeloma
     Krall, J. M., Uthoff, V. A. and Harley, J. B. (1975). A step-up procedure for selecting variables
     associated with survival, Biometrics 31: 49-57.

 2. Freiburg DNA breast cancer
     Pfisterer, J., Kommoss, F., Sauerbrei, W., Menzel, D., Kiechle, M., Giese, E., Hilgarth, M. and
     Pfleiderer, A. (1995). DNA flow cytometry in node positive breast cancer: Prognostic value
     and correlation to morphological and clinical factors, Analytical and Quantitative Cytology and
     Histology 17: 406-412


 3. Cervix cancer   
     Collett, D. (2003). Modelling binary data, second edn, Chapman & Hall/CRC, Boca Raton.


 4. Nerve conduction (no reference)
   

 5. Triceps skinfold thickness
     Cole, T. J. and Green, P. J. (1992). Smoothing reference centile curves: the LMS method and penalized
     likelihood, Statistics in Medicine 11: 1305-1319.


 6. Diabetes
     Sockett, E. B., Daneman, D., Clarson, C. and Ehrich, R. M. (1987). Factors affecting and patterns
     of residual insulin secretion during first year of Type I (insulin-dependent) diabetes mellitus in
     children, Diabetologia 30: 453–459.


 7. Advanced prostate cancer
      Byar, D. P. and Green, S. B. (1980). The choice of treatment for cancer patients based on covariate information:
      application to prostate cancer, Bulletin du Cancer 67: 477–490.


 8. Quit smoking study
      Cohen, J., Cohen, P., West, S. G. and Aiken, L. S. (2003). Applied Multiple Regression/Correlation
      Analysis for the Behavioral Sciences, third edn, Lawrence Erlbaum Associates, New Jersey.


 9. Breast cancer diagnosis
      Sauerbrei, W., Madjar, H. and Prömpeler, H. J. (1998). Differentiation of benign and malignant breast
      tumors by logistic regression and a classification tree using Doppler flow signals, Methods of
      Information in Medicine 37: 226–234.


 10. Boston housing
       Harrison, D. and Rubinfeld, D. L. (1978). Hedonic house prices and the demand for clear air, Journal
       of Environmental Economics and Management 5: 81-102.


11. Pima Indians
      Royston, P. (2005). Multiple imputation of missing values: update of ICE, Stata Journal 5: 527-536.


12. Rotterdam breast cancer
      Sauerbrei, W., Royston, P. and Look, M. (2007). A new proposal for multivariable modelling
      of time-varying effects in survival data based on fractional polynomial time-transformation,
      Biometrical Journal 49: 453-473.


13. Fetal growth
      Altman, D. G. and Chitty, L. S. (1993). Design and analysis of studies to derive charts of fetal size,
      Ultrasound in Obstetrics and Gynecology 3: 378-384.

      14. Cholesterol dataset (not available)
     Mann, J. I., Lewis, B., Shepherd, J.,Winder, A. F., Fenster, S., Rose, L. and Morgan, B. (1988). Blood
     lipid concentrations and other cardiovascular risk factors: distribution, prevalence and detection in
     Britain, British Medical Journal 296: 1702–1706.
  

        15. Research body fat
      Luke, A., Durazo-Arvizu, R. and others (1997). Relation between body mass index and body fat in
      black population samples from Nigeria, Jamaica, and the United States, American Journal of
      Epidemiology 145: 620-628.


16. GBSG breast cancer
      Sauerbrei, W. and Royston, P. (1999). Building multivariable prognostic and diagnostic models:
      transformation of the predictors using fractional polynomials, Journal of the Royal Statistical
      Society, Series A 162: 71-94.
   

17. Educational body fat
      Johnson, R. W. (1996). Fitting percentage of body fat to simple body measurements, Journal of
      Statistics Education 4(1).
   

18. Glioma
      Sauerbrei, W. and Schumacher, M. (1992). A bootstrap resampling procedure for model building:
      application to the Cox regression model, Statistics in Medicine 11: 2093–2109.


19. Prostate cancer
      Stamey, T. A., Kabalin, J. N., McNeal, J. E., Johnstone, I. M., Freiha, F., Redwine, E. A. and Yang, N.
      (1989). Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the
      prostate. ii. radical prostatectomy treated patients, Journal of Urology 141: 1076–1083.



20. Whitehall 1
     Royston, P., Ambler, G. and Sauerbrei, W. (1999). The use of fractional polynomials to model
     continuous risk variables in epidemiology, International Journal of Epidemiology 28: 964-974.


21. PBC
     Fleming, T. R. and Harrington, D. P. (1991). Counting Processes and Survival Analysis, JohnWiley &
     Sons, Ltd/Inc., NewYork.


22. Oral cancer
      Rosenberg, P. S., Katki, H., Swanson, C. A., Brown, L. M., Wacholder, S. and Hoover, R. N. (2003).
      Quantifying epidemiologic risk factors using nonparametric regression: model selection remains the
      greatest challenge, Statistics in Medicine 22: 3369-3381.

23. Kidney cancer
      Royston, P., Sauerbrei, W. and Ritchie, A. W. S. (2004). Is treatment with interferon-α effective in
      all patients with metastatic renal carcinoma? A new approach to the investigation of interactions,
      British Journal of Cancer 23: 794–799.


Programs (only Stata programs are avalable)

return to top