research-article

Massive Parallelization of Serial Inference Algorithms for a Complex Generalized Linear Model

Abstract

Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this article we show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases. We focus on optimization and massive parallelization of cyclic coordinate descent approaches to fit a conditioned generalized linear model involving tens of millions of observations and thousands of predictors in a Bayesian context. We find orders-of-magnitude improvement in overall run-time. Coordinate descent approaches are ubiquitous in high-dimensional statistics and the algorithms we propose open up exciting new methodological possibilities with the potential to significantly improve drug safety.

References

  1. Baskaran, M. and Bordawekar, R. 2009. Optimizing sparse matrix-vector multiplication on GPUs. IBM Res. rep. RC24704.Google ScholarGoogle Scholar
  2. Bell, N. and Garland, M. 2009. Efficient sparse matrix-vector multiplication in CUDA. In Proceedings of the ACM/IEEE Conference Supercomputing (SC). ACM, New York.Google ScholarGoogle Scholar
  3. Chatterjee, A. and Lahiri, S. 2011. Bootstrapping lasso estimators. J. Amer. Statist. Assoc. 106, 608--625.Google ScholarGoogle ScholarCross RefCross Ref
  4. Coplan, P., Noel, R., Levitan, B., Ferguson, J., and Mussen, F. 2011. Development of a framework for enhancing the transparency, reproducibility and communication of the benefit--risk balance of medicines. Clin. Pharm. Therapeutics 89, 312--315.Google ScholarGoogle ScholarCross RefCross Ref
  5. Curtis, J., Cheng, H., Delzell, E., Fram, D., Kilgore, M., Saag, K., Yun, H., and DuMouchel, W. 2008. Adaptation of Bayesian data mining algorithms to longitudinal claims data: coxib safety as an example. Medical Care 46, 9, 969--975.Google ScholarGoogle ScholarCross RefCross Ref
  6. Dennis Jr., J. and Schnabel, R. 1989. A view of unconstrained optimization. Handbooks in Oper. Res. Manage. Sci. 1, 1--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. d’Esopo, D. 1959. A convex programming procedure. Naval Res. Logi. Quart. 6, 1, 33--42.Google ScholarGoogle ScholarCross RefCross Ref
  8. Efron, B. and Tibshirani, R. 1986. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat. Sci. 1, 54--75.Google ScholarGoogle ScholarCross RefCross Ref
  9. Farrington, C. 1995. Relative incidence estimation from case series for vaccine safety evaluation. Biometrics 51, 228--235.Google ScholarGoogle ScholarCross RefCross Ref
  10. Funk, M., Westreich, D., Wiesen, C., Stürmer, T., Brookhart, M., and Davidian, M. 2011. Doubly robust estimation of causal effects. Amer. J. Epidemiol. 173, 7, 761--767.Google ScholarGoogle ScholarCross RefCross Ref
  11. Genkin, A., Lewis, D., and Madigan, D. 2007. Large-scale Bayesian logistic regression for text categorization. Technometrics 49, 3, 291--304.Google ScholarGoogle ScholarCross RefCross Ref
  12. Harris, M. 2010. Optimizing parallel reduction in CUDA. nVidia, online.Google ScholarGoogle Scholar
  13. Jin, H., Chen, J., He, H., Williams, G., Kelman, C., and O’Keefe, C. 2008. Mining unexpected temporal associations: Applications in detecting adverse drug reactions. IEEE Trans. Inf. Tech. Biomed. 12, 4, 488--500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kulldorff, M., Davis, R., Kolczak, M., Lewis, E., Lieu, T., and Platt, R. 2011. A maximized sequential probability ratio test for drug and vaccine safety surveillance. Sequent. Anal. 30, 1, 58--78.Google ScholarGoogle ScholarCross RefCross Ref
  15. Kyung, M., Gill, J., Ghosh, M., and Casella, G. 2010. Penalized regression, standard errors, and Bayesian lassos. Bay. Anal. 5, 2, 369--412.Google ScholarGoogle Scholar
  16. Lange, K. 1995. A gradient algorithm locally equivalent to the EM algorithm. J. Roy. Stat. Soc. Ser. B 57, 425--437.Google ScholarGoogle Scholar
  17. Lee, A., Yau, C., Giles, M., Doucet, A., and Holmes, C. 2010. On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. J. Comput. Graph. Stat. 19, 4, 769--789.Google ScholarGoogle ScholarCross RefCross Ref
  18. Li, L. 2009. A conditional sequential sampling procedure for drug safety surveillance. Stat. Med. 28, 25, 3124--3138.Google ScholarGoogle ScholarCross RefCross Ref
  19. Madigan, D., Ryan, P., Simpson, S., and Zorych, I. 2011. Bayesian methods in pharmacovigilance. In Bayesian Statistics 9. Oxford University Press, Oxford, UK, 421--438.Google ScholarGoogle Scholar
  20. Nelder, J. and Wedderburn, R. 1972. Generalized linear models. J. Roy. Stat. Soc. Ser. A (General) 135, 370--384.Google ScholarGoogle ScholarCross RefCross Ref
  21. Norén, G., Bate, A., Hopstadius, J., Star, K., and Edwards, I. 2008. Temporal pattern discovery for trends and transient effects: its application to patient records. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, 963--971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Park, M. and Hastie, T. 2007. L1-regularization path algorithm for generalized linear models. J. Roy. Stat. Soc. Ser. B 69, 4, 659.Google ScholarGoogle ScholarCross RefCross Ref
  23. Ryan, P., Suchard, M., and Madigan, D. 2012. Learning from epidemiology: A framework for interpreting large-scale observational database studies. Under review.Google ScholarGoogle Scholar
  24. Schneeweiss, S., Rassen, J., Glynn, R., Avorn, J., Mogun, H., and Brookhart, M. 2009. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology 20, 4, 512--522.Google ScholarGoogle ScholarCross RefCross Ref
  25. Silberstein, M., Schuster, A., Geiger, D., Patney, A., and Owens, J. 2008. Efficient computation of sum-products on GPUs through software-managed cache. In Proceedings of the 22nd Annual International Conference on Supercomputing. ACM, New York, 309--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Simpson, S. 2011. Self-controlled methods for postmarketing drug safety surveillance in large-scale longitudinal data. Ph.D. thesis, Columbia University.Google ScholarGoogle Scholar
  27. Stang, P., Ryan, P., Racoosin, J., Overhage, J., Hartzema, A., Reich, C., Welebob, E., Scarnecchia, T., and Woodcock, J. 2010. Advancing the science for active surveillance: rationale and design for the observational medical outcomes partnership. Ann. Internal Med. 153, 9, 600--606.Google ScholarGoogle ScholarCross RefCross Ref
  28. Suchard, M. and Rambaut, A. 2009. Many-core algorithms for statistical phylogenetics. Bioinformatics 25, 11, 1370--1376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Suchard, M., Wang, Q., Chan, C., Frelinger, J., Cron, A., and West, M. 2010. Understanding GPU programming for statistical computation: Studies in massively parallel massive mixtures. J. Computat. Graph. Stat. 19, 2, 419--438.Google ScholarGoogle ScholarCross RefCross Ref
  30. Tibbits, M., Haran, M., and Liechty, J. 2011. Parallel multivariate slice sampling. Stat. Comput. 21, 415--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B 58, 1, 267--268.Google ScholarGoogle ScholarCross RefCross Ref
  32. Veldhuizen, T. 1995. Expression templates. C++ Report 7, 5, 26--31.Google ScholarGoogle Scholar
  33. Warga, J. 1963. Minimizing certain convex functions. J. Soc. Indust. Appl. Math. 11, 3, 588--593.Google ScholarGoogle ScholarCross RefCross Ref
  34. Wilkinson, D. 2006. Parallel Bayesian computation. In Handbook of Parallel Computing and Statistics. Chapman & Hall/CRC, New York, 481--512.Google ScholarGoogle Scholar
  35. Wu, T. and Lange, K. 2008. Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2, 1, 224--244.Google ScholarGoogle ScholarCross RefCross Ref
  36. Wu, T., Chen, Y., Hastie, T., Sobel, E., and Lange, K. 2009. Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25, 6, 714--721. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zhang, T. and Oles, F. 2001. Text categorization based on regularized linear classification methods. Inf. Ret. 4, 1, 5--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zhou, H., Lange, K., and Suchard, M. 2010. Graphics processing units and high-dimensional optimization. Stat. Sci. 25, 3, 311--324.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Massive Parallelization of Serial Inference Algorithms for a Complex Generalized Linear Model

    Reviews

    Amos O Olagunju

    Computers with multicore central processing units (CPUs) and multiple graphics processing units (GPUs) exist today to speed up the parallel processing of computationally intensive statistical prediction algorithms. Unfortunately, several of the existing statistical algorithms [1], designed to cope with the instantaneous scrutiny of record-keeping systems in areas such as healthcare, are still sequential and thus computationally deficient. How should efficient statistical algorithms be designed to uncover and use the current and historical trends in medical claim databases to reliably predict the medical products associated with adverse events such as myocardial infarction or severe renal and liver collapse__?__ The authors of this paper critique the limitations of the existing statistical algorithms for coping with regulation and compliance issues in healthcare industries. They recognize the need to explore the parallelization capability of GPUs for solving generalized linear models (GLMs) that involve the solution of computationally intensive log-likelihood functions. Readers who are unfamiliar with computational statistics should browse Kennedy and Gentle's introduction [1] to sequential algorithms for solving unconstrained optimization and nonlinear regression, prior to exploring the insightful parallel algorithms used in this paper to solve GLMs with Bayesian priors or indefinite parameter regularization. The authors present a sequential cyclic coordinate descent algorithm used to fit the familiar Bayesian self-controlled case series. The algorithm targets the time-wasting computation of 1D gradients and Hessian matrices for extensive parallelization. They cleverly show how to represent and manipulate sparse matrices and dense vectors in parallel to derive the gradients and Hessians, and apply the parallel algorithms to compute the maximum a posteriori (MAP) probability estimates for numerous observational healthcare databases. Using GPUs to perform the sparse operations significantly increases the speed of the MAP estimation, compared to using CPUs to execute the sparse or dense computation. Exploiting parallel algorithms to fit complex GLMs to huge datasets offers new opportunities for associating adverse events with specific drugs, while controlling for covariates such as patient demographics, coexisting diseases, and coinciding drugs. However, a complete Bayesian analysis of the entire set of unidentified parameters is missing from the proposed model. Clearly, the authors recognize the roles of cross-validation and bootstrapping in estimating the hyperparameters of the model. However, are accurate estimates of the model hyperparameters really computationally infeasible, as the authors claim__?__ I strongly encourage all computational statisticians to read this perceptive paper and weigh in on this question. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!