Classification of pediatric tumors into biologically defined subtypes is challenging and multifaceted approaches are needed. For this aim, we developed a diagnostic classifier based on DNA methylation profiles. We offer MethPed as an easy-to-use toolbox that allows researchers and clinical diagnosticians to test single samples as well as large cohorts for subclass prediction of pediatric brain tumors. The current version of MethPed can classify the following tumor diagnoses/subgroups: Diffuse Intrinsic Pontine Glioma (DIPG), Ependymoma, Embryonal tumors with multilayered rosettes (ETMR), Glioblastoma (GBM), Medulloblastoma (MB) - Group 3 (MB_Gr3), Group 4 (MB_Gr3), Group WNT (MB_WNT), Group SHH (MB_SHH) and Pilocytic Astrocytoma (PiloAstro).
Fit a logistic regression model using Firth's bias reduction method, equivalent to penalization of the log-likelihood by the Jeffreys prior. Confidence intervals for regression coefficients can be computed by penalized profile likelihood. Firth's method was proposed as ideal solution to the problem of separation in logistic regression, see Heinze and Schemper (2002) <doi:10.1002/sim.1047>. If needed, the bias reduction can be turned off such that ordinary maximum likelihood logistic regression is obtained. Two new modifications of Firth's method, FLIC and FLAC, lead to unbiased predictions and are now available in the package as well, see Puhr et al (2017) <doi:10.1002/sim.7273>.
Herramientas para el análisis de datos de COVID-19 en México. Descarga y analiza los datos para COVID-19 de la Direccion General de Epidemiologà a de México (DGE) <https://www.gob.mx/salud/documentos/datos-abiertos-152127>, la Red de Infecciones Respiratorias Agudas Graves (Red IRAG) <https://www.gits.igg.unam.mx/red-irag-dashboard/reviewHome> y la Iniciativa Global para compartir todos los datos de influenza (GISAID) <https://gisaid.org/>. English: Downloads and analyzes data of COVID-19 from the Mexican General Directorate of Epidemiology (DGE), the Network of Severe Acute Respiratory Infections (IRAG network),and the Global Initiative on Sharing All Influenza Data GISAID.
Time-varying coefficient models for interval censored and right censored survival data including 1) Bayesian Cox model with time-independent, time-varying or dynamic coefficients for right censored and interval censored data studied by Sinha et al. (1999) <doi:10.1111/j.0006-341X.1999.00585.x> and Wang et al. (2013) <doi:10.1007/s10985-013-9246-8>, 2) Spline based time-varying coefficient Cox model for right censored data proposed by Perperoglou et al. (2006) <doi:10.1016/j.cmpb.2005.11.006>, and 3) Transformation model with time-varying coefficients for right censored data using estimating equations proposed by Peng and Huang (2007) <doi:10.1093/biomet/asm058>.
Provide an optimal histogram, in the sense of probability density estimation and features detection, by means of multiscale variational inference. In other words, the resulting histogram servers as an optimal density estimator, and meanwhile recovers the features, such as increases or modes, with both false positive and false negative controls. Moreover, it provides a parsimonious representation in terms of the number of blocks, which simplifies data interpretation. The only assumption for the method is that data points are independent and identically distributed, so it applies to fairly general situations, including continuous distributions, discrete distributions, and mixtures of both. For details see Li, Munk, Sieling and Walther (2016) <arXiv:1612.07216>.
Using mixed effects models to analyse longitudinal gene expression can highlight differences between sample groups over time. The most widely used differential gene expression tools are unable to fit linear mixed effect models, and are less optimal for analysing longitudinal data. This package provides negative binomial and Gaussian mixed effects models to fit gene expression and other biological data across repeated samples. This is particularly useful for investigating changes in RNA-Sequencing gene expression between groups of individuals over time, as described in: Rivellese, F., Surace, A. E., Goldmann, K., Sciacca, E., Cubuk, C., Giorli, G., ... Lewis, M. J., & Pitzalis, C. (2022) Nature medicine <doi:10.1038/s41591-022-01789-0>.
Performance measures and scores for statistical classification such as accuracy, sensitivity, specificity, recall, similarity coefficients, AUC, GINI index, Brier score and many more. Calculation of optimal cut-offs and decision stumps (Iba and Langley (1991), <doi:10.1016/B978-1-55860-247-2.50035-8>) for all implemented performance measures. Hosmer-Lemeshow goodness of fit tests (Lemeshow and Hosmer (1982), <doi:10.1093/oxfordjournals.aje.a113284>; Hosmer et al (1997), <doi:10.1002/(SICI)1097-0258(19970515)16:9%3C965::AID-SIM509%3E3.0.CO;2-O>). Statistical and epidemiological risk measures such as relative risk, odds ratio, number needed to treat (Porta (2014), <doi:10.1093%2Facref%2F9780199976720.001.0001>).
This package provides functions to do O2PLS-DA analysis for multiple omics data integration. The algorithm came from "O2-PLS, a two-block (X±Y) latent variable regression (LVR) method with an integral OSC filter" which published by Johan Trygg and Svante Wold at 2003 <doi:10.1002/cem.775>. O2PLS is a bidirectional multivariate regression method that aims to separate the covariance between two data sets (it was recently extended to multiple data sets) (Löfstedt and Trygg, 2011 <doi:10.1002/cem.1388>; Löfstedt et al., 2012 <doi:10.1016/j.aca.2013.06.026>) from the systematic sources of variance being specific for each data set separately.
Offers a range of utilities and functions for everyday programming tasks. 1.Data Manipulation. Such as grouping and merging, column splitting, and character expansion. 2.File Handling. Read and convert files in popular formats. 3.Plotting Assistance. Helpful utilities for generating color palettes, validating color formats, and adding transparency. 4.Statistical Analysis. Includes functions for pairwise comparisons and multiple testing corrections, enabling perform statistical analyses with ease. 5.Graph Plotting, Provides efficient tools for creating doughnut plot and multi-layered doughnut plot; Venn diagrams, including traditional Venn diagrams, upset plots, and flower plots; Simplified functions for creating stacked bar plots, or a box plot with alphabets group for multiple comparison group.
This package provides tools for translating environmental change into organismal response. Microclimate models to vertically scale weather station data to organismal heights. The biophysical modeling tools include both general models for heat flows and specific models to predict body temperatures for a variety of ectothermic taxa. Additional functions model and temporally partition air and soil temperatures and solar radiation. Utility functions estimate the organismal and environmental parameters needed for biophysical ecology. TrenchR focuses on relatively simple and modular functions so users can create transparent and flexible biophysical models. Many functions are derived from Gates (1980) <doi:10.1007/978-1-4612-6024-0> and Campbell and Norman (1988) <isbn:9780387949376>.
Companion R package for the course "Statistical analysis of correlated and repeated measurements for health science researchers" taught by the section of Biostatistics of the University of Copenhagen. It implements linear mixed models where the model for the variance-covariance of the residuals is specified via patterns (compound symmetry, toeplitz, unstructured, ...). Statistical inference for mean, variance, and correlation parameters is performed based on the observed information and a Satterthwaite approximation of the degrees of freedom. Normalized residuals are provided to assess model misspecification. Statistical inference can be performed for arbitrary linear or non-linear combination(s) of model coefficients. Predictions can be computed conditional to covariates only or also to outcome values.
Machine learning method specifically designed for pre-miRNA prediction. It takes advantage of unlabeled sequences to improve the prediction rates even when there are just a few positive examples, when the negative examples are unreliable or are not good representatives of its class. Furthermore, the method can automatically search for negative examples if the user is unable to provide them. MiRNAss can find a good boundary to divide the pre-miRNAs from other groups of sequences; it automatically optimizes the threshold that defines the classes boundaries, and thus, it is robust to high class imbalance. Each step of the method is scalable and can handle large volumes of data.
Transfer learning, as a prevailing technique in computer sciences, aims to improve the performance of a target model by leveraging auxiliary information from heterogeneous source data. We provide novel tools for multi-source transfer learning under statistical models based on model averaging strategies, including linear regression models, partially linear models. Unlike existing transfer learning approaches, this method integrates the auxiliary information through data-driven weight assignments to avoid negative transfer. This is the first package for transfer learning based on the optimal model averaging frameworks, providing efficient implementations for practitioners in multi-source data modeling. The details are described in Hu and Zhang (2023) <https://jmlr.org/papers/v24/23-0030.html>.
This package provides a function for estimating the transition probabilities in an illness-death model. The transition probabilities can be estimated from the unsmoothed landmark estimators developed by de Una-Alvarez and Meira-Machado (2015) <doi:10.1111/biom.12288>. Presmoothed estimates can also be obtained through the use of a parametric family of binary regression curves, such as logit, probit or cauchit. The additive logistic regression model and nonparametric regression are also alternatives which have been implemented. The idea behind the presmoothed landmark estimators is to use the presmoothing techniques developed by Cao et al. (2005) <doi:10.1007/s00180-007-0076-6> in the landmark estimation of the transition probabilities.
Penalized and non-penalized maximum likelihood estimation of smooth transition vector autoregressive models with various types of transition weight functions, conditional distributions, and identification methods. Constrained estimation with various types of constraints is available. Residual based model diagnostics, forecasting, simulations, counterfactual analysis, and computation of impulse response functions, generalized impulse response functions, generalized forecast error variance decompositions, as well as historical decompositions. See Heather Anderson, Farshid Vahid (1998) <doi:10.1016/S0304-4076(97)00076-6>, Helmut Lütkepohl, Aleksei Netšunajev (2017) <doi:10.1016/j.jedc.2017.09.001>, Markku Lanne, Savi Virolainen (2025) <doi:10.1016/j.jedc.2025.105162>, Savi Virolainen (2025) <doi:10.48550/arXiv.2404.19707>.
An introduction to a couple of novel predictive variable selection methods for generalised boosted regression modeling (gbm). They are based on various variable influence methods (i.e., relative variable influence (RVI) and knowledge informed RVI (i.e., KIRVI, and KIRVI2)) that adopted similar ideas as AVI, KIAVI and KIAVI2 in the steprf package, and also based on predictive accuracy in stepwise algorithms. For details of the variable selection methods, please see: Li, J., Siwabessy, J., Huang, Z. and Nichol, S. (2019) <doi:10.3390/geosciences9040180>. Li, J., Alvarez, B., Siwabessy, J., Tran, M., Huang, Z., Przeslawski, R., Radke, L., Howard, F., Nichol, S. (2017). <DOI: 10.13140/RG.2.2.27686.22085>.
This package provides tests for segregation distortion in F1 polyploid populations under different assumptions of meiosis. These tests can account for double reduction, partial preferential pairing, and genotype uncertainty through the use of genotype likelihoods. Parallelization support is provided. Details of these methods are described in Gerard et al. (2025a) <doi:10.1007/s00122-025-04816-z> and Gerard et al. (2025b) <doi:10.1101/2025.06.23.661114>. Part of this material is based upon work supported by the National Science Foundation under Grant No. 2132247. The opinions, findings, and conclusions or recommendations expressed are those of the author and do not necessarily reflect the views of the National Science Foundation.
Computation of stopping boundaries for a single-arm trial using a Bayesian criterion; i.e., for each m<=n (n= total patient number of the trial) the smallest number of observed toxicities is calculated leading to the termination of the trial/accrual according to the specified criteria. The probabilities of stopping the trial/accrual at and up until (resp.) the m-th patient (m<=n) is also calculated. This design is more conservative than the frequentist approach (using Clopper Pearson CIs) which might be preferred as it concerns safety.See also Aamot et.al.(2010) "Continuous monitoring of toxicity in clinical trials - simulating the risk of stopping prematurely" <doi:10.5414/cpp48476>.
Supports systematic scrutiny, modification, and integration of data. The function status() counts rows that have missing values in grouping columns (returned by na() ), have non-unique combinations of grouping columns (returned by dup() ), and that are not locally sorted (returned by unsorted() ). Functions enumerate() and itemize() give sorted unique combinations of columns, with or without occurrence counts, respectively. Function ignore() drops columns in x that are present in y, and informative() drops columns in x that are entirely NA; constant() returns values that are constant, given a key. Data that have defined unique combinations of grouping values behave more predictably during merge operations.
This package performs an analysis of time-to-event clinical trial data using various "win time" methods, including ewt', ewtr', rmt', ewtp', rewtp', ewtpr', rewtpr', max', wtr', rwtr', pwt', and rpwt'. These methods are used to calculate and compare treatment effects on ordered composite endpoints. The package handles event times, event indicators, and treatment arm indicators and supports calculations on observed and resampled data. Detailed explanations of each method and usage examples are provided in "Use of win time for ordered composite endpoints in clinical trials," by Troendle et al. (2024)<https://pubmed.ncbi.nlm.nih.gov/38417455/>. For more information, see the package documentation or the vignette titled "Introduction to wintime.".
Intuitive framework for identifying spatially variable genes (SVGs) and differential spatial variable pattern (DSP) between conditions via edgeR, a popular method for performing differential expression analyses. Based on pre-annotated spatial clusters as summarized spatial information, DESpace models gene expression using a negative binomial (NB), via edgeR, with spatial clusters as covariates. SVGs are then identified by testing the significance of spatial clusters. For multi-sample, multi-condition datasets, we again fit a NB model via edgeR, incorporating spatial clusters, conditions and their interactions as covariates. DSP genes-representing differences in spatial gene expression patterns across experimental conditions-are identified by testing the interaction between spatial clusters and conditions.
This package provides functions that solve initial value problems of a system of first-order ordinary differential equations (ODE), of partial differential equations (PDE), of differential algebraic equations (DAE), and of delay differential equations. The functions provide an interface to the FORTRAN functions lsoda, lsodar, lsode, lsodes of the ODEPACK collection, to the FORTRAN functions dvode and daspk and a C-implementation of solvers of the Runge-Kutta family with fixed or variable time steps. The package contains routines designed for solving ODEs resulting from 1-D, 2-D and 3-D partial differential equations that have been converted to ODEs by numerical differencing.
Computes confidence intervals for binomial or Poisson rates and their differences or ratios. Including the rate (or risk) difference ('RD') or rate ratio (or relative risk, RR') for binomial proportions or Poisson rates, and odds ratio ('OR', binomial only). Also confidence intervals for RD, RR or OR for paired binomial data, and estimation of a proportion from clustered binomial data. Includes skewness-corrected asymptotic score ('SCAS') methods, which have been developed in Laud (2017) <doi:10.1002/pst.1813> from Miettinen and Nurminen (1985) <doi:10.1002/sim.4780040211> and Gart and Nam (1988) <doi:10.2307/2531848>, and in Laud (2025, under review) for paired proportions. The same score produces hypothesis tests that are improved versions of the non-inferiority test for binomial RD and RR by Farrington and Manning (1990) <doi:10.1002/sim.4780091208>, or a generalisation of the McNemar test for paired data. The package also includes MOVER methods (Method Of Variance Estimates Recovery) for all contrasts, derived from the Newcombe method but with options to use equal-tailed intervals in place of the Wilson score method, and generalised for Bayesian applications incorporating prior information. So-called exact methods for strictly conservative coverage are approximated using continuity adjustments, and the amount of adjustment can be selected to avoid over-conservative coverage. Also includes methods for stratified calculations (e.g. meta-analysis), either with fixed effect assumption (matching the CMH test) or incorporating stratum heterogeneity.
We provide tools to estimate two prediction accuracy metrics, the average positive predictive values (AP) as well as the well-known AUC (the area under the receiver operator characteristic curve) for risk scores. The outcome of interest is either binary or censored event time. Note that for censored event time, our functions estimates, the AP and the AUC, are time-dependent for pre-specified time interval(s). A function that compares the APs of two risk scores/markers is also included. Optional outputs include positive predictive values and true positive fractions at the specified marker cut-off values, and a plot of the time-dependent AP versus time (available for event time data).