Implementation of commonly used penalized functional linear regression models, including the Smooth and Locally Sparse (SLoS
) method by Lin et al. (2016) <doi:10.1080/10618600.2016.1195273>, Nested Group bridge Regression (NGR) method by Guan et al. (2020) <doi:10.1080/10618600.2020.1713797>, Functional Linear Regression That's interpretable (FLIRTI) by James et al. (2009) <doi:10.1214/08-AOS641>, and the Penalized B-spline regression method.
This package provides functions to visualise sports data. Converts data into a format suitable for plotting charts. Helps to ease the process of working with messy sports data to a more user friendly format. Football data is accessed through worldfootballR
<https://github.com/JaseZiv/worldfootballR>
which gets data from FBref <https://fbref.com/en>, Transfermarkt <https://www.transfermarkt.com/>, Understat <https://understat.com/>, and fotmob <https://www.fotmob.com/>.
This package provides methods for regression with high-dimensional predictors and univariate or maltivariate response variables. It considers the decomposition of the coefficient matrix that leads to the best approximation to the signal part in the response given any rank, and estimates the decomposition by solving a penalized generalized eigenvalue problem followed by a least squares procedure. Ruiyan Luo and Xin Qi (2017) <doi:10.1016/j.jmva.2016.09.005>.
This package provides a tool to define rare biosphere. ulrb solves the problem of the definition of rarity by replacing arbitrary thresholds with an unsupervised machine learning algorithm (partitioning around medoids, or k-medoids). This algorithm works for any type of microbiome data, provided there is an abundance table. For validation of this method to different abundance tables, see Pascoal et al, 2025. This method also works for non-microbiome data.
Comprehensive set of tools for analyzing and manipulating functional data with non-uniform lengths. This package addresses two common scenarios in functional data analysis: Variable Domain Data, where the observation domain differs across samples, and Partially Observed Data, where observations are incomplete over the domain of interest. VDPO enhances the flexibility and applicability of functional data analysis in R'. See Amaro et al. (2024) <doi:10.48550/arXiv.2401.05839>
.
This package provides a function to make gene presence/absence calls based on distance from negative strand matching probesets (NSMP) which are derived from Affymetrix annotation. PANP is applied after gene expression values are created, and therefore can be used after any preprocessing method such as MAS5 or GCRMA, or PM-only methods like RMA. NSMP sets have been established for the HGU133A and HGU133-Plus-2.0 chipsets to date.
The Predictive Model Markup Language (PMML) is an XML-based language which provides a way for applications to define machine learning, statistical and data mining models and to share models between PMML compliant applications. More information about the PMML industry standard and the Data Mining Group can be found at http://dmg.org/. The generated PMML can be imported into any PMML consuming application, such as Zementis Predictive Analytics products.
This package provides an approach which is based on the methodology of the Burden of Communicable Diseases in Europe (BCoDE
) and can be used for large and small samples such as individual countries. The Burden of Healthcare-Associated Infections (BHAI) is estimated in disability-adjusted life years, number of infections as well as number of deaths per year. Results can be visualized with various plotting functions and exported into tables.
This package provides correlation-based penalty estimators for both linear and logistic regression models by implementing a new regularization method that incorporates correlation structures within the data. This method encourages a grouping effect where strongly correlated predictors tend to be in or out of the model together. See Tutz and Ulbricht (2009) <doi:10.1007/s11222-008-9088-5> and Algamal and Lee (2015) <doi:10.1016/j.eswa.2015.08.016>.
This package performs classical age-depth modelling of dated sediment deposits - prior to applying more sophisticated techniques such as Bayesian age-depth modelling. Any radiocarbon dated depths are calibrated. Age-depth models are constructed by sampling repeatedly from the dated levels, each time drawing age-depth curves. Model types include linear interpolation, linear or polynomial regression, and a range of splines. See Blaauw (2010) <doi:10.1016/j.quageo.2010.01.002>.
This package implements parametric (Direct) regression methods for modeling cumulative incidence functions (CIFs) in the presence of competing risks. Methods include the direct Gompertz-based approach and generalized regression models as described in Jeong and Fine (2006) <doi:10.1111/j.1467-9876.2006.00532.x> and Jeong and Fine (2007) <doi:10.1093/biostatistics/kxj040>. The package facilitates maximum likelihood estimation, variance computation, with applications to clinical trials and survival analysis.
Training and prediction functions are provided for the Extreme Learning Machine algorithm (ELM). The ELM use a Single Hidden Layer Feedforward Neural Network (SLFN) with random generated weights and no gradient-based backpropagation. The training time is very short and the online version allows to update the model using small chunk of the training set at each iteration. The only parameter to tune is the hidden layer size and the learning function.
The Forecast Linear Augmented Projection (flap) method reduces forecast variance by adjusting the forecasts of multivariate time series to be consistent with the forecasts of linear combinations (components) of the series by projecting all forecasts onto the space where the linear constraints are satisfied. The forecast variance can be reduced monotonically by including more components. For a given number of components, the flap method achieves maximum forecast variance reduction among linear projections.
This package provides tools to perform fuzzy formal concept analysis, presented in Wille (1982) <doi:10.1007/978-3-642-01815-2_23> and in Ganter and Obiedkov (2016) <doi:10.1007/978-3-662-49291-8>. It provides functions to load and save a formal context, extract its concept lattice and implications. In addition, one can use the implications to compute semantic closures of fuzzy sets and, thus, build recommendation systems.
Fitting and analyzing a Joint Trait Distribution Model. The Joint Trait Distribution Model is implemented in the Bayesian framework using conjugate priors and posteriors, thus guaranteeing fast inference. In particular the package computes joint probabilities and multivariate confidence intervals, and enables the investigation of how they depend on the environment through partial response curves. The method implemented by the package is described in Poggiato et al. (2023) <doi:10.1111/geb.13706>.
This package provides functions to fit quantile regression models for hierarchical data (2-level nested designs) as described in Geraci and Bottai (2014, Statistics and Computing) <doi:10.1007/s11222-013-9381-9>. A vignette is given in Geraci (2014, Journal of Statistical Software) <doi:10.18637/jss.v057.i13> and included in the package documents. The packages also provides functions to fit quantile models for independent data and for count responses.
This package provides a complement to all editions of *Modern Data Science with R* (ISBN: 978-0367191498, publisher URL: <https://www.routledge.com/Modern-Data-Science-with-R/Baumer-Kaplan-Horton/p/book/9780367191498>). This package contains data and code to complete exercises and reproduce examples from the text. It also facilitates connections to the SQL database server used in the book. All editions of the book are supported by this package.
This package performs maximum likelihood estimation for finite mixture models for families including Normal, Weibull, Gamma and Lognormal by using EM algorithm, together with Newton-Raphson algorithm or bisection method when necessary. It also conducts mixture model selection by using information criteria or bootstrap likelihood ratio test. The data used for mixture model fitting can be raw data or binned data. The model fitting process is accelerated by using R package Rcpp'.
This package provides an interface to the NoSQL
database CouchDB
(<http://couchdb.apache.org>). Methods are provided for managing databases within CouchDB
', including creating/deleting/updating/transferring, and managing documents within databases. One can connect with a local CouchDB
instance, or a remote CouchDB
databases such as Cloudant'. Documents can be inserted directly from vectors, lists, data.frames, and JSON'. Targeted at CouchDB
v2 or greater.
This package provides a fast, consistent tool for plotting and facilitating the analysis of stratigraphic and sedimentological data. Taking advantage of the flexible plotting tools available in R, SDAR uses stratigraphic and sedimentological data to produce detailed graphic logs for outcrop sections and borehole logs. These logs can include multiple features (e.g., bed thickness, lithology, samples, sedimentary structures, colors, fossil content, bioturbation index, gamma ray logs) (Johnson, 1992, <ISSN 0037-0738>).
Returns a data frame with the names of the input data points and hex colors (or CIELab coordinates). Data can be mapped to colors for use in data visualization. It optimally maps data points into a polygon that represents the CIELab colour space. Since Euclidean distance approximates relative perceptual differences in CIELab color space, the result is a color encoding that aims to capture much of the structure of the original data.
Create an interactive Shiny-based graphical user interface for exploring data stored in SummarizedExperiment
objects, including row- and column-level metadata. The interface supports transmission of selections between plots and tables, code tracking, interactive tours, interactive or programmatic initialization, preservation of app state, and extensibility to new panel types via S4 classes. Special attention is given to single-cell data in a SingleCellExperiment
object with visualization of dimensionality reduction results.
This package provides a framework for adjustment on cell type size when performing bulk transcripomics deconvolution. The main framework function provides a means of reference normalization using cell size scale factors. It allows for marker selection and deconvolution using non-negative least squares (NNLS) by default. The framework is extensible for other marker selection and deconvolution algorithms, and users may reuse the generics, methods, and classes for these when developing new algorithms.
mist (Methylation Inference for Single-cell along Trajectory) is a hierarchical Bayesian framework for modeling DNA methylation trajectories and performing differential methylation (DM) analysis in single-cell DNA methylation (scDNAm
) data. It estimates developmental-stage-specific variations, identifies genomic features with drastic changes along pseudotime, and, for two phenotypic groups, detects features with distinct temporal methylation patterns. mist uses Gibbs sampling to estimate parameters for temporal changes and stage-specific variations.