This wrapper package for mgcv makes it easier to create high-performing Generalized Additive Models (GAMs). With its central function autogam()
, by entering just a dataset and the name of the outcome column as inputs, AutoGAM
tries to automate the procedure of configuring a highly accurate GAM which performs at reasonably high speed, even for large datasets.
This package provides a fast, flexible and transparent framework to estimate context-specific word and short document embeddings using the a la carte embeddings approach developed by Khodak et al. (2018) <arXiv:1805.05388>
and evaluate hypotheses about covariate effects on embeddings using the regression framework developed by Rodriguez et al. (2021)<https://github.com/prodriguezsosa/EmbeddingRegression>
.
Package to fit diffusion-based IRT models to response and response time data. Models are fit using marginal maximum likelihood. Parameter restrictions (fixed value and equality constraints) are possible. In addition, factor scores (person drift rate and person boundary separation) can be estimated. Model fit assessment tools are also available. The traditional diffusion model can be estimated as well.
The amplitude-dependent autoregressive time series model (EXPAR) proposed by Haggan and Ozaki (1981) <doi:10.2307/2335819> was improved by incorporating the moving average (MA) framework for capturing the variability efficiently. Parameters of the EXPARMA model can be estimated using this package. The user is provided with the best fitted EXPARMA model for the data set under consideration.
This package provides tools to work with the Flexible Dirichlet distribution. The main features are an E-M algorithm for computing the maximum likelihood estimate of the parameter vector and a function based on conditional bootstrap to estimate its asymptotic variance-covariance matrix. It contains also functions to plot graphs, to generate random observations and to handle compositional data.
Automatic open data acquisition from resources of IGN ('Institut National de Information Geographique et forestiere') (<https://www.ign.fr/>). Available datasets include various types of raster and vector data, such as digital elevation models, state borders, spatial databases, cadastral parcels, and more. There also access to point clouds data ('LIDAR') and specifics API (<https://apicarto.ign.fr/api/doc/>).
This package provides semiparametric sufficient dimension reduction for central mean subspaces for heterogeneous data defined by combinations of binary factors (such as chronic conditions). Subspaces are estimated to be hierarchically nested to respect the structure of subpopulations with overlapping characteristics. This package is an implementation of the proposed methodology of Huling and Yu (2021) <doi:10.1111/biom.13546>.
Implementation of the KCMeans regression estimator studied by Wiemann (2023) <arXiv:2311.17021>
for expectation function estimation conditional on categorical variables. Computation leverages the unconditional KMeans implementation in one dimension using dynamic programming algorithm of Wang and Song (2011) <doi:10.32614/RJ-2011-015>, allowing for global solutions in time polynomial in the number of observed categories.
This package performs Monte Carlo hypothesis tests, allowing a couple of different sequential stopping boundaries. For example, a truncated sequential probability ratio test boundary (Fay, Kim and Hachey, 2007 <DOI:10.1198/106186007X257025>) and a boundary proposed by Besag and Clifford, 1991 <DOI:10.1093/biomet/78.2.301>. Gives valid p-values and confidence intervals on p-values.
This package provides a general framework for clinical trial simulations based on the Clinical Scenario Evaluation (CSE) approach. The package supports a broad class of data models (including clinical trials with continuous, binary, survival-type and count-type endpoints as well as multivariate outcomes that are based on combinations of different endpoints), analysis strategies and commonly used evaluation criteria.
Order, create and store reports from R. By defining a lightweight interface around the inputs and outputs of an analysis, a lot of the repetitive work for reproducible research can be automated. We define a simple format for organising and describing work that facilitates collaborative reproducible research and acknowledges that all analyses are run multiple times over their lifespans.
This package provides a method for fitting the entire regularization path of the principal components lasso for linear and logistic regression models. The algorithm uses cyclic coordinate descent in a path-wise fashion. See URL below for more information on the algorithm. See Tay, K., Friedman, J. ,Tibshirani, R., (2014) Principal component-guided sparse regression <arXiv:1810.04651>
.
This package provides functions for estimating the potential dispersal of tree species using regeneration densities and dispersal distances to nearest seed trees. A quantile regression is implemented to determine the dispersal potential. Spatial prediction can be used to identify natural regeneration potential for forest restoration as described in Axer et al (2021) <doi:10.1016/j.foreco.2020.118802>.
Sparse Linear Method(SLIM) predicts ratings and top-n recommendations suited for sparse implicit positive feedback systems. SLIM is decomposed into multiple elasticnet optimization problems which are solved in parallel over multiple cores. The package is based on "SLIM: Sparse Linear Methods for Top-N Recommender Systems" by Xia Ning and George Karypis <doi:10.1109/ICDM.2011.134>.
The SAWNUTI algorithm performs sequence comparison for finite sequences of discrete events with non-uniform time intervals. Further description of the algorithm can be found in the paper: A. Murph, A. Flynt, B. R. King (2021). Comparing finite sequences of discrete events with non-uniform time intervals, Sequential Analysis, 40(3), 291-313. <doi:10.1080/07474946.2021.1940491>.
This package implements stacked elastic net regression (Rauschenberger 2021, <doi:10.1093/bioinformatics/btaa535>). The elastic net generalises ridge and lasso regularisation (Zou 2005, <doi:10.1111/j.1467-9868.2005.00503.x>). Instead of fixing or tuning the mixing parameter alpha, we combine multiple alpha by stacked generalisation (Wolpert 1992 <doi:10.1016/S0893-6080(05)80023-1>).
These functions were developed within SECFISH project (Strengthening regional cooperation in the area of fisheries data collection-Socio-economic data collection for fisheries, aquaculture and the processing industry at EU level). They are aimed at identifying correlations between costs and transversal variables by metier using individual vessel data and for disaggregating variable costs from fleet segment to metier level.
After the clustering step of a single-cell RNAseq experiment, this package aims to suggest labels/cell types for the clusters, on the basis of similarity to a reference dataset. It requires a table of read counts per cell per gene, and a list of the cells belonging to each of the clusters, (for both test and reference data).
Messina is a collection of algorithms for constructing optimally robust single-gene classifiers, and for identifying differential expression in the presence of outliers or unknown sample subgroups. The methods have application in identifying lead features to develop into clinical tests (both diagnostic and prognostic), and in identifying differential expression when a fraction of samples show unusual patterns of expression.
The package generally provides methods for gene set enrichment analysis of high-throughput RNA-Seq data by integrating differential expression and splicing. It uses negative binomial distribution to model read count data, which accounts for sequencing biases and biological variation. Based on permutation tests, statistical significance can also be achieved regarding each gene's differential expression and splicing, respectively.
This package enables automated selection of group specific signature, especially for rare population. The package is developed for generating specifc lists of signature genes based on Term Frequency-Inverse Document Frequency (TF-IDF) modified methods. It can also be used as a new gene-set scoring method or data transformation method. Multiple visualization functions are implemented in this package.
This package helps with quality checks, visualizations and analysis of mass spectrometry data, coming from proteomics experiments. The package is developed, tested and used at the Functional Genomics Center Zurich, where it is used mainly for prototyping, teaching, and having fun with proteomics data. But it can also be used to do data analysis for small scale data sets.
This package provides various themes, palettes, and other functions that are used to customise ggplots to look like they were made in GraphPad Prism. The Prism-look is achieved with theme_prism()
and scale_fill|colour_prism()
, axes can be changed with custom guides like guide_prism_minor()
, and significance indicators added with add_pvalue()
.
R is a language and environment for statistical computing and graphics. It provides a variety of statistical techniques, such as linear and nonlinear modeling, classical statistical tests, time-series analysis, classification and clustering. It also provides robust support for producing publication-quality data plots. A large amount of 3rd-party packages are available, greatly increasing its breadth and scope.