Tensor Gaussian graphical models (GGMs) have important applications in numerous areas, which can interpret conditional independence structures within tensor data. Yet, the available tensor data in one single study is often limited due to high acquisition costs. Although relevant studies can provide additional data, it remains an open question how to pool such heterogeneous data. This package implements a transfer learning framework for tensor GGMs, which takes full advantage of informative auxiliary domains even when non-informative auxiliary domains are present, benefiting from the carefully designed data-adaptive weights. Reference: Ren, M., Zhen Y., and Wang J. (2022). "Transfer learning for tensor graphical models" <arXiv:2211.09391>
.
The introduction of the broom package has made converting model objects into data frames as simple as a single function. While the broom package focuses on providing tidy data frames that can be used in advanced analysis, it deliberately stops short of providing functionality for reporting models in publication-ready tables. pixiedust provides this functionality with a programming interface intended to be similar to ggplot2's system of layers with fine tuned control over each cell of the table. Options for output include printing to the console and to the common markdown formats (markdown, HTML, and LaTeX
). With a little pixiedust (and happy thoughts) tables can really fly.
Some response-adaptive randomization methods commonly found in literature are included in this package. These methods include the randomized play-the-winner rule for binary endpoint (Wei and Durham (1978) <doi:10.2307/2286290>), the doubly adaptive biased coin design with minimal variance strategy for binary endpoint (Atkinson and Biswas (2013) <doi:10.1201/b16101>, Rosenberger and Lachin (2015) <doi:10.1002/9781118742112>) and maximal power strategy targeting Neyman allocation for binary endpoint (Tymofyeyev, Rosenberger, and Hu (2007) <doi:10.1198/016214506000000906>) and RSIHR allocation with each letter representing the first character of the names of the individuals who first proposed this rule (Youngsook and Hu (2010) <doi:10.1198/sbr.2009.0056>, Bello and Sabo (2016) <doi:10.1080/00949655.2015.1114116>), A-optimal Allocation for continuous endpoint (Sverdlov and Rosenberger (2013) <doi:10.1080/15598608.2013.783726>), Aa-optimal Allocation for continuous endpoint (Sverdlov and Rosenberger (2013) <doi:10.1080/15598608.2013.783726>), generalized RSIHR allocation for continuous endpoint (Atkinson and Biswas (2013) <doi:10.1201/b16101>), Bayesian response-adaptive randomization with a control group using the Thall \& Wathen method for binary and continuous endpoints (Thall and Wathen (2007) <doi:10.1016/j.ejca.2007.01.006>) and the forward-looking Gittins index rule for binary and continuous endpoints (Villar, Wason, and Bowden (2015) <doi:10.1111/biom.12337>, Williamson and Villar (2019) <doi:10.1111/biom.13119>).
An R Shiny application for visual and statistical exploration and web communication of archaeological spatial data, either remains or sites. It offers interactive 3D and 2D visualisations (cross sections and maps of remains, timeline of the work made in a site) which can be exported in SVG and HTML formats. It performs simple spatial statistics (convex hull, regression surfaces, 2D kernel density estimation) and allows exporting data to other online applications for more complex methods. archeoViz
can be used offline locally or deployed on a server, either with interactive input of data or with a static data set. Example is provided at <https://analytics.huma-num.fr/archeoviz/en>.
Fits tractable fully parametric odds-based regression models for survival data, including proportional odds (PO), accelerated failure time (AFT), accelerated odds (AO), and General Odds (GO) models in overall survival frameworks. Given at least an R function specifying the survivor, hazard rate and cumulative distribution functions, any user-defined parametric distribution can be fitted. We applied and evaluated a minimum of seventeen (17) various baseline distributions that can handle different failure rate shapes for each of the four different proposed odds-based regression models. For more information see Bennet et al., (1983) <doi:10.1002/sim.4780020223>, and Muse et al., (2022) <doi:10.1016/j.aej.2022.01.033>.
Original idea was presented in the thesis "A statistical analysis tool for agricultural research" to obtain the degree of Master on science, National Engineering University (UNI), Lima-Peru. Some experimental data for the examples come from the CIP and others research. Agricolae offers extensive functionality on experimental design especially for agricultural and plant breeding experiments, which can also be useful for other purposes. It supports planning of lattice, Alpha, Cyclic, Complete Block, Latin Square, Graeco-Latin Squares, augmented block, factorial, split and strip plot designs. There are also various analysis facilities for experimental data, e.g. treatment comparison procedures and several non-parametric tests comparison, biodiversity indexes and consensus cluster.
These are data sets for the hit TV show, RuPaul's
Drag Race. Data right now include episode-level data, contestant-level data, and episode-contestant-level data. This is a work in progress, and a love letter of a kind to RuPaul's
Drag Race and the performers that have appeared on the show. This may not be the most productive use of my time, but I have tenure and what are you going to do about it? I think there is at least some value in this package if it allows the show's fandom to learn more about the R programming language around its contents.
This package provides a comprehensive framework for bioinformatics exploratory analysis of bulk and single-cell T-cell receptor and antibody repertoires. It provides seamless data loading, analysis and visualisation for AIRR (Adaptive Immune Receptor Repertoire) data, both bulk immunosequencing (RepSeq
) and single-cell sequencing (scRNAseq
). Immunarch implements most of the widely used AIRR analysis methods, such as: clonality analysis, estimation of repertoire similarities in distribution of clonotypes and gene segments, repertoire diversity analysis, annotation of clonotypes using external immune receptor databases and clonotype tracking in vaccination and cancer studies. A successor to our previously published tcR
immunoinformatics package (Nazarov 2015) <doi:10.1186/s12859-015-0613-1>.
Conduct multi-locus genome-wide association study under the framework of multi-locus random-SNP-effect mixed linear model (mrMLM
). First, each marker on the genome is scanned. Bonferroni correction is replaced by a less stringent selection criterion for significant test. Then, all the markers that are potentially associated with the trait are included in a multi-locus genetic model, their effects are estimated by empirical Bayes and all the nonzero effects were further identified by likelihood ratio test for true QTL. Wen YJ, Zhang H, Ni YL, Huang B, Zhang J, Feng JY, Wang SB, Dunwell JM, Zhang YM, Wu R (2018) <doi:10.1093/bib/bbw145>.
Computes the probability density function, cumulative distribution function, quantile function, random numbers and measures of inference for the following general families of distributions (each family defined in terms of an arbitrary cdf G): Marshall Olkin G distributions, exponentiated G distributions, beta G distributions, gamma G distributions, Kumaraswamy G distributions, generalized beta G distributions, beta extended G distributions, gamma G distributions, gamma uniform G distributions, beta exponential G distributions, Weibull G distributions, log gamma G I distributions, log gamma G II distributions, exponentiated generalized G distributions, exponentiated Kumaraswamy G distributions, geometric exponential Poisson G distributions, truncated-exponential skew-symmetric G distributions, modified beta G distributions, and exponentiated exponential Poisson G distributions.
An object is called "outlier" if it remarkably deviates from the other objects in a data set. Outlier detection is the process to find outliers by using the methods that are based on distance measures, clustering and spatial methods (Ben-Gal, 2005 <ISBN 0-387-24435-2>). It is one of the intensively studied research topics for identification of novelties, frauds, anomalies, deviations or exceptions in addition to its use for outlier removing in data processing. This package provides the implementations of some novel approaches to detect the outliers based on typicality degrees that are obtained with the soft partitioning clustering algorithms such as Fuzzy C-means and its variants.
This package provides functions to estimate and interpret the alpha-NOMINATE ideal point model developed in Carroll et al. (2013, <doi:10.1111/ajps.12029>). alpha-NOMINATE extends traditional spatial voting frameworks by allowing for a mixture of Gaussian and quadratic utility functions, providing flexibility in modeling political actors preferences. The package uses Markov Chain Monte Carlo (MCMC) methods for parameter estimation, supporting robust inference about individuals ideological positions and the shape of their utility functions. It also contains functions to simulate data from the model and to calculate the probability of a vote passing given the ideal points of the legislators/voters and the estimated location of the choice alternatives.
The Analytic Hierarchy Process is a versatile multi-criteria decision-making tool introduced by Saaty (1987) <doi:10.1016/0270-0255(87)90473-8> that allows decision-makers to weigh attributes and evaluate alternatives presented to them. This package provides a consistent methodology for researchers to reformat data and run analytic hierarchy process in R on data that are formatted using the survey data entry mode. It is optimized for performing the analytic hierarchy process with many decision-makers, and provides tools and options for researchers to aggregate individual preferences and test multiple options. It also allows researchers to quantify, visualize and correct for inconsistency in the decision-maker's comparisons.
This package contains Bayesian implementations of the Mixed-Effects Accelerated Failure Time (MEAFT) models for censored data. Those can be not only right-censored but also interval-censored, doubly-interval-censored or misclassified interval-censored. The methods implemented in the package have been published in Komárek and Lesaffre (2006, Stat. Modelling) <doi:10.1191/1471082X06st107oa>, Komárek, Lesaffre and Legrand (2007, Stat. in Medicine) <doi:10.1002/sim.3083>, Komárek and Lesaffre (2007, Stat. Sinica) <https://www3.stat.sinica.edu.tw/statistica/oldpdf/A17n27.pdf>, Komárek and Lesaffre (2008, JASA) <doi:10.1198/016214507000000563>, Garcà a-Zattera, Jara and Komárek (2016, Biometrics) <doi:10.1111/biom.12424>.
This package provides a class of Bayesian beta regression models for the analysis of continuous data with support restricted to an unknown finite support. The response variable is modeled using a four-parameter beta distribution with the mean or mode parameter depending linearly on covariates through a link function. When the response support is known to be (0,1), the above class of models reduce to traditional (0,1) supported beta regression models. Model choice is carried out via the logarithm of the pseudo marginal likelihood (LPML), the deviance information criterion (DIC), and the Watanabe-Akaike information criterion (WAIC). See Zhou and Huang (2022) <doi:10.1016/j.csda.2021.107345>.
Generic Machine Learning Inference on heterogeneous treatment effects in randomized experiments as proposed in Chernozhukov, Demirer, Duflo and Fernández-Val (2020) <arXiv:1712.04802>
. This package's workhorse is the mlr3 framework of Lang et al. (2019) <doi:10.21105/joss.01903>, which enables the specification of a wide variety of machine learners. The main functionality, GenericML()
, runs Algorithm 1 in Chernozhukov, Demirer, Duflo and Fernández-Val (2020) <arXiv:1712.04802>
for a suite of user-specified machine learners. All steps in the algorithm are customizable via setup functions. Methods for printing and plotting are available for objects returned by GenericML()
. Parallel computing is supported.
Determining potential output and the output gap - two inherently unobservable variables - is a major challenge for macroeconomists. sectorgap features a flexible modeling and estimation framework for a multivariate Bayesian state space model identifying economic output fluctuations consistent with subsectors of the economy. The proposed model is able to capture various correlations between output and a set of aggregate as well as subsector indicators. Estimation of the latent states and parameters is achieved using a simple Gibbs sampling procedure and various plotting options facilitate the assessment of the results. For details on the methodology and an illustrative example, see Streicher (2024) <https://www.research-collection.ethz.ch/handle/20.500.11850/653682>.
An interface for creating, registering, and resolving content-based identifiers for data management. Content-based identifiers rely on the cryptographic hashes to refer to the files they identify, thus, anyone possessing the file can compute the identifier using a well-known standard algorithm, such as SHA256'. By registering a URL at which the content is accessible to a public archive (such as Hash Archive) or depositing data in a scientific repository such Zenodo', DataONE
or SoftwareHeritage
', the content identifier can serve many functions typically associated with A Digital Object Identifier ('DOI'). Unlike location-based identifiers like DOIs', content-based identifiers permit the same content to be registered in many locations.
Efficient design matrix free lasso penalized estimation in large scale 2 and 3-dimensional generalized linear array model framework. The procedure is based on the gdpg algorithm from Lund et al. (2017) <doi:10.1080/10618600.2017.1279548>. Currently Lasso or Smoothly Clipped Absolute Deviation (SCAD) penalized estimation is possible for the following models: The Gaussian model with identity link, the Binomial model with logit link, the Poisson model with log link and the Gamma model with log link. It is also possible to include a component in the model with non-tensor design e.g an intercept. Also provided are functions, glamlassoRR()
and glamlassoS()
, fitting special cases of GLAMs.
This package provides functions for forest objects detection, structure metrics computation, model calibration and mapping with airborne laser scanning: co-registration of field plots (Monnet and Mermin (2014) <doi:10.3390/f5092307>); tree detection (method 1 in Eysn et al. (2015) <doi:10.3390/f6051721>) and segmentation; forest parameters estimation with the area-based approach: model calibration with ground reference, and maps export (Aussenac et al. (2023) <doi:10.12688/openreseurope.15373.2>); extraction of both physical (gaps, edges, trees) and statistical features useful for e.g. habitat suitability modeling (Glad et al. (2020) <doi:10.1002/rse2.117>) and forest maturity mapping (Fuhr et al. (2022) <doi:10.1002/rse2.274>).
The method m:Explorer associates a given list of target genes (e.g. those involved in a biological process) to gene regulators such as transcription factors. Transcription factors that bind DNA near significantly many target genes or correlate with target genes in transcriptional (microarray or RNAseq data) are selected. Selection of candidate master regulators is carried out using multinomial regression models, likelihood ratio tests and multiple testing correction. Reference: m:Explorer: multinomial regression models reveal positive and negative regulators of longevity in yeast quiescence. Juri Reimand, Anu Aun, Jaak Vilo, Juan M Vaquerizas, Juhan Sedman and Nicholas M Luscombe. Genome Biology (2012) 13:R55 <doi:10.1186/gb-2012-13-6-r55>.
Grey model is commonly used in time series forecasting when statistical assumptions are violated with a limited number of data points. The minimum number of data points required to fit a grey model is four observations. This package fits Grey model of First order and One Variable, i.e., GM (1,1) for multivariate time series data and returns the parameters of the model, model evaluation criteria and h-step ahead forecast values for each of the time series variables. For method details see, Akay, D. and Atak, M. (2007) <DOI:10.1016/j.energy.2006.11.014>, Hsu, L. and Wang, C. (2007).<DOI:10.1016/j.techfore.2006.02.005>.
Despite that several tests for normality in stationary processes have been proposed in the literature, consistent implementations of these tests in programming languages are limited. Seven normality test are implemented. The asymptotic Lobato & Velasco's, asymptotic Epps, Psaradakis and Vávra, Lobato & Velasco's and Epps sieve bootstrap approximations, El bouch et al., and the random projections tests for univariate stationary process. Some other diagnostics such as, unit root test for stationarity, seasonal tests for seasonality, and arch effect test for volatility; are also performed. Additionally, the El bouch test performs normality tests for bivariate time series. The package also offers residual diagnostic for linear time series models developed in several packages.
This package provides a random forest based implementation of the method described in Chapter 7.1.2 (Regression model based anomaly detection) of Chandola et al. (2009) <doi:10.1145/1541880.1541882>. It works as follows: Each numeric variable is regressed onto all other variables by a random forest. If the scaled absolute difference between observed value and out-of-bag prediction of the corresponding random forest is suspiciously large, then a value is considered an outlier. The package offers different options to replace such outliers, e.g. by realistic values found via predictive mean matching. Once the method is trained on a reference data, it can be applied to new data.