Carries out model-based clustering, classification and discriminant analysis using five different models. The models are all based on the generalized hyperbolic distribution. The first model MGHD (Browne and McNicholas
(2015) <doi:10.1002/cjs.11246>) is the classical mixture of generalized hyperbolic distributions. The MGHFA (Tortora et al. (2016) <doi:10.1007/s11634-015-0204-z>) is the mixture of generalized hyperbolic factor analyzers for high dimensional data sets. The MSGHD is the mixture of multiple scaled generalized hyperbolic distributions, the cMSGHD
is a MSGHD with convex contour plots and the MCGHD', mixture of coalesced generalized hyperbolic distributions is a new more flexible model (Tortora et al. (2019)<doi:10.1007/s00357-019-09319-3>. The paper related to the software can be found at <doi:10.18637/jss.v098.i03>.
Estimate the correlation between two irregular time series that are not necessarily sampled on identical time points. This program is also applicable to the situation of two evenly spaced time series that are not on the same time grid. BINCOR is based on a novel estimation approach proposed by Mudelsee (2010, 2014) to estimate the correlation between two climate time series with different timescales. The idea is that autocorrelation (AR1 process) allows to correlate values obtained on different time points. BINCOR contains four functions: bin_cor()
(the main function to build the binned time series), plot_ts()
(to plot and compare the irregular and binned time series, cor_ts()
(to estimate the correlation between the binned time series) and ccf_ts()
(to estimate the cross-correlation between the binned time series).
An implementation of common statistical analysis and models with differential privacy (Dwork et al., 2006a) <doi:10.1007/11681878_14> guarantees. The package contains, for example, functions providing differentially private computations of mean, variance, median, histograms, and contingency tables. It also implements some statistical models and machine learning algorithms such as linear regression (Kifer et al., 2012) <https://proceedings.mlr.press/v23/kifer12.html> and SVM (Chaudhuri et al., 2011) <https://jmlr.org/papers/v12/chaudhuri11a.html>. In addition, it implements some popular randomization mechanisms, including the Laplace mechanism (Dwork et al., 2006a) <doi:10.1007/11681878_14>, Gaussian mechanism (Dwork et al., 2006b) <doi:10.1007/11761679_29>, analytic Gaussian mechanism (Balle & Wang, 2018) <https://proceedings.mlr.press/v80/balle18a.html>, and exponential mechanism (McSherry
& Talwar, 2007) <doi:10.1109/FOCS.2007.66>.
Post-construction fatality monitoring studies at wind facilities are based on data from searches for bird and bat carcasses in plots beneath turbines. Bird and bat carcasses can fall outside of the search plot. Bird and bat carcasses from wind turbines often fall outside of the searched area. To compensate, area correction (AC) estimations are calculated to estimate the percentage of fatalities that fall within the searched area versus those that fall outside of it. This package provides two likelihood based methods and one physics based method (Hull and Muir (2010) <doi:10.1080/14486563.2010.9725253>, Huso and Dalthorp (2014) <doi:10.1002/jwmg.663>) to estimate the carcass fall distribution. There are also functions for calculating the proportion of area searched within one unit annuli, log logistic distribution functions, and truncated distribution functions.
BUSseq R package fits an interpretable Bayesian hierarchical model---the Batch Effects Correction with Unknown Subtypes for scRNA
seq Data (BUSseq)---to correct batch effects in the presence of unknown cell types. BUSseq is able to simultaneously correct batch effects, clusters cell types, and takes care of the count data nature, the overdispersion, the dropout events, and the cell-specific sequencing depth of scRNA-seq
data. After correcting the batch effects with BUSseq, the corrected value can be used for downstream analysis as if all cells were sequenced in a single batch. BUSseq can integrate read count matrices obtained from different scRNA-seq
platforms and allow cell types to be measured in some but not all of the batches as long as the experimental design fulfills the conditions listed in our manuscript.
This package contains a function that imports data from a CSV file, or uses manually entered data from the format (x, y, weight) and plots the appropriate ACC vs LOI graph and LMA graph. The main function is plotLMA
(source file, header) that takes a data set and plots the appropriate LMA and ACC graphs. If no source file (a string) was passed, a manual data entry window is opened. The header parameter indicates by TRUE/FALSE (false by default) if the source CSV file has a header row or not. The dataset should contain only one independent variable (x) and one dependent variable (y) and can contain a weight for each observation.
This package covers many important models used in marketing and micro-econometrics applications, including Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choice-based conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity, and Bayesian Analysis of Aggregate Random Coefficient Logit Models.
This package provides a set of functions to access the ARDECO (Annual Regional Database of the European Commission) data directly from the official ARDECO public repository through the exploitation of the ARDECO APIs. The APIs are completely transparent to the user and the provided functions provide a direct access to the ARDECO data. The ARDECO database is a collection of variables related to demography, employment, labour market, domestic product, capital formation. Each variable can be exposed in one or more units of measure as well as refers to total values plus additional dimensions like economic sectors, gender, age classes. Data can be also aggregated at country level according to the tercet classes as defined by EUROSTAT. The description of the ARDECO database can be found at the following URL <https://urban.jrc.ec.europa.eu/ardeco>.
MDS is a statistic tool for reduction of dimensionality, using as input a distance matrix of dimensions n à n. When n is large, classical algorithms suffer from computational problems and MDS configuration can not be obtained. With this package, we address these problems by means of six algorithms, being two of them original proposals: - Landmark MDS proposed by De Silva V. and JB. Tenenbaum (2004). - Interpolation MDS proposed by Delicado P. and C. Pachón-Garcà a (2021) <arXiv:2007.11919>
(original proposal). - Reduced MDS proposed by Paradis E (2018). - Pivot MDS proposed by Brandes U. and C. Pich (2007) - Divide-and-conquer MDS proposed by Delicado P. and C. Pachón-Garcà a (2021) <arXiv:2007.11919>
(original proposal). - Fast MDS, proposed by Yang, T., J. Liu, L. McMillan
and W. Wang (2006).
Genome-wide association studies (GWAS) have been widely used for identifying common variants associated with complex diseases. Due to the small effect sizes of common variants, the power to detect individual risk variants is generally low. Complementary to SNP-level analysis, a variety of gene-based association tests have been proposed. However, the power of existing gene-based tests is often dependent on the underlying genetic models, and it is not known a priori which test is optimal. Here we proposed COMBined Association Test (COMBAT) to incorporate strengths from multiple existing gene-based tests, including VEGAS, GATES and simpleM
. Compared to individual tests, COMBAT shows higher overall performance and robustness across a wide range of genetic models. The algorithm behind this method is described in Wang et al (2017) <doi:10.1534/genetics.117.300257>.
Chaos theory has been hailed as a revolution of thoughts and attracting ever increasing attention of many scientists from diverse disciplines. Chaotic systems are nonlinear deterministic dynamic systems which can behave like an erratic and apparently random motion. A relevant field inside chaos theory and nonlinear time series analysis is the detection of a chaotic behaviour from empirical time series data. One of the main features of chaos is the well known initial value sensitivity property. Methods and techniques related to test the hypothesis of chaos try to quantify the initial value sensitive property estimating the Lyapunov exponents. The DChaos package provides different useful tools and efficient algorithms which test robustly the hypothesis of chaos based on the Lyapunov exponent in order to know if the data generating process behind time series behave chaotically or not.
Reliability Analysis and Maintenance Optimization using Hidden Markov Models (HMM). The use of HMMs to model the state of a system which is not directly observable and instead certain indicators (signals) of the true situation are provided via a control system. A hidden model can provide key information about the system dependability, such as the reliability of the system and related measures. An estimation procedure is implemented based on the Baum-Welch algorithm. Classical structures such as K-out-of-N systems and Shock models are illustrated. Finally, the maintenance of the system is considered in the HMM context and two functions for new preventive maintenance strategies are considered. Maintenance efficiency is measured in terms of expected cost. Methods are described in Gamiz, Limnios, and Segovia-Garcia (2023) <doi:10.1016/j.ejor.2022.05.006>.
This package provides a computational method that infers copy number variations (CNV) in cancer scRNA-seq data and reconstructs the tumor phylogeny. It integrates signals from gene expression, allelic ratio, and population haplotype structures to accurately infer allele-specific CNVs in single cells and reconstruct their lineage relationship. It does not require tumor/normal-paired DNA or genotype data, but operates solely on the donor scRNA-data data (for example, 10x Cell Ranger output). It can be used to:
detect allele-specific copy number variations from single-cells
differentiate tumor versus normal cells in the tumor microenvironment
infer the clonal architecture and evolutionary history of profiled tumors
For details on the method see Gao et al in Nature Biotechnology 2022.
The method of synthetic controls is a widely-adopted tool for evaluating causal effects of policy changes in settings with observational data. In many settings where it is applicable, researchers want to identify causal effects of policy changes on a treated unit at an aggregate level while having access to data at a finer granularity. This package implements a simple extension of the synthetic controls estimator, developed in Gunsilius (2023) <doi:10.3982/ECTA18260>, that takes advantage of this additional structure and provides nonparametric estimates of the heterogeneity within the aggregate unit. The idea is to replicate the quantile function associated with the treated unit by a weighted average of quantile functions of the control units. The package contains tools for aggregating and plotting the resulting distributional estimates, as well as for carrying out inference on them.
HIGHT(HIGh security and light weigHT
) algorithm is a block cipher encryption algorithm developed to provide confidentiality in computing environments that demand low power consumption and lightweight, such as RFID(Radio-Frequency Identification) and USN(Ubiquitous Sensor Network), or in mobile environments that require low power consumption and lightweight, such as smartphones and smart cards. Additionally, it is designed with a simple structure that enables it to be used with basic arithmetic operations, XOR, and circular shifts in 8-bit units. This algorithm was designed to consider both safety and efficiency in a very simple structure suitable for limited environments, compared to the former 128-bit encryption algorithm SEED. In December 2010, it became an ISO(International Organization for Standardization) standard. The detailed procedure is described in Hong et al. (2006) <doi:10.1007/11894063_4>.
Create and use data frame labels for data frame objects (frame labels), their columns (name labels), and individual values of a column (value labels). Value labels include one-to-one and many-to-one labels for nominal and ordinal variables, as well as numerical range-based value labels for continuous variables. Convert value-labeled variables so each value is replaced by its corresponding value label. Add values-converted-to-labels columns to a value-labeled data frame while preserving parent columns. Filter and subset a value-labeled data frame using labels, while returning results in terms of values. Overlay labels in place of values in common R commands to increase interpretability. Generate tables of value frequencies, with categories expressed as raw values or as labels. Access data frames that show value-to-label mappings for easy reference.
Allows users to calculate pairwise Nei's Genetic Distances (Nei 1972), pairwise Fixation Indexes (Fst) (Weir & Cockerham 1984) and also Genomic Relationship matrixes following Yang et al. (2010) in mixed and single ploidy populations. Bootstrapping across loci is implemented during Fst calculation to generate confidence intervals and p-values around pairwise Fst values. StAMPP
utilises SNP genotype data of any ploidy level (with the ability to handle missing data) and is coded to utilise multithreading where available to allow efficient analysis of large datasets. StAMPP
is able to handle genotype data from genlight objects allowing integration with other packages such adegenet. Please refer to LW Pembleton, NOI Cogan & JW Forster, 2013, Molecular Ecology Resources, 13(5), 946-952. <doi:10.1111/1755-0998.12129> for the appropriate citation and user manual. Thank you in advance.
Analyzing Inductively Coupled Plasma - Mass Spectrometry (ICP-MS) measurement data to evaluate isotope ratios (IRs) is a complex process. The IsoCor
package facilitates this process and renders it reproducible by providing a function to run a Shiny'-App locally in any web browser. In this App the user can upload data files of various formats, select ion traces, apply peak detection and perform calculation of IRs and delta values. Results are provided as figures and tables and can be exported. The App, therefore, facilitates data processing of ICP-MS experiments to quickly obtain optimal processing parameters compared to traditional Excel worksheet based approaches. A more detailed description can be found in the corresponding article <doi:10.1039/D2JA00208F>. The most recent version of IsoCor
can be tested online at <https://apps.bam.de/shn00/IsoCor/>
.
The rapid screening of effective and optimal therapies from large numbers of candidate combinations, as well as exploring subgroup efficacy, remains challenging, which necessitates innovative, integrated, and efficient trial designs(Yuan, Y., et al. (2016) <doi:10.1002/sim.6971>). MIDAS-2 package enables quick and continuous screening of promising combination strategies and exploration of their subgroup effects within a unified platform design framework. We used a regression model to characterize the efficacy pattern in subgroups. Information borrowing was applied through Bayesian hierarchical model to improve trial efficiency considering the limited sample size in subgroups(Cunanan, K. M., et al. (2019) <doi:10.1177/1740774518812779>). MIDAS-2 provides an adaptive drug screening and subgroup exploring framework to accelerate immunotherapy development in an efficient, accurate, and integrated fashion(Wathen, J. K., & Thall, P. F. (2017) <doi: 10.1177/1740774517692302>).
This package provides a suite of non-parametric, visual tools for assessing differences in data structures for two datasets that contain different observations of the same variables. These tools are all based on Principal Component Analysis (PCA) and thus effectively address differences in the structures of the covariance matrices of the two datasets. The PCASDC tools consist of easy-to-use, intuitive plots that each focus on different aspects of the PCA decompositions. The cumulative eigenvalue (CE) plot describes differences in the variance components (eigenvalues) of the deconstructed covariance matrices. The angle plot presents the information loss when moving from the PCA decomposition of one dataset to the PCA decomposition of the other. The chroma plot describes the loading patterns of the two datasets, thereby presenting the relative weighting and importance of the variables from the original dataset.
Tests the homogeneity of intraclass kappa statistics obtained from independent studies or a stratified study with binary results. It is desired to compare the kappa statistics obtained in multi-center studies or in a single stratified study to give a common or summary kappa using all available information. If the homogeneity test of these kappa statistics is not rejected, then it is possible to make inferences over a single kappa statistic that summarizes all the studies. Muammer Albayrak, Kemal Turhan, Yasemin Yavuz, Zeliha Aydin Kasap (2019) <doi:10.1080/03610918.2018.1538457> Jun-mo Nam (2003) <doi:10.1111/j.0006-341X.2003.00118.x> Jun-mo Nam (2005) <doi:10.1002/sim.2321>Mousumi Banerjee, Michelle Capozzoli, Laura McSweeney,Debajyoti
Sinha (1999) <doi:10.2307/3315487> Allan Donner, Michael Eliasziw, Neil Klar (1996) <doi:10.2307/2533154>.
This package provides an object type and associated tools for storing and wrangling panel data. Implements several methods for creating regression models that take advantage of the unique aspects of panel data. Among other capabilities, automates the "within-between" (also known as "between-within" and "hybrid") panel regression specification that combines the desirable aspects of both fixed effects and random effects econometric models and fits them as multilevel models (Allison, 2009 <doi:10.4135/9781412993869.d33>; Bell & Jones, 2015 <doi:10.1017/psrm.2014.7>). These models can also be estimated via generalized estimating equations (GEE; McNeish
, 2019 <doi:10.1080/00273171.2019.1602504>) and Bayesian estimation is (optionally) supported via Stan'. Supports estimation of asymmetric effects models via first differences (Allison, 2019 <doi:10.1177/2378023119826441>) as well as a generalized linear model extension thereof using GEE.
An implementation of Quantitative Fatty Acid Signature Analysis (QFASA) in R. QFASA is a method of estimating the diet composition of predators. The fundamental unit of information in QFASA is a fatty acid signature (signature), which is a vector of proportions describing the composition of fatty acids within lipids. Signature data from at least one predator and from samples of all potential prey types are required. Calibration coefficients, which adjust for the differential metabolism of individual fatty acids by predators, are also required. Given those data inputs, a predator signature is modeled as a mixture of prey signatures and its diet estimate is obtained as the mixture that minimizes a measure of distance between the observed and modeled signatures. A variety of estimation options and simulation capabilities are implemented. Please refer to the vignette for additional details and references.
This package performs receptor abundance estimation for single cell RNA-sequencing data using a supervised feature selection mechanism and a thresholded gene set scoring procedure. Seurat's normalization method is described in: Hao et al., (2021) <doi:10.1016/j.cell.2021.04.048>, Stuart et al., (2019) <doi:10.1016/j.cell.2019.05.031>, Butler et al., (2018) <doi:10.1038/nbt.4096> and Satija et al., (2015) <doi:10.1038/nbt.3192>. Method for reduced rank reconstruction and rank-k selection is detailed in: Javaid et al., (2022) <doi:10.1101/2022.10.08.511197>. Gene set scoring procedure is described in: Frost et al., (2020) <doi:10.1093/nar/gkaa582>. Clustering method is outlined in: Song et al., (2020) <doi:10.1093/bioinformatics/btaa613> and Wang et al., (2011) <doi:10.32614/RJ-2011-015>.