RocksDB is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database. RocksDB is partially based on LevelDB
.
This is an open-source implementation of the Congruent Matching Profile Segments (CMPS) method (Chen et al. 2019)<doi:10.1016/j.forsciint.2019.109964>. In general, it can be used for objective comparison of striated tool marks, and in our examples, we specifically use it for bullet signatures comparisons. The CMPS score is expected to be large if two signatures are similar. So it can also be considered as a feature that measures the similarity of two bullet signatures.
This package implements the dynamic logistic state space model for binary outcome data proposed by Jiang et al. (2021) <doi:10.1111/biom.13593>. It provides a computationally efficient way to update the prediction whenever new data becomes available. It allows for both time-varying and time-invariant coefficients, and use cubic smoothing splines to model varying coefficients. The smoothing parameters are objectively chosen by maximum likelihood. The model is updated using batch data accumulated at pre-specified time intervals.
This package provides methods to apply decomposition-based relative importance analysis for R functions. This package supports the application of decomposition methods by providing lapply'- or Map'-like meta-functions that compute dominance analysis (Azen, R., & Budescu, D. V. (2003) <doi:10.1037/1082-989X.8.2.129>; Grömping, U. (2007) <doi:10.1198/000313007X188252>) an extension of Shapley value regression (Lipovetsky, S., & Conklin, M. (2001) <doi:10.1002/asmb.446>) based on the values returned from other functions.
The goal of GHCNr is to provide a fast and friendly interface with the Global Historical Climatology Network daily (GHCNd) database, which contains daily summaries of weather station data worldwide (<https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily>). GHCNd is accessed through the web API <https://www.ncei.noaa.gov/access/services/data/v1>. GHCNr main functionalities consist of downloading data from GHCNd, filter it, and to aggregate it at monthly and annual scales.
There is an increasing interest in investigating how the compositions of microbial communities are associated with human health and disease. In this package, we present a novel global testing method called aMiSPU
, that is highly adaptive and thus high powered across various scenarios, alleviating the issue with the choice of a phylogenetic distance. Our simulations and real data analysis demonstrated that aMiSPU
test was often more powerful than several competing methods while correctly controlling type I error rates.
This package provides tools for analyzing Pakistan's Population Censuses data via the PakPC2023
and PakPC2017
R packages. Designed for researchers, policymakers, and professionals, the app enables in-depth numerical and graphical analysis, including detailed cross-tabulations and insights. With diverse statistical models and visualization options, it supports informed decision-making in social and economic policy. This tool enhances users ability to explore and interpret census data, providing valuable insights for effective planning and analysis across various fields.
Fit a latent embedding multivariate regression (LEMUR) model to multi-condition single-cell data. The model provides a parametric description of single-cell data measured with treatment vs. control or more complex experimental designs. The parametric model is used to (1) align conditions, (2) predict log fold changes between conditions for all cells, and (3) identify cell neighborhoods with consistent log fold changes. For those neighborhoods, a pseudobulked differential expression test is conducted to assess which genes are significantly changed.
The MOFA2 package contains a collection of tools for training and analysing multi-omic factor analysis (MOFA). MOFA is a probabilistic factor model that aims to identify principal axes of variation from data sets that can comprise multiple omic layers and/or groups of samples. Additional time or space information on the samples can be incorporated using the MEFISTO framework, which is part of MOFA2. Downstream analysis functions to inspect molecular features underlying each factor, vizualisation, imputation etc are available.
spiky implements methods and model generation for cfMeDIP
(cell-free methylated DNA immunoprecipitation) with spike-in controls. CfMeDIP
is an enrichment protocol which avoids destructive conversion of scarce template, making it ideal as a "liquid biopsy," but creating certain challenges in comparing results across specimens, subjects, and experiments. The use of synthetic spike-in standard oligos allows diagnostics performed with cfMeDIP
to quantitatively compare samples across subjects, experiments, and time points in both relative and absolute terms.
AS (alternative splicing) is a common mechanism of post-transcriptional gene regulation in eukaryotic organisms that expands the functional and regulatory diversity of a single gene by generating multiple mRNA isoforms that encode structurally and functionally distinct proteins. ASpli is an integrative pipeline and user-friendly R package that facilitates the analysis of changes in both annotated and novel AS events. ASpli integrates several independent signals in order to deal with the complexity that might arise in splicing patterns.
UCell is a package for evaluating gene signatures in single-cell datasets. UCell signature scores, based on the Mann-Whitney U statistic, are robust to dataset size and heterogeneity, and their calculation demands less computing time and memory than other available methods, enabling the processing of large datasets in a few minutes even on machines with limited computing power. UCell can be applied to any single-cell data matrix, and includes functions to directly interact with SingleCellExperiment and Seurat objects.
Univariate and multivariate methods to analyze randomized response (RR) survey designs (e.g., Warner, S. L. (1965). Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60, 63â 69, <doi:10.2307/2283137>). Besides univariate estimates of true proportions, RR variables can be used for correlations, as dependent variable in a logistic regression (with or without random effects), or as predictors in a linear regression (Heck, D. W., & Moshagen, M. (2018). RRreg: An R package for correlation and regression analyses of randomized response data. Journal of Statistical Software, 85(2), 1â 29, <doi:10.18637/jss.v085.i02>). For simulations and the estimation of statistical power, RR data can be generated according to several models. The implemented methods also allow to test the link between continuous covariates and dishonesty in cheating paradigms such as the coin-toss or dice-roll task (Moshagen, M., & Hilbig, B. E. (2017). The statistical analysis of cheating paradigms. Behavior Research Methods, 49, 724â 732, <doi:10.3758/s13428-016-0729-x>).
Several cubic spline interpolation methods of H. Akima for irregular and regular gridded data are available through this package, both for the bivariate case (irregular data: ACM 761, regular data: ACM 760) and univariate case (ACM 433 and ACM 697). Linear interpolation of irregular gridded data is also covered by reusing D. J. Renkas triangulation code which is part of Akimas Fortran code. A bilinear interpolator for regular grids was also added for comparison with the bicubic interpolator on regular grids.
This package contains Coverage Adjusted Standardized Mutual Information ('CASMI')-based functions. CASMI is a fundamental concept of a series of methods. For more information about CASMI and CASMI'-related methods, please refer to the corresponding publications (e.g., a feature selection method, Shi, J., Zhang, J., & Ge, Y. (2019) <doi:10.3390/e21121179>, and a dataset quality measurement method, Shi, J., Zhang, J., & Ge, Y. (2019) <doi:10.1109/ICHI.2019.8904553>) or contact the package author for the latest updates.
This package implements Dirichlet multinomial modeling of relative abundance data using functionality provided by the Stan software. The purpose of this package is to provide a user friendly way to interface with Stan that is suitable for those new to modeling. For more regarding the modeling mathematics and computational techniques we use see our publication in Molecular Ecology Resources titled Dirichlet multinomial modeling outperforms alternatives for analysis of ecological count data (Harrison et al. 2020 <doi:10.1111/1755-0998.13128>).
We consider a set of sample counts obtained by sampling arbitrary fractions of a finite volume containing an homogeneously dispersed population of identical objects. This package implements a Bayesian derivation of the posterior probability distribution of the population size using a binomial likelihood and non-conjugate, discrete uniform priors under sampling with or without replacement. This can be used for a variety of statistical problems involving absolute quantification under uncertainty. See Comoglio et al. (2013) <doi:10.1371/journal.pone.0074388>.
Weighted frequency and contingency tables of categorical variables and of the comparison of the mean value of a numerical variable by the levels of a factor, and methods to produce xtable objects of the tables and to plot them. There are also functions to facilitate the character encoding conversion of objects, to quickly convert fixed width files into csv ones, and to export a data.frame to a text file with the necessary R and SPSS codes to reread the data.
Primarily devoted to implementing the Univariate Bootstrap (as well as the Traditional Bootstrap). In addition there are multiple functions for DeFries-Fulker
behavioral genetics models. The univariate bootstrapping functions, DeFries-Fulker
functions, regression and traditional bootstrapping functions form the original core. Additional features may come online later, however this software is a work in progress. For more information about univariate bootstrapping see: Lee and Rodgers (1998) and Beasley et al (2007) <doi:10.1037/1082-989X.12.4.414>.
Accurate estimates of the diets of predators are required in many areas of ecology, but for many species current methods are imprecise, limited to the last meal, and often biased. The diversity of fatty acids and their patterns in organisms, coupled with the narrow limitations on their biosynthesis, properties of digestion in monogastric animals, and the prevalence of large storage reservoirs of lipid in many predators, led to the development of quantitative fatty acid signature analysis (QFASA) to study predator diets.
We provide a comprehensive scheme that is capable of simulating Single Cell RNA Sequencing data for various parameters of Biological Coefficient of Variation, busting kinetics, differential expression (DE), cell or sample groups, cell trajectory, batch effect and other experimental designs. SCRIP proposed and compared two frameworks with Gamma-Poisson and Beta-Gamma-Poisson models for simulating Single Cell RNA Sequencing data. Other reference is available in Zappia et al. (2017) <https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1305-0>.
Identify statistically significant flow clusters using the local spatial network autocorrelation statistic G_ij* proposed by Berglund and Karlström (1999) <doi:10.1007/s101090050013>. The metric, an extended statistic of Getis/Ord G ('Getis and Ord 1992) <doi:10.1111/j.1538-4632.1992.tb00261.x>, detects a group of flows having similar traits in terms of directionality. You provide OD data and the associated polygon to get results with several parameters, some of which are defined by spdep package.
This package provides methods for sensory discrimination methods; duotrio, tetrad, triangle, 2-AFC, 3-AFC, A-not A, same-different, 2-AC and degree-of-difference. This enables the calculation of d-primes, standard errors of d-primes, sample size and power computations, and comparisons of different d-primes. Methods for profile likelihood confidence intervals and plotting are included. Most methods are described in Brockhoff, P.B. and Christensen, R.H.B. (2010) <doi:10.1016/j.foodqual.2009.04.003>.
xcore is an R package for transcription factor activity modeling based on known molecular signatures and user's gene expression data. Accompanying xcoredata package provides a collection of molecular signatures, constructed from publicly available ChiP-seq
experiments. xcore use ridge regression to model changes in expression as a linear combination of molecular signatures and find their unknown activities. Obtained, estimates can be further tested for significance to select molecular signatures with the highest predicted effect on the observed expression changes.