clustSIGNAL: clustering of Spatially Informed Gene expression with Neighbourhood Adapted Learning. A tool for adaptively smoothing and clustering gene expression data. clustSIGNAL uses entropy to measure heterogeneity of cell neighbourhoods and performs a weighted, adaptive smoothing, where homogeneous neighbourhoods are smoothed more and heterogeneous neighbourhoods are smoothed less. This not only overcomes data sparsity but also incorporates spatial context into the gene expression data. The resulting smoothed gene expression data is used for clustering and could be used for other downstream analyses.
Software which provides numerous functionalities for detecting and removing group-level effects from high-dimensional scientific data which, when combined with additional assumptions, allow for causal conclusions, as-described in our manuscripts Bridgeford et al. (2024) <doi:10.1101/2021.09.03.458920> and Bridgeford et al. (2023) <doi:10.48550/arXiv.2307.13868>. Also provides a number of useful utilities for generating simulations and balancing covariates across multiple groups/batches of data via matching and propensity trimming for more than two groups.
Diagnose, visualize, and aggregate event report level data to the event level. Users provide an event report level dataset, specify their aggregation rules, and the package produces a dataset aggregated at the event level. Also includes the Modes and Agents of Election-Related Violence in Côte d'Ivoire and Kenya (MAVERICK) dataset, an event report level dataset that records all documented instances of electoral violence from the first multiparty election to 2022 in Côte d'Ivoire (1995-2022) and Kenya (1992-2022).
This package implements methods in Mathur and VanderWeele (in preparation) to characterize global evidence strength across W correlated ordinary least squares (OLS) hypothesis tests. Specifically, uses resampling to estimate a null interval for the total number of rejections in, for example, 95% of samples generated with no associations (the global null), the excess hits (the difference between the observed number of rejections and the upper limit of the null interval), and a test of the global null based on the number of rejections.
An R wrapper for the OneMap.Sg API <https://www.onemap.gov.sg/docs/>. Functions help users query data from the API and return raw JSON data in "tidy" formats. Support is also available for users to retrieve data from multiple API calls and integrate results into single dataframes, without needing to clean and merge the data themselves. This package is best suited for users who would like to perform analyses with Singapore's spatial data without having to perform excessive data cleaning.
SKIFTI files contain brain imaging data in coordinates across Tract Based Spatial Statistics (TBSS) skeleton, which represent the brain white matter intensity values. skiftiTools provides a unified environment for reading, writing, visualizing and manipulating SKIFTI-format data. It supports the "subsetting", "concatenating", and using data as data.frame for R statistical functions. The SKIFTI data is structured for convenient access to the data and metadata, and includes support for visualizations. For more information see Merisaari et al. (2024) <doi:10.57736/87d2-0608>.
Network sparsification with a variety of novel and known network sparsification techniques. All network sparsification techniques reduce the number of edges, not the number of nodes. Network sparsification is sometimes referred to as network dimensionality reduction. This package is based on the work of Spielman, D., Srivastava, N. (2009)<arXiv:0803.0929>. Koutis I., Levin, A., Peng, R. (2013)<arXiv:1209.5821>. Toivonen, H., Mahler, S., Zhou, F. (2010)<doi:10.1007>. Foti, N., Hughes, J., Rockmore, D. (2011)<doi:10.1371>.
This package assists you in setting up and retrieving of HTTPS and SSH credentials for use with git and other services. For HTTPS remotes the package interfaces the git-credential utility which git uses to store HTTP usernames and passwords. For SSH remotes this package provides convenient functions to find or generate appropriate SSH keys. The package both helps the user to setup a local git installation, and also provides a back-end for git/ssh client libraries to authenticate with existing user credentials.
Allows access to data from the Brazilian Public Security Information System (SINESP) by state and municipality. It should be emphasized that the package only extracts the data and facilitates its manipulation in R. Therefore, its sole purpose is to support empirical research. All data credits belong to SINESP, an integrated information platform developed and maintained by the National Secretariat of Public Security (SENASP) of the Ministry of Justice and Public Security. <https://www.gov.br/mj/pt-br/assuntos/sua-seguranca/seguranca-publica/sinesp-1>.
An integrated toolset for the analysis of de novo (sporadic) genetic sequence variants. denovolyzeR implements a mutational model that estimates the probability of a de novo genetic variant arising in each human gene, from which one can infer the expected number of de novo variants in a given population size. Observed variant frequencies can then be compared against expectation in a Poisson framework. denovolyzeR provides a suite of functions to implement these analyses for the interpretation of de novo variation in human disease.
High-performance implementation of various effect plots useful for regression and probabilistic classification tasks. The package includes partial dependence plots (Friedman, 2021, <doi:10.1214/aos/1013203451>), accumulated local effect plots and M-plots (both from Apley and Zhu, 2016, <doi:10.1111/rssb.12377>), as well as plots that describe the statistical associations between model response and features. It supports visualizations with either ggplot2 or plotly', and is compatible with most models, including Tidymodels', models wrapped in DALEX explainers, or models with case weights.
Gaussian processes are flexible distributions to model functional data. Whilst theoretically appealing, they are computationally cumbersome except for small datasets. This package implements two methods for scaling Gaussian process inference in Stan'. First, a sparse approximation of the likelihood that is generally applicable and, second, an exact method for regularly spaced data modeled by stationary kernels using fast Fourier methods. Utility functions are provided to compile and fit Stan models using the cmdstanr interface. References: Hoffmann and Onnela (2025) <doi:10.18637/jss.v112.i02>.
Similar to rstantools for rstan', the instantiate package builds pre-compiled CmdStan models into CRAN-ready statistical modeling R packages. The models compile once during installation, the executables live inside the file systems of their respective packages, and users have the full power and convenience of cmdstanr without any additional compilation after package installation. This approach saves time and helps R package developers migrate from rstan to the more modern cmdstanr'. Packages rstantools', cmdstanr', stannis', and stanapi are similar Stan clients with different objectives.
The package converts R data onto input and data for LocalSolver, executes optimization and exposes optimization results as R data. LocalSolver (http://www.localsolver.com/) is an optimization engine developed by Innovation24 (http://www.innovation24.fr/). It is designed to solve large-scale mixed-variable non-convex optimization problems. The localsolver package is developed and maintained by WLOG Solutions (http://www.wlogsolutions.com/en/) in collaboration with Decision Support and Analysis Division at Warsaw School of Economics (http://www.sgh.waw.pl/en/).
Empirical statistical analysis, visualization and simulation of diffusion and contagion processes on networks. The package implements algorithms for calculating network diffusion statistics such as transmission rate, hazard rates, exposure models, network threshold levels, infectiousness (contagion), and susceptibility. The package is inspired by work published in Valente, et al., (2015) <DOI:10.1016/j.socscimed.2015.10.001>; Valente (1995) <ISBN: 9781881303213>, Myers (2000) <DOI:10.1086/303110>, Iyengar and others (2011) <DOI:10.1287/mksc.1100.0566>, Burt (1987) <DOI:10.1086/228667>; among others.
All PubChem compounds are downloaded to a local computer, but for each compound, only partial records are used. The data are organized into small files referenced by PubChem CID. This package also contains functions to parse the biologically relevant compounds from all PubChem compounds, using biological database sources, pathway presence, and taxonomic relationships. Taxonomy is used to generate a lowest common ancestor taxonomy ID (NCBI) for each biological metabolite, which then enables creation of taxonomically specific metabolome databases for any taxon.
Phylogenetic Diversity (PD, Faith 1992), Evolutionary Distinctiveness (ED, Isaac et al. 2007), Phylogenetic Endemism (PE, Rosauer et al. 2009; Laffan et al. 2016), and Weighted Endemism (WE, Laffan et al. 2016) for presence-absence raster. Faith, D. P. (1992) <doi:10.1016/0006-3207(92)91201-3> Isaac, N. J. et al. (2007) <doi:10.1371/journal.pone.0000296> Laffan, S. W. et al. (2016) <doi:10.1111/2041-210X.12513> Rosauer, D. et al. (2009) <doi:10.1111/j.1365-294X.2009.04311.x>.
The most important function of the R package is the genetic effects analysis of small RNA in hybrid plants via two methods, and at the same time, it provides various forms of graph related to data characteristics and expression analysis. In terms of two classification methods, one is the calculation of the additive (a) and dominant (d), the other is the evaluation of expression level dominance by comparing the total expression of the small RNA in progeny with the expression level in the parent species.
Calculate numerical agricultural soil management indicators from on a management timeline of an arable field. Currently, indicators for carbon (C) input into the soil system, soil tillage intensity rating (STIR), number of soil cover and living plant cover days, N fertilization and livestock intensity, and plant diversity are implemented. The functions can also be used independently of the management timeline to calculate some indicators. The package contains tables with reference information for the functions, as well as a *.xlsx template to collect the management data.
This package performs two-way tests in independent groups designs. These are two-way ANOVA, two-way ANOVA under heteroscedasticity: parametric bootstrap based generalized test and generalized pivotal quantity based generalized test, two-way ANOVA for medians, trimmed means, M-estimators. The package performs descriptive statistics and graphical approaches. Moreover, it assesses variance homogeneity and normality of data in each group via tests and plots. All twowaytests functions are designed for two-way layout (Dag et al., 2024, <doi:10.1016/j.softx.2024.101862>).
Estimates the predicted 10-year cardiovascular (CVD) risk score (in probability) for civilian women, women military service members and veterans by inputting patient profiles. The proposed women CVD risk score improves the accuracy of the existing American College of Cardiology/American Heart Association CVD risk assessment tool in predicting longâ term CVD risk for VA women, particularly in young and racial/ethnic minority women. See the reference: Jeonâ Slaughter, H., Chen, X., Tsai, S., Ramanan, B., & Ebrahimi, R. (2021) <doi:10.1161/JAHA.120.019217>.
Perform change points detection on univariate and multivariate time series according to the methods presented by Asael Fabian Martà nez and Ramsés H. Mena (2014) <doi:10.1214/14-BA878> and Corradin, Danese and Ongaro (2022) <doi:10.1016/j.ijar.2021.12.019>. It also clusters different types of time dependent data with common change points, see "Model-based clustering of time-dependent observations with common structural changes" (Corradin,Danese,KhudaBukhsh and Ongaro, 2024) <doi:10.48550/arXiv.2410.09552> for details.
This package provides a collection of widely used univariate data sets of various applied domains on applications of distribution theory. The functions allow researchers and practitioners to quickly, easily, and efficiently access and use these data sets. The data are related to different applied domains and as follows: Bio-medical, survival analysis, medicine, reliability analysis, hydrology, actuarial science, operational research, meteorology, extreme values, quality control, engineering, finance, sports and economics. The total 100 data sets are documented along with associated references for further details and uses.
Given independent and identically distributed observations X(1), ..., X(n), allows to compute the maximum likelihood estimator (MLE) of probability mass function (pmf) under the assumption that it is log-concave, see Weyermann (2007) and Balabdaoui, Jankowski, Rufibach, and Pavlides (2012). The main functions of the package are logConDiscrMLE that allows computation of the log-concave MLE, logConDiscrCI that computes pointwise confidence bands for the MLE, and kInflatedLogConDiscr that computes a mixture of a log-concave PMF and a point mass at k.