Based on STATA xtsum command, it is used to compute summary statistics for a panel data set. It generates overall, between-group, and within-group statistics for specified variables in a panel data set, as presented in S. Porter (2023) <https://stephenporter.org/files/xtsum_handout.pdf>, StataCorp
(2023) <https://www.stata.com/manuals/xtxtsum.pdf>.
lpNet
aims at infering biological networks, in particular signaling and gene networks. For that it takes perturbation data, either steady-state or time-series, as input and generates an LP model which allows the inference of signaling networks. For parameter identification either leave-one-out cross-validation or stratified n-fold cross-validation can be used.
GNU Rot[t]log is a program for managing log files. It is used to automatically rotate out log files when they have reached a given size or according to a given schedule. It can also be used to automatically compress and archive such logs. Rot[t]log will mail reports of its activity to the system administrator.
Machine Learning models are widely used and have various applications in classification or regression. Models created with boosting, bagging, stacking or similar techniques are often used due to their high performance, but such black-box models usually lack interpretability. The DALEX package contains various explainers that help to understand the link between input variables and model output.
This package provides tools for making the descriptive "Table 1" used in medical articles, a transition plot for showing changes between categories (also known as a Sankey diagram), flow charts by extending the grid package, a method for variable selection based on the SVD, Bezier lines with arrows complementing the ones in the grid package, and more.
This package provides functions to perform reproducible parallel foreach
loops, using independent random streams as generated by L'Ecuyer's combined multiple-recursive generator. It enables to easily convert standard %dopar%
loops into fully reproducible loops, independently of the number of workers, the task scheduling strategy, or the chosen parallel environment and associated foreach backend.
Enables simultaneous statistical inference for the accuracy of multiple classifiers in multiple subgroups (strata). For instance, allows to perform multiple comparisons in diagnostic accuracy studies with co-primary endpoints sensitivity and specificity (Westphal M, Zapf A. Statistical inference for diagnostic test accuracy studies with multiple comparisons. Statistical Methods in Medical Research. 2024;0(0). <doi:10.1177/09622802241236933>).
Extension of cmprsk to Stratified and Clustered data. A goodness of fit test for Fine-Gray model is also provided. Methods are detailed in the following articles: Zhou et al. (2011) <doi:10.1111/j.1541-0420.2010.01493.x>, Zhou et al. (2012) <doi:10.1093/biostatistics/kxr032>, Zhou et al. (2013) <doi: 10.1002/sim.5815>.
This package provides functions to facilitate access to the DKAN API (<https://dkan.readthedocs.io/en/latest/apis/index.html>), including the DKAN REST API (metadata), and the DKAN datastore API (data). Includes functions to list, create, retrieve, update, and delete datasets and resources nodes. It also includes functions to search and retrieve data from the DKAN datastore.
Piecewise linear segmentation of ordered data by a dynamic programming algorithm. The algorithm was developed for time series data, e.g. growth curves, and for genome-wide read-count data from next generation sequencing, but is broadly applicable. Generic implementations of dynamic programming routines allow to scan for optimal segmentation parameters and test custom segmentation criteria ("scoring functions").
Create list comprehensions (and other types of comprehension) similar to those in python', haskell', and other languages. List comprehension in R converts a regular for()
loop into a vectorized lapply()
function. Support for looping with multiple variables, parallelization, and across non-standard objects included. Package also contains a variety of functions to help with list comprehension.
This package provides functions to implement the Flexible cFDR
(Hutchinson et al. (2021) <doi:10.1371/journal.pgen.1009853>) and Binary cFDR
(Hutchinson et al. (2021) <doi:10.1101/2021.10.21.465274>) methodologies to leverage auxiliary data from arbitrary distributions, for example functional genomic data, with GWAS p-values to generate re-weighted p-values.
Likelihood-free inference method for stochastic models. Uses a deterministic optimizer on simple simulations of the model that are performed with a prior drawn randomness by applying the inverse transform method. Is designed to work on its own and also by using the Julia package Jflimo available on the git page of the project: <https://metabarcoding.org/flimo>.
This package provides a function to retrieve the system timezone on Unix systems which has been found to find an answer when Sys.timezone()
has failed. It is based on an answer by Duane McCully
posted on StackOverflow
', and adapted to be callable from R. The package also builds on Windows, but just returns NULL.
Solves goal programming problems of the weighted and lexicographic type, as well as combinations of the two, as described by Ignizio (1983) <doi:10.1016/0305-0548(83)90003-5>. Allows for a simple human-readable input describing the problem as a series of equations. Relies on the lpSolve
package to solve the underlying linear optimisation problem.
Estimates networks of conditional dependencies (Gaussian graphical models) from multiple classes of data (similar but not exactly, i.e. measurements on different equipment, in different locations or for various sub-types). Package also allows to generate simulation data and evaluate the performance. Implementation of the method described in Angelini, De Canditiis and Plaksienko (2022) <doi:10.3390/math10213983>.
Random Forest Spatial Interpolation (RFSI, SekuliÄ et al. (2020) <doi:10.3390/rs12101687>) and spatio-temporal geostatistical (spatio-temporal regression Kriging (STRK)) interpolation for meteorological (Kilibarda et al. (2014) <doi:10.1002/2013JD020803>, SekuliÄ et al. (2020) <doi:10.1007/s00704-019-03077-3>) and other environmental variables. Contains global spatio-temporal models calculated using publicly available data.
Calculation of molecular number and brightness from fluorescence microscopy image series. The software was published in a 2016 paper <doi:10.1093/bioinformatics/btx434>. The seminal paper for the technique is Digman et al. 2008 <doi:10.1529/biophysj.107.114645>. A review of the technique was published in 2017 <doi:10.1016/j.ymeth.2017.12.001>.
It provides ensemble capabilities to supervised and unsupervised learning models predictions without using training labels. It decides the relative weights of the different models predictions by using best models predictions as response variable and rest of the mo. User can decide the best model, therefore, It provides freedom to user to ensemble models based on their design solutions.
Statistical methods for estimating preferential attachment and node fitness generative mechanisms in temporal complex networks are provided. Thong Pham et al. (2015) <doi:10.1371/journal.pone.0137796>. Thong Pham et al. (2016) <doi:10.1038/srep32558>. Thong Pham et al. (2020) <doi:10.18637/jss.v092.i03>. Thong Pham et al. (2021) <doi:10.1093/comnet/cnab024>.
Allows you to make clean, good-looking scatter plots with the option to easily add marginal density or box plots on the axes. It is also available as a module for jamovi (see <https://www.jamovi.org> for more information). Scatr is based on the cowplot package by Claus O. Wilke and the ggplot2 package by Hadley Wickham.
Conducts linear regression using variational Bayesian inference, particularly optimized for genome-wide association mapping and whole-genome prediction which use a number of DNA markers as the explanatory variables. Provides seven regression models which select the important variables (i.e., the variables related to response variables) among the given explanatory variables in different ways (i.e., model structures).
Computes inequality measures of a given variable taking into account weights. Suitable for ratio, interval and ordered scale. Includes Gini, Theil, Leti index, Palma ratio, 20:20 ratio, Allison and Foster index, Jenkins index, Cowell and Flechaire index, Abul Naga and Yalcin index, Apouey index, Blair and Lacy index. Bootstrap provides distribution of inequality measures enabling significance tests.
The package is focused on the detection of correlation between expressed genes and selected epigenomic signals (i.e. enhancers obtained from ChIP-seq
data) either within topologically associated domains (TADs) or between chromatin contact loop anchors. Various parameters can be controlled to investigate the influence of external factors and visualization plots are available for each analysis step.