This package implements fast change point detection algorithm based on the paper "Sequential Gradient Descent and Quasi-Newton's Method for Change-Point Analysis" by Xianyang Zhang, Trisha Dawn <https://proceedings.mlr.press/v206/zhang23b.html>. The algorithm is based on dynamic programming with pruning and sequential gradient descent. It is able to detect change points a magnitude faster than the vanilla Pruned Exact Linear Time(PELT). The package includes examples of linear regression, logistic regression, Poisson regression, penalized linear regression data, and whole lot more examples with custom cost function in case the user wants to use their own cost function.
This package provides methods for recursive partitioning based on the Graded Response Model ('GRM'), extending the MOB algorithm from the partykit package. The package allows for fitting GRM trees that partition the population into homogeneous subgroups based on item response patterns and covariates. Includes specialized plotting functions for visualizing GRM trees with different terminal node displays (threshold regions, parameter profiles, and factor score distributions). For more details on the methods, see Samejima (1969) <doi:10.1002/J.2333-8504.1968.TB00153.X>, Komboz et al. (2018) <doi:10.1177/0013164416664394> and Arimoro et al. (2025) <doi:10.1007/s11136-025-04018-6>.
Based on the work of Curi, Converse, Hajewski, and Oliveira (2019) <doi:10.1109/IJCNN.2019.8852333>. This package provides easy-to-use functions which create a variational autoencoder (VAE) to be used for parameter estimation in Item Response Theory (IRT) - namely the Multidimensional Logistic 2-Parameter (ML2P) model. To use a neural network as such, nontrivial modifications to the architecture must be made, such as restricting the nonzero weights in the decoder according to some binary matrix Q. The functions in this package allow for straight-forward construction, training, and evaluation so that minimal knowledge of tensorflow or keras is required.
When using pooled p-values to adjust for multiple testing, there is an inherent balance that must be struck between rejection based on weak evidence spread among many tests and strong evidence in a few, explored in Salahub and Olford (2023) <arXiv:2310.16600>. This package provides functionality to compute marginal and central rejection levels and the centrality quotient for p-value pooling functions and provides implementations of the chi-squared quantile pooled p-value (described in Salahub and Oldford (2023)) and a proposal from Heard and Rubin-Delanchy (2018) <doi:10.1093/biomet/asx076> to control the quotient's value.
This package provides R bindings to OpenSSL libssl and libcrypto, plus custom SSH pubkey parsers. It supports RSA, DSA and NIST curves P-256, P-384 and P-521. Cryptographic signatures can either be created and verified manually or via x509 certificates. AES block cipher is used in CBC mode for symmetric encryption; RSA for asymmetric (public key) encryption. High-level envelope functions combine RSA and AES for encrypting arbitrary sized data. Other utilities include key generators, hash functions (md5, sha1, sha256, etc), base64 encoder, a secure random number generator, and bignum math methods for manually performing crypto calculations on large multibyte integers.
MBttest method was developed from beta t-test method of Baggerly et al(2003). Compared to baySeq (Hard castle and Kelly 2010), DESeq (Anders and Huber 2010) and exact test (Robinson and Smyth 2007, 2008) and the GLM of McCarthy et al(2012), MBttest is of high work efficiency,that is, it has high power, high conservativeness of FDR estimation and high stability. MBttest is suit- able to transcriptomic data, tag data, SAGE data (count data) from small samples or a few replicate libraries. It can be used to identify genes, mRNA isoforms or tags differentially expressed between two conditions.
In genomics, differential analysis enables the discovery of groups of genes implicating important biological processes such as cell differentiation and aging. Non-parametric tests of differential gene expression usually detect shifts in centrality (such as mean or median), and therefore suffer from diminished power against alternative hypotheses characterized by shifts in spread (such as variance). This package provides a flexible family of non-parametric two-sample tests and K-sample tests, which is based on theoretical work around non-parametric tests, spacing statistics and local asymptotic normality (Erdmann-Pham et al., 2022+ [arXiv:2008.06664v2]; Erdmann-Pham, 2023+ [arXiv:2209.14235v2]).
This package provides actuarial modeling tools for Monte Carlo loss simulations, loss reserving, and reinsurance layer loss calculations. It enables users to generate stochastic loss datasets with customisable frequency and severity distributions, fit development patterns to claim triangles, and calculate reinsurance losses for occurrence and aggregate layers with user-defined retentions, limits, and reinstatements. For development pattern selection, the package includes a machine learning approach that evaluates multiple reserving models using holdout validation to identify the best-fitting pattern based on predictive accuracy, this is based on the algorithm described in Richman, R and Balona, C (2020)<https://www.ssrn.com/abstract=3697256>.
Some functions for drawing some special plots: The function bagplot plots a bagplot, faces plots chernoff faces, iconplot plots a representation of a frequency table or a data matrix, plothulls plots hulls of a bivariate data set, plotsummary plots a graphical summary of a data set, puticon adds icons to a plot, skyline.hist combines several histograms of a one dimensional data set in one plot, slider functions supports some interactive graphics, spin3R helps an inspection of a 3-dim point cloud, stem.leaf plots a stem and leaf plot, stem.leaf.backback plots back-to-back versions of stem and leaf plot.
Simulation of segments shared identical-by-descent (IBD) by pedigree members. Using sex specific recombination rates along the human genome (Halldorsson et al. (2019) <doi:10.1126/science.aau1043>), phased chromosomes are simulated for all pedigree members. Applications include calculation of realised relatedness coefficients and IBD segment distributions. ibdsim2 is part of the pedsuite collection of packages for pedigree analysis. A detailed presentation of the pedsuite', including a separate chapter on ibdsim2', is available in the book Pedigree analysis in R (Vigeland, 2021, ISBN:9780128244302). A Shiny app for visualising and comparing IBD distributions is available at <https://magnusdv.shinyapps.io/ibdsim2-shiny/>.
Social Relation Model (SRM) analyses for single or multiple round-robin groups are performed. These analyses are either based on one manifest variable, one latent construct measured by two manifest variables, two manifest variables and their bivariate relations, or two latent constructs each measured by two manifest variables. Within-group t-tests for variance components and covariances are provided for single groups. For multiple groups two types of significance tests are provided: between-groups t-tests (as in SOREMO) and enhanced standard errors based on Lashley and Bond (1997) <DOI:10.1037/1082-989X.2.3.278>. Handling for missing values is provided.
Model data with a suspected clustering structure (either in co-variate space, regression space or both) using a Bayesian product model with a logistic regression likelihood. Observations are represented graphically and clusters are formed through various edge removals or additions. Cluster quality is assessed through the log Bayesian evidence of the overall model, which is estimated using either a Sequential Monte Carlo sampler or a suitable transformation of the Bayesian Information Criterion as a fast approximation of the former. The internal Iterated Batch Importance Sampling scheme (Chopin (2002 <doi:10.1093/biomet/89.3.539>)) is made available as a free standing function.
Facilitates use and analysis of data about the armed conflict in Colombia resulting from the joint project between La Jurisdicción Especial para la Paz (JEP), La Comisión para el Esclarecimiento de la Verdad, la Convivencia y la No repetición (CEV), and the Human Rights Data Analysis Group (HRDAG). The data are 100 replicates from a multiple imputation through chained equations as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. With the replicates the user can examine four human rights violations that occurred in the Colombian conflict accounting for the impact of missing fields and fully missing observations.
Fetches zonal statistics from weather indicators that were calculated for each municipality in Brazil using data from the BR-DWGD and TerraClimate projects. Zonal statistics such as mean, maximum, minimum, standard deviation, and sum were computed by taking into account the data cells that intersect the boundaries of each municipality and stored in Parquet files. This procedure was carried out for all Brazilian municipalities, and for all available dates, for every indicator available in the weather products (BR-DWGD and TerraClimate projects). This package queries on-line the already calculated statistics on the Parquet files and returns easy-to-use data.frames.
This package provides a user friendly function crrcbcv to compute bias-corrected variances for competing risks regression models using proportional subdistribution hazards with small-sample clustered data. Four types of bias correction are included: the MD-type bias correction by Mancl and DeRouen (2001) <doi:10.1111/j.0006-341X.2001.00126.x>, the KC-type bias correction by Kauermann and Carroll (2001) <doi:10.1198/016214501753382309>, the FG-type bias correction by Fay and Graubard (2001) <doi:10.1111/j.0006-341X.2001.01198.x>, and the MBN-type bias correction by Morel, Bokossa, and Neerchal (2003) <doi:10.1002/bimj.200390021>.
Limpa e simplifica nomes de pessoas para auxiliar no pareamento de banco de dados na ausência de chaves únicas não ambà guas. Detecta e corrige erros tipográficos mais comuns, simplifica opcionalmente termos sujeitos eventualmente a omissão em cadastros, e simplifica foneticamente suas palavras, aplicando variação própria do algoritmo metaphoneBR. (Cleans and simplifies person names to assist in database matching when unambiguous unique keys are unavailable. Detects and corrects common typos, optionally simplifies terms prone to omission in records, and applies phonetic simplification using a custom variation of the metaphoneBR algorithm.) Mation (2025) <doi:10.6082/uchicago.15104>.
Forms queries to submit to the Cleveland Federal Reserve Bank web site's financial stress index data site. Provides query functions for both the composite stress index and the components data. By default the download includes daily time series data starting September 25, 1991. The functions return a class of either type easing or cfsi which contain a list of items related to the query and its graphical presentation. The list includes the time series data as an xts object. The package provides four lattice time series plots to render the time series data in a manner similar to the bank's own presentation.
Partially penalized versions of specific transformation models implemented in package mlt'. Available models include a fully parametric version of the Cox model, other parametric survival models (Weibull, etc.), models for binary and ordered categorical variables, normal and transformed-normal (Box-Cox type) linear models, and continuous outcome logistic regression. Hyperparameter tuning is facilitated through model-based optimization functionalities from package mlrMBO'. The accompanying vignette describes the methodology used in tramnet in detail. Transformation models and model-based optimization are described in Hothorn et al. (2019) <doi:10.1111/sjos.12291> and Bischl et al. (2016) <doi:10.48550/arXiv.1703.03373>, respectively.
Estimation of time-dependent ROC curve and area under time dependent ROC curve (AUC) in the presence of censored data, with or without competing risks. Confidence intervals of AUCs and tests for comparing AUCs of two rival markers measured on the same subjects can be computed, using the iid-representation of the AUC estimator. Plot functions for time-dependent ROC curves and AUC curves are provided. Time-dependent Positive Predictive Values (PPV) and Negative Predictive Values (NPV) can also be computed. See Blanche et al. (2013) <doi:10.1002/sim.5958> and references therein for the details of the methods implemented in the package.
The main janitor functions can: perfectly format data.frame column names; provide quick counts of variable combinations (i.e., frequency tables and crosstabs); and isolate duplicate records. Other janitor functions nicely format the tabulation results. These tabulate-and-report functions approximate popular features of SPSS and Excel. This package follows the principles of the "tidyverse" and works well with the pipe function %>%. janitor was built with beginning-to-intermediate R users in mind and is optimized for user-friendliness. Advanced R users can already do everything covered here, but with janitor they can do it faster and save their thinking for the fun stuff.
Data type and tools for working with matrices having precision weights and missing data. This package provides a common representation and tools that can be used with many types of high-throughput data. The meaning of the weights is compatible with usage in the base R function "lm" and the package "limma". Calibrate weights to account for known predictors of precision. Find rows with excess variability. Perform differential testing and find rows with the largest confident differences. Find PCA-like components of variation even with many missing values, rotated so that individual components may be meaningfully interpreted. DelayedArray matrices and BiocParallel are supported.
The dependencies of CRAN packages can be analysed in a network fashion. For each package we can obtain the packages that it depends, imports, suggests, etc. By iterating this procedure over a number of packages, we can build, visualise, and analyse the dependency network, enabling us to have a bird's-eye view of the CRAN ecosystem. One aspect of interest is the number of reverse dependencies of the packages, or equivalently the in-degree distribution of the dependency network. This can be fitted by the power law and/or an extreme value mixture distribution <doi:10.1111/stan.12355>, of which functions are provided.
This package implements Cramer-von Mises Statistics for testing fit to (1) fully specified discrete distributions as described in Choulakian, Lockhart and Stephens (1994) <doi:10.2307/3315828> (2) discrete distributions with unknown parameters that must be estimated from the sample data, see Spinelli & Stephens (1997) <doi:10.2307/3315735> and Lockhart, Spinelli and Stephens (2007) <doi:10.1002/cjs.5550350111> (3) grouped continuous distributions with Unknown Parameters, see Spinelli (2001) <doi:10.2307/3316040>. Maximum likelihood estimation (MLE) is used to estimate the parameters. The package computes the Cramer-von Mises Statistics, Anderson-Darling Statistics and the Watson-Stephens Statistics and their p-values.
Access and analyze multi-band greenspace seasonality data cubes (available for 1,028 major global cities), global Normalized Difference Vegetation Index / land cover data from the European Space Agency WorldCover 10m Dataset, and Sentinel-2-l2a images. Users can download data using bounding boxes, city names, and filter by year or seasonal time window. The package also supports calculating human exposure to greenspace using a population-weighted greenspace exposure model introduced by Chen et al. (2022) <doi:10.1038/s41467-022-32258-4> based on Global Human Settlement Layer population data, and calculating a set of greenspace morphology metrics at patch and landscape levels.