This package provides actuarial modeling tools for Monte Carlo loss simulations, loss reserving, and reinsurance layer loss calculations. It enables users to generate stochastic loss datasets with customisable frequency and severity distributions, fit development patterns to claim triangles, and calculate reinsurance losses for occurrence and aggregate layers with user-defined retentions, limits, and reinstatements. For development pattern selection, the package includes a machine learning approach that evaluates multiple reserving models using holdout validation to identify the best-fitting pattern based on predictive accuracy, this is based on the algorithm described in Richman, R and Balona, C (2020)<https://www.ssrn.com/abstract=3697256>.
Some functions for drawing some special plots: The function bagplot plots a bagplot, faces plots chernoff faces, iconplot plots a representation of a frequency table or a data matrix, plothulls plots hulls of a bivariate data set, plotsummary plots a graphical summary of a data set, puticon adds icons to a plot, skyline.hist combines several histograms of a one dimensional data set in one plot, slider functions supports some interactive graphics, spin3R helps an inspection of a 3-dim point cloud, stem.leaf plots a stem and leaf plot, stem.leaf.backback plots back-to-back versions of stem and leaf plot.
Simulation of segments shared identical-by-descent (IBD) by pedigree members. Using sex specific recombination rates along the human genome (Halldorsson et al. (2019) <doi:10.1126/science.aau1043>), phased chromosomes are simulated for all pedigree members. Applications include calculation of realised relatedness coefficients and IBD segment distributions. ibdsim2 is part of the pedsuite collection of packages for pedigree analysis. A detailed presentation of the pedsuite', including a separate chapter on ibdsim2', is available in the book Pedigree analysis in R (Vigeland, 2021, ISBN:9780128244302). A Shiny app for visualising and comparing IBD distributions is available at <https://magnusdv.shinyapps.io/ibdsim2-shiny/>.
Social Relation Model (SRM) analyses for single or multiple round-robin groups are performed. These analyses are either based on one manifest variable, one latent construct measured by two manifest variables, two manifest variables and their bivariate relations, or two latent constructs each measured by two manifest variables. Within-group t-tests for variance components and covariances are provided for single groups. For multiple groups two types of significance tests are provided: between-groups t-tests (as in SOREMO) and enhanced standard errors based on Lashley and Bond (1997) <DOI:10.1037/1082-989X.2.3.278>. Handling for missing values is provided.
Model data with a suspected clustering structure (either in co-variate space, regression space or both) using a Bayesian product model with a logistic regression likelihood. Observations are represented graphically and clusters are formed through various edge removals or additions. Cluster quality is assessed through the log Bayesian evidence of the overall model, which is estimated using either a Sequential Monte Carlo sampler or a suitable transformation of the Bayesian Information Criterion as a fast approximation of the former. The internal Iterated Batch Importance Sampling scheme (Chopin (2002 <doi:10.1093/biomet/89.3.539>)) is made available as a free standing function.
Facilitates use and analysis of data about the armed conflict in Colombia resulting from the joint project between La Jurisdicción Especial para la Paz (JEP), La Comisión para el Esclarecimiento de la Verdad, la Convivencia y la No repetición (CEV), and the Human Rights Data Analysis Group (HRDAG). The data are 100 replicates from a multiple imputation through chained equations as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. With the replicates the user can examine four human rights violations that occurred in the Colombian conflict accounting for the impact of missing fields and fully missing observations.
Fetches zonal statistics from weather indicators that were calculated for each municipality in Brazil using data from the BR-DWGD and TerraClimate projects. Zonal statistics such as mean, maximum, minimum, standard deviation, and sum were computed by taking into account the data cells that intersect the boundaries of each municipality and stored in Parquet files. This procedure was carried out for all Brazilian municipalities, and for all available dates, for every indicator available in the weather products (BR-DWGD and TerraClimate projects). This package queries on-line the already calculated statistics on the Parquet files and returns easy-to-use data.frames.
This package provides a user friendly function crrcbcv to compute bias-corrected variances for competing risks regression models using proportional subdistribution hazards with small-sample clustered data. Four types of bias correction are included: the MD-type bias correction by Mancl and DeRouen (2001) <doi:10.1111/j.0006-341X.2001.00126.x>, the KC-type bias correction by Kauermann and Carroll (2001) <doi:10.1198/016214501753382309>, the FG-type bias correction by Fay and Graubard (2001) <doi:10.1111/j.0006-341X.2001.01198.x>, and the MBN-type bias correction by Morel, Bokossa, and Neerchal (2003) <doi:10.1002/bimj.200390021>.
Limpa e simplifica nomes de pessoas para auxiliar no pareamento de banco de dados na ausência de chaves únicas não ambà guas. Detecta e corrige erros tipográficos mais comuns, simplifica opcionalmente termos sujeitos eventualmente a omissão em cadastros, e simplifica foneticamente suas palavras, aplicando variação própria do algoritmo metaphoneBR. (Cleans and simplifies person names to assist in database matching when unambiguous unique keys are unavailable. Detects and corrects common typos, optionally simplifies terms prone to omission in records, and applies phonetic simplification using a custom variation of the metaphoneBR algorithm.) Mation (2025) <doi:10.6082/uchicago.15104>.
Forms queries to submit to the Cleveland Federal Reserve Bank web site's financial stress index data site. Provides query functions for both the composite stress index and the components data. By default the download includes daily time series data starting September 25, 1991. The functions return a class of either type easing or cfsi which contain a list of items related to the query and its graphical presentation. The list includes the time series data as an xts object. The package provides four lattice time series plots to render the time series data in a manner similar to the bank's own presentation.
Partially penalized versions of specific transformation models implemented in package mlt'. Available models include a fully parametric version of the Cox model, other parametric survival models (Weibull, etc.), models for binary and ordered categorical variables, normal and transformed-normal (Box-Cox type) linear models, and continuous outcome logistic regression. Hyperparameter tuning is facilitated through model-based optimization functionalities from package mlrMBO'. The accompanying vignette describes the methodology used in tramnet in detail. Transformation models and model-based optimization are described in Hothorn et al. (2019) <doi:10.1111/sjos.12291> and Bischl et al. (2016) <doi:10.48550/arXiv.1703.03373>, respectively.
Estimation of time-dependent ROC curve and area under time dependent ROC curve (AUC) in the presence of censored data, with or without competing risks. Confidence intervals of AUCs and tests for comparing AUCs of two rival markers measured on the same subjects can be computed, using the iid-representation of the AUC estimator. Plot functions for time-dependent ROC curves and AUC curves are provided. Time-dependent Positive Predictive Values (PPV) and Negative Predictive Values (NPV) can also be computed. See Blanche et al. (2013) <doi:10.1002/sim.5958> and references therein for the details of the methods implemented in the package.
The main janitor functions can: perfectly format data.frame column names; provide quick counts of variable combinations (i.e., frequency tables and crosstabs); and isolate duplicate records. Other janitor functions nicely format the tabulation results. These tabulate-and-report functions approximate popular features of SPSS and Excel. This package follows the principles of the "tidyverse" and works well with the pipe function %>%. janitor was built with beginning-to-intermediate R users in mind and is optimized for user-friendliness. Advanced R users can already do everything covered here, but with janitor they can do it faster and save their thinking for the fun stuff.
Data type and tools for working with matrices having precision weights and missing data. This package provides a common representation and tools that can be used with many types of high-throughput data. The meaning of the weights is compatible with usage in the base R function "lm" and the package "limma". Calibrate weights to account for known predictors of precision. Find rows with excess variability. Perform differential testing and find rows with the largest confident differences. Find PCA-like components of variation even with many missing values, rotated so that individual components may be meaningfully interpreted. DelayedArray matrices and BiocParallel are supported.
The dependencies of CRAN packages can be analysed in a network fashion. For each package we can obtain the packages that it depends, imports, suggests, etc. By iterating this procedure over a number of packages, we can build, visualise, and analyse the dependency network, enabling us to have a bird's-eye view of the CRAN ecosystem. One aspect of interest is the number of reverse dependencies of the packages, or equivalently the in-degree distribution of the dependency network. This can be fitted by the power law and/or an extreme value mixture distribution <doi:10.1111/stan.12355>, of which functions are provided.
This package implements Cramer-von Mises Statistics for testing fit to (1) fully specified discrete distributions as described in Choulakian, Lockhart and Stephens (1994) <doi:10.2307/3315828> (2) discrete distributions with unknown parameters that must be estimated from the sample data, see Spinelli & Stephens (1997) <doi:10.2307/3315735> and Lockhart, Spinelli and Stephens (2007) <doi:10.1002/cjs.5550350111> (3) grouped continuous distributions with Unknown Parameters, see Spinelli (2001) <doi:10.2307/3316040>. Maximum likelihood estimation (MLE) is used to estimate the parameters. The package computes the Cramer-von Mises Statistics, Anderson-Darling Statistics and the Watson-Stephens Statistics and their p-values.
Access and analyze multi-band greenspace seasonality data cubes (available for 1,028 major global cities), global Normalized Difference Vegetation Index / land cover data from the European Space Agency WorldCover 10m Dataset, and Sentinel-2-l2a images. Users can download data using bounding boxes, city names, and filter by year or seasonal time window. The package also supports calculating human exposure to greenspace using a population-weighted greenspace exposure model introduced by Chen et al. (2022) <doi:10.1038/s41467-022-32258-4> based on Global Human Settlement Layer population data, and calculating a set of greenspace morphology metrics at patch and landscape levels.
The hydReng package provides a set of functions for hydraulic engineering tasks and natural hazard assessments. It includes basic hydraulics (wetted area, wetted perimeter, flow, flow velocity, flow depth, and maximum flow) for open channels with arbitrary geometry under uniform flow conditions. For structures such as circular pipes, weirs, and gates, the package includes calculations for pressure flow, backwater depth, and overflow over a weir crest. Additionally, it provides formulas for calculating bedload transport. The formulas used can be found in standard literature on hydraulics, such as Bollrich (2019, ISBN:978-3-410-29169-5) or Hager (2011, ISBN:978-3-642-77430-0).
Extensive penalized variable selection methods have been developed in the past two decades for analyzing high dimensional omics data, such as gene expressions, single nucleotide polymorphisms (SNPs), copy number variations (CNVs) and others. However, lipidomics data have been rarely investigated by using high dimensional variable selection methods. This package incorporates our recently developed penalization procedures to conduct interaction analysis for high dimensional lipidomics data with repeated measurements. The core module of this package is developed in C++. The development of this software package and the associated statistical methods have been partially supported by an Innovative Research Award from Johnson Cancer Research Center, Kansas State University.
Pearson and Spearman correlation coefficients are commonly used to quantify the strength of bivariate associations of genomic variables. For example, correlations of gene-level DNA copy number and gene expression measurements may be used to assess the impact of DNA copy number changes on gene expression in tumor tissue. MVisAGe enables users to quickly compute and visualize the correlations in order to assess the effect of regional genomic events such as changes in DNA copy number or DNA methylation level. Please see Walter V, Du Y, Danilova L, Hayward MC, Hayes DN, 2018. Cancer Research <doi:10.1158/0008-5472.CAN-17-3464>.
When working with big data sets, RAM conservation is critically important. However, it is not always enough to just monitor the size of the objects created. So-called "copy-on-modify" behavior, characteristic of R, means that some expressions or functions may require an unexpectedly large amount of RAM overhead. For example, replacing a single value in a matrix duplicates that matrix in the back-end, making this task require twice as much RAM as that used by the matrix itself. This package makes it easy to monitor the total and peak RAM used so that developers can quickly identify and eliminate RAM hungry code.
This package provides Sensory and Consumer Data mapping and analysis <doi:10.14569/IJACSA.2017.081266>. The mapping visualization is made available from several features : options in dimension reduction methods and prediction models ranging from linear to non linear regressions. A smoothed version of the map performed using locally weighted regression algorithm is available. A selection process of map stability is provided. A shiny application is included. It presents an easy GUI for the implemented functions as well as a comparative tool of fit models using several criteria. Basic analysis such as characterization of products, panelists and sessions likewise consumer segmentation are also made available.
By gaining the property of emergence through self-organization, the enhancement of SOMs(self organizing maps) is called Emergent SOM (ESOM). The result of the projection by ESOM is a grid of neurons which can be visualised as a three dimensional landscape in form of the Umatrix. Further details can be found in the referenced publications (see url). This package offers tools for calculating and visualising the ESOM as well as Umatrix, Pmatrix and UStarMatrix. All the functionality is also available through graphical user interfaces implemented in shiny'. Based on the recognized data structures, the method can be used to generate new data.
Generalized Additive Mixed Modeling (GAMM; Lin & Zhang, 1999) as implemented in the R package mgcv is a nonlinear regression analysis which is particularly useful for time course data such as EEG, pupil dilation, gaze data (eye tracking), and articulography recordings, but also for behavioral data such as reaction times and response data. As time course measures are sensitive to autocorrelation problems, GAMMs implements methods to reduce the autocorrelation problems. This package includes functions for the evaluation of GAMM models (e.g., model comparisons, determining regions of significance, inspection of autocorrelational structure in residuals) and interpreting of GAMMs (e.g., visualization of complex interactions, and contrasts).