Analyzing pedigree data of wild populations. While primarily designed to process outputs from the COLONY (Jones & Wang (2010) <doi:10.1111/j.1755-0998.2009.02787.x>) pedigree reconstruction software, it can also accommodate data from other sources. By linking reconstructed pedigrees with genetic sample metadata, wpeR
produces spatial and temporal visualizations as well as tabular summaries that support interpretation of family structures and dynamics. The main goal of the package is to provide a solution for the analysis of complex wild pedigree data and to help the user to gain insights into genetic relationships within wild animal populations.
Format dates and times flexibly and to whichever locales make sense. This package parses dates, times, and date-times in various formats (including string-based ISO 8601 constructions). The formatting syntax gives the user many options for formatting the date and time output in a precise manner. Time zones in the input can be expressed in multiple ways and there are many options for formatting time zones in the output as well. Several of the provided helper functions allow for automatic generation of locale-aware formatting patterns based on date/time skeleton formats and standardized date/time formats with varying specificity.
Perform the Adaptable Regularized Hotelling's T^2 test (ARHT) proposed by Li et al., (2016) <arXiv:1609.08725>
. Both one-sample and two-sample mean test are available with various probabilistic alternative prior models. It contains a function to consistently estimate higher order moments of the population covariance spectral distribution using the spectral of the sample covariance matrix (Bai et al. (2010) <doi:10.1111/j.1467-842X.2010.00590.x>). In addition, it contains a function to sample from 3-variate chi-squared random vectors approximately with a given correlation matrix when the degrees of freedom are large.
Estimates an ecological niche using occurrence data, covariates, and kernel density-based estimation methods. For a single species with presence and absence data, the envi package uses the spatial relative risk function that is estimated using the sparr package. Details about the sparr package methods can be found in the tutorial: Davies et al. (2018) <doi:10.1002/sim.7577>. Details about kernel density estimation can be found in J. F. Bithell (1990) <doi:10.1002/sim.4780090616>. More information about relative risk functions using kernel density estimation can be found in J. F. Bithell (1991) <doi:10.1002/sim.4780101112>.
Expert Algorithm Verbal Autopsy assigns causes of death to 2016 WHO Verbal Autopsy Questionnaire data. odk2EAVA()
converts data to a standard input format for cause of death determination building on the work of Thomas (2021) <https://cran.r-project.org/src/contrib/Archive/CrossVA/>
. codEAVA()
uses the presence and absence of signs and symptoms reported in the Verbal Autopsy interview to diagnose common causes of death. A deterministic algorithm assigns a single cause of death to each Verbal Autopsy interview record using a hierarchy of all common causes for neonates or children 1 to 59 months of age.
Compute house price indexes and series using a variety of different methods and models common through the real estate literature. Evaluate index goodness based on accuracy, volatility and revision statistics. Background on basic model construction for repeat sales models can be found at: Case and Quigley (1991) <https://ideas.repec.org/a/tpr/restat/v73y1991i1p50-58.html> and for hedonic pricing models at: Bourassa et al (2006) <doi:10.1016/j.jhe.2006.03.001>. The package author's working paper on the random forest approach to house price indexes can be found at: <http://www.github.com/andykrause/hpi_research>.
Specific functions are provided for rounding real weights to integers and performing an integer programming algorithm for calibration problems. These functions are useful for census-weights adjustments, survey calibration, or for performing linear regression with integer parameters <https://www.nass.usda.gov/Education_and_Outreach/Reports,_Presentations_and_Conferences/reports/New_Integer_Calibration_%20Procedure_2016.pdf>. This research was supported in part by the U.S. Department of Agriculture, National Agriculture Statistics Service. The findings and conclusions in this publication are those of the authors and should not be construed to represent any official USDA, or US Government determination or policy.
To fit the kernel semi-parametric model and its extensions. It allows multiple kernels and unlimited interactions in the same model. Coefficients are estimated by maximizing a penalized log-likelihood; penalization terms and hyperparameters are estimated by minimizing leave-one-out error. It includes predictions with confidence/prediction intervals, statistical tests for the significance of each kernel, a procedure for variable selection and graphical tools for diagnostics and interpretation of covariate effects. Currently it is implemented for continuous dependent variables. The package is based on the paper of Liu et al. (2007), <doi:10.1111/j.1541-0420.2007.00799.x>.
This package performs likelihood-based inference for stationary time series extremes. The general approach follows Fawcett and Walshaw (2012) <doi:10.1002/env.2133>. Marginal extreme value inferences are adjusted for cluster dependence in the data using the methodology in Chandler and Bate (2007) <doi:10.1093/biomet/asm015>, producing an adjusted log-likelihood for the model parameters. A log-likelihood for the extremal index is produced using the K-gaps model of Suveges and Davison (2010) <doi:10.1214/09-AOAS292>. These log-likelihoods are combined to make inferences about extreme values. Both maximum likelihood and Bayesian approaches are available.
An array of nonparametric and parametric estimation methods for cognitive diagnostic models, including nonparametric classification of examinee attribute profiles, joint maximum likelihood estimation (JMLE) of examinee attribute profiles and item parameters, and nonparametric refinement of the Q-matrix, as well as conditional maximum likelihood estimation (CMLE) of examinee attribute profiles given item parameters and CMLE of item parameters given examinee attribute profiles. Currently the nonparametric methods in the package support both conjunctive and disjunctive models, and the parametric methods in the package support the DINA model, the DINO model, the NIDA model, the G-NIDA model, and the R-RUM model.
I tend to repeat the same code chunks over and over again. At first, this was fine for me and I paid little attention to such redundancies. A little later, when I got tired of manually replacing Linux filepaths with the referring Windows versions, and vice versa, I started to stuff some very frequently used work-steps into functions and, even later, into a proper R package. And that's what this package is - a hodgepodge of various R functions meant to simplify (my) everyday-life coding work without, at the same time, being devoted to a particular scope of application.
Joint frailty models have been widely used to study the associations between recurrent events and a survival outcome. However, existing joint frailty models only consider one or a few recurrent events and cannot deal with high-dimensional recurrent events. This package can be used to fit our recently developed penalized joint frailty model that can handle high-dimensional recurrent events. Specifically, an adaptive lasso penalty is imposed on the parameters for the effects of the recurrent events on the survival outcome, which allows for variable selection. Also, our algorithm is computationally efficient, which is based on the Gaussian variational approximation method.
Generates and evaluates D, I, A, Alias, E, T, and G optimal designs. Supports generation and evaluation of blocked and split/split-split/.../N-split plot designs. Includes parametric and Monte Carlo power evaluation functions, and supports calculating power for censored responses. Provides a framework to evaluate power using functions provided in other packages or written by the user. Includes a Shiny graphical user interface that displays the underlying code used to create and evaluate the design to improve ease-of-use and make analyses more reproducible. For details, see Morgan-Wall et al. (2021) <doi:10.18637/jss.v099.i01>.
The Tweedie compound Poisson distribution is a mixture of a degenerate distribution at the origin and a continuous distribution on the positive real line. It has been applied in a wide range of fields in which continuous data with exact zeros regularly arise. The cplm package provides likelihood based and Bayesian procedures for fitting common Tweedie compound Poisson linear models. In particular, models with hierarchical structures or extra zero inflation can be handled. Further, the package implements the Gini index based on an ordered version of the Lorenz curve as a robust model comparison tool involving zero-inflated and highly skewed distributions.
The rema package implements a permutation-based approach for binary meta-analyses of 2x2 tables, founded on conditional logistic regression, that provides more reliable statistical tests when heterogeneity is observed in rare event data (Zabriskie et al. 2021 <doi:10.1002/sim.9142>). To adjust for the effect of heterogeneity, this method conditions on the sufficient statistic of a proxy for the heterogeneity effect as opposed to estimating the heterogeneity variance. While this results in the model not strictly falling under the random-effects framework, it is akin to a random-effects approach in that it assumes differences in variability due to treatment. Further, this method does not rely on large-sample approximations or continuity corrections for rare event data. This method uses the permutational distribution of the test statistic instead of asymptotic approximations for inference. The number of observed events drives the computation complexity for creating this permutational distribution. Accordingly, for this method to be computationally feasible, it should only be applied to meta-analyses with a relatively low number of observed events. To create this permutational distribution, a network algorithm, based on the work of Mehta et al. (1992) <doi:10.2307/1390598> and Corcoran et al. (2001) <doi:10.1111/j.0006-341x.2001.00941.x>, is employed using C++ and integrated into the package.
We introduced a novel ensemble-based explainable machine learning model using Model Confidence Set (MCS) and two stage Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) algorithm. The model combined the predictive capabilities of different machine-learning models and integrates the interpretability of explainability methods. To develop the proposed algorithm, a two-stage Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) framework was employed. The package has been developed using the algorithm of Paul et al. (2023) <doi:10.1007/s40009-023-01218-x> and Yeasin and Paul (2024) <doi:10.1007/s11227-023-05542-3>.
The HMS (Hierarchic Memetic Strategy) is a composite global optimization strategy consisting of a multi-population evolutionary strategy and some auxiliary methods. The HMS makes use of a dynamically-evolving data structure that provides an organization among the component populations. It is a tree with a fixed maximal height and variable internal node degree. Each component population is governed by a particular evolutionary engine. This package provides a simple R implementation with examples of using different genetic algorithms as the population engines. References: J. Sawicki, M. Å oÅ , M. SmoÅ ka, J. Alvarez-Aramberri (2022) <doi:10.1007/s11047-020-09836-w>.
This package provides a framework package aimed to provide standardized computational environment for specialist work via object classes to represent the data coded by samples, taxa and segments (i.e. subpopulations, repeated measures). It supports easy processing of the data along with cross tabulation and relational data tables for samples and taxa. An object of class `mefa is a project specific compendium of the data and can be easily used in further analyses. Methods are provided for extraction, aggregation, conversion, plotting, summary and reporting of `mefa objects. Reports can be generated in plain text or LaTeX
format. Vignette contains worked examples.
Features tools for the network data analysis and community detection. Provides multiple methods for fitting, model selection and goodness-of-fit testing in degree-corrected stochastic blocks models. Most of the computations are fast and scalable for sparse networks, esp. for Poisson versions of the models. Implements the following: Amini, Chen, Bickel and Levina (2013) <doi:10.1214/13-AOS1138> Bickel and Sarkar (2015) <doi:10.1111/rssb.12117> Lei (2016) <doi:10.1214/15-AOS1370> Wang and Bickel (2017) <doi:10.1214/16-AOS1457> Zhang and Amini (2020) <arXiv:2012.15047>
Le and Levina (2022) <doi:10.1214/21-EJS1971>.
Automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse including frequency counts of sentence types, words, sentences, turns of talk, syllables and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables, providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This affords the user a more efficient and targeted analysis. qdap is designed for transcript analysis, however, many functions are applicable to other areas of Text Mining/ Natural Language Processing.
This package provides a terribly-simple data base for numeric time series, written purely in R, so no external database-software is needed. Series are stored in plain-text files (the most-portable and enduring file type) in CSV format. Timestamps are encoded using R's native numeric representation for Date'/'POSIXct', which makes them fast to parse, but keeps them accessible with other software. The package provides tools for saving and updating series in this standardised format, for retrieving and joining data, for summarising files and directories, and for coercing series from and to other data types (such as zoo series).
The aim of XINA
is to determine which proteins exhibit similar patterns within and across experimental conditions, since proteins with co-abundance patterns may have common molecular functions. XINA
imports multiple datasets, tags dataset in silico, and combines the data for subsequent subgrouping into multiple clusters. The result is a single output depicting the variation across all conditions. XINA
not only extracts coabundance profiles within and across experiments, but also incorporates protein-protein interaction databases and integrative resources such as Kyoto encyclopedia of genes and genomes (KEGG) to infer interactors and molecular functions, respectively, and produces intuitive graphical outputs.
Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in http://doi.org/10.18637/jss.v045.i03. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
This package implements the fast cross-validation via sequential testing (CVST) procedure. CVST is an improved cross-validation procedure which uses non-parametric testing coupled with sequential analysis to determine the best parameter set on linearly increasing subsets of the data. Additionally to the CVST the package contains an implementation of the ordinary k-fold cross-validation with a flexible and powerful set of helper objects and methods to handle the overall model selection process. The implementations of the Cochran's Q test with permutations and the sequential testing framework of Wald are generic and can therefore also be used in other contexts.