This package provides methods to generate a design in the input space that sequentially fills the output space of a black-box function. The output space-filling designs are helpful in inverse design or feature-based modeling problems. See Wang, Shangkun, Adam P. Generale, Surya R. Kalidindi, and V. Roshan Joseph. (2024), Sequential designs for filling output spaces, Technometrics, 66, 65รข 76. for details. This work is supported by U.S. National Foundation grant CMMI-1921646.
This package implements the Quantile Composite-based Path Modeling approach (Davino and Vinzi, 2016 <doi:10.1007/s11634-015-0231-9>; Dolce et al., 2021 <doi:10.1007/s11634-021-00469-0>). The method complements the traditional PLS Path Modeling approach, analyzing the entire distribution of outcome variables and, therefore, overcoming the classical exploration of only average effects. It exploits quantile regression to investigate changes in the relationships among constructs and between constructs and observed variables.
This package implements the methodological developments found in Hermes, van Heerwaarden, and Behrouzi (2023) <doi:10.48550/arXiv.2308.04325>, and allows for the statistical modeling of asymmetric between-location effects, as well as within-location effects using spatial autoregressive graphical models. The package allows for the generation of spatial weight matrices to capture asymmetric effects for strip-type intercropping designs, although it can handle any type of spatial data commonly found in other sciences.
Analyzes longitudinal data of HIV decline in patients on antiretroviral therapy using the canonical biphasic exponential decay model (pioneered, for example, by work in Perelson et al. (1997) <doi:10.1038/387188a0>; and Wu and Ding (1999) <doi:10.1111/j.0006-341X.1999.00410.x>). Model fitting and parameter estimation are performed, with additional options to calculate the time to viral suppression. Plotting and summary tools are also provided for fast assessment of model results.
Estimating the disparity between two groups based on the extended model of the Peters-Belson (PB) method. Our model is the first work on the longitudinal data, and also can set a varying variable to find the complicated association between other variables and the varying variable. Our work is an extension of the Peters-Belson method which was originally published in Peters (1941)<doi:10.1080/00220671.1941.10881036> and Belson (1956)<doi:10.2307/2985420>.
Calculate the win statistics (win ratio, net benefit and win odds) for prioritized multiple endpoints, plot the win statistics and win proportions over study time if at least one time-to-event endpoint is analyzed, and simulate datasets with dependent endpoints. The package can handle any type of outcomes (continuous, ordinal, binary, time-to-event) and allow users to perform stratified analysis, inverse probability of censoring weighting (IPCW) and inverse probability of treatment weighting (IPTW) analysis.
This package provides a convenient data set, a set of helper functions, and a benchmark function for economically (profit) driven wind farm layout optimization. This enables researchers in the field of the NP-hard (non-deterministic polynomial-time hard) problem of wind farm layout optimization to focus on their optimization methodology contribution and also provides a realistic benchmark setting for comparability among contributions. See Croonenbroeck, Carsten & Hennecke, David (2020) <doi:10.1016/j.energy.2020.119244>.
Augmented Regression with General Online data (ARGO) for accurate estimation of influenza epidemics in United States on national level, regional level and state level. It replicates the method introduced in paper Yang, S., Santillana, M. and Kou, S.C. (2015) <doi:10.1073/pnas.1515373112>; Ning, S., Yang, S. and Kou, S.C. (2019) <doi:10.1038/s41598-019-41559-6>; Yang, S., Ning, S. and Kou, S.C. (2021) <doi:10.1038/s41598-021-83084-5>.
Bindings to Google's C++ library Compact Language Detector 2 (see <https://github.com/cld2owners/cld2#readme> for more information). Probabilistically detects over 80 languages in plain text or HTML. For mixed-language input it returns the top three detected languages and their approximate proportion of the total classified text bytes (e.g. 80% English and 20% French out of 1000 bytes). There is also a cld3 package on CRAN which uses a neural network model instead.
This framework enables forecasting and extrapolating measures of conditional risk (e.g. of extreme or unprecedented events), including quantiles and exceedance probabilities, using extreme value statistics and flexible neural network architectures. It allows for capturing complex multivariate dependencies, including dependencies between observations, such as sequential dependence (time-series). The methodology was introduced in Pasche and Engelke (2024) <doi:10.1214/24-AOAS1907> (also available in preprint: Pasche and Engelke (2022) <doi:10.48550/arXiv.2208.07590>).
This package provides measures to characterize the complexity of classification and regression problems based on aspects that quantify the linearity of the data, the presence of informative feature, the sparsity and dimensionality of the datasets. This package provides bug fixes, generalizations and implementations of many state of the art measures. The measures are described in the papers: Lorena et al. (2019) <doi:10.1145/3347711> and Lorena et al. (2018) <doi:10.1007/s10994-017-5681-1>.
This R package can be used to generate artificial data conditionally on pre-specified (simulated or user-defined) relationships between the variables and/or observations. Each observation is drawn from a multivariate Normal distribution where the mean vector and covariance matrix reflect the desired relationships. Outputs can be used to evaluate the performances of variable selection, graphical modelling, or clustering approaches by comparing the true and estimated structures (B Bodinier et al (2021) <arXiv:2106.02521>).
The algorithm assigns rareness/ outlierness score to every sample in voluminous datasets. The algorithm makes multiple estimations of the proximity between a pair of samples, in low-dimensional spaces. To compute proximity, FiRE uses Sketching, a variant of locality sensitive hashing. For more details: Jindal, A., Gupta, P., Jayadeva and Sengupta, D., 2018. Discovery of rare cells from voluminous single cell expression data. Nature Communications, 9(1), p.4719. <doi:10.1038/s41467-018-07234-6>.
Fits Weighted Quantile Sum (WQS) regression (Carrico et al. (2014) <doi:10.1007/s13253-014-0180-3>), a random subset implementation of WQS (Curtin et al. (2019) <doi:10.1080/03610918.2019.1577971>), a repeated holdout validation WQS (Tanner et al. (2019) <doi:10.1016/j.mex.2019.11.008>) and a WQS with 2 indices (Renzetti et al. (2023) <doi:10.3389/fpubh.2023.1289579>) for continuous, binomial, multinomial, Poisson, quasi-Poisson and negative binomial outcomes.
Retrieving regional plant checklists, species traits and distributions, and environmental data from the Global Inventory of Floras and Traits (GIFT). More information about the GIFT database can be found at <https://gift.uni-goettingen.de/about> and the map of available floras can be visualized at <https://gift.uni-goettingen.de/map>. The API and associated queries can be accessed according the following scheme: <https://gift.uni-goettingen.de/api/extended/index2.0.php?query=env_raster>.
Cross-validated eigenvalues are estimated by splitting a graph into two parts, the training and the test graph. The training graph is used to estimate eigenvectors, and the test graph is used to evaluate the correlation between the training eigenvectors and the eigenvectors of the test graph. The correlations follow a simple central limit theorem that can be used to estimate graph dimension via hypothesis testing, see Chen et al. (2021) <arXiv:2108.03336> for details.
When the response variable Y takes one of R > 1 values, the function glsm() computes the maximum likelihood estimates (MLEs) of the parameters under four models: null, complete, saturated, and logistic. It also calculates the log-likelihood values for each model. This method assumes independent, non-identically distributed variables. For grouped data with a multinomial outcome, where observations are divided into J populations, the function glsm() provides estimation for any number K of explanatory variables.
Suite of functions to study animal incubation. At the core of incR lies an algorithm that allows for the scoring of incubation behaviour. Additionally, several functions extract biologically relevant metrics of incubation such as off-bout number and off-bout duration - for a review of avian incubation studies, see Nests, Eggs, and Incubation: New ideas about avian reproduction (2015) edited by D. Charles Deeming and S. James Reynolds <doi:10.1093/acprof:oso/9780198718666.001.0001>.
An implementation of modified maximum contrast methods (Sato et al. (2009) <doi:10.1038/tpj.2008.17>; Nagashima et al. (2011) <doi:10.2202/1544-6115.1560>) and the maximum contrast method (Yoshimura et al. (1997) <doi:10.1177/009286159703100213>): Functions mmcm.mvt() and mcm.mvt() give P-value by using randomized quasi-Monte Carlo method with pmvt() function of package mvtnorm', and mmcm.resamp() gives P-value by using a permutation method.
The word Meme was originated from the book, The Selfish Gene', authored by Richard Dawkins (1976). It is a unit of culture that is passed from one generation to another and correlates to the gene, the unit of physical heredity. The internet memes are captioned photos that are intended to be funny, ridiculous. Memes behave like infectious viruses and travel from person to person quickly through social media. The meme package allows users to make custom memes.
This package implements the Quantitative Classification-based on Association Rules (QCBA) algorithm (<doi:10.1007/s10489-022-04370-x>). QCBA postprocesses rule classification models making them typically smaller and in some cases more accurate. Supported are CBA implementations from rCBA', arulesCBA and arc packages, and CPAR', CMAR', FOIL2 and PRM implementations from arulesCBA package and SBRL implementation from the sbrl package. The result of the post-processing is an ordered CBA-like rule list.
This package provides a coalescent simulator that allows the rapid simulation of biological sequences under neutral models of evolution, see Staab et al. (2015) <doi:10.1093/bioinformatics/btu861>. Different to other coalescent based simulations, it has an optional approximation parameter that allows for high accuracy while maintaining a linear run time cost for long sequences. It is optimized for simulating massive data sets as produced by Next- Generation Sequencing technologies for up to several thousand sequences.
This package provides comprehensive tools for the implementation of Structural Latent Class Models (SLCM), including Latent Transition Analysis (LTA; Linda M. Collins and Stephanie T. Lanza, 2009) <doi:10.1002/9780470567333>, Latent Class Profile Analysis (LCPA; Hwan Chung et al., 2010) <doi:10.1111/j.1467-985x.2010.00674.x>, and Joint Latent Class Analysis (JLCA; Saebom Jeon et al., 2017) <doi:10.1080/10705511.2017.1340844>, and any other extended models involving multiple latent class variables.
Calculates the robust Taba linear, Taba rank (monotonic), TabWil, and TabWil rank correlations. Test statistics as well as one sided or two sided p-values are provided for all correlations. Multiple correlations and p-values can be calculated simultaneously across multiple variables. In addition, users will have the option to use the partial, semipartial, and generalized partial correlations; where the partial and semipartial correlations use linear, logistic, or Poisson regression to modify the specified variable.