This package implements techniques to estimate the unknown quantities related to two-component admixture models, where the two components can belong to any distribution (note that in the case of multinomial mixtures, the two components must belong to the same family). Estimation methods depend on the assumptions made on the unknown component density; see Bordes and Vandekerkhove (2010) <doi:10.3103/S1066530710010023>, Patra and Sen (2016) <doi:10.1111/rssb.12148>, and Milhaud, Pommeret, Salhi, Vandekerkhove (2024) <doi:10.3150/23-BEJ1593>. In practice, one can estimate both the mixture weight and the unknown component density in a wide variety of frameworks. On top of that, hypothesis tests can be performed in one and two-sample contexts to test the unknown component density (see Milhaud, Pommeret, Salhi and Vandekerkhove (2022) <doi:10.1016/j.jspi.2021.05.010>, and Milhaud, Pommeret, Salhi, Vandekerkhove (2024) <doi:10.3150/23-BEJ1593>). Finally, clustering of unknown mixture components is also feasible in a K-sample setting (see Milhaud, Pommeret, Salhi, Vandekerkhove (2024) <https://jmlr.org/papers/v25/23-0914.html>).
This package performs the analysis of completely randomized experimental designs (CRD), randomized blocks (RBD) and Latin square (LSD), experiments in double and triple factorial scheme (in CRD and RBD), experiments in subdivided plot scheme (in CRD and RBD), subdivided and joint analysis of experiments in CRD and RBD, linear regression analysis, test for two samples. The package performs analysis of variance, ANOVA assumptions and multiple comparison test of means or regression, according to Pimentel-Gomes (2009, ISBN: 978-85-7133-055-9), nonparametric test (Conover, 1999, ISBN: 0471160687), test for two samples, joint analysis of experiments according to Ferreira (2018, ISBN: 978-85-7269-566-4) and generalized linear model (glm) for binomial and Poisson family in CRD and RBD (Carvalho, FJ (2019), <doi:10.14393/ufu.te.2019.1244>). It can also be used to obtain descriptive measures and graphics, in addition to correlations and creative graphics used in agricultural sciences (Agronomy, Zootechnics, Food Science and related areas). Shimizu, G. D., Marubayashi, R. Y. P., Goncalves, L. S. A. (2025) <doi:10.4025/actasciagron.v47i1.73889>.
Original idea was presented in the reference paper. Varghese et al. (2020, 74(1):35-42) "Bayesian State-space Implementation of Schaefer Production Model for Assessment of Stock Status for Multi-gear Fishery". Marine fisheries governance and management practices are very essential to ensure the sustainability of the marine resources. A widely accepted resource management strategy towards this is to derive sustainable fish harvest levels based on the status of marine fish stock. Various fish stock assessment models that describe the biomass dynamics using time series data on fish catch and fishing effort are generally used for this purpose. In the scenario of complex multi-species marine fishery in which different species are caught by a number of fishing gears and each gear harvests a number of species make it difficult to obtain the fishing effort corresponding to each fish species. Since the capacity of the gears varies, the effort made to catch a resource cannot be considered as the sum of efforts expended by different fishing gears. This necessitates standardisation of fishing effort in unit base.
An efficient unified nonconvex penalized estimation algorithm for Gaussian (linear), binomial Logit (logistic), Poisson, multinomial Logit, and Cox proportional hazard regression models. The unified algorithm is implemented based on the convex concave procedure and the algorithm can be applied to most of the existing nonconvex penalties. The algorithm also supports convex penalty: least absolute shrinkage and selection operator (LASSO). Supported nonconvex penalties include smoothly clipped absolute deviation (SCAD), minimax concave penalty (MCP), truncated LASSO penalty (TLP), clipped LASSO (CLASSO), sparse ridge (SRIDGE), modified bridge (MBRIDGE) and modified log (MLOG). For high-dimensional data (data set with many variables), the algorithm selects relevant variables producing a parsimonious regression model. Kim, D., Lee, S. and Kwon, S. (2018) <arXiv:1811.05061>
, Lee, S., Kwon, S. and Kim, Y. (2016) <doi:10.1016/j.csda.2015.08.019>, Kwon, S., Lee, S. and Kim, Y. (2015) <doi:10.1016/j.csda.2015.07.001>. (This research is funded by Julian Virtue Professorship from Center for Applied Research at Pepperdine Graziadio Business School and the National Research Foundation of Korea.).
An implementation of a single-index regression for optimizing individualized dose rules from an observational study. To model interaction effects between baseline covariates and a treatment variable defined on a continuum, we employ two-dimensional penalized spline regression on an index-treatment domain, where the index is defined as a linear combination of the covariates (a single-index). An unspecified main effect for the covariates is allowed, which can also be modeled through a parametric model. A unique contribution of this work is in the parsimonious single-index parametrization specifically defined for the interaction effect term. We refer to Park, Petkova, Tarpey, and Ogden (2020) <doi:10.1111/biom.13320> (for the case of a discrete treatment) and Park, Petkova, Tarpey, and Ogden (2021) "A single-index model with a surface-link for optimizing individualized dose rules" <arXiv:2006.00267v2>
for detail of the method. The model can take a member of the exponential family as a response variable and can also take an ordinal categorical response. The main function of this package is simsl()
.
Constructs genotype x environment interaction (GxE
) models where G is a weighted sum of genetic variants (genetic score) and E is a weighted sum of environments (environmental score) using the alternating optimization algorithm by Jolicoeur-Martineau et al. (2017) <arXiv:1703.08111>
. This approach has greatly enhanced predictive power over traditional GxE
models which include only a single genetic variant and a single environmental exposure. Although this approach was originally made for GxE
modelling, it is flexible and does not require the use of genetic and environmental variables. It can also handle more than 2 latent variables (rather than just G and E) and 3-way interactions or more. The LEGIT model produces highly interpretable results and is very parameter-efficient thus it can even be used with small sample sizes (n < 250). Tools to determine the type of interaction (vantage sensitivity, diathesis-stress or differential susceptibility), with any number of genetic variants or environments, are available <arXiv:1712.04058>
. The software can now produce mixed-effects LEGIT models through the lme4 package.
Calculate unified measures that quantify the effect of a covariate on a binary dependent variable (e.g., for meta-analyses). This can be particularly important if the estimation results are obtained with different models/estimators (e.g., linear probability model, logit, probit, ...) and/or with different transformations of the explanatory variable of interest (e.g., linear, quadratic, interval-coded, ...). The calculated unified measures are: (a) semi-elasticities of linear, quadratic, or interval-coded covariates and (b) effects of linear, quadratic, interval-coded, or categorical covariates when a linear or quadratic covariate changes between distinct intervals, the reference category of a categorical variable or the reference interval of an interval-coded variable needs to be changed, or some categories of a categorical covariate or some intervals of an interval-coded covariate need to be grouped together. Approximate standard errors of the unified measures are also calculated. All methods that are implemented in this package are described in the vignette "Extracting and Unifying Semi-Elasticities and Effect Sizes from Studies with Binary Dependent Variables" that is included in this package.
In some situations where researchers would like to demonstrate causal effects, it is hard to obtain a sample size that would allow for a well-powered randomized controlled trial. Single case designs are experimental designs that can be used to demonstrate causal effects with only one participant or with only a few participants. The scdtb package provides a suite of tools for analyzing data from studies that use single case designs. The nap()
function can be used to compute the nonoverlap of all pairs as outlined by the What Works Clearinghouse (2022) <https://ies.ed.gov/ncee/wwc/Handbooks>. The package also offers the mixed_model_analysis()
and cross_lagged()
functions which implement mixed effects models and cross lagged analyses as described in Maric & van der Werff (2020) <doi:10.4324/9780429273872-9>. The randomization_test()
function implements randomization tests based on methods presented in Onghena (2020) <doi:10.4324/9780429273872-8>. The scdtb()
shiny application can be used to upload single case design data and access various scdtb tools for plotting and analysis.
This package provides a major challenge in estimating treatment decision rules from a randomized clinical trial dataset with covariates measured at baseline lies in detecting relatively small treatment effect modification-related variability (i.e., the treatment-by-covariates interaction effects on treatment outcomes) against a relatively large non-treatment-related variability (i.e., the main effects of covariates on treatment outcomes). The class of Single-Index Models with Multiple-Links is a novel single-index model specifically designed to estimate a single-index (a linear combination) of the covariates associated with the treatment effect modification-related variability, while allowing a nonlinear association with the treatment outcomes via flexible link functions. The models provide a flexible regression approach to developing treatment decision rules based on patients data measured at baseline. We refer to Park, Petkova, Tarpey, and Ogden (2020) <doi:10.1016/j.jspi.2019.05.008> and Park, Petkova, Tarpey, and Ogden (2020) <doi:10.1111/biom.13320> (that allows an unspecified X main effect) for detail of the method. The main function of this package is simml()
.
Third order response surface designs (M. Hemavathi, Shashi Shekhar, Eldho Varghese, Seema Jaggi, Bikas Sinha & Nripes Kumar Mandal (2022) <DOI:10.1080/03610926.2021.1944213>."Theoretical developments in response surface designs: an informative review and further thoughts") are classified into two types viz., designs which are suitable for sequential experimentation and designs for non-sequential experimentation (M. Hemavathi, Eldho Varghese, Shashi Shekhar & Seema Jaggi (2022)<DOI:10.1080/02664763.2020.1864817>." Sequential asymmetric third order rotatable designs (SATORDs)"). The sequential experimentation approach involves conducting the trials step by step whereas, in the non-sequential experimentation approach, the entire runs are executed in one go.This package contains functions named STORDs()
and NSTORDs()
for generating sequential/non-sequential TORDs given in Das, M. N., and V. L. Narasimham (1962). <DOI:10.1214/aoms/1177704374>. "Construction of rotatable designs through balanced incomplete block designs" along with the randomized layout. It also contains another function named Pred3.var()
for generating the variance of predicted response as well as the moment matrix based on a third order response surface model.
Allows users to create and deploy the workflow with multiple functions in Function-as-a-Service (FaaS
) cloud computing platforms. The FaaSr
package makes it simpler for R developers to use FaaS
platforms by providing the following functionality: 1) Parsing and validating a JSON-based payload compliant to FaaSr
schema supporting multiple FaaS
platforms 2) Invoking user functions written in R in a Docker container (derived from rocker), using a list generated from the parser as argument 3) Downloading/uploading of files from/to S3 buckets using simple primitives 4) Logging to files in S3 buckets 5) Triggering downstream actions supporting multiple FaaS
platforms 6) Generating FaaS-specific
API calls to simplify the registering of a user's workflow with a FaaS
platform Supported FaaS
platforms: Apache OpenWhisk
<https://openwhisk.apache.org/> GitHub
Actions <https://github.com/features/actions> Amazon Web Services (AWS) Lambda <https://aws.amazon.com/lambda/> Supported cloud data storage for persistent storage: Amazon Web Services (AWS) Simple Storage Service (S3) <https://aws.amazon.com/s3/>.
Conduct multi-locus genome-wide association study under the framework of multi-locus random-SNP-effect mixed linear model (mrMLM
). First, each marker on the genome is scanned. Bonferroni correction is replaced by a less stringent selection criterion for significant test. Then, all the markers that are potentially associated with the trait are included in a multi-locus genetic model, their effects are estimated by empirical Bayes, and all the nonzero effects were further identified by likelihood ratio test for significant QTL. The program may run on a desktop or laptop computers. If marker genotypes in association mapping population are almost homozygous, these methods in this software are very effective. If there are many heterozygous marker genotypes, the IIIVmrMLM
software is recommended. Wen YJ, Zhang H, Ni YL, Huang B, Zhang J, Feng JY, Wang SB, Dunwell JM, Zhang YM, Wu R (2018, <doi:10.1093/bib/bbw145>), and Li M, Zhang YW, Zhang ZC, Xiang Y, Liu MH, Zhou YH, Zuo JF, Zhang HQ, Chen Y, Zhang YM (2022, <doi:10.1016/j.molp.2022.02.012>).
This package provides a scalable and fast method for estimating joint Species Distribution Models (jSDMs
) for big community data, including eDNA
data. The package estimates a full (i.e. non-latent) jSDM
with different response distributions (including the traditional multivariate probit model). The package allows to perform variation partitioning (VP) / ANOVA on the fitted models to separate the contribution of environmental, spatial, and biotic associations. In addition, the total R-squared can be further partitioned per species and site to reveal the internal metacommunity structure, see Leibold et al., <doi:10.1111/oik.08618>. The internal structure can then be regressed against environmental and spatial distinctiveness, richness, and traits to analyze metacommunity assembly processes. The package includes support for accounting for spatial autocorrelation and the option to fit responses using deep neural networks instead of a standard linear predictor. As described in Pichler & Hartig (2021) <doi:10.1111/2041-210X.13687>, scalability is achieved by using a Monte Carlo approximation of the joint likelihood implemented via PyTorch
and reticulate', which can be run on CPUs or GPUs.
The Bayesian Markov renewal mixed models take sequentially observed categorical data with continuous duration times, being either state duration or inter-state duration. These models comprehensively analyze the stochastic dynamics of both state transitions and duration times under the influence of multiple exogenous factors and random individual effect. The default setting flexibly models the transition probabilities using Dirichlet mixtures and the duration times using gamma mixtures. It also provides the flexibility of modeling the categorical sequences using Bayesian Markov mixed models alone, either ignoring the duration times altogether or dividing duration time into multiples of an additional category in the sequence by a user-specific unit. The package allows extensive inference of the state transition probabilities and the duration times as well as relevant plots and graphs. It also includes a synthetic data set to demonstrate the desired format of input data set and the utility of various functions. Methods for Bayesian Markov renewal mixed models are as described in: Abhra Sarkar et al., (2018) <doi:10.1080/01621459.2018.1423986> and Yutong Wu et al., (2022) <doi:10.1093/biostatistics/kxac050>.
We solve non linear least squares problems with optional equality and/or inequality constraints. Non linear iterations are globalized with back-tracking method. Linear problems are solved by dense QR decomposition from LAPACK which can limit the size of treated problems. On the other side, we avoid condition number degradation which happens in classical quadratic programming approach. Inequality constraints treatment on each non linear iteration is based on NNLS method (by Lawson and Hanson). We provide an original function lsi_ln for solving linear least squares problem with inequality constraints in least norm sens. Thus if Jacobian of the problem is rank deficient a solution still can be provided. However, truncation errors are probable in this case. Equality constraints are treated by using a basis of Null-space. User defined function calculating residuals must return a list having residual vector (not their squared sum) and Jacobian. If Jacobian is not in the returned list, package numDeriv
is used to calculated finite difference version of Jacobian. The NLSIC method was fist published in Sokol et al. (2012) <doi:10.1093/bioinformatics/btr716>.
This package provides a framework for specifying spatially, temporally and spatially-and-temporally varying coefficient models using Generalized Additive Models with smooths. The smooths are parameterised with location, time and predictor variables. The framework supports the investigation of the presence and nature of any space-time dependencies in the data by evaluating multiple model forms (specifications) using a Generalized Cross-Validation score. The workflow sequence is to: i) Prepare the data by lengthening it to have a single location and time variables for each observation. ii) Evaluate all possible spatial and/or temporal models in which each predictor is specified in different ways. iii) Evaluate each model and pick the best one. iv) Create the final model. v) Calculate the varying coefficient estimates to quantify how the relationships between the target and predictor variables vary over space, time or space-time. vi) Create maps, time series plots etc. For more details see: Comber et al (2023) <doi:10.4230/LIPIcs.GIScience.2023.22>, Comber et al (2024) <doi:10.1080/13658816.2023.2270285> and Comber et al (2004) <doi:10.3390/ijgi13120459>.
Compiles functions to trim, bin, visualise, and analyse activity/sleep time-series data collected from the Drosophila Activity Monitor (DAM) system (Trikinetics, USA). The following methods were used to compute periodograms - Chi-square periodogram: Sokolove and Bushell (1978) <doi:10.1016/0022-5193(78)90022-X>, Lomb-Scargle periodogram: Lomb (1976) <doi:10.1007/BF00648343>, Scargle (1982) <doi:10.1086/160554> and Ruf (1999) <doi:10.1076/brhm.30.2.178.1422>, and Autocorrelation: Eijzenbach et al. (1986) <doi:10.1111/j.1440-1681.1986.tb00943.x>. Identification of activity peaks is done after using a Savitzky-Golay filter (Savitzky and Golay (1964) <doi:10.1021/ac60214a047>) to smooth raw activity data. Three methods to estimate anticipation of activity are used based on the following papers - Slope method: Fernandez et al. (2020) <doi:10.1016/j.cub.2020.04.025>, Harrisingh method: Harrisingh et al. (2007) <doi:10.1523/JNEUROSCI.3680-07.2007>, and Stoleru method: Stoleru et al. (2004) <doi:10.1038/nature02926>. Rose plots and circular analysis are based on methods from - Batschelet (1981) <ISBN:0120810506> and Zar (2010) <ISBN:0321656865>.
Hierarchical continuous (and discrete) time state space modelling, for linear and nonlinear systems measured by continuous variables, with limited support for binary data. The subject specific dynamic system is modelled as a stochastic differential equation (SDE) or difference equation, measurement models are typically multivariate normal factor models. Linear mixed effects SDE's estimated via maximum likelihood and optimization are the default. Nonlinearities, (state dependent parameters) and random effects on all parameters are possible, using either max likelihood / max a posteriori optimization (with optional importance sampling) or Stan's Hamiltonian Monte Carlo sampling. See <https://github.com/cdriveraus/ctsem/raw/master/vignettes/hierarchicalmanual.pdf> for details. See <https://osf.io/preprints/psyarxiv/4q9ex_v2> for a detailed tutorial. Priors may be used. For the conceptual overview of the hierarchical Bayesian linear SDE approach, see <https://www.researchgate.net/publication/324093594_Hierarchical_Bayesian_Continuous_Time_Dynamic_Modeling>. Exogenous inputs may also be included, for an overview of such possibilities see <https://www.researchgate.net/publication/328221807_Understanding_the_Time_Course_of_Interventions_with_Continuous_Time_Dynamic_Models> . <https://cdriver.netlify.app/> contains some tutorial blog posts.
Obtains lists of files of remote sensing collections for Southern Ocean surface properties. Commonly used data sources of sea surface temperature, sea ice concentration, and altimetry products such as sea surface height and sea surface currents are cached in object storage on the Pawsey Supercomputing Research Centre facility. Patterns of working to retrieve data from these object storage catalogues are described. The catalogues include complete collections of datasets Reynolds et al. (2008) "NOAA Optimum Interpolation Sea Surface Temperature (OISST) Analysis, Version 2.1" <doi:10.7289/V5SQ8XB5>, Spreen et al. (2008) "Artist Advanced Microwave Scanning Radiometer for Earth Observing System (AMSR-E) sea ice concentration" <doi:10.1029/2005JC003384>. In future releases helpers will be added to identify particular data collections and target specific dates for earth observation data for reading, as well as helpers to retrieve data set citation and provenance details. This work was supported by resources provided by the Pawsey Supercomputing Research Centre with funding from the Australian Government and the Government of Western Australia. This software was developed by the Integrated Digital East Antarctica program of the Australian Antarctic Division.
The PBIB designs are important type of incomplete block designs having wide area of their applications for example in agricultural experiments, in plant breeding, in sample surveys etc. This package constructs various series of PBIB designs and assists in checking all the necessary conditions of PBIB designs and the association scheme on which these designs are based on. It also assists in calculating the efficiencies of PBIB designs with any number of associate classes. The package also constructs Youden-m square designs which are Row-Column designs for the two-way elimination of heterogeneity. The incomplete columns of these Youden-m square designs constitute PBIB designs. With the present functionality, the package will be of immense importance for the researchers as it will help them to construct PBIB designs, to check if their PBIB designs and association scheme satisfy various necessary conditions for the existence, to calculate the efficiencies of PBIB designs based on any association scheme and to construct Youden-m square designs for the two-way elimination of heterogeneity. R. C. Bose and K. R. Nair (1939) <http://www.jstor.org/stable/40383923>.
Single-cell Interpretable Tensor Decomposition (scITD
) employs the Tucker tensor decomposition to extract multicell-type gene expression patterns that vary across donors/individuals. This tool is geared for use with single-cell RNA-sequencing datasets consisting of many source donors. The method has a wide range of potential applications, including the study of inter-individual variation at the population-level, patient sub-grouping/stratification, and the analysis of sample-level batch effects. Each "multicellular process" that is extracted consists of (A) a multi cell type gene loadings matrix and (B) a corresponding donor scores vector indicating the level at which the corresponding loadings matrix is expressed in each donor. Additional methods are implemented to aid in selecting an appropriate number of factors and to evaluate stability of the decomposition. Additional tools are provided for downstream analysis, including integration of gene set enrichment analysis and ligand-receptor analysis. Tucker, L.R. (1966) <doi:10.1007/BF02289464>. Unkel, S., Hannachi, A., Trendafilov, N. T., & Jolliffe, I. T. (2011) <doi:10.1007/s13253-011-0055-9>. Zhou, G., & Cichocki, A. (2012) <doi:10.2478/v10175-012-0051-4>.
Additional nonlinear regression functions using self-start (SS) algorithms. One of the functions is the Beta growth function proposed by Yin et al. (2003) <doi:10.1093/aob/mcg029>. There are several other functions with breakpoints (e.g. linear-plateau, plateau-linear, exponential-plateau, plateau-exponential, quadratic-plateau, plateau-quadratic and bilinear), a non-rectangular hyperbola and a bell-shaped curve. Twenty eight (28) new self-start (SS) functions in total. This package also supports the publication Nonlinear regression Models and applications in agricultural research by Archontoulis and Miguez (2015) <doi:10.2134/agronj2012.0506>, a book chapter with similar material <doi:10.2134/appliedstatistics.2016.0003.c15> and a publication by Oddi et. al. (2019) in Ecology and Evolution <doi:10.1002/ece3.5543>. The function nlsLMList
uses nlsLM
for fitting, but it is otherwise almost identical to nlme::nlsList'.In
addition, this release of the package provides functions for conducting simulations for nlme and gnls objects as well as bootstrapping. These functions are intended to work with the modeling framework of the nlme package. It also provides four vignettes with extended examples.
An implementation of popular screening methods that are commonly employed in ultra-high and high dimensional data. Through this publicly available package, we provide a unified framework to carry out model-free screening procedures including SIS (Fan and Lv (2008) <doi:10.1111/j.1467-9868.2008.00674.x>), SIRS (Zhu et al. (2011)<doi:10.1198/jasa.2011.tm10563>), DC-SIS (Li et al. (2012) <doi:10.1080/01621459.2012.695654>), MDC-SIS (Shao and Zhang (2014) <doi:10.1080/01621459.2014.887012>), Bcor-SIS (Pan et al. (2019) <doi:10.1080/01621459.2018.1462709>), PC-Screen (Liu et al. (2020) <doi:10.1080/01621459.2020.1783274>), WLS (Zhong et al.(2021) <doi:10.1080/01621459.2021.1918554>), Kfilter (Mai and Zou (2015) <doi:10.1214/14-AOS1303>), MVSIS (Cui et al. (2015) <doi:10.1080/01621459.2014.920256>), PSIS (Pan et al. (2016) <doi:10.1080/01621459.2014.998760>), CAS (Xie et al. (2020) <doi:10.1080/01621459.2019.1573734>), CI-SIS (Cheng and Wang. (2023) <doi:10.1016/j.cmpb.2022.107269>) and CSIS (Cheng et al. (2023) <doi:10.1007/s00180-023-01399-5>).
The Genetic Algorithm (GA) is a type of optimization method of Evolutionary Algorithms. It uses the biologically inspired operators such as mutation, crossover, selection and replacement.Because of their global search and robustness abilities, GAs have been widely utilized in machine learning, expert systems, data science, engineering, life sciences and many other areas of research and business. However, the regular GAs need the techniques to improve their efficiency in computing time and performance in finding global optimum using some adaptation and hybridization strategies. The adaptive GAs (AGA) increase the convergence speed and success of regular GAs by setting the parameters crossover and mutation probabilities dynamically. The hybrid GAs combine the exploration strength of a stochastic GAs with the exact convergence ability of any type of deterministic local search algorithms such as simulated-annealing, in addition to other nature-inspired algorithms such as ant colony optimization, particle swarm optimization etc. The package adana includes a rich working environment with its many functions that make possible to build and work regular GA, adaptive GA, hybrid GA and hybrid adaptive GA for any kind of optimization problems. Cebeci, Z. (2021, ISBN: 9786254397448).