This package implements methods for variable selection in linear regression based on the "Sum of Single Effects" (SuSiE
) model, as described in Wang et al (2020) <DOI:10.1101/501114> and Zou et al (2021) <DOI:10.1101/2021.11.03.467167>. These methods provide simple summaries, called "Credible Sets", for accurately quantifying uncertainty in which variables should be selected. The methods are motivated by genetic fine-mapping applications, and are particularly well-suited to settings where variables are highly correlated and detectable effects are sparse. The fitting algorithm, a Bayesian analogue of stepwise selection methods called "Iterative Bayesian Stepwise Selection" (IBSS), is simple and fast, allowing the SuSiE
model be fit to large data sets (thousands of samples and hundreds of thousands of variables).
This package creates the optimal (D, U and I) designs for the accelerated life testing with right censoring or interval censoring. It uses generalized linear model (GLM) approach to derive the asymptotic variance-covariance matrix of regression coefficients. The failure time distribution is assumed to follow Weibull distribution with a known shape parameter and log-linear link functions are used to model the relationship between failure time parameters and stress variables. The acceleration model may have multiple stress factors, although most ALTs involve only two or less stress factors. ALTopt package also provides several plotting functions including contour plot, Fraction of Use Space (FUS) plot and Variance Dispersion graphs of Use Space (VDUS) plot. For more details, see Seo and Pan (2015) <doi:10.32614/RJ-2015-029>.
This package provides a statistical framework and computational procedure for identifying the sub-populations within a tumor, determining the mutation profiles of each subpopulation, and inferring the tumor's phylogenetic history. The input are variant allele frequencies (VAFs) of somatic single nucleotide alterations (SNAs) along with allele-specific coverage ratios between the tumor and matched normal sample for somatic copy number alterations (CNAs). These quantities can be directly taken from the output of existing software. Canopy provides a general mathematical framework for pooling data across samples and sites to infer the underlying parameters. For SNAs that fall within CNA regions, Canopy infers their temporal ordering and resolves their phase. When there are multiple evolutionary configurations consistent with the data, Canopy outputs all configurations along with their confidence assessment.
Analysis of repeated measurements and time-to-event data via random effects joint models. Fits the joint models proposed by Henderson and colleagues <doi:10.1093/biostatistics/1.4.465> (single event time) and by Williamson and colleagues (2008) <doi:10.1002/sim.3451> (competing risks events time) to a single continuous repeated measure. The time-to-event data is modelled using a (cause-specific) Cox proportional hazards regression model with time-varying covariates. The longitudinal outcome is modelled using a linear mixed effects model. The association is captured by a latent Gaussian process. The model is estimated using am Expectation Maximization algorithm. Some plotting functions and the variogram are also included. This project is funded by the Medical Research Council (Grant numbers G0400615 and MR/M013227/1).
This package provides a toolkit for simulation studies concerning time-to-event endpoints with non-proportional hazards. SimNPH
encompasses functions for simulating time-to-event data in various scenarios, simulating different trial designs like fixed-followup, event-driven, and group sequential designs. The package provides functions to calculate the true values of common summary statistics for the implemented scenarios and offers common analysis methods for time-to-event data. Helper functions for running simulations with the SimDesign
package and for aggregating and presenting the results are also included. Results of the conducted simulation study are available in the paper: "A Comparison of Statistical Methods for Time-To-Event Analyses in Randomized Controlled Trials Under Non-Proportional Hazards", Klinglmüller et al. (2025) <doi:10.1002/sim.70019>.
Large panel data sets are often subject to common trends. However, it can be difficult to determine the exact number of these common factors and analyse their properties. The package implements the Barigozzi and Trapani (2022) <doi:10.1080/07350015.2021.1901719> test, which not only provides an efficient way of estimating the number of common factors in large nonstationary panel data sets, but also gives further insights on factor classes. The routine identifies the existence of (i) a factor subject to a linear trend, (ii) the number of zero-mean I(1) and (iii) zero-mean I(0) factors. Furthermore, the package includes the Integrated Panel Criteria by Bai (2004) <doi:10.1016/j.jeconom.2003.10.022> that provide a complementary measure for the number of factors.
This package performs copy number variants association analysis with Lasso and Weighted Fusion penalized regression. Creates a "CNV profile curve" to represent an individualâ s CNV events across a genomic region so to capture variations in CNV length and dosage. When evaluating association, the CNV profile curve is directly used as a predictor in the regression model, avoiding the need to predefine CNV loci. CNV profile regression estimates CNV effects at each genome position, making the results comparable across different studies. The penalization encourages sparsity in variable selection with a Lasso penalty and encourages effect smoothness between consecutive CNV events with a weighted fusion penalty, where the weight controls the level of smoothing between adjacent CNVs. For more details, see Si (2024) <doi:10.1101/2024.11.23.624994>.
Package for analysis of simple experimental designs (CRD, RBD and LSD), experiments in double factorial schemes (in CRD and RBD), experiments in a split plot in time schemes (in CRD and RBD), experiments in double factorial schemes with an additional treatment (in CRD and RBD), experiments in triple factorial scheme (in CRD and RBD) and experiments in triple factorial schemes with an additional treatment (in CRD and RBD), performing the analysis of variance and means comparison by fitting regression models until the third power (quantitative treatments) or by a multiple comparison test, Tukey test, test of Student-Newman-Keuls (SNK), Scott-Knott, Duncan test, t test (LSD) and Bonferroni t test (protected LSD) - for qualitative treatments; residual analysis (Ferreira, Cavalcanti and Nogueira, 2014) <doi:10.4236/am.2014.519280>.
It is used to travel graphs, by using DFS and BFS to get the path from node to each leaf node. Depth first traversal(DFS) is a recursive algorithm for searching all the vertices of a graph or tree data structure. Traversal means visiting all the nodes of a graph. Breadth first traversal(BFS) algorithm is used to search a tree or graph data structure for a node that meets a set of criteria. It starts at the treeâ s root or graph and searches/visits all nodes at the current depth level before moving on to the nodes at the next depth level. Also, it provides the matrix which is reachable between each node. Implement reference about Baruch Awerbuch (1985) <doi:10.1016/0020-0190(85)90083-3>.
Easy implementation of the MABAC multi-criteria decision method, that was introduced by PamuÄ ar and Ä iroviÄ in the work entitled: "The selection of transport and handling resources in logistics centers using Multi-Attributive Border Approximation area Comparison (MABAC)" - <doi:10.1016/j.eswa.2014.11.057> - which aimed to choose implements for logistics centers. This package receives data, preferably in a spreadsheet, reads it and applies the mathematical algorithms inherent to the MABAC method to generate a ranking with the optimal solution according to the established criteria, weights and type of criteria. The data will be normalized, weighted by the weights, the border area will be determined, the distances to this border area will be calculated and finally a ranking with the optimal option will be generated.
The network structural equation modeling conducts a network statistical analysis on a data frame of coincident observations of multiple continuous variables [1]. It builds a pathway model by exploring a pool of domain knowledge guided candidate statistical relationships between each of the variable pairs, selecting the best fit on the basis of a specific criteria such as adjusted r-squared value. This material is based upon work supported by the U.S. National Science Foundation Award EEC-2052776 and EEC-2052662 for the MDS-Rely IUCRC Center, under the NSF Solicitation: NSF 20-570 Industry-University Cooperative Research Centers Program [1] Bruckman, Laura S., Nicholas R. Wheeler, Junheng Ma, Ethan Wang, Carl K. Wang, Ivan Chou, Jiayang Sun, and Roger H. French. (2013) <doi:10.1109/ACCESS.2013.2267611>.
This package provides a set of basic tools for generating, analyzing, summarizing and visualizing finite partially ordered sets. In particular, it implements flexible and very efficient algorithms for the extraction of linear extensions and for the computation of mutual ranking probabilities and other user-defined functionals, over them. The package is meant as a computationally efficient "engine", for the implementation of data analysis procedures, on systems of multidimensional ordinal indicators and partially ordered data, in the spirit of Fattore, M. (2016) "Partially ordered sets and the measurement of multidimensional ordinal deprivation", Social Indicators Research <DOI:10.1007/s11205-015-1059-6>, and Fattore M. and Arcagni, A. (2018) "A reduced posetic approach to the measurement of multidimensional ordinal deprivation", Social Indicators Research <DOI:10.1007/s11205-016-1501-4>.
Single-index mixture cure models allow estimating the probability of cure and the latency depending on a vector (or functional) covariate, avoiding the curse of dimensionality. The vector of parameters that defines the model can be estimated by maximum likelihood. A nonparametric estimator for the conditional density of the susceptible population is provided. For more details, see Piñeiro-Lamas (2024) (<https://ruc.udc.es/dspace/handle/2183/37035>). Funding: This work, integrated into the framework of PERTE for Vanguard Health, has been co-financed by the Spanish Ministry of Science, Innovation and Universities with funds from the European Union NextGenerationEU
, from the Recovery, Transformation and Resilience Plan (PRTR-C17.I1) and from the Autonomous Community of Galicia within the framework of the Biotechnology Plan Applied to Health.
Implementation of evolutionary fuzzy systems for the data mining task called "subgroup discovery". In particular, the algorithms presented in this package are: M. J. del Jesus, P. Gonzalez, F. Herrera, M. Mesonero (2007) <doi:10.1109/TFUZZ.2006.890662> M. J. del Jesus, P. Gonzalez, F. Herrera (2007) <doi:10.1109/MCDM.2007.369416> C. J. Carmona, P. Gonzalez, M. J. del Jesus, F. Herrera (2010) <doi:10.1109/TFUZZ.2010.2060200> C. J. Carmona, V. Ruiz-Rodado, M. J. del Jesus, A. Weber, M. Grootveld, P. González, D. Elizondo (2015) <doi:10.1016/j.ins.2014.11.030> It also provide a Shiny App to ease the analysis. The algorithms work with data sets provided in KEEL, ARFF and CSV format and also with data.frame objects.
How can we measure how the usage or frequency of some feature, such as words, differs across some group or set, such as documents? One option is to use the log odds ratio, but the log odds ratio alone does not account for sampling variability; we haven't counted every feature the same number of times so how do we know which differences are meaningful? Enter the weighted log odds, which tidylo provides an implementation for, using tidy data principles. In particular, here we use the method outlined in Monroe, Colaresi, and Quinn (2008) <doi:10.1093/pan/mpn018> to weight the log odds ratio by a prior. By default, the prior is estimated from the data itself, an empirical Bayes approach, but an uninformative prior is also available.
This is the human disease ontology R package HDO.db, which provides the semantic relationship between human diseases. Relying on the DOSE and GOSemSim
packages, this package can carry out disease enrichment and semantic similarity analyses. Many biological studies are achieved through mouse models, and a large number of data indicate the association between genotypes and phenotypes or diseases. The study of model organisms can be transformed into useful knowledge about normal human biology and disease to facilitate treatment and early screening for diseases. Organism-specific genotype-phenotypic associations can be applied to cross-species phenotypic studies to clarify previously unknown phenotypic connections in other species. Using the same principle to diseases can identify genetic associations and even help to identify disease associations that are not obvious.
This package implements several string comparison algorithms, including calACS
(count all common subsequences), lenACS
(calculate the lengths of all common subsequences), and lenLCS
(calculate the length of the longest common subsequence). Some algorithms differentiate between the more strict definition of subsequence, where a common subsequence cannot be separated by any other items, from its looser counterpart, where a common subsequence can be interrupted by other items. This difference is shown in the suffix of the algorithm (-Strict vs -Loose). For example, q-w is a common subsequence of q-w-e-r and q-e-w-r on the looser definition, but not on the more strict definition. calACSLoose
Algorithm from Wang, H. All common subsequences (2007) IJCAI International Joint Conference on Artificial Intelligence, pp. 635-640.
Implementation of adaptive assessment procedures based on Knowledge Space Theory (KST, Doignon & Falmagne, 1999 <ISBN:9783540645016>) and Formal Psychological Assessment (FPA, Spoto, Stefanutti & Vidotto, 2010 <doi:10.3758/BRM.42.1.342>) frameworks. An adaptive assessment is a type of evaluation that adjusts the difficulty and nature of subsequent questions based on the test taker's responses to previous ones. The package contains functions to perform and simulate an adaptive assessment. Moreover, it is integrated with two Shiny interfaces, making it both accessible and user-friendly. The package has been partially funded by the European Union - NextGenerationEU
and by the Ministry of University and Research (MUR), National Recovery and Resilience Plan (NRRP), Mission 4, Component 2, Investment 1.5, project â RAISE - Robotics and AI for Socio-economic Empowermentâ (ECS00000035).
It is often useful to produce short, quasi-unique identifiers (SQUIDs) without the benefit of a central authority to prevent duplication. Although Universally Unique Identifiers (UUIDs) provide for this, these are also unwieldy; for example, the most used UUID, version 4, is 36 characters long. SQUIDs are short (8 characters) at the expense of having more collisions, which can be mitigated by combining them with human-produced suffixes, yielding relatively brief, half human-readable, almost-unique identifiers (see for example the identifiers used for Decentralized Construct Taxonomies; Peters & Crutzen, 2024 <doi:10.15626/MP.2022.3638>). SQUIDs are the number of centiseconds elapsed since the beginning of 1970 converted to a base 30 system. This package contains functions to produce SQUIDs as well as convert them back into dates and times.
This package implements a spatiotemporal boundary detection model with a dissimilarity metric for areal data with inference in a Bayesian setting using Markov chain Monte Carlo (MCMC). The response variable can be modeled as Gaussian (no nugget), probit or Tobit link and spatial correlation is introduced at each time point through a conditional autoregressive (CAR) prior. Temporal correlation is introduced through a hierarchical structure and can be specified as exponential or first-order autoregressive. Full details of the package can be found in the accompanying vignette. Furthermore, the details of the package can be found in "Diagnosing Glaucoma Progression with Visual Field Data Using a Spatiotemporal Boundary Detection Method", by Berchuck et al (2018), <arXiv:1805.11636>
. The paper is in press at the Journal of the American Statistical Association.
This package provides a collection of methods for both the rank-based estimates and least-square estimates to the Accelerated Failure Time (AFT) model. For rank-based estimation, it provides approaches that include the computationally efficient Gehan's weight and the general's weight such as the logrank weight. Details of the rank-based estimation can be found in Chiou et al. (2014) <doi:10.1007/s11222-013-9388-2> and Chiou et al. (2015) <doi:10.1002/sim.6415>. For the least-square estimation, the estimating equation is solved with generalized estimating equations (GEE). Moreover, in multivariate cases, the dependence working correlation structure can be specified in GEE's setting. Details on the least-squares estimation can be found in Chiou et al. (2014) <doi:10.1007/s10985-014-9292-x>.
This is a cross-platform linear model to SQL compiler. It generates SQL from linear and generalized linear models. Its interface consists of a single function, modelc()
, which takes the output of lm()
or glm()
functions (or any object which has the same signature) and outputs a SQL character vector representing the predictions on the scale of the response variable as described in Dunn & Smith (2018) <doi:10.1007/978-1-4419-0118-7> and originating in Nelder & Wedderburn (1972) <doi:10.2307/2344614>. The resultant SQL can be included in a SELECT statement and returns output similar to that of the glm.predict()
or lm.predict()
predictions, assuming numeric types are represented in the database using sufficient precision. Currently log and identity link functions are supported.
This package performs causal mediation analysis for count and zero-inflated count data without or with a post-treatment confounder; calculates power to detect prespecified causal mediation effects, direct effects, and total effects; performs sensitivity analysis when there is a treatment- induced mediator-outcome confounder as described by Cheng, J., Cheng, N.F., Guo, Z., Gregorich, S., Ismail, A.I., Gansky, S.A. (2018) <doi:10.1177/0962280216686131>. Implements Instrumental Variable (IV) method to estimate the controlled (natural) direct and mediation effects, and compute the bootstrap Confidence Intervals as described by Guo, Z., Small, D.S., Gansky, S.A., Cheng, J. (2018) <doi:10.1111/rssc.12233>. This software was made possible by Grant R03DE028410 from the National Institute of Dental and Craniofacial Research, a component of the National Institutes of Health.
Stop signal task data of go and stop trials is generated per participant. The simulation process is based on the generally non-independent horse race model and fixed stop signal delay or tracking method. Each of go and stop process is assumed having exponentially modified Gaussian(ExG
) or Shifted Wald (SW) distributions. The output data can be converted to BEESTS software input data enabling researchers to test and evaluate various brain stopping processes manifested by ExG
or SW distributional parameters of interest. Methods are described in: Soltanifar M (2020) <https://hdl.handle.net/1807/101208>, Matzke D, Love J, Wiecki TV, Brown SD, Logan GD and Wagenmakers E-J (2013) <doi:10.3389/fpsyg.2013.00918>, Logan GD, Van Zandt T, Verbruggen F, Wagenmakers EJ. (2014) <doi:10.1037/a0035230>.