Single-index mixture cure models allow estimating the probability of cure and the latency depending on a vector (or functional) covariate, avoiding the curse of dimensionality. The vector of parameters that defines the model can be estimated by maximum likelihood. A nonparametric estimator for the conditional density of the susceptible population is provided. For more details, see Piñeiro-Lamas (2024) (<https://ruc.udc.es/dspace/handle/2183/37035>). Funding: This work, integrated into the framework of PERTE for Vanguard Health, has been co-financed by the Spanish Ministry of Science, Innovation and Universities with funds from the European Union NextGenerationEU, from the Recovery, Transformation and Resilience Plan (PRTR-C17.I1) and from the Autonomous Community of Galicia within the framework of the Biotechnology Plan Applied to Health.
Implementation of evolutionary fuzzy systems for the data mining task called "subgroup discovery". In particular, the algorithms presented in this package are: M. J. del Jesus, P. Gonzalez, F. Herrera, M. Mesonero (2007) <doi:10.1109/TFUZZ.2006.890662> M. J. del Jesus, P. Gonzalez, F. Herrera (2007) <doi:10.1109/MCDM.2007.369416> C. J. Carmona, P. Gonzalez, M. J. del Jesus, F. Herrera (2010) <doi:10.1109/TFUZZ.2010.2060200> C. J. Carmona, V. Ruiz-Rodado, M. J. del Jesus, A. Weber, M. Grootveld, P. González, D. Elizondo (2015) <doi:10.1016/j.ins.2014.11.030> It also provide a Shiny App to ease the analysis. The algorithms work with data sets provided in KEEL, ARFF and CSV format and also with data.frame objects.
How can we measure how the usage or frequency of some feature, such as words, differs across some group or set, such as documents? One option is to use the log odds ratio, but the log odds ratio alone does not account for sampling variability; we haven't counted every feature the same number of times so how do we know which differences are meaningful? Enter the weighted log odds, which tidylo provides an implementation for, using tidy data principles. In particular, here we use the method outlined in Monroe, Colaresi, and Quinn (2008) <doi:10.1093/pan/mpn018> to weight the log odds ratio by a prior. By default, the prior is estimated from the data itself, an empirical Bayes approach, but an uninformative prior is also available.
This package implements several string comparison algorithms, including calACS (count all common subsequences), lenACS (calculate the lengths of all common subsequences), and lenLCS (calculate the length of the longest common subsequence). Some algorithms differentiate between the more strict definition of subsequence, where a common subsequence cannot be separated by any other items, from its looser counterpart, where a common subsequence can be interrupted by other items. This difference is shown in the suffix of the algorithm (-Strict vs -Loose). For example, q-w is a common subsequence of q-w-e-r and q-e-w-r on the looser definition, but not on the more strict definition. calACSLoose Algorithm from Wang, H. All common subsequences (2007) IJCAI International Joint Conference on Artificial Intelligence, pp. 635-640.
Implementation of adaptive assessment procedures based on Knowledge Space Theory (KST, Doignon & Falmagne, 1999 <ISBN:9783540645016>) and Formal Psychological Assessment (FPA, Spoto, Stefanutti & Vidotto, 2010 <doi:10.3758/BRM.42.1.342>) frameworks. An adaptive assessment is a type of evaluation that adjusts the difficulty and nature of subsequent questions based on the test taker's responses to previous ones. The package contains functions to perform and simulate an adaptive assessment. Moreover, it is integrated with two Shiny interfaces, making it both accessible and user-friendly. The package has been partially funded by the European Union - NextGenerationEU and by the Ministry of University and Research (MUR), National Recovery and Resilience Plan (NRRP), Mission 4, Component 2, Investment 1.5, project â RAISE - Robotics and AI for Socio-economic Empowermentâ (ECS00000035).
It is often useful to produce short, quasi-unique identifiers (SQUIDs) without the benefit of a central authority to prevent duplication. Although Universally Unique Identifiers (UUIDs) provide for this, these are also unwieldy; for example, the most used UUID, version 4, is 36 characters long. SQUIDs are short (8 characters) at the expense of having more collisions, which can be mitigated by combining them with human-produced suffixes, yielding relatively brief, half human-readable, almost-unique identifiers (see for example the identifiers used for Decentralized Construct Taxonomies; Peters & Crutzen, 2024 <doi:10.15626/MP.2022.3638>). SQUIDs are the number of centiseconds elapsed since the beginning of 1970 converted to a base 30 system. This package contains functions to produce SQUIDs as well as convert them back into dates and times.
Provide a workflow to jointly embed chromatin accessibility peaks and expressed genes into a shared low-dimensional space using paired single-cell ATAC-seq (scATAC-seq) and single-cell RNA-seq (scRNA-seq) data. It integrates regulatory relationships among peak-peak interactions (via Cicero'), peak-gene interactions (via Lasso, random forest, and XGBoost), and gene-gene interactions (via principal component regression). With the input of paired scATAC-seq and scRNA-seq data matrices, it assigns a low-dimensional feature vector to each gene and peak. Additionally, it supports the reconstruction of gene-gene network with low-dimensional projections (via epsilon-NN) and then the comparison of the networks of two conditions through manifold alignment implemented in scTenifoldNet'. See <doi:10.1093/bioinformatics/btaf483> for more details.
Reconstructing gene regulatory networks and transcription factor activity is crucial to understand biological processes and holds potential for developing personalized treatment. Yet, it is still an open problem as state-of-art algorithm are often not able to handle large amounts of data. Furthermore, many of the present methods predict numerous false positives and are unable to integrate other sources of information such as previously known interactions. Here we introduce KBoost, an algorithm that uses kernel PCA regression, boosting and Bayesian model averaging for fast and accurate reconstruction of gene regulatory networks. KBoost can also use a prior network built on previously known transcription factor targets. We have benchmarked KBoost using three different datasets against other high performing algorithms. The results show that our method compares favourably to other methods across datasets.
This package provides a collection of methods for both the rank-based estimates and least-square estimates to the Accelerated Failure Time (AFT) model. For rank-based estimation, it provides approaches that include the computationally efficient Gehan's weight and the general's weight such as the logrank weight. Details of the rank-based estimation can be found in Chiou et al. (2014) <doi:10.1007/s11222-013-9388-2> and Chiou et al. (2015) <doi:10.1002/sim.6415>. For the least-square estimation, the estimating equation is solved with generalized estimating equations (GEE). Moreover, in multivariate cases, the dependence working correlation structure can be specified in GEE's setting. Details on the least-squares estimation can be found in Chiou et al. (2014) <doi:10.1007/s10985-014-9292-x>.
This is a cross-platform linear model to SQL compiler. It generates SQL from linear and generalized linear models. Its interface consists of a single function, modelc(), which takes the output of lm() or glm() functions (or any object which has the same signature) and outputs a SQL character vector representing the predictions on the scale of the response variable as described in Dunn & Smith (2018) <doi:10.1007/978-1-4419-0118-7> and originating in Nelder & Wedderburn (1972) <doi:10.2307/2344614>. The resultant SQL can be included in a SELECT statement and returns output similar to that of the glm.predict() or lm.predict() predictions, assuming numeric types are represented in the database using sufficient precision. Currently log and identity link functions are supported.
This package performs causal mediation analysis for count and zero-inflated count data without or with a post-treatment confounder; calculates power to detect prespecified causal mediation effects, direct effects, and total effects; performs sensitivity analysis when there is a treatment- induced mediator-outcome confounder as described by Cheng, J., Cheng, N.F., Guo, Z., Gregorich, S., Ismail, A.I., Gansky, S.A. (2018) <doi:10.1177/0962280216686131>. Implements Instrumental Variable (IV) method to estimate the controlled (natural) direct and mediation effects, and compute the bootstrap Confidence Intervals as described by Guo, Z., Small, D.S., Gansky, S.A., Cheng, J. (2018) <doi:10.1111/rssc.12233>. This software was made possible by Grant R03DE028410 from the National Institute of Dental and Craniofacial Research, a component of the National Institutes of Health.
This package provides functions to construct confidence intervals for the Overlap Coefficient (OVL). OVL measures the similarity between two distributions through the overlapping area of their distribution functions. Given its intuitive description and ease of visual representation by the straightforward depiction of the amount of overlap between the two corresponding histograms based on samples of measurements from each one of the two distributions, the development of accurate methods for confidence interval construction can be useful for applied researchers. Implements methods based on the work of Franco-Pereira, A.M., Nakas, C.T., Reiser, B., and Pardo, M.C. (2021) <doi:10.1177/09622802211046386> as well as extensions for multimodal distributions proposed by Alcaraz-Peñalba, A., Franco-Pereira, A., and Pardo, M.C. (2025) <doi:10.1007/s10182-025-00545-2>.
Stop signal task data of go and stop trials is generated per participant. The simulation process is based on the generally non-independent horse race model and fixed stop signal delay or tracking method. Each of go and stop process is assumed having exponentially modified Gaussian(ExG) or Shifted Wald (SW) distributions. The output data can be converted to BEESTS software input data enabling researchers to test and evaluate various brain stopping processes manifested by ExG or SW distributional parameters of interest. Methods are described in: Soltanifar M (2020) <https://hdl.handle.net/1807/101208>, Matzke D, Love J, Wiecki TV, Brown SD, Logan GD and Wagenmakers E-J (2013) <doi:10.3389/fpsyg.2013.00918>, Logan GD, Van Zandt T, Verbruggen F, Wagenmakers EJ. (2014) <doi:10.1037/a0035230>.
Enables the user to calculate Value at Risk (VaR) and Expected Shortfall (ES) by means of various parametric and semiparametric GARCH-type models. For the latter the estimation of the nonparametric scale function is carried out by means of a data-driven smoothing approach. Model quality, in terms of forecasting VaR and ES, can be assessed by means of various backtesting methods such as the traffic light test for VaR and a newly developed traffic light test for ES. The approaches implemented in this package are described in e.g. Feng Y., Beran J., Letmathe S. and Ghosh S. (2020) <https://ideas.repec.org/p/pdn/ciepap/137.html> as well as Letmathe S., Feng Y. and Uhde A. (2021) <https://ideas.repec.org/p/pdn/ciepap/141.html>.
This package provides a novel PRS model is introduced to enhance the prediction accuracy by utilising GxE effects. This package performs Genome Wide Association Studies (GWAS) and Genome Wide Environment Interaction Studies (GWEIS) using a discovery dataset. The package has the ability to obtain polygenic risk scores (PRSs) for a target sample. Finally it predicts the risk values of each individual in the target sample. Users have the choice of using existing models (Li et al., 2015) <doi:10.1093/annonc/mdu565>, (Pandis et al., 2013) <doi:10.1093/ejo/cjt054>, (Peyrot et al., 2018) <doi:10.1016/j.biopsych.2017.09.009> and (Song et al., 2022) <doi:10.1038/s41467-022-32407-9>, as well as newly proposed models for genomic risk prediction (refer to the URL for more details).
This resource provides tools to create, compare, and post-process spatial isotope assignment models of animal origin. It generates probability-of-origin maps for individuals based on user-provided tissue and environment isotope values (e.g., as generated by IsoMAP, Bowen et al. [2013] <doi:10.1111/2041-210X.12147>) using the framework established in Bowen et al. (2010) <doi:10.1146/annurev-earth-040809-152429>). The package isocat can then quantitatively compare and cluster these maps to group individuals by similar origin. It also includes techniques for applying four approaches (cumulative sum, odds ratio, quantile only, and quantile simulation) with which users can summarize geographic origins and probable distance traveled by individuals. Campbell et al. [2020] establishes several of the functions included in this package <doi:10.1515/ami-2020-0004>.
Runs a Shiny web application that merges raw qPCR fluorescence data with related metadata to visualize species presence/absence detection patterns and assess data quality. The application calculates threshold values from raw fluorescence data using a method based on the second derivative method, Luu-The et al (2005) <doi:10.2144/05382RR05>, and utilizes the âchipPCRâ package by Rödiger, Burdukiewicz, & Schierack (2015) <doi:10.1093/bioinformatics/btv205> to calculate Cq values. The application has the ability to connect to a custom developed MySQL database to populate the applications interface. The application allows users to interact with visualizations such as a dynamic map, amplification curves and standard curves, that allow for zooming and/or filtering. It also enables the generation of customized exportable reports based on filtered mapping data.
Intended to create standard human-in-the-loop validity tests for typical automated content analysis such as topic modeling and dictionary-based methods. This package offers a standard workflow with functions to prepare, administer and evaluate a human-in-the-loop validity test. This package provides functions for validating topic models using word intrusion, topic intrusion (Chang et al. 2009, <https://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models>) and word set intrusion (Ying et al. 2021) <doi:10.1017/pan.2021.33> tests. This package also provides functions for generating gold-standard data which are useful for validating dictionary-based methods. The default settings of all generated tests match those suggested in Chang et al. (2009) and Song et al. (2020) <doi:10.1080/10584609.2020.1723752>.
This package covers many important models used in marketing and micro-econometrics applications, including Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choice-based conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity, and Bayesian Analysis of Aggregate Random Coefficient Logit Models.
This package contains a function that imports data from a CSV file, or uses manually entered data from the format (x, y, weight) and plots the appropriate ACC vs LOI graph and LMA graph. The main function is plotLMA (source file, header) that takes a data set and plots the appropriate LMA and ACC graphs. If no source file (a string) was passed, a manual data entry window is opened. The header parameter indicates by TRUE/FALSE (false by default) if the source CSV file has a header row or not. The dataset should contain only one independent variable (x) and one dependent variable (y) and can contain a weight for each observation.
Carries out model-based clustering, classification and discriminant analysis using five different models. The models are all based on the generalized hyperbolic distribution. The first model MGHD (Browne and McNicholas (2015) <doi:10.1002/cjs.11246>) is the classical mixture of generalized hyperbolic distributions. The MGHFA (Tortora et al. (2016) <doi:10.1007/s11634-015-0204-z>) is the mixture of generalized hyperbolic factor analyzers for high dimensional data sets. The MSGHD is the mixture of multiple scaled generalized hyperbolic distributions, the cMSGHD is a MSGHD with convex contour plots and the MCGHD', mixture of coalesced generalized hyperbolic distributions is a new more flexible model (Tortora et al. (2019)<doi:10.1007/s00357-019-09319-3>. The paper related to the software can be found at <doi:10.18637/jss.v098.i03>.
This package provides systematic geometry-adaptive parameter optimization with statistical validation for experimental biological data. Combines ANOVA-based validation with systematic constraint configuration testing (log-scale, positive domain, Euclidean) through T,P,E testing. Only proceeds with parameter optimization when statistically significant biological effects are detected, preventing over-fitting to noise. Uses GALAHAD trust region methods with constraint projection from Conn et al. (2000) <doi:10.1137/S1052623497325107>, ANOVA-based validation following Fisher (1925) <doi:10.1007/978-1-4612-4380-9_6>, and effect size calculations per Cohen (1988, ISBN:0805802835). Designed for structured experimental data including kinetic curves, dose-response studies, and treatment comparisons where appropriate parameter constraints and statistical justification are important for meaningful biological interpretation. Developed at the Minnesota Center for Prion Research and Outreach at the University of Minnesota.
In the recent past, measurement of coverage has been mainly through two-stage cluster sampled surveys either as part of a nutrition assessment or through a specific coverage survey known as Centric Systematic Area Sampling (CSAS). However, such methods are resource intensive and often only used for final programme evaluation meaning results arrive too late for programme adaptation. SLEAC, which stands for Simplified Lot Quality Assurance Sampling Evaluation of Access and Coverage, is a low resource method designed specifically to address this limitation and is used regularly for monitoring, planning and importantly, timely improvement to programme quality, both for agency and Ministry of Health (MoH) led programmes. SLEAC is designed to complement the Semi-quantitative Evaluation of Access and Coverage (SQUEAC) method. This package provides functions for use in conducting a SLEAC assessment.
This package provides a computational method that infers copy number variations (CNV) in cancer scRNA-seq data and reconstructs the tumor phylogeny. It integrates signals from gene expression, allelic ratio, and population haplotype structures to accurately infer allele-specific CNVs in single cells and reconstruct their lineage relationship. It does not require tumor/normal-paired DNA or genotype data, but operates solely on the donor scRNA-data data (for example, 10x Cell Ranger output). It can be used to:
detect allele-specific copy number variations from single-cells
differentiate tumor versus normal cells in the tumor microenvironment
infer the clonal architecture and evolutionary history of profiled tumors
For details on the method see Gao et al in Nature Biotechnology 2022.