Several functions for working with mixed effects regression models for limited dependent variables. The functions facilitate post-estimation of model predictions or margins, and comparisons between model predictions for assessing or probing moderation. Additional helper functions facilitate model comparisons and implements simulation-based inference for model predictions of alternative-specific outcome models. See also, Melamed and Doan (2024, ISBN: 978-1032509518).
CaMutQC
is able to filter false positive mutations generated due to technical issues, as well as to select candidate cancer mutations through a series of well-structured functions by labeling mutations with various flags. And a detailed and vivid filter report will be offered after completing a whole filtration or selection section. Also, CaMutQC
integrates serveral methods and gene panels for Tumor Mutational Burden (TMB) estimation.
This package provides a first-principle, phylogeny-aware comparative genomics tool for investigating associations between terms used to annotate genomic components (e.g., Pfam IDs, Gene Ontology terms,) with quantitative or rank variables such as number of cell types, genome size, or density of specific genomic elements. See the project website for more information, documentation and examples, and <doi:10.1016/j.patter.2023.100728> for the full paper.
This package provides a flexible tool for calculating carbon-equivalent emissions. Mostly using data from the UK Government's Greenhouse Gas Conversion Factors report <https://www.gov.uk/government/publications/greenhouse-gas-reporting-conversion-factors-2023>, it facilitates transparent emissions calculations for various sectors, including travel, accommodation, and clinical activities. The package is designed for easy integration into R workflows, with additional support for shiny applications and community-driven extensions.
Unifying an inconsistently coded categorical variable between two different time points in accordance with a mapping table. The main rule is to replicate the observation if it could be assigned to a few categories. Then using frequencies or statistical methods to approximate the probabilities of being assigned to each of them. This procedure was invented and implemented in the paper by Nasinski, Majchrowska, and Broniatowska (2020) <doi:10.24425/cejeme.2020.134747>.
This package provides methods of computerized adaptive testing for survey researchers. See Montgomery and Rossiter (2020) <doi:10.1093/jssam/smz027>. Includes functionality for data fit with the classic item response methods including the latent trait model, Birnbaum`s three parameter model, the graded response, and the generalized partial credit model. Additionally, includes several ability parameter estimation and item selection routines. During item selection, all calculations are done in compiled C++ code.
It is an open source insurance claim simulation engine sponsored by the Casualty Actuarial Society. It generates individual insurance claims including open claims, reopened claims, incurred but not reported claims and future claims. It also includes claim data fitting functions to help set simulation assumptions. It is useful for claim level reserving analysis. Parodi (2013) <https://www.actuaries.org.uk/documents/triangle-free-reserving-non-traditional-framework-estimating-reserves-and-reserve-uncertainty>.
The CalMaTe
method calibrates preprocessed allele-specific copy number estimates (ASCNs) from DNA microarrays by controlling for single-nucleotide polymorphism-specific allelic crosstalk. The resulting ASCNs are on average more accurate, which increases the power of segmentation methods for detecting changes between copy number states in tumor studies including copy neutral loss of heterozygosity. CalMaTe
applies to any ASCNs regardless of preprocessing method and microarray technology, e.g. Affymetrix and Illumina.
The number of bird or bat fatalities from collisions with buildings, towers or wind energy turbines can be estimated based on carcass searches and experimentally assessed carcass persistence times and searcher efficiency. Functions for estimating the probability that a bird or bat that died is found by a searcher are provided. Further functions calculate the posterior distribution of the number of fatalities based on the number of carcasses found and the estimated detection probability.
Estimate sample sizes needed to capture target levels of genetic diversity from a population (multivariate allele frequencies) for applications like germplasm conservation and breeding efforts. Compares bootstrap samples to a full population using linear regression, employing the R-squared value to represent the proportion of diversity captured. Iteratively increases sample size until a user-defined target R-squared is met. Offers a parallelized R implementation of a previously developed python method. All ploidy levels are supported. For more details, see Sandercock et al. (2024) <doi:10.1073/pnas.2403505121>.
Classifies the type of cancer using routinely collected data commonly found in cancer registries from pathology reports. The package implements the International Classification of Diseases for Oncology, 3rd Edition site (topography), histology (morphology), and behaviour codes of neoplasms to classify cancer type <https://www.who.int/standards/classifications/other-classifications/international-classification-of-diseases-for-oncology>. Classification in children utilize the International Classification of Childhood Cancer by Steliarova-Foucher et al. (2005) <doi:10.1002/cncr.20910>. Adolescent and young adult cancer classification is based on Barr et al. (2020) <doi:10.1002/cncr.33041>.
CARD is a reference-based deconvolution method that estimates cell type composition in spatial transcriptomics based on cell type specific expression information obtained from a reference scRNA-seq
data. A key feature of CARD is its ability to accommodate spatial correlation in the cell type composition across tissue locations, enabling accurate and spatially informed cell type deconvolution as well as refined spatial map construction. CARD relies on an efficient optimization algorithm for constrained maximum likelihood estimation and is scalable to spatial transcriptomics with tens of thousands of spatial locations and tens of thousands of genes.
Copernicus Atmosphere Monitoring Service (CAMS) radiations service provides time series of global, direct, and diffuse irradiations on horizontal surface, and direct irradiation on normal plane for the actual weather conditions as well as for clear-sky conditions. The geographical coverage is the field-of-view of the Meteosat satellite, roughly speaking Europe, Africa, Atlantic Ocean, Middle East. The time coverage of data is from 2004-02-01 up to 2 days ago. Data are available with a time step ranging from 15 min to 1 month. For license terms and to create an account, please see <http://www.soda-pro.com/web-services/radiation/cams-radiation-service>.
Chemical analysis of proteins based on their amino acid compositions. Amino acid compositions can be read from FASTA files and used to calculate chemical metrics including carbon oxidation state and stoichiometric hydration state, as described in Dick et al. (2020) <doi:10.5194/bg-17-6145-2020>. Other properties that can be calculated include protein length, grand average of hydropathy (GRAVY), isoelectric point (pI
), molecular weight (MW), standard molal volume (V0), and metabolic costs (Akashi and Gojobori, 2002 <doi:10.1073/pnas.062526999>; Wagner, 2005 <doi:10.1093/molbev/msi126>; Zhang et al., 2018 <doi:10.1038/s41467-018-06461-1>). A database of amino acid compositions of human proteins derived from UniProt
is provided.
This package provides functions for computing and visualizing generalized canonical discriminant analyses and canonical correlation analysis for a multivariate linear model. Traditional canonical discriminant analysis is restricted to a one-way MANOVA design and is equivalent to canonical correlation analysis between a set of quantitative response variables and a set of dummy variables coded from the factor variable. The candisc package generalizes this to higher-way MANOVA designs for all factors in a multivariate linear model, computing canonical scores and vectors for each term. The graphic functions provide low-rank (1D, 2D, 3D) visualizations of terms in an mlm via the plot.candisc and heplot.candisc methods. Related plots are now provided for canonical correlation analysis when all predictors are quantitative.
While data from randomized experiments remain the gold standard for causal inference, estimation of causal estimands from observational data is possible through various confounding adjustment methods. However, the challenge of unmeasured confounding remains a concern in causal inference, where failure to account for unmeasured confounders can lead to biased estimates of causal estimands. Sensitivity analysis within the framework of causal inference can help adjust for possible unmeasured confounding. In `causens`, three main methods are implemented: adjustment via sensitivity functions (Brumback, Hernán, Haneuse, and Robins (2004) <doi:10.1002/sim.1657> and Li, Shen, Wu, and Li (2011) <doi:10.1093/aje/kwr096>), Bayesian parametric modelling and Monte Carlo approaches (McCandless
, Lawrence C and Gustafson, Paul (2017) <doi:10.1002/sim.7298>).
Assesses the quality of estimates made by complex sample designs, following the methodology developed by the National Institute of Statistics Chile (Household Survey Standard 2020, <https://www.ine.cl/docs/default-source/institucionalidad/buenas-pr%C3%A1cticas/clasificaciones-y-estandares/est%C3%A1ndar-evaluaci%C3%B3n-de-calidad-de-estimaciones-publicaci%C3%B3n-27022020.pdf>), (Economics Survey Standard 2024, <https://www.ine.gob.cl/docs/default-source/buenas-practicas/directrices-metodologicas/estandares/documentos/est%C3%A1ndar-evaluaci%C3%B3n-de-calidad-de-estimaciones-econ%C3%B3micas.pdf?sfvrsn=201fbeb9_2>) and by Economic Commission for Latin America and Caribbean (2020, <https://repositorio.cepal.org/bitstream/handle/11362/45681/1/S2000293_es.pdf>), (2024, <https://repositorio.cepal.org/server/api/core/bitstreams/f04569e6-4f38-42e7-a32b-e0b298e0ab9c/content>).
This package provides a generic, easy-to-use and intuitive pharmacokinetic/pharmacodynamic (PK/PD) simulation platform based on R packages rxode2 and mrgsolve'. CAMPSIS provides an abstraction layer over the underlying processes of writing a PK/PD model, assembling a custom dataset and running a simulation. CAMPSIS has a strong dependency to the R package campsismod', which allows to read/write a model from/to files and adapt it further on the fly in the R environment. Package campsis allows the user to assemble a dataset in an intuitive manner. Once the userâ s dataset is ready, the package is in charge of preparing the simulation, calling rxode2 or mrgsolve (at the user's choice) and returning the results, for the given model, dataset and desired simulation settings.
This package provides functions for implementing the novel algorithm CASCORE, which is designed to detect latent community structure in graphs with node covariates. This algorithm can handle models such as the covariate-assisted degree corrected stochastic block model (CADCSBM). CASCORE specifically addresses the disagreement between the community structure inferred from the adjacency information and the community structure inferred from the covariate information. For more detailed information, please refer to the reference paper: Yaofang Hu and Wanjie Wang (2022) <arXiv:2306.15616>
. In addition to CASCORE, this package includes several classical community detection algorithms that are compared to CASCORE in our paper. These algorithms are: Spectral Clustering On Ratios-of Eigenvectors (SCORE), normalized PCA, ordinary PCA, network-based clustering, covariates-based clustering and covariate-assisted spectral clustering (CASC). By providing these additional algorithms, the package enables users to compare their performance with CASCORE in community detection tasks.
We provide a toolbox to fit a continuous-time fractionally integrated ARMA process (CARFIMA) on univariate and irregularly spaced time series data via both frequentist and Bayesian machinery. A general-order CARFIMA(p, H, q) model for p>q is specified in Tsai and Chan (2005) <doi:10.1111/j.1467-9868.2005.00522.x> and it involves p+q+2 unknown model parameters, i.e., p AR parameters, q MA parameters, Hurst parameter H, and process uncertainty (standard deviation) sigma. Also, the model can account for heteroscedastic measurement errors, if the information about measurement error standard deviations is known. The package produces their maximum likelihood estimates and asymptotic uncertainties using a global optimizer called the differential evolution algorithm. It also produces posterior samples of the model parameters via Metropolis-Hastings within a Gibbs sampler equipped with adaptive Markov chain Monte Carlo. These fitting procedures, however, may produce numerical errors if p>2. The toolbox also contains a function to simulate discrete time series data from CARFIMA(p, H, q) process given the model parameters and observation times.
This package provides a collection of tools for performing category analysis.
Formal psychological models of categorization and learning, independently-replicated data sets against which to test them, and simulation archives.
Access public spatial data available under the INSPIRE directive. Tools for downloading references and addresses of properties, as well as map images.
This package implements statistical & computational tools for analyzing mass spectrometry imaging datasets, including methods for efficient pre-processing, spatial segmentation, and classification.