This package provides tools are provided for estimating, testing, and simulating abundance in a two-event (Petersen) mark-recapture experiment. Functions are given to calculate the Petersen, Chapman, and Bailey estimators and associated variances. However, the principal utility is a set of functions to simulate random draws from these estimators, and use these to conduct hypothesis tests and power calculations. Additionally, a set of functions are provided for generating confidence intervals via bootstrapping. Functions are also provided to test abundance estimator consistency under complete or partial stratification, and to calculate stratified or partially stratified estimators. Functions are also provided to calculate recommended sample sizes. Referenced methods can be found in Arnason et al. (1996) <ISSN:0706-6457>, Bailey (1951) <DOI:10.2307/2332575>, Bailey (1952) <DOI:10.2307/1913>, Chapman (1951) NAID:20001644490, Cohen (1988) ISBN:0-12-179060-6, Darroch (1961) <DOI:10.2307/2332748>, and Robson and Regier (1964) <ISSN:1548-8659>.
This package provides Ion Trap positive ionization mode data in mzML file format. It includes a subset from 500-850 m/z and 1190-1310 seconds, including MS2 and MS3, intensity threshold 100.000; extracts from FTICR Apex III, m/z 400-450; a subset of UPLC - Bruker micrOTOFq data, both mzML and mz5; LC-MSMS and MRM files from proteomics experiments; and PSI mzIdentML example files for various search engines.
Network Common Data Form (netCDF) files are widely used for scientific data. Library-level access in R is provided through packages RNetCDF and ncdf4. The package ncdfCF is built on top of RNetCDF and makes the data and its attributes available as a set of R6 classes that are informed by the Climate and Forecasting Metadata Conventions. Access to the data uses standard R subsetting operators and common function forms.
Biological studies often consist of multiple conditions which are examined with different laboratory set ups like RNA-sequencing or ChIP-sequencing. To get an overview about the whole resulting data set, Cogito provides an automated, complete, reproducible and clear report about all samples and basic comparisons between all different samples. This report can be used as documentation about the data set or as starting point for further custom analysis.
This package integrates colocalization probabilities from colocalization analysis with transcriptome-wide association study (TWAS) scan summary statistics to implicate genes that may be biologically relevant to a complex trait. The probabilistic framework implemented in this package constrains the TWAS scan z-score-based likelihood using a gene-level colocalization probability. Given gene set annotations, this package can estimate gene set enrichment using posterior probabilities from the TWAS-colocalization integration step.
This package contains the Summix2 method for estimating and adjusting for substructure in genetic summary allele frequency data. The function summix() estimates reference group proportions using a mixture model. The adjAF() function produces adjusted allele frequencies for an observed group with reference group proportions matching a target individual or sample. The summix_local() function estimates local ancestry mixture proportions and performs selection scans in genetic summary data.
Uniparental disomy (UPD) is a genetic condition where an individual inherits both copies of a chromosome or part of it from one parent, rather than one copy from each parent. This package contains a HMM for detecting UPDs through HTS (High Throughput Sequencing) data from trio assays. By analyzing the genotypes in the trio, the model infers a hidden state (normal, father isodisomy, mother isodisomy, father heterodisomy and mother heterodisomy).
Auto-GO is a framework that enables automated, high quality Gene Ontology enrichment analysis visualizations. It also features a handy wrapper for Differential Expression analysis around the DESeq2 package described in Love et al. (2014) <doi:10.1186/s13059-014-0550-8>. The whole framework is structured in different, independent functions, in order to let the user decide which steps of the analysis to perform and which plot to produce.
This package implements several tools that are used in animal social network analysis, as described in Whitehead (2007) Analyzing Animal Societies <University of Chicago Press> and Farine & Whitehead (2015) <doi: 10.1111/1365-2656.12418>. In particular, this package provides the tools to infer groups and generate networks from observation data, perform permutation tests on the data, calculate lagged association rates, and performed multiple regression analysis on social network data.
This package provides tools to construct (or add to) cell-type signature matrices using flow sorted or single cell samples and deconvolve bulk gene expression data. Useful for assessing the quality of single cell RNAseq experiments, estimating the accuracy of signature matrices, and determining cell-type spillover. Please cite: Danziger SA et al. (2019) ADAPTS: Automated Deconvolution Augmentation of Profiles for Tissue Specific cells <doi:10.1371/journal.pone.0224693>.
Designed for web usage data analysis, it implements tools to process web sequences and identify web browsing profiles through sequential classification. Sequences clusters are identified by using a model-based approach, specifically mixture of discrete time first-order Markov models for categorical web sequences. A Bayesian approach is used to estimate model parameters and identify sequences classification as proposed by Fruehwirth-Schnatter and Pamminger (2010) <doi:10.1214/10-BA606>.
Base DataSHIELD functions for the server side. DataSHIELD is a software package which allows you to do non-disclosive federated analysis on sensitive data. DataSHIELD analytic functions have been designed to only share non disclosive summary statistics, with built in automated output checking based on statistical disclosure control. With data sites setting the threshold values for the automated output checks. For more details, see citation("dsBase")'.
This package provides a R driver for Apache Drill<https://drill.apache.org>, which could connect to the Apache Drill cluster<https://drill.apache.org/docs/installing-drill-on-the-cluster> or drillbit<https://drill.apache.org/docs/embedded-mode-prerequisites> and get result(in data frame) from the SQL query and check the current configuration status. This link <https://drill.apache.org/docs> contains more information about Apache Drill.
For cleaning and analysis of graphs, such as animal closing force measurements. forceR was initially written and optimized to deal with insect bite force measurements, but can be used for any time series. Includes a full workflow to load, plot and crop data, correct amplifier and baseline drifts, identify individual peak shapes (bites), rescale (normalize) peak curves, and find best polynomial fits to describe and analyze force curve shapes.
This package provides an implementation of a kernel-embedding of probability test for elliptical distribution. This is an asymptotic test for elliptical distribution under general alternatives, and the location and shape parameters are assumed to be unknown. Some side-products are posted, including the transformation between rectangular and polar coordinates and two product-type kernel functions. See Tang and Li (2024) <doi:10.48550/arXiv.2306.10594> for details.
Fits keyword assisted topic models (keyATM) using collapsed Gibbs samplers. The keyATM combines the latent dirichlet allocation (LDA) models with a small number of keywords selected by researchers in order to improve the interpretability and topic classification of the LDA. The keyATM can also incorporate covariates and directly model time trends. The keyATM is proposed in Eshima, Imai, and Sasaki (2024) <doi:10.1111/ajps.12779>.
Dealing with neutrosophic data in single valued form using score, accuracy and certainty functions to calculate ranks of Single Valued Neutrosophic Set (SVNS), also to calculate the Mann-Whitney test, and making a post-hoc test after rejecting the null hypothesis using the Neutrosophic Statistics Kruskal-Wallis test. For more information see Miari, Mahmoud; Anan, Mohamad Taher; Zeina, Mohamed Bisher(2022) <https://digitalrepository.unm.edu/nss_journal/vol51/iss1/60/>.
Plot the daily and cumulative number of downloads of your packages. It is designed to be slightly more convenient than the several similar programs. If you want to run this each morning, you do not need to keep typing in the names of your packages. Also, this combines the daily and cumulative counts in one run, you do not need to run separate programs to get both types of information.
Easily import the MI-SUVI data sets. The user can import data sets with full metrics, percentiles, Z-scores, or rankings. Data is available at both the County and Zip Code Tabulation Area (ZCTA) levels. This package also includes a function to import shape files for easy mapping and a function to access the full technical documentation. All data is sourced from the Michigan Department of Health and Human Services.
Calculates phenological cycle and anomalies using a non-parametric approach applied to time series of vegetation indices derived from remote sensing data or field measurements. The package implements basic and high-level functions for manipulating vector data (numerical series) and raster data (satellite derived products). Processing of very large raster files is supported. For more information, please check the following paper: Chávez et al. (2023) <doi:10.3390/rs15010073>.
This package implements the copula-based estimator for univariate long-range dependent processes, introduced in Pumi et al. (2023) <doi:10.1007/s00362-023-01418-z>. Notably, this estimator is capable of handling missing data and has been shown to perform exceptionally well, even when up to 70% of data is missing (as reported in <arXiv:2303.04754>) and has been found to outperform several other commonly applied estimators.
This package provides functions to aid the identification of probable/possible duplicates in Plant Genetic Resources (PGR) collections using passport databases comprising of information records of each constituent sample. These include methods for cleaning the data, creation of a searchable Key Word in Context (KWIC) index of keywords associated with sample records and the identification of nearly identical records with similar information by fuzzy, phonetic and semantic matching of keywords.
It offers a wide variety of techniques, such as graphics, recoding, or regression models, for a comprehensive analysis of patient-reported outcomes (PRO). Especially novel is the broad range of regression models based on the beta-binomial distribution useful for analyzing binomial data with over-dispersion in cross-sectional, longitudinal, or multidimensional response studies (see Najera-Zuloaga J., Lee D.-J. and Arostegui I. (2019) <doi:10.1002/bimj.201700251>).
The nature of working with structured query language ('SQL') scripts efficiently often requires the creation of temporary tables and there are few clean and simple R SQL execution approaches that allow you to complete this kind of work with the R environment. This package seeks to give SQL implementations in R a little love by deploying functions that allow you to deploy complex SQL scripts within a typical R workflow.