This package implements Bayesian data analyses of balanced repeatability and reproducibility studies with ordinal measurements. Model fitting is based on MCMC posterior sampling with rjags'. Function ordinalRR() directly carries out the model fitting, and this function has the flexibility to allow the user to specify key aspects of the model, e.g., fixed versus random effects. Functions for preprocessing data and for the numerical and graphical display of a fitted model are also provided. There are also functions for displaying the model at fixed (user-specified) parameters and for simulating a hypothetical data set at a fixed (user-specified) set of parameters for a random-effects rater population. For additional technical details, refer to Culp, Ryan, Chen, and Hamada (2018) and cite this Technometrics paper when referencing any aspect of this work. The demo of this package reproduces results from the Technometrics paper.
This package contains functions to perform various models and methods for test equating (Kolen and Brennan, 2014 <doi:10.1007/978-1-4939-0317-7> ; Gonzalez and Wiberg, 2017 <doi:10.1007/978-3-319-51824-4> ; von Davier et. al, 2004 <doi:10.1007/b97446>). It currently implements the traditional mean, linear and equipercentile equating methods. Both IRT observed-score and true-score equating are also supported, as well as the mean-mean, mean-sigma, Haebara and Stocking-Lord IRT linking methods. It also supports newest methods such that local equating, kernel equating (using Gaussian, logistic, Epanechnikov, uniform and adaptive kernels) with presmoothing, and IRT parameter linking methods based on asymmetric item characteristic functions. Functions to obtain both standard error of equating (SEE) and standard error of equating differences between two equating functions (SEED) are also implemented for the kernel method of equating.
Many complex diseases are known to be affected by the interactions between genetic variants and environmental exposures beyond the main genetic and environmental effects. Existing Bayesian methods for gene-environment (GÃ E) interaction studies are challenged by the high-dimensional nature of the study and the complexity of environmental influences. We have developed a novel and powerful semi-parametric Bayesian variable selection method that can accommodate linear and nonlinear GÃ E interactions simultaneously (Ren et al. (2020) <doi:10.1002/sim.8434>). Furthermore, the proposed method can conduct structural identification by distinguishing nonlinear interactions from main effects only case within Bayesian framework. Spike-and-slab priors are incorporated on both individual and group level to shrink coefficients corresponding to irrelevant main and interaction effects to zero exactly. The Markov chain Monte Carlo algorithms of the proposed and alternative methods are efficiently implemented in C++.
We consider the problem of estimating two isotonic regression curves g1* and g2* under the constraint that they are ordered, i.e. g1* <= g2*. Given two sets of n data points y_1, ..., y_n and z_1, ..., z_n that are observed at (the same) deterministic design points x_1, ..., x_n, the estimates are obtained by minimizing the Least Squares criterion L(a, b) = sum_i=1^n (y_i - a_i)^2 w1(x_i) + sum_i=1^n (z_i - b_i)^2 w2(x_i) over the class of pairs of vectors (a, b) such that a and b are isotonic and a_i <= b_i for all i = 1, ..., n. We offer two different approaches to compute the estimates: a projected subgradient algorithm where the projection is calculated using a PAVA as well as Dykstra's cyclical projection algorithm.
NOTE: PARAMLINK HAS BEEN SUPERSEDED BY THE PEDSUITE PACKAGES (<https://magnusdv.github.io/pedsuite/>). PARAMLINK IS MAINTAINED ONLY FOR LEGACY PURPOSES AND SHOULD NOT BE USED IN NEW PROJECTS. A suite of tools for analysing pedigrees with marker data, including parametric linkage analysis, forensic computations, relatedness analysis and marker simulations. The core of the package is an implementation of the Elston-Stewart algorithm for pedigree likelihoods, extended to allow mutations as well as complex inbreeding. Features for linkage analysis include singlepoint LOD scores, power analysis, and multipoint analysis (the latter through a wrapper to the MERLIN software). Forensic applications include exclusion probabilities, genotype distributions and conditional simulations. Data from the Familias software can be imported and analysed in paramlink'. Finally, paramlink offers many utility functions for creating, manipulating and plotting pedigrees with or without marker data (the actual plotting is done by the kinship2 package).
pipeFrame is an R package for building a componentized bioinformatics pipeline. Each step in this pipeline is wrapped in the framework, so the connection among steps is created seamlessly and automatically. Users could focus more on fine-tuning arguments rather than spending a lot of time on transforming file format, passing task outputs to task inputs or installing the dependencies. Componentized step elements can be customized into other new pipelines flexibly as well. This pipeline can be split into several important functional steps, so it is much easier for users to understand the complex arguments from each step rather than parameter combination from the whole pipeline. At the same time, componentized pipeline can restart at the breakpoint and avoid rerunning the whole pipeline, which may save a lot of time for users on pipeline tuning or such issues as power off or process other interrupts.
For fitting N-mixture models using either FFT or asymptotic approaches. FFT N-mixture models extend the work of Cowen et al. (2017) <doi:10.1111/biom.12701>. Asymptotic N-mixture models extend the work of Dail and Madsen (2011) <doi:10.1111/j.1541-0420.2010.01465.x>, to consider asymptotic solutions to the open population N-mixture models. The FFT models are derived and described in "Parker, M.R.P., Elliott, L., Cowen, L.L.E. (2022). Computational efficiency and precision for replicated-count and batch-marked hidden population models [Manuscript in preparation]. Department of Statistics and Actuarial Sciences, Simon Fraser University.". The asymptotic models are derived and described in: "Parker, M.R.P., Elliott, L., Cowen, L.L.E., Cao, J. (2022). Fast asymptotic solutions for N-mixtures on large populations [Manuscript in preparation]. Department of Statistics and Actuarial Sciences, Simon Fraser University.".
In a clinical trial, it frequently occurs that the most credible outcome to evaluate the effectiveness of a new therapy (the true endpoint) is difficult to measure. In such a situation, it can be an effective strategy to replace the true endpoint by a (bio)marker that is easier to measure and that allows for a prediction of the treatment effect on the true endpoint (a surrogate endpoint). The package Surrogate allows for an evaluation of the appropriateness of a candidate surrogate endpoint based on the meta-analytic, information-theoretic, and causal-inference frameworks. Part of this software has been developed using funding provided from the European Union's Seventh Framework Programme for research, technological development and demonstration (Grant Agreement no 602552), the Special Research Fund (BOF) of Hasselt University (BOF-number: BOF2OCPO3), GlaxoSmithKline Biologicals, Baekeland Mandaat (HBC.2022.0145), and Johnson & Johnson Innovative Medicine.
Allows calculating global scores for characteristics of visual stimuli as assessed by human raters. Stimuli are presented as sequence of pairwise comparisons ('contests'), during each of which a rater expresses preference for one stimulus over the other (forced choice). The algorithm for calculating global scores is based on Elo rating, which updates individual scores after each single pairwise contest. Elo rating is widely used to rank chess players according to their performance. Its core feature is that dyadic contests with expected outcomes lead to smaller changes of participants scores than outcomes that were unexpected. As such, Elo rating is an efficient tool to rate individual stimuli when a large number of such stimuli are paired against each other in the context of experiments where the goal is to rank stimuli according to some characteristic of interest. Clark et al (2018) <doi:10.1371/journal.pone.0190393> provide details.
Drafting an epidemiological report in Microsoft Word format for a given disease, similar to the Annual Epidemiological Reports published by the European Centre for Disease Prevention and Control. Through standalone functions, it is specifically designed to generate each disease specific output presented in these reports and includes: - Table with the distribution of cases by Member State over the last five years; - Seasonality plot with the distribution of cases at the European Union / European Economic Area level, by month, over the past five years; - Trend plot with the trend and number of cases at the European Union / European Economic Area level, by month, over the past five years; - Age and gender bar graph with the distribution of cases at the European Union / European Economic Area level. Two types of datasets can be used: - The default dataset of dengue 2015-2019 data; - Any dataset specified as described in the vignette.
Most common exact, asymptotic and resample based tests are provided for testing the homogeneity of variances of k normal distributions under normality. These tests are Barlett, Bhandary & Dai, Brown & Forsythe, Chang et al., Gokpinar & Gokpinar, Levene, Liu and Xu, Gokpinar. Also, a data generation function from multiple normal distribution is provided using any multiple normal parameters. Bartlett, M. S. (1937) <doi:10.1098/rspa.1937.0109> Bhandary, M., & Dai, H. (2008) <doi:10.1080/03610910802431011> Brown, M. B., & Forsythe, A. B. (1974).<doi:10.1080/01621459.1974.10482955> Chang, C. H., Pal, N., & Lin, J. J. (2017) <doi:10.1080/03610918.2016.1202277> Gokpinar E. & Gokpinar F. (2017) <doi:10.1080/03610918.2014.955110> Liu, X., & Xu, X. (2010) <doi:10.1016/j.spl.2010.05.017> Levene, H. (1960) <https://cir.nii.ac.jp/crid/1573950400526848896> Gökpınar, E. (2020) <doi:10.1080/03610918.2020.1800037>.
Simply and efficiently simulates (i) variants from reference genomes and (ii) reads from both Illumina <https://www.illumina.com/> and Pacific Biosciences (PacBio) <https://www.pacb.com/> platforms. It can either read reference genomes from FASTA files or simulate new ones. Genomic variants can be simulated using summary statistics, phylogenies, Variant Call Format (VCF) files, and coalescent simulationsâ the latter of which can include selection, recombination, and demographic fluctuations. jackalope can simulate single, paired-end, or mate-pair Illumina reads, as well as PacBio reads. These simulations include sequencing errors, mapping qualities, multiplexing, and optical/polymerase chain reaction (PCR) duplicates. Simulating Illumina sequencing is based on ART by Huang et al. (2012) <doi:10.1093/bioinformatics/btr708>. PacBio sequencing simulation is based on SimLoRD by Stöcker et al. (2016) <doi:10.1093/bioinformatics/btw286>. All outputs can be written to standard file formats.
The inference in multi-state models is traditionally performed under a Markov assumption that claims that past and future of the process are independent given the present state. In this package, we consider tests of the Markov assumption that are applicable to general multi-state models. Three approaches using existing methodology are considered: a simple method based on including covariates depending on the history in Cox models for the transition intensities; methods based on measuring the discrepancy of the non-Markov estimators of the transition probabilities to the Markov Aalen-Johansen estimators; and, finally, methods that were developed by considering summaries from families of log-rank statistics where patients are grouped by the state occupied of the process at a particular time point (see Soutinho G, Meira-Machado L (2021) <doi:10.1007/s00180-021-01139-7> and Titman AC, Putter H (2020) <doi:10.1093/biostatistics/kxaa030>).
This package performs various statistical transformations; Box-Cox and Log (Box and Cox, 1964) <doi:10.1111/j.2517-6161.1964.tb00553.x>, Glog (Durbin et al., 2002) <doi:10.1093/bioinformatics/18.suppl_1.S105>, Neglog (Whittaker et al., 2005) <doi:10.1111/j.1467-9876.2005.00520.x>, Reciprocal (Tukey, 1957), Log Shift (Feng et al., 2016) <doi:10.1002/sta4.104>, Bickel-Docksum (Bickel and Doksum, 1981) <doi:10.1080/01621459.1981.10477649>, Yeo-Johnson (Yeo and Johnson, 2000) <doi:10.1093/biomet/87.4.954>, Square Root (Medina et al., 2019), Manly (Manly, 1976) <doi:10.2307/2988129>, Modulus (John and Draper, 1980) <doi:10.2307/2986305>, Dual (Yang, 2006) <doi:10.1016/j.econlet.2006.01.011>, Gpower (Kelmansky et al., 2013) <doi:10.1515/sagmb-2012-0030>. It also performs graphical approaches, assesses the success of the transformation via tests and plots.
This package provides several methods for generating density functions based on binned data. Methods include step function, recursive subdivision, and optimized spline. Data are assumed to be nonnegative, the top bin is assumed to have no upper bound, but the bin widths need be equal. All PDF smoothing methods maintain the areas specified by the binned data. (Equivalently, all CDF smoothing methods interpolate the points specified by the binned data.) In practice, an estimate for the mean of the distribution should be supplied as an optional argument. Doing so greatly improves the reliability of statistics computed from the smoothed density functions. Includes methods for estimating the Gini coefficient, the Theil index, percentiles, and random deviates from a smoothed distribution. Among the three methods, the optimized spline (splinebins) is recommended for most purposes. The percentile and random-draw methods should be regarded as experimental, and these methods only support splinebins.
This package provides a bottom up model to estimate the emission levels of public transport systems based on General Transit Feed Specification (GTFS) data. The package requires two main inputs: i) Public transport data in the GTFS standard format; and ii) Some basic information on fleet characteristics such as fleet age, technology, fuel and Euro stage. As it stands, the package estimates several pollutants at high spatial and temporal resolutions. Pollution levels can be calculated for specific transport routes, trips, time of the day or for the transport system as a whole. The output with emission estimates can be extracted in different formats, supporting analysis on how emission levels vary across space, time and by fleet characteristics. A full description of the methods used in the gtfs2emis model is presented in Vieira, J. P. B.; Pereira, R. H. M.; Andrade, P. R. (2022) <doi:10.31219/osf.io/8m2cy>.
Simulation of the random evolution of heterogeneous populations using stochastic Individual-Based Models (IBMs) <doi:10.48550/arXiv.2303.06183>. The package enables users to simulate population evolution, in which individuals are characterized by their age and some characteristics, and the population is modified by different types of events, including births/arrivals, death/exit events, or changes of characteristics. The frequency at which an event can occur to an individual can depend on their age and characteristics, but also on the characteristics of other individuals (interactions). Such models have a wide range of applications. For instance, IBMs can be used for simulating the evolution of a heterogeneous insurance portfolio with selection or for validating mortality forecasts. This package overcomes the limitations of time-consuming IBMs simulations by implementing new efficient algorithms based on thinning methods, which are compiled using the Rcpp package while providing a user-friendly interface.
This package provides interactive, configurable and graphics visualization of the chromosome regions of any living organism allowing users to map chromosome elements (like genes, SNPs etc.) on the chromosome plot. It introduces a special plot viz. the "chromosome heatmap" that, in addition to mapping elements, can visualize the data associated with chromosome elements (like gene expression) in the form of heat colors. Users can investigate the detailed information about the mappings (like gene names or total genes mapped on a location) or can view the magnified single or double stranded view of the chromosome at a location showing each mapped element in sequential order. The package provide multiple features like visualizing multiple sets, chromosome heat-maps, group annotations, adding hyperlinks, and labelling. The plots can be saved as HTML documents that can be customized and shared easily. In addition, you can include them in R Markdown or in R Shiny applications.
Similarity of dissolution profiles is assessed using the similarity factor f2 according to the EMA guideline (European Medicines Agency 2010) "On the investigation of bioequivalence". Dissolution profiles are regarded as similar if the f2 value is between 50 and 100. For the applicability of the similarity factor f2, the variability between profiles needs to be within certain limits. Often, this constraint is violated. One possibility in this situation is to resample the measured profiles in order to obtain a bootstrap estimate of f2 (Shah et al. (1998) <doi:10.1023/A:1011976615750>). Other alternatives are the model-independent non-parametric multivariate confidence region (MCR) procedure (Tsong et al. (1996) <doi:10.1177/009286159603000427>) or the T2-test for equivalence procedure (Hoffelder (2016) <https://www.ecv.de/suse_item.php?suseId=Z|pi|8430>). Functions for estimation of f1, f2, bootstrap f2, MCR / T2-test for equivalence procedure are implemented.
Allows ATA (Automatic Time series analysis using the Ata method) models from the ATAforecasting package to be used in a tidy workflow with the modeling interface of fabletools'. This extends ATAforecasting to provide enhanced model specification and management, performance evaluation methods, and model combination tools. The Ata method (Yapar et al. (2019) <doi:10.15672/hujms.461032>), an alternative to exponential smoothing (described in Yapar (2016) <doi:10.15672/HJMS.201614320580>, Yapar et al. (2017) <doi:10.15672/HJMS.2017.493>), is a new univariate time series forecasting method which provides innovative solutions to issues faced during the initialization and optimization stages of existing forecasting methods. Forecasting performance of the Ata method is superior to existing methods both in terms of easy implementation and accurate forecasting. It can be applied to non-seasonal or seasonal time series which can be decomposed into four components (remainder, level, trend and seasonal).
Statistical methods to match feature vectors between multiple datasets in a one-to-one fashion. Given a fixed number of classes/distributions, for each unit, exactly one vector of each class is observed without label. The goal is to label the feature vectors using each label exactly once so to produce the best match across datasets, e.g. by minimizing the variability within classes. Statistical solutions based on empirical loss functions and probabilistic modeling are provided. The Gurobi software and its R interface package are required for one of the package functions (match.2x()) and can be obtained at <https://www.gurobi.com/> (free academic license). For more details, refer to Degras (2022) <doi:10.1080/10618600.2022.2074429> "Scalable feature matching for large data collections" and Bandelt, Maas, and Spieksma (2004) <doi:10.1057/palgrave.jors.2601723> "Local search heuristics for multi-index assignment problems with decomposable costs".
Easily analyze and visualize differences between samples (e.g., benchmark comparisons, nonresponse comparisons in surveys) on three levels. The comparisons can be univariate, bivariate or multivariate. On univariate level the variables of interest of a survey and a comparison survey (i.e. benchmark) are compared, by calculating one of several difference measures (e.g., relative difference in mean), and an average difference between the surveys. On bivariate level a function can calculate significant differences in correlations for the surveys. And on multivariate levels a function can calculate significant differences in model coefficients between the surveys of comparison. All of those differences can be easily plotted and outputted as a table. For more detailed information on the methods and example use see Rohr, B., Silber, H., & Felderer, B. (2024). Comparing the Accuracy of Univariate, Bivariate, and Multivariate Estimates across Probability and Nonprobability Surveys with Population Benchmarks. Sociological Methodology <doi:10.1177/00811750241280963>.
Highest averages & largest remainders allocating seats methods and several party system scores. Implemented highest averages allocating seats methods are D'Hondt, Webster, Danish, Imperiali, Hill-Huntington, Dean, Modified Sainte-Lague, equal proportions and Adams. Implemented largest remainders allocating seats methods are Hare, Droop, Hangenbach-Bischoff, Imperial, modified Imperial and quotas & remainders. The main advantage of this package is that ties are always reported and not incorrectly allocated. Party system scores provided are competitiveness, concentration, effective number of parties, party nationalization score, party system nationalization score and volatility. References: Gallagher (1991) <doi:10.1016/0261-3794(91)90004-C>. Norris (2004, ISBN:0-521-82977-1). Laakso & Taagepera (1979) <https://escholarship.org/uc/item/703827nv>. Jones & Mainwaring (2003) <https://kellogg.nd.edu/sites/default/files/old_files/documents/304_0.pdf>. Pedersen (1979) <https://janda.org/c24/Readings/Pedersen/Pedersen.htm>. Golosov (2010) <doi:10.1177/1354068809339538>. Golosov (2014) <doi:10.1177/1354068814549342>.
Analysis of pervasiveness of effects in correlational data. The Observed Proportion (or Percentage) of Concordant Pairs (OPCP) is Kendall's Tau expressed on a 0 to 1 metric instead of the traditional -1 to 1 metric to facilitate interpretation. As its name implies, it represents the proportion of concordant pairs in a sample (with an adjustment for ties). Pairs are concordant when a participant who has a larger value on a variable than another participant also has a larger value on a second variable. The OPCP is therefore an easily interpretable indicator of monotonicity. The pervasive functions are essentially wrappers for the arules package by Hahsler et al. (2025)<doi:10.32614/CRAN.package.arules> and serve to count individuals who actually display the pattern(s) suggested by a regression. For more details, see the paper "Considering approaches to pervasiveness in the context of personality psychology" now accepted at the journal Personality Science.