Multidimensional scaling models and methods for the visualization and analysis of asymmetric proximity data <doi:10.1111/j.2044-8317.1996.tb01078.x>. An asymmetric data matrix has the same number of rows and columns, and these rows and columns refer to the same set of objects. At least some elements in the upper-triangle are different from the corresponding elements in the lower triangle. An example of an asymmetric matrix is a student migration table, where the rows correspond to the countries of origin of the students and the columns to the destination countries. This package provides algorithms for three multidimensional scaling models. These are the slide-vector model <doi:10.1007/BF02294474>, a scaling model with unique dimensions and the asymscal model for asymmetric multidimensional scaling. Furthermore, a heat map for skew-symmetric data, and the decomposition of asymmetry are provided for the exploratory analysis of asymmetric tables.
Similarity of dissolution profiles is assessed using the similarity factor f2 according to the EMA guideline (European Medicines Agency 2010) "On the investigation of bioequivalence". Dissolution profiles are regarded as similar if the f2 value is between 50 and 100. For the applicability of the similarity factor f2, the variability between profiles needs to be within certain limits. Often, this constraint is violated. One possibility in this situation is to resample the measured profiles in order to obtain a bootstrap estimate of f2 (Shah et al. (1998) <doi:10.1023/A:1011976615750>). Other alternatives are the model-independent non-parametric multivariate confidence region (MCR) procedure (Tsong et al. (1996) <doi:10.1177/009286159603000427>) or the T2-test for equivalence procedure (Hoffelder (2016) <https://www.ecv.de/suse_item.php?suseId=Z|pi|8430>
). Functions for estimation of f1, f2, bootstrap f2, MCR / T2-test for equivalence procedure are implemented.
Allows ATA (Automatic Time series analysis using the Ata method) models from the ATAforecasting package to be used in a tidy workflow with the modeling interface of fabletools'. This extends ATAforecasting to provide enhanced model specification and management, performance evaluation methods, and model combination tools. The Ata method (Yapar et al. (2019) <doi:10.15672/hujms.461032>), an alternative to exponential smoothing (described in Yapar (2016) <doi:10.15672/HJMS.201614320580>, Yapar et al. (2017) <doi:10.15672/HJMS.2017.493>), is a new univariate time series forecasting method which provides innovative solutions to issues faced during the initialization and optimization stages of existing forecasting methods. Forecasting performance of the Ata method is superior to existing methods both in terms of easy implementation and accurate forecasting. It can be applied to non-seasonal or seasonal time series which can be decomposed into four components (remainder, level, trend and seasonal).
Statistical methods to match feature vectors between multiple datasets in a one-to-one fashion. Given a fixed number of classes/distributions, for each unit, exactly one vector of each class is observed without label. The goal is to label the feature vectors using each label exactly once so to produce the best match across datasets, e.g. by minimizing the variability within classes. Statistical solutions based on empirical loss functions and probabilistic modeling are provided. The Gurobi software and its R interface package are required for one of the package functions (match.2x()
) and can be obtained at <https://www.gurobi.com/> (free academic license). For more details, refer to Degras (2022) <doi:10.1080/10618600.2022.2074429> "Scalable feature matching for large data collections" and Bandelt, Maas, and Spieksma (2004) <doi:10.1057/palgrave.jors.2601723> "Local search heuristics for multi-index assignment problems with decomposable costs".
This package provides interactive, configurable and graphics visualization of the chromosome regions of any living organism allowing users to map chromosome elements (like genes, SNPs etc.) on the chromosome plot. It introduces a special plot viz. the "chromosome heatmap" that, in addition to mapping elements, can visualize the data associated with chromosome elements (like gene expression) in the form of heat colors. Users can investigate the detailed information about the mappings (like gene names or total genes mapped on a location) or can view the magnified single or double stranded view of the chromosome at a location showing each mapped element in sequential order. The package provide multiple features like visualizing multiple sets, chromosome heat-maps, group annotations, adding hyperlinks, and labelling. The plots can be saved as HTML documents that can be customized and shared easily. In addition, you can include them in R Markdown or in R Shiny applications.
The first stand-alone R package for computation of latent correlation that takes into account all variable types (continuous/binary/ordinal/zero-inflated), comes with an optimized memory footprint, and is computationally efficient, essentially making latent correlation estimation almost as fast as rank-based correlation estimation. The estimation is based on latent copula Gaussian models. For continuous/binary types, see Fan, J., Liu, H., Ning, Y., and Zou, H. (2017). For ternary type, see Quan X., Booth J.G. and Wells M.T. (2018) <arXiv:1809.06255>
. For truncated type or zero-inflated type, see Yoon G., Carroll R.J. and Gaynanova I. (2020) <doi:10.1093/biomet/asaa007>. For approximation method of computation, see Yoon G., Müller C.L. and Gaynanova I. (2021) <doi:10.1080/10618600.2021.1882468>. The latter method uses multi-linear interpolation originally implemented in the R package <https://cran.r-project.org/package=chebpol>.
Easily analyze and visualize differences between samples (e.g., benchmark comparisons, nonresponse comparisons in surveys) on three levels. The comparisons can be univariate, bivariate or multivariate. On univariate level the variables of interest of a survey and a comparison survey (i.e. benchmark) are compared, by calculating one of several difference measures (e.g., relative difference in mean), and an average difference between the surveys. On bivariate level a function can calculate significant differences in correlations for the surveys. And on multivariate levels a function can calculate significant differences in model coefficients between the surveys of comparison. All of those differences can be easily plotted and outputted as a table. For more detailed information on the methods and example use see Rohr, B., Silber, H., & Felderer, B. (2024). Comparing the Accuracy of Univariate, Bivariate, and Multivariate Estimates across Probability and Nonprobability Surveys with Population Benchmarks. Sociological Methodology <doi:10.1177/00811750241280963>.
Highest averages & largest remainders allocating seats methods and several party system scores. Implemented highest averages allocating seats methods are D'Hondt, Webster, Danish, Imperiali, Hill-Huntington, Dean, Modified Sainte-Lague, equal proportions and Adams. Implemented largest remainders allocating seats methods are Hare, Droop, Hangenbach-Bischoff, Imperial, modified Imperial and quotas & remainders. The main advantage of this package is that ties are always reported and not incorrectly allocated. Party system scores provided are competitiveness, concentration, effective number of parties, party nationalization score, party system nationalization score and volatility. References: Gallagher (1991) <doi:10.1016/0261-3794(91)90004-C>. Norris (2004, ISBN:0-521-82977-1). Laakso & Taagepera (1979) <https://escholarship.org/uc/item/703827nv>. Jones & Mainwaring (2003) <https://kellogg.nd.edu/sites/default/files/old_files/documents/304_0.pdf>. Pedersen (1979) <https://janda.org/c24/Readings/Pedersen/Pedersen.htm>. Golosov (2010) <doi:10.1177/1354068809339538>. Golosov (2014) <doi:10.1177/1354068814549342>.
An extensible repository of accurate, up-to-date functions to score commonly used patient-reported outcome (PRO), quality of life (QOL), and other psychometric and psychological measures. PROscorer', together with the PROscorerTools
package, is a system to facilitate the incorporation of PRO measures into research studies and clinical settings in a scientifically rigorous and reproducible manner. These packages and their vignettes are intended to help establish and promote best practices for scoring PRO and PRO-like measures in research. The PROscorer Instrument Descriptions vignette contains descriptions of each instrument scored by PROscorer', complete with references. These instrument descriptions are suitable for inclusion in formal study protocol documents, grant proposals, and manuscript Method sections. Each PROscorer function is composed of helper functions from the PROscorerTools
package, and users are encouraged to contribute new functions to PROscorer'. More scoring functions are currently in development and will be added in future updates.
The implementation of a forecasting-specific tree-based model that is in particular suitable for global time series forecasting, as proposed in Godahewa et al. (2022) <arXiv:2211.08661v1>
. The model uses the concept of Self Exciting Threshold Autoregressive (SETAR) models to define the node splits and thus, the model is named SETAR-Tree. The SETAR-Tree uses some time-series-specific splitting and stopping procedures. It trains global pooled regression models in the leaves allowing the models to learn cross-series information. The depth of the tree is controlled by conducting a statistical linearity test as well as measuring the error reduction percentage at each node split. Thus, the SETAR-Tree requires minimal external hyperparameter tuning and provides competitive results under its default configuration. A forest is developed by extending the SETAR-Tree. The SETAR-Forest combines the forecasts provided by a collection of diverse SETAR-Trees during the forecasting process.
This package provides new tools for analyzing discrete trait data integrating bio-ontologies and phylogenetics. It expands on the previous work of Tarasov et al. (2019) <doi:10.1093/isd/ixz009>. The PARAMO pipeline allows to reconstruct ancestral phenomes treating groups of morphological traits as a single complex character. The pipeline incorporates knowledge from ontologies during the amalgamation of individual character stochastic maps. Here we expand the current PARAMO functionality by adding new statistical methods for inferring evolutionary phenome dynamics using non-homogeneous Poisson process (NHPP). The new functionalities include: (1) reconstruction of evolutionary rate shifts of phenomes across lineages and time; (2) reconstruction of morphospace dynamics through time; and (3) estimation of rates of phenome evolution at different levels of anatomical hierarchy (e.g., entire body or specific regions only). The package also includes user-friendly tools for visualizing evolutionary rates of different anatomical regions using vector images of the organisms of interest.
An easy-to-use tool for working with presence/absence tests on pooled or grouped samples. The primary application is for estimating prevalence of a marker in a population based on the results of tests on pooled specimens. This sampling method is often employed in surveillance of rare conditions in humans or animals (e.g. molecular xenomonitoring). The package was initially conceived as an R-based alternative to the molecular xenomonitoring software, PoolScreen
<https://sites.uab.edu/statgenetics/software/>. However, it goes further, allowing for estimates of prevalence to be adjusted for hierarchical sampling frames, and perform flexible mixed-effect regression analyses (McLure
et al. Environmental Modelling and Software. <DOI:10.1016/j.envsoft.2021.105158>). The package is currently in early stages, however more features are planned or in the works: e.g. adjustments for imperfect test specificity/sensitivity, functions for helping with optimal experimental design, and functions for spatial modelling.
Allows clinicians and researchers to compute daily dose (and subsequently days supply) for prescription refills using the following methods: Fixed window, fixed tablet, defined daily dose (DDD), and Random Effects Warfarin Days Supply (REWarDS
). Daily dose is the computed dose that the patient takes every day. For medications with fixed dosing (e.g. direct oral anticoagulants) this is known and does not need to be estimated. For medications with varying dose such as warfarin, however, the daily dose should be assumed or estimated to allow measurement of drug exposure. Daysâ supply is the number of days that patientsâ supply of medication will last after each prescription fill. Estimating daysâ supply is necessary to calculate drug exposure. The package computes daysâ supply and daily dose at both the prescription and patient levels. Results at the prescription level are denoted with â -Rx-â and those at patient level are denoted with â -Pt-â .
This package provides functions for model fitting and selection of generalised hypergeometric ensembles of random graphs (gHypEG
). To learn how to use it, check the vignettes for a quick tutorial. Please reference its use as Casiraghi, G., Nanumyan, V. (2019) <doi:10.5281/zenodo.2555300> together with those relevant references from the one listed below. The package is based on the research developed at the Chair of Systems Design, ETH Zurich. Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2016) <arXiv:1607.02441>
. Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2017) <doi:10.1007/978-3-319-67256-4_11>. Casiraghi, G., (2017) <arXiv:1702.02048>
Brandenberger, L., Casiraghi, G., Nanumyan, V., Schweitzer, F. (2019) <doi:10.1145/3341161.3342926> Casiraghi, G. (2019) <doi:10.1007/s41109-019-0241-1>. Casiraghi, G., Nanumyan, V. (2021) <doi:10.1038/s41598-021-92519-y>. Casiraghi, G. (2021) <doi:10.1088/2632-072X/ac0493>.
Processes noble gas mass spectrometer data to determine the isotopic composition of argon (comprised of Ar36, Ar37, Ar38, Ar39 and Ar40) released from neutron-irradiated potassium-bearing minerals. Then uses these compositions to calculate precise and accurate geochronological ages for multiple samples as well as the covariances between them. Error propagation is done in matrix form, which jointly treats all samples and all isotopes simultaneously at every step of the data reduction process. Includes methods for regression of the time-resolved mass spectrometer signals to t=0 ('time zero') for both single- and multi-collector instruments, blank correction, mass fractionation correction, detector intercalibration, decay corrections, interference corrections, interpolation of the irradiation parameter between neutron fluence monitors, and (weighted mean) age calculation. All operations are performed on the logs of the ratios between the different argon isotopes so as to properly treat them as compositional data', sensu Aitchison [1986, The Statistics of Compositional Data, Chapman and Hall].
An important environmental impact on running water ecosystems is caused by hydropeaking - the discontinuous release of turbine water because of peaks of energy demand. An event-based algorithm is implemented to detect flow fluctuations referring to increase events (IC) and decrease events (DC). For each event, a set of parameters related to the fluctuation intensity is calculated. The framework is introduced in Greimel et al. (2016) "A method to detect and characterize sub-daily flow fluctuations" <doi:10.1002/hyp.10773> and can be used to identify different fluctuation types according to the potential source: e.g., sub-daily flow fluctuations caused by hydropeaking, rainfall, or snow and glacier melt. This is a companion to the package hydroroute', which is used to detect and follow hydropower plant-specific hydropeaking waves at the sub-catchment scale and to describe how hydropeaking flow parameters change along the longitudinal flow path as proposed and validated in Greimel et al. (2022).
An index is created using a mathematical model that transforms multi-dimensional variables into a single value. These variables are often correlated, and while PCA-based indices can address the issue of multicollinearity, they typically do not account for survey weights, which can lead to inaccurate rankings of survey units such as households, districts, or states. To resolve this, the current package facilitates the development of a principal component analysis-based composite index by incorporating survey weights for each sample observation. This ensures the generation of a survey-weighted principal component-based normalized composite index. Additionally, the package provides a normalized principal component-based composite index and ranks the sample observations based on the values of the composite indices. For method details see, Skinner, C. J., Holmes, D. J. and Smith, T. M. F. (1986) <DOI:10.1080/01621459.1986.10478336>, Singh, D., Basak, P., Kumar, R. and Ahmad, T. (2023) <DOI:10.3389/fams.2023.1274530>.
This package performs non-parametric tests of parametric specifications. Five tests are available. Specific bandwidth and kernel methods can be chosen along with many other options. Allows parallel computing to quickly compute p-values based on the bootstrap. Methods implemented in the package are H.J. Bierens (1982) <doi:10.1016/0304-4076(82)90105-1>, J.C. Escanciano (2006) <doi:10.1017/S0266466606060506>, P.L. Gozalo (1997) <doi:10.1016/S0304-4076(97)86571-2>, P. Lavergne and V. Patilea (2008) <doi:10.1016/j.jeconom.2007.08.014>, P. Lavergne and V. Patilea (2012) <doi:10.1198/jbes.2011.07152>, J.H. Stock and M.W. Watson (2006) <doi:10.1111/j.1538-4616.2007.00014.x>, C.F.J. Wu (1986) <doi:10.1214/aos/1176350142>, J. Yin, Z. Geng, R. Li, H. Wang (2010) <https://www.jstor.org/stable/24309002> and J.X. Zheng (1996) <doi:10.1016/0304-4076(95)01760-7>.
Genome-wide association studies (GWAS) are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. The R package SNPRelate provides a binary format for single-nucleotide polymorphism (SNP) data in GWAS utilizing CoreArray Genomic Data Structure (GDS) data files. The GDS format offers the efficient operations specifically designed for integers with two bits, since a SNP could occupy only two bits. SNPRelate is also designed to accelerate two key computations on SNP data using parallel computing for multi-core symmetric multiprocessing computer architectures: Principal Component Analysis (PCA) and relatedness analysis using Identity-By-Descent measures. The SNP GDS format is also used by the GWASTools package with the support of S4 classes and generic functions. The extended GDS format is implemented in the SeqArray package to support the storage of single nucleotide variations (SNVs), insertion/deletion polymorphism (indel) and structural variation calls in whole-genome and whole-exome variant data.
This package provides a suite of machine learning algorithms written in C++ with the R interface contains several learning techniques for classification and regression. Predictive models include e.g., classification and regression trees with optional constructive induction and models in the leaves, random forests, kNN
, naive Bayes, and locally weighted regression. All predictions obtained with these models can be explained and visualized with the ExplainPrediction
package. This package is especially strong in feature evaluation where it contains several variants of Relief algorithm and many impurity based attribute evaluation functions, e.g., Gini, information gain, MDL, and DKM. These methods can be used for feature selection or discretization of numeric attributes. The OrdEval
algorithm and its visualization is used for evaluation of data sets with ordinal features and class, enabling analysis according to the Kano model of customer satisfaction. Several algorithms support parallel multithreaded execution via OpenMP
. The top-level documentation is reachable through ?CORElearn.
This package provides tools for shoreline dating coastal Stone Age sites. The implemented method was developed in Roalkvam (2023) <doi:10.1016/j.quascirev.2022.107880> for the Norwegian Skagerrak coast. Although it can be extended to other areas, this also forms the core area for application of the package. Shoreline dating is based on the present-day elevation of a site, a reconstruction of past relative sea-level change, and empirically derived estimates of the likely elevation of the sites above the contemporaneous sea-level when they were in use. The geographical and temporal coverage of the method thus follows from the availability of local geological reconstructions of shoreline displacement and the degree to which the settlements to be dated have been located on or close to the shoreline when they were in use. Methods for numerical treatment and visualisation of the dates are provided, along with basic tools for visualising and evaluating the location of sites.
We present a rank-based Mercer kernel to compute a pair-wise similarity metric corresponding to informative representation of data. We tailor the development of a kernel to encode our prior knowledge about the data distribution over a probability space. The philosophical concept behind our construction is that objects whose feature values fall on the extreme of that featureâ s probability mass distribution are more similar to each other, than objects whose feature values lie closer to the mean. Semblance emphasizes features whose values lie far away from the mean of their probability distribution. The kernel relies on properties empirically determined from the data and does not assume an underlying distribution. The use of feature ranks on a probability space ensures that Semblance is computational efficacious, robust to outliers, and statistically stable, thus making it widely applicable algorithm for pattern analysis. The output from the kernel is a square, symmetric matrix that gives proximity values between pairs of observations.
The goal of this package is to user-friendly realizing Gaussian graphical model-based heterogeneity analysis. Recently, several Gaussian graphical model-based heterogeneity analysis techniques have been developed. A common methodological limitation is that the number of subgroups is assumed to be known a priori, which is not realistic. In a very recent study (Ren et al., 2022), a novel approach based on the penalized fusion technique is developed to fully data-dependently determine the number and structure of subgroups in Gaussian graphical model-based heterogeneity analysis. It opens the door for utilizing the Gaussian graphical model technique in more practical settings. Beyond Ren et al. (2022), more estimations and functions are added, so that the package is self-contained and more comprehensive and can provide ``more direct insights to practitioners (with the visualization function). Reference: Ren, M., Zhang S., Zhang Q. and Ma S. (2022). Gaussian Graphical Model-based Heterogeneity Analysis via Penalized Fusion. Biometrics, 78 (2), 524-535.
Estimation and inference for multiple kink quantile regression for longitudinal data and the i.i.d data. A bootstrap restarting iterative segmented quantile algorithm is proposed to estimate the multiple kink quantile regression model conditional on a given number of change points. The number of kinks is also allowed to be unknown. In such case, the backward elimination algorithm and the bootstrap restarting iterative segmented quantile algorithm are combined to select the number of change points based on a quantile BIC. For longitudinal data, we also develop the GEE estimator to incorporate the within-subject correlations. A score-type based test statistic is also developed for testing the existence of kink effect. The package is based on the paper, ``Wei Zhong, Chuang Wan and Wenyang Zhang (2022). Estimation and inference for multikink quantile regression, JBES and ``Chuang Wan, Wei Zhong, Wenyang Zhang and Changliang Zou (2022). Multi-kink quantile regression for longitudinal data with application to progesterone data analysis, Biometrics".