This package provides tools to import, clean, and visualize movement data, particularly from motion capture systems such as Optitrack's Motive', the Straw Lab's Flydra', or from other sources. We provide functions to remove artifacts, standardize tunnel position and tunnel axes, select a region of interest, isolate specific trajectories, fill gaps in trajectory data, and calculate 3D and per-axis velocity. For experiments of visual guidance, we also provide functions that use subject position to estimate perception of visual stimuli.
Propose an area-level, non-parametric regression estimator based on Nadaraya-Watson kernel on small area mean. Adopt a two-stage estimation approach proposed by Prasad and Rao (1990). Mean Squared Error (MSE) estimators are not readily available, so resampling method that called bootstrap is applied. This package are based on the model proposed in Two stage non-parametric approach for small area estimation by Pushpal Mukhopadhyay and Tapabrata Maiti(2004) <http://www.asasrms.org/Proceedings/y2004/files/Jsm2004-000737.pdf>.
This package provides a framework for text cleansing and analysis. Conveniently prepare and process large amounts of text for analysis. Includes various metrics for word counts/frequencies that scale efficiently. Quickly analyze large amounts of text data using a text.table (a data.table created with one word (or unit of text analysis) per row, similar to the tidytext format). Offers flexibility to efficiently work with text data stored in vectors as well as text data formatted as a text.table.
Probability surveys often use auxiliary continuous data from administrative records, but the utility of this data is diminished when it is discretized for confidentiality. We provide a set of survey estimators to make full use of information from the discretized variables. See Williams, S.Z., Zou, J., Liu, Y., Si, Y., Galea, S. and Chen, Q. (2024), Improving Survey Inference Using Administrative Records Without Releasing Individual-Level Continuous Data. Statistics in Medicine, 43: 5803-5813. <doi:10.1002/sim.10270> for details.
The bestridge package is designed to provide a one-stand service for users to successfully carry out best ridge regression in various complex situations via the primal dual active set algorithm proposed by Wen, C., Zhang, A., Quan, S. and Wang, X. (2020) <doi:10.18637/jss.v094.i04>. This package allows users to perform the regression, classification, count regression and censored regression for (ultra) high dimensional data, and it also supports advanced usages like group variable selection and nuisance variable selection.
Every research team have their own script for data management, statistics and most importantly hemodynamic indices. The purpose is to standardize scripts utilized in clinical research. The hemodynamic indices can be used in a long-format dataframe, and add both periods of interest (trigger-periods), and delete artifacts with deleter-files. Transfer function analysis (Claassen et al. (2016) <doi:10.1177/0271678X15626425>) and Mx (Czosnyka et al. (1996) <doi:10.1161/01.str.27.10.1829>) can be calculated using this package.
This package provides a set of wrapper functions that mainly re-produces most of the sequence plots rendered with TraMineR::seqplot()
. Whereas TraMineR
uses base R to produce the plots this library draws on ggplot2'. The plots are produced on the basis of a sequence object defined with TraMineR::seqdef()
. The package automates the reshaping and plotting of sequence data. Resulting plots are of class ggplot', i.e. components can be added and tweaked using + and regular ggplot2 functions.
This package provides a ggplot2 extension for visualising uncertainty with the goal of signal suppression. Usually, uncertainty visualisation focuses on expressing uncertainty as a distribution or probability, whereas ggdibbler differentiates itself by viewing an uncertainty visualisation as an adjustment to an existing graphic that incorporates the inherent uncertainty in the estimates. You provide the code for an existing plot, but replace one of the variables with a vector of distributions, and it will convert the visualisation into it's signal suppression counterpart.
Estimates the Gini index and computes variances and confidence intervals for finite and infinite populations, using different methods; also computes Gini index for continuous probability distributions, draws samples from continuous probability distributions with Gini indices set by the user; uses Rcpp'. References: Muñoz et al. (2023) <doi:10.1177/00491241231176847>. à lvarez et al. (2021) <doi:10.3390/math9243252>. Giorgi and Gigliarano (2017) <doi:10.1111/joes.12185>. Langel and Tillé (2013) <doi:10.1111/j.1467-985X.2012.01048.x>.
Solver for linear, quadratic, and rational programs with linear, quadratic, and rational constraints. A unified interface to different R packages is provided. Optimization problems are transformed into equivalent formulations and solved by the respective package. For example, quadratic programming problems with linear, quadratic and rational constraints can be solved by augmented Lagrangian minimization using package alabama', or by sequential quadratic programming using solver slsqp'. Alternatively, they can be reformulated as optimization problems with second order cone constraints and solved with package cccp'.
G-computation for a set of time-fixed exposures with quantile-based basis functions, possibly under linearity and homogeneity assumptions. Effect measure modification in this method is a way to assess how the effect of the mixture varies by a binary, categorical or continuous variable. Reference: Alexander P. Keil, Jessie P. Buckley, Katie M. OBrien, Kelly K. Ferguson, Shanshan Zhao, and Alexandra J. White (2019) A quantile-based g-computation approach to addressing the effects of exposure mixtures; <doi:10.1289/EHP5838>.
This package provides a high-level plotting system, compatible with `ggplot2` objects, maps from `sf`, `terra`, `raster`, `sp`. It is built primarily on the grid package. The objective of the package is to provide a plotting system that is built for speed and modularity. This is useful for quick visualizations when testing code and for plotting multiple figures to the same device from independent sources that may be independent of one another (i.e., different function or modules the create the visualizations).
Identifies a bicluster, a submatrix of the data such that the features and observations within the submatrix differ from those not contained in submatrix, using a two-step method. In the first step, observations in the bicluster are identified to maximize the sum of weighted between cluster feature differences. The method is described in Helgeson et al. (2020) <doi:10.1111/biom.13136>. SCBiclust can be used to identify biclusters which differ based on feature means, feature variances, or more general differences.
Estimates the parameter of small area in binary data without auxiliary variable using Empirical Bayes technique, mainly from Rao and Molina (2015,ISBN:9781118735787) with book entitled "Small Area Estimation Second Edition". This package provides another option of direct estimation using weight. This package also features alpha and beta parameter estimation on calculating process of small area. Those methods are Newton-Raphson and Moment which based on Wilcox (1979) <doi:10.1177/001316447903900302> and Kleinman (1973) <doi:10.1080/01621459.1973.10481332>.
This package provides a pipeline that can process single or multiple Single Cell RNAseq samples primarily specializes in Clustering and Dimensionality Reduction. Meanwhile we use common cell type marker genes for T cells, B cells, Myeloid cells, Epithelial cells, and stromal cells (Fiboblast, Endothelial cells, Pericyte, Smooth muscle cells) to visualize the Seurat clusters, to facilitate labeling them by biological names. Once users named each cluster, they can evaluate the quality of them again and find the de novo marker genes also.
This package provides an efficient implementation of the K-Means++ algorithm. For more information see (1) "kmeans++ the advantages of the k-means++ algorithm" by David Arthur and Sergei Vassilvitskii (2007), Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp. 1027-1035, and (2) "The Effectiveness of Lloyd-Type Methods for the k-Means Problem" by Rafail Ostrovsky, Yuval Rabani, Leonard J. Schulman and Chaitanya Swamy <doi:10.1145/2395116.2395117>.
Distance-sampling (<doi:10.1007/978-3-319-19219-2>) estimates density and abundance of survey targets (e.g., animals) when detection probability declines with distance. Distance-sampling is popular in ecology, especially when survey targets are observed from aerial platforms (e.g., airplane or drone), surface vessels (e.g., boat or truck), or along walking transects. Distance-sampling includes line-transect studies that measure observation distances as the closest approach of the sample route (transect) to the target (i.e., perpendicular off-transect distance), and point-transect studies that measure observation distances from stationary observers to the target (i.e., radial distance). The routines included here fit smooth (parametric) curves to histograms of observation distances and use those functions to compute effective sampling distances, density of targets in the surveyed area, and abundance of targets in a surrounding study area. Curve shapes include the half-normal, hazard rate, and negative exponential functions. Physical measurement units are required and used throughout to ensure density is reported correctly. The help files are extensive and have been vetted by multiple authors.
An implementation of sensitivity and robustness methods in Bayesian networks in R. It includes methods to perform parameter variations via a variety of co-variation schemes, to compute sensitivity functions and to quantify the dissimilarity of two Bayesian networks via distances and divergences. It further includes diagnostic methods to assess the goodness of fit of a Bayesian networks to data, including global, node and parent-child monitors. Reference: M. Leonelli, R. Ramanathan, R.L. Wilkerson (2022) <doi:10.1016/j.knosys.2023.110882>.
This package provides methods for powering cluster-randomized trials with two continuous co-primary outcomes using five key design techniques. Includes functions for calculating required sample size and statistical power. For more details on methodology, see Owen et al. (2025) <doi:10.1002/sim.70015>, Yang et al. (2022) <doi:10.1111/biom.13692>, Pocock et al. (1987) <doi:10.2307/2531989>, Vickerstaff et al. (2019) <doi:10.1186/s12874-019-0754-4>, and Li et al. (2020) <doi:10.1111/biom.13212>.
This package provides tools for Delphi's COVIDcast Epidata API: data access, maps and time series plotting, and basic signal processing. The API includes a collection of numerous indicators relevant to the COVID-19 pandemic in the United States, including official reports, de-identified aggregated medical claims data, large-scale surveys of symptoms and public behavior, and mobility data, typically updated daily and at the county level. All data sources are documented at <https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html>.
This package implements methods for calculating disproportionate impact: the percentage point gap, proportionality index, and the 80% index. California Community Colleges Chancellor's Office (2017). Percentage Point Gap Method. <https://www.cccco.edu/-/media/CCCCO-Website/About-Us/Divisions/Digital-Innovation-and-Infrastructure/Research/Files/PercentagePointGapMethod2017.ashx>
. California Community Colleges Chancellor's Office (2014). Guidelines for Measuring Disproportionate Impact in Equity Plans. <https://www.cccco.edu/-/media/CCCCO-Website/Files/DII/guidelines-for-measuring-disproportionate-impact-in-equity-plans-tfa-ada.pdf>.
Analysis and visualization of plant disease progress curve data. Functions for fitting two-parameter population dynamics models (exponential, monomolecular, logistic and Gompertz) to proportion data for single or multiple epidemics using either linear or no-linear regression. Statistical and visual outputs are provided to aid in model selection. Synthetic curves can be simulated for any of the models given the parameters. See Laurence V. Madden, Gareth Hughes, and Frank van den Bosch (2007) <doi:10.1094/9780890545058> for further information on the methods.
Comparisons of floating point numbers are problematic due to errors associated with the binary representation of decimal numbers. Despite being aware of these problems, people still use numerical methods that fail to account for these and other rounding errors (this pitfall is the first to be highlighted in Circle 1 of Burns (2012) The R Inferno <https://www.burns-stat.com/pages/Tutor/R_inferno.pdf>). This package provides new relational operators useful for performing floating point number comparisons with a set tolerance.
This package provides classes and functions to calculate various distance measures and routes in heterogeneous geographic spaces represented as grids. The package implements measures to model dispersal histories first presented by van Etten and Hijmans (2010) <doi:10.1371/journal.pone.0012060>. Least-cost distances as well as more complex distances based on (constrained) random walks can be calculated. The distances implemented in the package are used in geographical genetics, accessibility indicators, and may also have applications in other fields of geospatial analysis.