This package provides a framework for extracting semantic motifs around entities in textual data. It implements an entity-centered semantic grammar that distinguishes six classes of motifs: actions of an entity, treatments of an entity, agents acting upon an entity, patients acted upon by an entity, characterizations of an entity, and possessions of an entity. Motifs are identified by applying a set of extraction rules to a parsed text object that includes part-of-speech tags and dependency annotations - such as those generated by spacyr'. For further reference, see: Stuhler (2022) <doi: 10.1177/00491241221099551>.
Fit Cox non-proportional hazards models with time-varying coefficients. Both unpenalized procedures (Newton and proximal Newton) and penalized procedures (P-splines and smoothing splines) are included using B-spline basis functions for estimating time-varying coefficients. For penalized procedures, cross validations, mAIC, TIC or GIC are implemented to select tuning parameters. Utilities for carrying out post-estimation visualization, summarization, point-wise confidence interval and hypothesis testing are also provided. For more information, see Wu et al. (2022) <doi: 10.1007/s10985-021-09544-2> and Luo et al. (2023) <doi:10.1177/09622802231181471>.
This package provides functions for computing moments and coefficients related to the Beta-Wishart and Inverse Beta-Wishart distributions. It includes functions for calculating the expectation of matrix-valued functions of the Beta-Wishart distribution, coefficient matrices C_k and H_k, expectation of matrix-valued functions of the inverse Beta-Wishart distribution, and coefficient matrices \tildeC_k and \tildeH_k. For more details, refer Hillier and Kan (2024) <https://www-2.rotman.utoronto.ca/~kan/papers/wishmom.pdf>, "On the Expectations of Equivariant Matrix-valued Functions of Wishart and Inverse Wishart Matrices".
CARD is a reference-based deconvolution method that estimates cell type composition in spatial transcriptomics based on cell type specific expression information obtained from a reference scRNA-seq data. A key feature of CARD is its ability to accommodate spatial correlation in the cell type composition across tissue locations, enabling accurate and spatially informed cell type deconvolution as well as refined spatial map construction. CARD relies on an efficient optimization algorithm for constrained maximum likelihood estimation and is scalable to spatial transcriptomics with tens of thousands of spatial locations and tens of thousands of genes.
This package provides statistical methods for analyzing experimental evaluation of the causal impacts of algorithmic recommendations on human decisions developed by Imai, Jiang, Greiner, Halen, and Shin (2023) <doi:10.1093/jrsssa/qnad010> and Ben-Michael, Greiner, Huang, Imai, Jiang, and Shin (2024) <doi:10.48550/arXiv.2403.12108>. The data used for this paper, and made available here, are interim, based on only half of the observations in the study and (for those observations) only half of the study follow-up period. We use them only to illustrate methods, not to draw substantive conclusions.
Bayesian inference under log-normality assumption must be performed very carefully. In fact, under the common priors for the variance, useful quantities in the original data scale (like mean and quantiles) do not have posterior moments that are finite (Fabrizi et al. 2012 <doi:10.1214/12-BA733>). This package allows to easily carry out a proper Bayesian inferential procedure by fixing a suitable distribution (the generalized inverse Gaussian) as prior for the variance. Functions to estimate several kind of means (unconditional, conditional and conditional under a mixed model) and quantiles (unconditional and conditional) are provided.
Cellular cooperation compromises the plating efficiency-based analysis of clonogenic survival data. This tool provides functions that enable a robust analysis of colony formation assay (CFA) data in presence or absence of cellular cooperation. The implemented method has been described in Brix et al. (2020). (Brix, N., Samaga, D., Hennel, R. et al. "The clonogenic assay: robustness of plating efficiency-based analysis is strongly compromised by cellular cooperation." Radiat Oncol 15, 248 (2020). <doi:10.1186/s13014-020-01697-y>) Power regression for parameter estimation, calculation of survival fractions, uncertainty analysis and plotting functions are provided.
Enable researchers to adjust identification rates using the 1/(lineup size) method, generate the full receiver operating characteristic (ROC) curves, and statistically compare the area under the curves (AUC). References: Yueran Yang & Andrew Smith. (2020). "fullROC: An R package for generating and analyzing eyewitness-lineup ROC curves". <doi:10.13140/RG.2.2.20415.94885/1> , Andrew Smith, Yueran Yang, & Gary Wells. (2020). "Distinguishing between investigator discriminability and eyewitness discriminability: A method for creating full receiver operating characteristic curves of lineup identification performance". Perspectives on Psychological Science, 15(3), 589-607. <doi:10.1177/1745691620902426>.
We implemented multiple tests based on the restricted mean survival time (RMST) for general factorial designs as described in Munko et al. (2024) <doi:10.1002/sim.10017>. Therefore, an asymptotic test, a groupwise bootstrap test, and a permutation test are incorporated with a Wald-type test statistic. The asymptotic and groupwise bootstrap test take the asymptotic exact dependence structure of the test statistics into account to gain more power. Furthermore, confidence intervals for RMST contrasts can be calculated and plotted and a stepwise extension that can improve the power of the multiple tests is available.
Fast implementations of mathematical operations and performance metrics for multi-objective optimization, including filtering and ranking of dominated vectors according to Pareto optimality, hypervolume metric, C.M. Fonseca, L. Paquete, M. López-Ibáñez (2006) <doi:10.1109/CEC.2006.1688440>, epsilon indicator, inverted generational distance, computation of the empirical attainment function, V.G. da Fonseca, C.M. Fonseca, A.O. Hall (2001) <doi:10.1007/3-540-44719-9_15>, and Vorob'ev threshold, expectation and deviation, M. Binois, D. Ginsbourger, O. Roustant (2015) <doi:10.1016/j.ejor.2014.07.032>, among others.
An implementation of an S3 class based on a double vector for storing and displaying precision teaching measures, representing a growing or a decaying (multiplicative) change between two frequencies. The main format method allows researchers to display measures (including data.frame) that respect the established conventions in the precision teaching community (i.e., prefixed multiplication or division symbol, displayed number <= 1). Basic multiplication and division methods are allowed and other useful functions are provided for creating, converting or inverting precision teaching measures. For more details, see Pennypacker, Gutierrez and Lindsley (2003, ISBN: 1-881317-13-7).
Like similar profiling tools, the proffer package automatically detects sources of slowness in R code. The distinguishing feature of proffer is its utilization of pprof', which supplies interactive visualizations that are efficient and easy to interpret. Behind the scenes, the profile package converts native Rprof() data to a protocol buffer that pprof understands. For the documentation of proffer', visit <https://r-prof.github.io/proffer/>. To learn about the implementations and methodologies of pprof', profile', and protocol buffers, visit <https://github.com/google/pprof>. <https://protobuf.dev>, and <https://github.com/r-prof/profile>, respectively.
This package provides functions for stabilometric signal quantification. The input is a data frame containing the x, y coordinates of the center-of-pressure displacement. Jose Magalhaes de Oliveira (2017) <doi:10.3758/s13428-016-0706-4> "Statokinesigram normalization method"; T E Prieto, J B Myklebust, R G Hoffmann, E G Lovett, B M Myklebust (1996) <doi:10.1109/10.532130> "Measures of postural steadiness: Differences between healthy young and elderly adults"; L F Oliveira et al (1996) <doi:10.1088/0967-3334/17/4/008> "Calculation of area of stabilometric signals using principal component analisys".
This package provides infrastructure for handling running, cycling and swimming data from GPS-enabled tracking devices within R. The package provides methods to extract, clean and organise workout and competition data into session-based and unit-aware data objects of class trackeRdata (S3 class). The information can then be visualised, summarised, and analysed through flexible and extensible methods. Frick and Kosmidis (2017) <doi: 10.18637/jss.v082.i07>, which is updated and maintained as one of the vignettes, provides detailed descriptions of the package and its methods, and real-data demonstrations of the package functionality.
Estimates the Vevea and Hedges (1995) weight-function model. By specifying arguments, users can also estimate the modified model described in Vevea and Woods (2005), which may be more practical with small datasets. Users can also specify moderators to estimate a linear model. The package functionality allows users to easily extract the results of these analyses as R objects for other uses. In addition, the package includes a function to launch both models as a Shiny application. Although the Shiny application is also available online, this function allows users to launch it locally if they choose.
Structural equation modeling (SEM) has a long history of representing models graphically as path diagrams. The semPlot package for R fills the gap between advanced, but time-consuming, graphical software and the limited graphics produced automatically by SEM software. In addition, semPlot offers more functionality than drawing path diagrams: it can act as a common ground for importing SEM results into R. Any result usable as input to semPlot can also be represented in any of the three popular SEM frame-works, as well as translated to input syntax for the R packages sem and lavaan.
This package detects significant differentially methylated regions (for both qualitative and quantitative traits), using a scan statistic with underlying Poisson heuristics. The scan statistic will depend on a sequence of window sizes (# of CpGs within each window) and on a threshold for each window size. This threshold can be calculated by three different means: i) analytically using Siegmund et.al (2012) solution (preferred), ii) an important sampling as suggested by Zhang (2008), and a iii) full MCMC modeling of the data, choosing between a number of different options for modeling the dependency between each CpG.
StabMap performs single cell mosaic data integration by first building a mosaic data topology, and for each reference dataset, traverses the topology to project and predict data onto a common embedding. Mosaic data should be provided in a list format, with all relevant features included in the data matrices within each list object. The output of stabMap is a joint low-dimensional embedding taking into account all available relevant features. Expression imputation can also be performed using the StabMap embedding and any of the original data matrices for given reference and query cell lists.
The `TrIdent` R package automates the analysis of transductomics data by detecting, classifying, and characterizing read coverage patterns associated with potential transduction events. Transductomics is a DNA sequencing-based method for the detection and characterization of transduction events in pure cultures and complex communities. Transductomics relies on mapping sequencing reads from a viral-like particle (VLP)-fraction of a sample to contigs assembled from the metagenome (whole-community) of the same sample. Reads from bacterial DNA carried by VLPs will map back to the bacterial contigs of origin creating read coverage patterns indicative of ongoing transduction.
Providing the functions for communicating with Amazon Web Services(AWS) Elastic Compute Cloud(EC2) and Elastic Container Service(ECS). The functions will have the prefix ecs_ or ec2_ depending on the class of the API. The request will be sent via the REST API and the parameters are given by the function argument. The credentials can be set via aws_set_credentials'. The EC2 documentation can be found at <https://docs.aws.amazon.com/AWSEC2/latest/APIReference/Welcome.html> and ECS can be found at <https://docs.aws.amazon.com/AmazonECS/latest/APIReference/Welcome.html>.
This toolkit implements a numerical solution algorithm to invert a quality of life measure from observed data. Unlike the traditional Rosen-Roback measure, this measure accounts for mobility frictionsâ generated by idiosyncratic tastes and local ties â and trade frictions â generated by trade costs and non-tradable services, thereby reducing non-classical measurement error. The QoL measure is based on Ahlfeldt, Bald, Roth, Seidel (2024) <https://econpapers.repec.org/RePEc:boc:bocode:s459382> "Measuring Quality of Life under Spatial Frictions". When using this programme or the toolkit in your work, please cite the paper.
Concept drift refers to the change in the data distribution or in the relationships between variables over time. drifter calculates distances between variable distributions or variable relations and identifies both types of drift. Key functions are: calculate_covariate_drift() checks distance between corresponding variables in two datasets, calculate_residuals_drift() checks distance between residual distributions for two models, calculate_model_drift() checks distance between partial dependency profiles for two models, check_drift() executes all checks against drift. drifter is a part of the DrWhy.AI universe (Biecek 2018) <arXiv:1806.08915>.
Descriptive analysis is essential for publishing medical articles. This package provides an easy way to conduct the descriptive analysis. 1. Both numeric and factor variables can be handled. For numeric variables, normality test will be applied to choose the parametric and nonparametric test. 2. Both two or more groups can be handled. For groups more than two, the post hoc test will be applied, Tukey for the numeric variables and FDR for the factor variables. 3. T test, ANOVA or Fisher test can be forced to apply. 4. Mean and standard deviation can be forced to display.
Developed by CDC/ATSDR (Centers for Disease Control and Prevention/ Agency for Toxic Substances and Disease Registry), Social Vulnerability Index (SVI) serves as a tool to assess the resilience of communities by taking into account socioeconomic and demographic factors. Provided with year(s), region(s) and a geographic level of interest, findSVI retrieves required variables from US census data and calculates SVI for communities in the specified area based on CDC/ATSDR SVI documentation. Reference for the calculation methods: Flanagan BE, Gregory EW, Hallisey EJ, Heitgerd JL, Lewis B (2011) <doi:10.2202/1547-7355.1792>.