Efficient tools for parsing and standardizing Australian addresses from textual data. It utilizes optimized algorithms to accurately identify and extract components of addresses, such as street names, types, and postcodes, especially for large batched data in contexts where sending addresses to internet services may be slow or inappropriate. The core functionality is built on fast string processing techniques to handle variations in address formats and abbreviations commonly found in Australian address data. Designed for data scientists, urban planners, and logistics analysts, the package facilitates the cleaning and normalization of address information, supporting better data integration and analysis in urban studies, geography, and related fields.
Estimates the relative transmission probabilities between cases in an infectious disease outbreak or cluster using naive Bayes. Included are various functions to use these probabilities to estimate transmission parameters such as the generation/serial interval and reproductive number as well as finding the contribution of covariates to the probabilities and visualizing results. The ideal use is for an infectious disease dataset with metadata on the majority of cases but more informative data such as contact tracing or pathogen whole genome sequencing on only a subset of cases. For a detailed description of the methods see Leavitt et al. (2020) <doi:10.1093/ije/dyaa031>.
The user has the option to utilize the two-dimensional density estimation techniques called smoothed density published by Eilers and Goeman (2004) <doi:10.1093/bioinformatics/btg454>, and pareto density which was evaluated for univariate data by Thrun, Gehlert and Ultsch, 2020 <doi:10.1371/journal.pone.0238835>. Moreover, it provides visualizations of the density estimation in the form of two-dimensional scatter plots in which the points are color-coded based on increasing density. Colors are defined by the one-dimensional clustering technique called 1D distribution cluster algorithm (DDCAL) published by Lux and Rinderle-Ma (2023) <doi:10.1007/s00357-022-09428-6>.
Rust has lots of builtin traits that are implemented for its basic types, such as Add
, Not
, From
or Display
. However, when wrapping these types inside your own structs or enums you lose the implementations of these traits and are required to recreate them. This is especially annoying when your own structures are very simple, such as when using the commonly advised newtype pattern (e.g. MyInt(i32)
).
This library tries to remove these annoyances and the corresponding boilerplate code. It does this by allowing you to derive lots of commonly used traits for both structs and enums.
Rust has lots of builtin traits that are implemented for its basic types, such as Add
, Not
, From
or Display
. However, when wrapping these types inside your own structs or enums you lose the implementations of these traits and are required to recreate them. This is especially annoying when your own structures are very simple, such as when using the commonly advised newtype pattern (e.g. MyInt(i32)
).
This library tries to remove these annoyances and the corresponding boilerplate code. It does this by allowing you to derive lots of commonly used traits for both structs and enums.
Programming vaccine specific Clinical Data Interchange Standards Consortium (CDISC) compliant Analysis Data Model (ADaM
) datasets in R'. Flat model is followed as per Center for Biologics Evaluation and Research (CBER) guidelines for creating vaccine specific domains. ADaM
datasets are a mandatory part of any New Drug or Biologics License Application submitted to the United States Food and Drug Administration (FDA). Analysis derivations are implemented in accordance with the "Analysis Data Model Implementation Guide" (CDISC Analysis Data Model Team (2021), <https://www.cdisc.org/standards/foundational/adam/adamig-v1-3-release-package>). The package is an extension package of the admiral package.
In computationally demanding analysis projects, statisticians and data scientists asynchronously deploy long-running tasks to distributed systems, ranging from traditional clusters to cloud services. The crew.aws.batch package extends the mirai'-powered crew package with a worker launcher plugin for AWS Batch. Inspiration also comes from packages mirai by Gao (2023) <https://github.com/r-lib/mirai>, future by Bengtsson (2021) <doi:10.32614/RJ-2021-048>, rrq by FitzJohn
and Ashton (2023) <https://github.com/mrc-ide/rrq>, clustermq by Schubert (2019) <doi:10.1093/bioinformatics/btz284>), and batchtools by Lang, Bischl, and Surmann (2017). <doi:10.21105/joss.00135>.
Sampler and post-processing functions for semi-parametric Bayesian infinite factor models, motivated by the Multiplicative Gamma Shrinkage Prior of Bhattacharya and Dunson (2011) <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3419391/>. Contains component C++ functions for building samplers for linear and 2-way interaction factor models using the multiplicative gamma and Dirichlet-Laplace shrinkage priors. The package also contains post processing functions to return matrices that display rotational ambiguity to identifiability through successive application of orthogonalization procedures and resolution of column label and sign switching. This package was developed with the support of the National Institute of Environmental Health Sciences grant 1R01ES028804-01.
Ordinal patterns describe the dynamics of a time series by looking at the ranks of subsequent observations. By comparing ordinal patterns of two times series, Schnurr (2014) <doi:10.1007/s00362-013-0536-8> defines a robust and non-parametric dependence measure: the ordinal pattern coefficient. Functions to calculate this and a method to detect a change in the pattern coefficient proposed in Schnurr and Dehling (2017) <doi:10.1080/01621459.2016.1164706> are provided. Furthermore, the package contains a function for calculating the ordinal pattern frequencies. Generalized ordinal patterns as proposed by Schnurr and Fischer (2022) <doi:10.1016/j.csda.2022.107472> are also considered.
The package provides a consistent way of producing references throughout a project. Enough flexibility is provided to make local changes to a single reference. The user can configure their own setup. The package offers a direct interface to varioref (for use, for example, in large projects such as a series of books, or a multivolume thesis written as a series of documents), and name references from the nameref package may be incorporated with ease. For large projects such as a series of books or a multi volume thesis, written as freestanding documents, a facility is provided to interface to the xr
package for external document references.
The SPARRA risk score (Scottish Patients At Risk of admission and Re-Admission) estimates yearly risk of emergency hospital admission using electronic health records on a monthly basis for most of the Scottish population. This package implements a suite of functions used to analyse the behaviour and performance of the score, focusing particularly on differential performance over demographically-defined groups. It includes useful utility functions to plot receiver-operator-characteristic, precision-recall and calibration curves, draw stock human figures, estimate counterfactual quantities without the need to re-compute risk scores, to simulate a semi-realistic dataset. Our manuscript can be found at: <doi:10.1371/journal.pdig.0000675>.
Implementation of single-source capture-recapture methods for population size estimation using zero-truncated, zero-one truncated and zero-truncated one-inflated Poisson, Geometric and Negative Binomial regression as well as Zelterman's and Chao's regression. Package includes point and interval estimators for the population size with variances estimated using analytical or bootstrap method. Details can be found in: van der Heijden et all. (2003) <doi:10.1191/1471082X03st057oa>, Böhning and van der Heijden (2019) <doi:10.1214/18-AOAS1232>, Böhning et al. (2020) Capture-Recapture Methods for the Social and Medical Sciences or Böhning and Friedl (2021) <doi:10.1007/s10260-021-00556-8>.
This package implements functionality for exploratory data analysis and nonparametric analysis of spatial data, mainly spatial point patterns, in the spatstat family of packages. Methods include quadrat counts, K-functions and their simulation envelopes, nearest neighbour distance and empty space statistics, Fry plots, pair correlation function, kernel smoothed intensity, relative risk estimation with cross-validated bandwidth selection, mark correlation functions, segregation indices, mark dependence diagnostics, and kernel estimates of covariate effects. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported.
We consider a multiple testing procedure used in many modern applications which is the q-value method proposed by Storey and Tibshirani (2003), <doi:10.1073/pnas.1530509100>. The q-value method is based on the false discovery rate (FDR), hence versions of the q-value method can be defined depending on which estimator of the proportion of true null hypotheses, p0, is plugged in the FDR estimator. We implement the q-value method based on two classical pi0 estimators, and furthermore, we propose and implement three versions of the q-value method for homogeneous discrete uniform P-values based on pi0 estimators which take into account the discrete distribution of the P-values.
The Readonly
module is an effective way to create non-modifiable variables. However, it's relatively slow.
The reason it's slow is that is implements the read-only-ness of variables via tied objects. This mechanism is inherently slow. Perl simply has to do a lot of work under the hood to make tied variables work.
This module corrects the speed problem, at least with respect to scalar variables. When Readonly::XS
is installed, Readonly
uses it to access the internals of scalar variables. Instead of creating a scalar variable object and tying it, Readonly
simply flips the SvREADONLY
bit in the scalar's FLAGS
structure.
This package provides a two-step Bayesian approach for mode inference following Cross, Hoogerheide, Labonne and van Dijk (2024) <doi:10.1016/j.econlet.2024.111579>). First, a mixture distribution is fitted on the data using a sparse finite mixture (SFM) Markov chain Monte Carlo (MCMC) algorithm. The number of mixture components does not have to be known; the size of the mixture is estimated endogenously through the SFM approach. Second, the modes of the estimated mixture at each MCMC draw are retrieved using algorithms specifically tailored for mode detection. These estimates are then used to construct posterior probabilities for the number of modes, their locations and uncertainties, providing a powerful tool for mode inference.
Sample sizes are often small due to hard to reach target populations, rare target events, time constraints, limited budgets, or ethical considerations. Two statistical methods with promising performance in small samples are the nonparametric bootstrap test with pooled resampling method, which is the focus of Dwivedi, Mallawaarachchi, and Alvarado (2017) <doi:10.1002/sim.7263>, and informative hypothesis testing, which is implemented in the restriktor package. The npboottprmFBar
package uses the nonparametric bootstrap test with pooled resampling method to implement informative hypothesis testing. The bootFbar()
function can be used to analyze data with this method and the persimon()
function can be used to conduct performance simulations on type-one error and statistical power.
This package contains specialised analyses and visualisation tools for behavior change science. These facilitate conducting determinant studies (for example, using confidence interval-based estimation of relevance, CIBER, or CIBERlite plots, see Crutzen, Noijen & Peters (2017) <doi:10/ghtfz9>), systematically developing, reporting, and analysing interventions (for example, using Acyclic Behavior Change Diagrams), and reporting about intervention effectiveness (for example, using the Numbers Needed for Change, see Gruijters & Peters (2017) <doi:10/jzkt>), and computing the required sample size (using the Meaningful Change Definition, see Gruijters & Peters (2020) <doi:10/ghpnx8>). This package is especially useful for researchers in the field of behavior change or health psychology and to behavior change professionals such as intervention developers and prevention workers.
Calculate the colocalization index, NSInC
, in two different ways as described in the paper (Liu et al., 2019. Manuscript submitted for publication.) for multiple-species spatial data which contain the precise locations and membership of each spatial point. The two main functions are nsinc.d()
and nsinc.z()
. They provide the Pearsonâ s correlation coefficients of signal proportions in different memberships within a concerned proximity of every signal (or every base signal if single direction colocalization is considered) across all (base) signals using two different ways of normalization. The proximity sizes could be an individual value or a range of values, where the default ranges of values are different for the two functions.
This is an extremely fast implementation of a Naive Bayes classifier. This package is currently the only package that supports a Bernoulli distribution, a Multinomial distribution, and a Gaussian distribution, making it suitable for both binary features, frequency counts, and numerical features. Another feature is the support of a mix of different event models. Only numerical variables are allowed, however, categorical variables can be transformed into dummies and used with the Bernoulli distribution. The implementation is largely based on the paper "A comparison of event models for Naive Bayes anti-spam e-mail filtering" written by K.M. Schneider (2003) <doi:10.3115/1067807.1067848>. Any issues can be submitted to: <https://github.com/mskogholt/fastNaiveBayes/issues>
.
For a given test market find the best control markets using time series matching and analyze the impact of an intervention. The intervention could be a marketing event or some other local business tactic that is being tested. The workflow implemented in the Market Matching package utilizes dynamic time warping (the dtw package) to do the matching and the CausalImpact
package to analyze the causal impact. In fact, this package can be considered a "workflow wrapper" for those two packages. In addition, if you don't have a chosen set of test markets to match, the Market Matching package can provide suggested test/control market pairs and pseudo prospective power analysis (measuring causal impact at fake interventions).
Modern native UTF-8 engines such as XeTeX and LuaTeX need hyphenation patterns in UTF-8 format, whereas older systems require hyphenation patterns in the 8-bit encoding of the font in use (such encodings are codified in the LaTeX scheme with names like OT1, T2A, TS1, OML, LY1, etc). The present package offers a collection of conversions of existing patterns to UTF-8 format, together with converters for use with 8-bit fonts in older systems.
This Guix-specific package provides hyphenation patterns for all languages supported in TeX Live. It is a strict super-set of codehyphen-base package and should be preferred to it whenever a package would otherwise depend on hyph-utf8
.
Extending the base classes and methods of EnsembleBase
package for Penalized-Regression-based (Ridge and Lasso) integration of base learners. Default implementation uses cross-validation error to choose the optimal lambda (shrinkage parameter) for the final predictor. The package takes advantage of the file method provided in EnsembleBase
package for writing estimation objects to disk in order to circumvent RAM bottleneck. Special save and load methods are provided to allow estimation objects to be saved to permanent files on disk, and to be loaded again into temporary files in a later R session. Users and developers can extend the package by extending the generic methods and classes provided in EnsembleBase
package as well as this package.
Selective sweep is a biological phenomenon in which genetic variation between neighboring beneficial mutant alleles is swept away due to the effect of genetic hitchhiking. Detection of selective sweep is not well acquainted as well as it is a laborious job. This package is a user friendly approach for detecting selective sweep in genomic regions. It uses a Random Forest based machine learning approach to predict selective sweep from VCF files as an input. Input of this function, train data and new data, can be computed using the project <https://github.com/AbhikSarkar1999/SweepDiscovery>
in GitHub
'. This package has been developed by using the concept of Pavlidis and Alachiotis (2017) <doi:10.1186/s40709-017-0064-0>.