Implementation of single-source capture-recapture methods for population size estimation using zero-truncated, zero-one truncated and zero-truncated one-inflated Poisson, Geometric and Negative Binomial regression as well as Zelterman's and Chao's regression. Package includes point and interval estimators for the population size with variances estimated using analytical or bootstrap method. Details can be found in: van der Heijden et all. (2003) <doi:10.1191/1471082X03st057oa>, Böhning and van der Heijden (2019) <doi:10.1214/18-AOAS1232>, Böhning et al. (2020) Capture-Recapture Methods for the Social and Medical Sciences or Böhning and Friedl (2021) <doi:10.1007/s10260-021-00556-8>.
This package implements functionality for exploratory data analysis and nonparametric analysis of spatial data, mainly spatial point patterns, in the spatstat family of packages. Methods include quadrat counts, K-functions and their simulation envelopes, nearest neighbour distance and empty space statistics, Fry plots, pair correlation function, kernel smoothed intensity, relative risk estimation with cross-validated bandwidth selection, mark correlation functions, segregation indices, mark dependence diagnostics, and kernel estimates of covariate effects. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported.
We consider a multiple testing procedure used in many modern applications which is the q-value method proposed by Storey and Tibshirani (2003), <doi:10.1073/pnas.1530509100>. The q-value method is based on the false discovery rate (FDR), hence versions of the q-value method can be defined depending on which estimator of the proportion of true null hypotheses, p0, is plugged in the FDR estimator. We implement the q-value method based on two classical pi0 estimators, and furthermore, we propose and implement three versions of the q-value method for homogeneous discrete uniform P-values based on pi0 estimators which take into account the discrete distribution of the P-values.
The Readonly
module is an effective way to create non-modifiable variables. However, it's relatively slow.
The reason it's slow is that is implements the read-only-ness of variables via tied objects. This mechanism is inherently slow. Perl simply has to do a lot of work under the hood to make tied variables work.
This module corrects the speed problem, at least with respect to scalar variables. When Readonly::XS
is installed, Readonly
uses it to access the internals of scalar variables. Instead of creating a scalar variable object and tying it, Readonly
simply flips the SvREADONLY
bit in the scalar's FLAGS
structure.
This package provides a two-step Bayesian approach for mode inference following Cross, Hoogerheide, Labonne and van Dijk (2024) <doi:10.1016/j.econlet.2024.111579>). First, a mixture distribution is fitted on the data using a sparse finite mixture (SFM) Markov chain Monte Carlo (MCMC) algorithm. The number of mixture components does not have to be known; the size of the mixture is estimated endogenously through the SFM approach. Second, the modes of the estimated mixture at each MCMC draw are retrieved using algorithms specifically tailored for mode detection. These estimates are then used to construct posterior probabilities for the number of modes, their locations and uncertainties, providing a powerful tool for mode inference.
Sample sizes are often small due to hard to reach target populations, rare target events, time constraints, limited budgets, or ethical considerations. Two statistical methods with promising performance in small samples are the nonparametric bootstrap test with pooled resampling method, which is the focus of Dwivedi, Mallawaarachchi, and Alvarado (2017) <doi:10.1002/sim.7263>, and informative hypothesis testing, which is implemented in the restriktor package. The npboottprmFBar
package uses the nonparametric bootstrap test with pooled resampling method to implement informative hypothesis testing. The bootFbar()
function can be used to analyze data with this method and the persimon()
function can be used to conduct performance simulations on type-one error and statistical power.
This package contains specialised analyses and visualisation tools for behavior change science. These facilitate conducting determinant studies (for example, using confidence interval-based estimation of relevance, CIBER, or CIBERlite plots, see Crutzen, Noijen & Peters (2017) <doi:10/ghtfz9>), systematically developing, reporting, and analysing interventions (for example, using Acyclic Behavior Change Diagrams), and reporting about intervention effectiveness (for example, using the Numbers Needed for Change, see Gruijters & Peters (2017) <doi:10/jzkt>), and computing the required sample size (using the Meaningful Change Definition, see Gruijters & Peters (2020) <doi:10/ghpnx8>). This package is especially useful for researchers in the field of behavior change or health psychology and to behavior change professionals such as intervention developers and prevention workers.
Calculate the colocalization index, NSInC
, in two different ways as described in the paper (Liu et al., 2019. Manuscript submitted for publication.) for multiple-species spatial data which contain the precise locations and membership of each spatial point. The two main functions are nsinc.d()
and nsinc.z()
. They provide the Pearsonâ s correlation coefficients of signal proportions in different memberships within a concerned proximity of every signal (or every base signal if single direction colocalization is considered) across all (base) signals using two different ways of normalization. The proximity sizes could be an individual value or a range of values, where the default ranges of values are different for the two functions.
This is an extremely fast implementation of a Naive Bayes classifier. This package is currently the only package that supports a Bernoulli distribution, a Multinomial distribution, and a Gaussian distribution, making it suitable for both binary features, frequency counts, and numerical features. Another feature is the support of a mix of different event models. Only numerical variables are allowed, however, categorical variables can be transformed into dummies and used with the Bernoulli distribution. The implementation is largely based on the paper "A comparison of event models for Naive Bayes anti-spam e-mail filtering" written by K.M. Schneider (2003) <doi:10.3115/1067807.1067848>. Any issues can be submitted to: <https://github.com/mskogholt/fastNaiveBayes/issues>
.
For a given test market find the best control markets using time series matching and analyze the impact of an intervention. The intervention could be a marketing event or some other local business tactic that is being tested. The workflow implemented in the Market Matching package utilizes dynamic time warping (the dtw package) to do the matching and the CausalImpact
package to analyze the causal impact. In fact, this package can be considered a "workflow wrapper" for those two packages. In addition, if you don't have a chosen set of test markets to match, the Market Matching package can provide suggested test/control market pairs and pseudo prospective power analysis (measuring causal impact at fake interventions).
Selective sweep is a biological phenomenon in which genetic variation between neighboring beneficial mutant alleles is swept away due to the effect of genetic hitchhiking. Detection of selective sweep is not well acquainted as well as it is a laborious job. This package is a user friendly approach for detecting selective sweep in genomic regions. It uses a Random Forest based machine learning approach to predict selective sweep from VCF files as an input. Input of this function, train data and new data, can be computed using the project <https://github.com/AbhikSarkar1999/SweepDiscovery>
in GitHub
'. This package has been developed by using the concept of Pavlidis and Alachiotis (2017) <doi:10.1186/s40709-017-0064-0>.
This package provides a package with focus on analysis of discrete regions of the genome. This package is useful for investigation of one or a few genes using Affymetrix data, since it will extract probe level data using the Affymetrix Power Tools application and wrap these data into a ProbeLevelSet
. A ProbeLevelSet
directly extends the expressionSet
, but includes additional information about the sequence of each probe and the probe set it is derived from. The package includes a number of functions used for plotting these probe level data as a function of location along sequences of mRNA-strands
. This can be used for analysis of variable splicing, and is especially well suited for use with exon-array data.
The rtkinenc
package is functionally similar to the standard LaTeX package inputenc
: both set up active characters so that an input character outside the range of 7-bit visible ASCII is converted into one or more corresponding LaTeX commands. The main difference lies in that rtkinenc
allows the user to specify a fallback procedure to use when the text command corresponding to some input character isn't available. Names of commands in rtkinenc
have been selected so that it can read inputenc
encoding definition files, and the aim is that rtkinenc
should be backwards compatible with inputenc
. rtkinenc
is not a new version of inputenc
though, nor is it part of standard LaTeX.
Metadynamics is a state of the art biomolecular simulation technique. Plumed Tribello, G.A. et al. (2014) <doi:10.1016/j.cpc.2013.09.018> program makes it possible to perform metadynamics using various simulation codes. The results of metadynamics done in Plumed can be analyzed by metadynminer'. The package metadynminer reads 1D and 2D metadynamics hills files from Plumed package. As an addendum, metadynaminer3d is used to visualize 3D hills. It uses a fast algorithm by Hosek, P. and Spiwok, V. (2016) <doi:10.1016/j.cpc.2015.08.037> to calculate a free energy surface from hills. Minima can be located and plotted on the free energy surface. Free energy surfaces and minima can be plotted to produce publication quality images.
This package creates a full rank matrix out of a given matrix. The intended use is for one-hot encoded design matrices that should be used in linear models to ensure that significant associations can be correctly interpreted. However, fullRankMatrix
can be applied to any matrix to make it full rank. It removes columns with only 0's, merges duplicated columns and discovers linearly dependent columns and replaces them with linearly independent columns that span the space of the original columns. Columns are renamed to reflect those modifications. This results in a full rank matrix that can be used as a design matrix in linear models. The algorithm and some functions are inspired by Kuhn, M. (2008) <doi:10.18637/jss.v028.i05>.
Skeletal myoblasts undergo a well-characterized sequence of morphological and transcriptional changes during differentiation. In this experiment, primary human skeletal muscle myoblasts (HSMM) were expanded under high mitogen conditions (GM) and then differentiated by switching to low-mitogen media (DM). RNA-Seq libraries were sequenced from each of several hundred cells taken over a time-course of serum-induced differentiation. Between 49 and 77 cells were captured at each of four time points (0, 24, 48, 72 hours) following serum switch using the Fluidigm C1 microfluidic system. RNA from each cell was isolated and used to construct mRNA-Seq libraries, which were then sequenced to a depth of ~4 million reads per library, resulting in a complete gene expression profile for each cell.
Allows to plot a number of information related to the interpretation of Correspondence Analysis results. It provides the facility to plot the contribution of rows and columns categories to the principal dimensions, the quality of points display on selected dimensions, the correlation of row and column categories to selected dimensions, etc. It also allows to assess which dimension(s) is important for the data structure interpretation by means of different statistics and tests. The package also offers the facility to plot the permuted distribution of the table total inertia as well as of the inertia accounted for by pairs of selected dimensions. Different facilities are also provided that aim to produce interpretation-oriented scatterplots. Reference: Alberti 2015 <doi:10.1016/j.softx.2015.07.001>.
This package implements network analysis and graph theory measures used in neuroscience, cognitive science, and psychology. Methods include various filtering methods and approaches such as threshold, dependency (Kenett, Tumminello, Madi, Gur-Gershgoren, Mantegna, & Ben-Jacob, 2010 <doi:10.1371/journal.pone.0015032>), Information Filtering Networks (Barfuss, Massara, Di Matteo, & Aste, 2016 <doi:10.1103/PhysRevE.94.062306>
), and Efficiency-Cost Optimization (Fallani, Latora, & Chavez, 2017 <doi:10.1371/journal.pcbi.1005305>). Brain methods include the recently developed Connectome Predictive Modeling (see references in package). Also implements several network measures including local network characteristics (e.g., centrality), community-level network characteristics (e.g., community centrality), global network characteristics (e.g., clustering coefficient), and various other measures associated with the reliability and reproducibility of network analysis.
Tests, utilities, and case studies for analyzing significance in clustered binary matched-pair data. The central function clust.bin.pair uses one of several tests to calculate a Chi-square statistic. Implemented are the tests Eliasziw (1991) <doi:10.1002/sim.4780101211>, Obuchowski (1998) <doi:10.1002/(SICI)1097-0258(19980715)17:13%3C1495::AID-SIM863%3E3.0.CO;2-I>, Durkalski (2003) <doi:10.1002/sim.1438>, and Yang (2010) <doi:10.1002/bimj.201000035> with McNemar
(1947) <doi:10.1007/BF02295996> included for comparison. The utility functions nested.to.contingency and paired.to.contingency convert data between various useful formats. Thyroids and psychiatry are the canonical datasets from Obuchowski and Petryshen (1989) <doi:10.1016/0165-1781(89)90196-0> respectively.
This package contains methods to generate and evaluate semi-artificial data sets. Based on a given data set different methods learn data properties using machine learning algorithms and generate new data with the same properties. The package currently includes the following data generators: i) a RBF network based generator using rbfDDA()
from package RSNNS', ii) a Random Forest based generator for both classification and regression problems iii) a density forest based generator for unsupervised data Data evaluation support tools include: a) single attribute based statistical evaluation: mean, median, standard deviation, skewness, kurtosis, medcouple, L/RMC, KS test, Hellinger distance b) evaluation based on clustering using Adjusted Rand Index (ARI) and FM c) evaluation based on classification performance with various learning models, e.g., random forests.
Org-remark lets you highlight and annotate text files, websites, EPUB books and Info documentation using Org mode.
Features:
Highlight and annotate any text file. The highlights and notes are kept in an Org file as the plain text database. This lets you easily manage your marginal notes and use the built-in Org facilities on them – e.g. create a sparse tree based on the category of the notes
Create your your own highlighter pens with different colors, type (e.g. underline, squiggle, etc. optionally with Org’s category for search and filter on your highlights and notes)
Have the same highlighting and annotating functionality for websites (when browsing with EWW), EPUB books with
nov.el
, Info documentation
The Ata method (Yapar et al. (2019) <doi:10.15672/hujms.461032>), an alternative to exponential smoothing (described in Yapar (2016) <doi:10.15672/HJMS.201614320580>, Yapar et al. (2017) <doi:10.15672/HJMS.2017.493>), is a new univariate time series forecasting method which provides innovative solutions to issues faced during the initialization and optimization stages of existing forecasting methods. Forecasting performance of the Ata method is superior to existing methods both in terms of easy implementation and accurate forecasting. It can be applied to non-seasonal or seasonal time series which can be decomposed into four components (remainder, level, trend and seasonal). This methodology performed well on the M3 and M4-competition data. This package was written based on Ali Sabri Taylanâ s PhD
dissertation.
Following Arroyo-Maté-Roque (2006), the function calculates the distance between rows or columns of the dataset using the generalized Minkowski metric as described by Ichino-Yaguchi (1994). The distance measure gives more weight to differences between quartiles than to differences between extremes, making it less sensitive to outliers. Further,the function calculates the silhouette width (Rousseeuw 1987) for different numbers of clusters and selects the number of clusters that maximizes the average silhouette width, unless a specific number of clusters is provided by the user. The approach implemented in this package is based on the following publications: Rousseeuw (1987) <doi:10.1016/0377-0427(87)90125-7>; Ichino-Yaguchi (1994) <doi:10.1109/21.286391>; Arroyo-Maté-Roque (2006) <doi:10.1007/3-540-34416-0_7>.
Analysis of Surface Plasmon Resonance (SPR) and Biolayer Interferometry data, with automations for high-throughput SPR. This version of the package fits the 1: 1 binding model, with and without bulkshift. It offers optional local or global Rmax fitting. The user must provide a sample sheet and a Carterra output file in Carterra's current format. There is a utility function to convert from Carterra's old output format. The user may run a custom pipeline or use the provided Runscript', which will produce a pdf file containing fitted Rmax, ka, kd and standard errors, a plot of the sensorgram and fits, and a plot of residuals. The script will also produce a .csv file with all of the relevant parameters for each spot on the SPR chip.