The Sequence of Physical Processes (SPP) framework is a way of interpreting the transient data derived from oscillatory rheological tests. It is designed to allow both the linear and non-linear deformation regimes to be understood within a single unified framework. This code provides a convenient way to determine the SPP framework metrics for a given sample of oscillatory data. It will produce a text file containing the SPP metrics, which the user can then plot using their software of choice. It can also produce a second text file with additional derived data (components of tangent, normal, and binormal vectors), as well as pre-plotted figures if so desired. It is the R version of the Package SPP by Simon Rogers Group for Soft Matter (Simon A. Rogers, Brian M. Erwin, Dimitris Vlassopoulos, Michel Cloitre (2011) <doi:10.1122/1.3544591>).
This package implements the EM algorithm with one-step Gradient Descent method to estimate the parameters of the Block-Basu bivariate Pareto distribution with location and scale. We also found parametric bootstrap and asymptotic confidence intervals based on the observed Fisher information of scale and shape parameters, and exact confidence intervals for location parameters. Details are in Biplab Paul and Arabin Kumar Dey (2023) <doi:10.48550/arXiv.1608.02199>
"An EM algorithm for absolutely continuous Marshall-Olkin bivariate Pareto distribution with location and scale"; E L Lehmann and George Casella (1998) <doi:10.1007/b98854> "Theory of Point Estimation"; Bradley Efron and R J Tibshirani (1994) <doi:10.1201/9780429246593> "An Introduction to the Bootstrap"; A P Dempster, N M Laird and D B Rubin (1977) <www.jstor.org/stable/2984875> "Maximum Likelihood from Incomplete Data via the EM Algorithm".
The recovery of visual sensitivity in a dark environment is known as dark adaptation. In a clinical or research setting the recovery is typically measured after a dazzling flash of light and can be described by the Mahroo, Lamb and Pugh (MLP) model of dark adaptation. The functions in this package take dark adaptation data and use nonlinear regression to find the parameters of the model that best describe the data. They do this by firstly, generating rapid initial objective estimates of data adaptation parameters, then a multi-start algorithm is used to reduce the possibility of a local minimum. There is also a bootstrap method to calculate parameter confidence intervals. The functions rely upon a dark list or object. This object is created as the first step in the workflow and parts of the object are updated as it is processed.
This package provides a collection of functions developed to support the tutorial on using Exploratory Structural Equiation Modeling (ESEM) (Asparouhov & Muthén, 2009) <https://www.statmodel.com/download/EFACFA810.pdf>) with Longitudinal Study of Australian Children (LSAC) dataset (Mohal et al., 2023) <doi:10.26193/QR4L6Q>. The package uses tidyverse','psych', lavaan','semPlot
and provides additional functions to conduct ESEM. The package provides general functions to complete ESEM, including esem_c()
, creation of target matrix (if it is used) make_target()
, generation of the Confirmatory Factor Analysis (CFA) model syntax esem_cfa_syntax()
. A sample data is provided - the package includes a sample data of the Strengths and Difficulties Questionnaire of the Longitudinal Study of Australian Children (SDQ LSAC) in sdq_lsac()
. ESEM package vignette presents the tutorial demonstrating the use of ESEM on SDQ LSAC data.
Inference concerning equilibrium and random mating in autopolyploids. Methods are available to test for equilibrium and random mating at any even ploidy level (>2) in the presence of double reduction at biallelic loci. For autopolyploid populations in equilibrium, methods are available to estimate the degree of double reduction. We also provide functions to calculate genotype frequencies at equilibrium, or after one or several rounds of random mating, given rates of double reduction. The main function is hwefit()
. This material is based upon work supported by the National Science Foundation under Grant No. 2132247. The opinions, findings, and conclusions or recommendations expressed are those of the author and do not necessarily reflect the views of the National Science Foundation. For details of these methods, see Gerard (2022a) <doi:10.1111/biom.13722> and Gerard (2022b) <doi:10.1101/2022.08.11.503635>.
This package provides functions to conduct a model-agnostic asymptotic hypothesis test for the identification of interaction effects in black-box machine learning models. The null hypothesis assumes that a given set of covariates does not contribute to interaction effects in the prediction model. The test statistic is based on the difference of variances of partial dependence functions (Friedman (2008) <doi:10.1214/07-AOAS148> and Welchowski (2022) <doi:10.1007/s13253-021-00479-7>) with respect to the original black-box predictions and the predictions under the null hypothesis. The hypothesis test can be applied to any black-box prediction model, and the null hypothesis of the test can be flexibly specified according to the research question of interest. Furthermore, the test is computationally fast to apply as the null distribution does not require resampling or refitting black-box prediction models.
This package provides functions for generating pseudo-random numbers that follow a uniform distribution [0,1]. Randomness tests were conducted using the National Institute of Standards and Technology test suite<https://csrc.nist.gov/pubs/sp/800/22/r1/upd1/final>, along with additional tests. The sequence generated depends on the initial values and parameters. The package includes a linear congruence map as the decision map and three chaotic maps to generate the pseudo-random sequence, which follow a uniform distribution. Other distributions can be generated from the uniform distribution using the Inversion Principle Method and BOX-Muller transformation. Small perturbations in seed values result in entirely different sequences of numbers due to the sensitive nature of the maps being used. The chaotic nature of the maps helps achieve randomness in the generator. Additionally, the generator is capable of producing random bits.
The strength of evidence provided by epidemiological and observational studies is inherently limited by the potential for unmeasured confounding. We focus on three key quantities: the observed bound of the confidence interval closest to the null, the relationship between an unmeasured confounder and the outcome, for example a plausible residual effect size for an unmeasured continuous or binary confounder, and the relationship between an unmeasured confounder and the exposure, for example a realistic mean difference or prevalence difference for this hypothetical confounder between exposure groups. Building on the methods put forth by Cornfield et al. (1959), Bross (1966), Schlesselman (1978), Rosenbaum & Rubin (1983), Lin et al. (1998), Lash et al. (2009), Rosenbaum (1986), Cinelli & Hazlett (2020), VanderWeele
& Ding (2017), and Ding & VanderWeele
(2016), we can use these quantities to assess how an unmeasured confounder may tip our result to insignificance.
Mixture Nested Effects Models (mnem) is an extension of Nested Effects Models and allows for the analysis of single cell perturbation data provided by methods like Perturb-Seq (Dixit et al., 2016) or Crop-Seq (Datlinger et al., 2017). In those experiments each of many cells is perturbed by a knock-down of a specific gene, i.e. several cells are perturbed by a knock-down of gene A, several by a knock-down of gene B, ... and so forth. The observed read-out has to be multi-trait and in the case of the Perturb-/Crop-Seq gene are expression profiles for each cell. mnem uses a mixture model to simultaneously cluster the cell population into k clusters and and infer k networks causally linking the perturbed genes for each cluster. The mixture components are inferred via an expectation maximization algorithm.
This package provides the probability density function (PDF), cumulative distribution function (CDF), the first-order and second-order partial derivatives of the PDF, and a fitting function for the diffusion decision model (DDM; e.g., Ratcliff & McKoon
, 2008, <doi:10.1162/neco.2008.12-06-420>) with across-trial variability in the drift rate. Because the PDF, its partial derivatives, and the CDF of the DDM both contain an infinite sum, they need to be approximated. fddm implements all published approximations (Navarro & Fuss, 2009, <doi:10.1016/j.jmp.2009.02.003>; Gondan, Blurton, & Kesselmeier, 2014, <doi:10.1016/j.jmp.2014.05.002>; Blurton, Kesselmeier, & Gondan, 2017, <doi:10.1016/j.jmp.2016.11.003>; Hartmann & Klauer, 2021, <doi:10.1016/j.jmp.2021.102550>) plus new approximations. All approximations are implemented purely in C++ providing faster speed than existing packages.
This package provides a variety of latent Markov models, including hidden Markov models, hidden semi-Markov models, state-space models and continuous-time variants can be formulated and estimated within the same framework via directly maximising the likelihood function using the so-called forward algorithm. Applied researchers often need custom models that standard software does not easily support. Writing tailored R code offers flexibility but suffers from slow estimation speeds. We address these issues by providing easy-to-use functions (written in C++ for speed) for common tasks like the forward algorithm. These functions can be combined into custom models in a Lego-type approach, offering up to 10-20 times faster estimation via standard numerical optimisers. To aid in building fully custom likelihood functions, several vignettes are included that show how to simulate data from and estimate all the above model classes.
This package provides functions to fit point process models using the Palm likelihood. First proposed by Tanaka, Ogata, and Stoyan (2008) <DOI:10.1002/bimj.200610339>, maximisation of the Palm likelihood can provide computationally efficient parameter estimation for point process models in situations where the full likelihood is intractable. This package is chiefly focused on Neyman-Scott point processes, but can also fit the void processes proposed by Jones-Todd et al. (2019) <DOI:10.1002/sim.8046>. The development of this package was motivated by the analysis of capture-recapture surveys on which individuals cannot be identified---the data from which can conceptually be seen as a clustered point process (Stevenson, Borchers, and Fewster, 2019 <DOI:10.1111/biom.12983>). As such, some of the functions in this package are specifically for the estimation of cetacean density from two-camera aerial surveys.
This package contains functions for statistical data analysis based on spatially-clustered techniques. The package allows estimating the spatially-clustered spatial regression models presented in Cerqueti, Maranzano \& Mattera (2024), "Spatially-clustered spatial autoregressive models with application to agricultural market concentration in Europe", arXiv
preprint 2407.15874 <doi:10.48550/arXiv.2407.15874>
. Specifically, the current release allows the estimation of the spatially-clustered linear regression model (SCLM), the spatially-clustered spatial autoregressive model (SCSAR), the spatially-clustered spatial Durbin model (SCSEM), and the spatially-clustered linear regression model with spatially-lagged exogenous covariates (SCSLX). From release 0.0.2, the library contains functions to estimate spatial clustering based on Adiajacent Matrix K-Means (AMKM) as described in Zhou, Liu \& Zhu (2019), "Weighted adjacent matrix for K-means clustering", Multimedia Tools and Applications, 78 (23) <doi:10.1007/s11042-019-08009-x>.
Flexibly implements Integral Projection Models using a mathematical(ish) syntax. This package will not help with the vital rate modeling process, but will help convert those regression models into an IPM. ipmr handles density dependence and environmental stochasticity, with a couple of options for implementing the latter. In addition, provides functions to avoid unintentional eviction of individuals from models. Additionally, provides model diagnostic tools, plotting functionality, stochastic/deterministic simulations, and analysis tools. Integral projection models are described in depth by Easterling et al. (2000) <doi:10.1890/0012-9658(2000)081[0694:SSSAAN]2.0.CO;2>, Merow et al. (2013) <doi:10.1111/2041-210X.12146>, Rees et al. (2014) <doi:10.1111/1365-2656.12178>, and Metcalf et al. (2015) <doi:10.1111/2041-210X.12405>. Williams et al. (2012) <doi:10.1890/11-2147.1> discuss the problem of unintentional eviction.
Institutional performance assessment remains a key challenge to a multitude of stakeholders. Existing indicators such as h-type indicators, g-type indicators, and many others do not reflect expertise of institutions that defines their research portfolio. The package offers functionality to compute and visualise two novel indices: the x-index and the xd-index. The x-index evaluates an institution's scholarly expertise within a specific discipline or field, while the xd-index provides a broader assessment of overall scholarly expertise considering an institution's publication pattern and strengths across coarse thematic areas. These indices offer a nuanced understanding of institutional research capabilities, aiding stakeholders in research management and resource allocation decisions. Lathabai, H.H., Nandy, A., and Singh, V.K. (2021) <doi:10.1007/s11192-021-04188-3>. Nandy, A., Lathabai, H.H., and Singh, V.K. (2023) <doi:10.5281/zenodo.8305585>.
Implementation of a transfer learning framework employing distribution mapping based domain transfer. Uses the renowned concept of histogram matching (see Gonzalez and Fittes (1977) <doi:10.1016/0094-114X(77)90062-3>, Gonzalez and Woods (2008) <isbn:9780131687288>) and extends it to include distribution measures like kernel density estimates (KDE; see Wand and Jones (1995) <isbn:978-0-412-55270-0>, Jones et al. (1996) <doi:10.2307/2291420). In the typical application scenario, one can use the underlying sample distributions (histogram or KDE) to generate a map between two distinct but related domains to transfer the target data to the source domain and utilize the available source data for better predictive modeling design. Suitable for the case where a one-to-one sample matching is not possible, thus one needs to transform the underlying data distribution to utilize the more available data for modeling.
This is a small, lightweight package that lets users investigate the distribution of genotypes in genotype-by-sequencing (GBS) data where they expect (by and large) Hardy-Weinberg equilibrium, in order to assess rates of genotyping errors and the dependence of those rates on read depth. It implements a Markov chain Monte Carlo (MCMC) sampler using Rcpp to compute a Bayesian estimate of what we call the heterozygote miscall rate for restriction-associated digest (RAD) sequencing data and other types of reduced representation GBS data. It also provides functions to generate plots of expected and observed genotype frequencies. Some background on these topics can be found in a recent paper "Recent advances in conservation and population genomics data analysis" by Hendricks et al. (2018) <doi:10.1111/eva.12659>, and another paper describing the MCMC approach is in preparation with Gordon Luikart and Thierry Gosselin.
Hypothesis tests and sure independence screening (SIS) procedure based on ball statistics, including ball divergence <doi:10.1214/17-AOS1579>, ball covariance <doi:10.1080/01621459.2018.1543600>, and ball correlation <doi:10.1080/01621459.2018.1462709>, are developed to analyze complex data in metric spaces, e.g, shape, directional, compositional and symmetric positive definite matrix data. The ball divergence and ball covariance based distribution-free tests are implemented to detecting distribution difference and association in metric spaces <doi:10.18637/jss.v097.i06>. Furthermore, several generic non-parametric feature selection procedures based on ball correlation, BCor-SIS and all of its variants, are implemented to tackle the challenge in the context of ultra high dimensional data. A fast implementation for large-scale multiple K-sample testing with ball divergence <doi: 10.1002/gepi.22423> is supported, which is particularly helpful for genome-wide association study.
Multidimensional scaling (MDS) functions for various tasks that are beyond the beta stage and way past the alpha stage. Currently, options are available for weights, restrictions, classical scaling or principal coordinate analysis, transformations (linear, power, Box-Cox, spline, ordinal), outlier mitigation (rdop), out-of-sample estimation (predict), negative dissimilarities, fast and faster executions with low memory footprints, penalized restrictions, cross-validation-based penalty selection, supplementary variable estimation (explain), additive constant estimation, mixed measurement level distance calculation, restricted classical scaling, etc. More will come in the future. References. Busing (2024) "A Simple Population Size Estimator for Local Minima Applied to Multidimensional Scaling". Manuscript submitted for publication. Busing (2025) "Node Localization by Multidimensional Scaling with Iterative Majorization". Manuscript submitted for publication. Busing (2025) "Faster Multidimensional Scaling". Manuscript in preparation. Barroso and Busing (2025) "e-RDOP, Relative Density-Based Outlier Probabilities, Extended to Proximity Mapping". Manuscript submitted for publication.
This package provides a Bayesian latent space model for complex networks, either weighted or unweighted. Given an observed input graph, the estimates for the latent coordinates of the nodes are obtained through a Bayesian MCMC algorithm. The overall likelihood of the graph depends on a fundamental probability equation, which is defined so that ties are more likely to exist between nodes whose latent space coordinates are close. The package is mainly based on the model by Hoff, Raftery and Handcock (2002) <doi:10.1198/016214502388618906> and contains some extra features (e.g., removal of the Procrustean step, weights implemented as coefficients of the latent distances, 3D plots). The original code related to the above model was retrieved from <https://www.stat.washington.edu/people/pdhoff/Code/hoff_raftery_handcock_2002_jasa/>. Users can inspect the MCMC simulation, create and customize insightful graphical representations or apply clustering techniques.
Mapping, spatial analysis, and statistical modeling of microdata from sources such as the Demographic and Health Surveys <https://www.dhsprogram.com/> and Integrated Public Use Microdata Series <https://www.ipums.org/>. It can also be extended to other datasets. The package supports spatial correlation index construction and visualization, along with empirical Bayes approximation of regression coefficients in a multistage setup. The main functionality is repeated regression â for example, if we have to run regression for n groups, the group ID should be vertically composed into the variable for the parameter `location_var`. It can perform various kinds of regression, such as Generalized Regression Models, logit, probit, and more. Additionally, it can incorporate interaction effects. The key benefit of the package is its ability to store the regression results performed repeatedly on a dataset by the group ID, along with respective p-values and map those estimates.
Entropy weighted k-means (ewkm) by Liping Jing, Michael K. Ng and Joshua Zhexue Huang (2007) <doi:10.1109/TKDE.2007.1048> is a weighted subspace clustering algorithm that is well suited to very high dimensional data. Weights are calculated as the importance of a variable with regard to cluster membership. The two-level variable weighting clustering algorithm tw-k-means (twkm) by Xiaojun Chen, Xiaofei Xu, Joshua Zhexue Huang and Yunming Ye (2013) <doi:10.1109/TKDE.2011.262> introduces two types of weights, the weights on individual variables and the weights on variable groups, and they are calculated during the clustering process. The feature group weighted k-means (fgkm) by Xiaojun Chen, Yunminng Ye, Xiaofei Xu and Joshua Zhexue Huang (2012) <doi:10.1016/j.patcog.2011.06.004> extends this concept by grouping features and weighting the group in addition to weighting individual features.
This package implements the Single Transferable Vote (STV) electoral system, with clear explanatory graphics. The core function stv()
uses Meek's method, the purest expression of the simple principles of STV, but which requires electronic counting. It can handle votes expressing equal preferences for subsets of the candidates. A function stv.wig()
implementing the Weighted Inclusive Gregory method, as used in Scottish council elections, is also provided, and with the same options, as described in the manual. The required vote data format is as an R list: a function pref.data()
is provided to transform some commonly used data formats into this format. References for methodology: Hill, Wichmann and Woodall (1987) <doi:10.1093/comjnl/30.3.277>, Hill, David (2006) <https://www.votingmatters.org.uk/ISSUE22/I22P2.pdf>, Mollison, Denis (2023) <arXiv:2303.15310>
, (see also the package manual pref_pkg_manual.pdf).
Managing postgraduate programmes involves extracting information from Lattes CVs. This information can be used for strategic planning and self-evaluation, as well as for producing reports on the Sucupira Platform. Summary reports are produced for each period and course (specialisation, master's and doctorate), showing bibliographic production with and without student participation, as well as papers at events, technical or technological production, ongoing and completed supervision, research projects, exchanges (visiting professor, postdoctoral or short-term leave), awards and general activity indicators. Based on this information, a detailed report is then drawn up for each lecturer, taking into account their participation in exam boards, their research project contributions, their technical collaborations (e.g. advisory committee, editorial board) and the subjects they teach. For more details see Pagliosa and Nascimento (2021) <https://repositorio.ufsc.br/bitstream/handle/123456789/231602/ManualLattesGeociencias11_2021_versaobeta%20%281%29.pdf?sequence=1&isAllowed=y>
.