This package provides a dynamic programming algorithm for optimal clustering multidimensional data with sequential constraint. The algorithm minimizes the sum of squares of within-cluster distances. The sequential constraint allows only subsequent items of the input data to form a cluster. The sequential constraint is typically required in clustering data streams or items with time stamps such as video frames, GPS signals of a vehicle, movement data of a person, e-pen data, etc. The algorithm represents an extension of Ckmeans.1d.dp to multiple dimensional spaces. Similarly to the one-dimensional case, the algorithm guarantees optimality and repeatability of clustering. Method clustering.sc.dp() can find the optimal clustering if the number of clusters is known. Otherwise, methods findwithinss.sc.dp() and backtracking.sc.dp() can be used. See Szkaliczki, T. (2016) "clustering.sc.dp: Optimal Clustering with Sequential Constraint by Using Dynamic Programming" <doi: 10.32614/RJ-2016-022> for more information.
The Percentage of Importance Indice (Percentage_I.I.) bases in magnitudes, frequencies, and distributions of occurrence of an event (DEMOLIN-LEITE, 2021) <http://cjascience.com/index.php/CJAS/article/view/1009/1350>. This index can detect the key loss sources (L.S) and solution sources (S.S.), classifying them according to their importance in terms of loss or income gain, on the productive system. The Percentage_I.I. = [(ks1 x c1 x ds1)/SUM (ks1 x c1 x ds1) + (ks2 x c2 x ds2) + (ksn x cn x dsn)] x 100. key source (ks) is obtained using simple regression analysis and magnitude (abundance). Constancy (c) is SUM of occurrence of L.S. or S.S. on the samples (absence = 0 or presence = 1), and distribution source (ds) is obtained using chi-square test. This index has derivations: i.e., i) Loss estimates and solutions effectiveness and ii) Attention and non-attention levels (DEMOLIN-LEITE,2024) <DOI: 10.1590/1519-6984.253215>.
Power analysis for regression models which test the interaction of two or three independent variables on a single dependent variable. Includes options for correlated interacting variables and specifying variable reliability. Two-way interactions can include continuous, binary, or ordinal variables. Power analyses can be done either analytically or via simulation. Includes tools for simulating single data sets and visualizing power analysis results. The primary functions are power_interaction_r2() and power_interaction() for two-way interactions, and power_interaction_3way_r2() for three-way interactions. The function run_pos_power_search() provides a stability analysis for two-way interactions. Please cite as: Baranger DAA, Finsaas MC, Goldstein BL, Vize CE, Lynam DR, Olino TM (2023). "Tutorial: Power analyses for interaction effects in cross-sectional regressions." <doi:10.1177/25152459231187531>. If you use the stability analyses, please cite: Castillo A, Miller JD, Vize C, Baranger DAA, Lynam DR. "When Do Interaction/Moderation Effects Stabilize in Linear Regression?"<doi:10.1177/25152459251407860>.
This simple daemon feeds entropy from the CPU Jitter RNG core to the kernel Linux's entropy estimator. This prevents the /dev/random device from blocking and should benefit users of the preferred /dev/urandom and getrandom() interfaces too.
The CPU Jitter RNG itself is part of the kernel and claims to provide good entropy by collecting and magnifying differences in CPU execution time as measured by the high-resolution timer built into modern CPUs. It requires no additional hardware or external entropy source.
The random bit stream generated by jitterentropy-rngd is not processed by a cryptographically secure whitening function. Nonetheless, its authors believe it to be a suitable source of cryptographically secure key material or other cryptographically sensitive data.
If you agree with them, start this daemon as early as possible to provide properly seeded random numbers to services like SSH or those using TLS during early boot when entropy may be low, especially in virtualised environments.
Cleans and formats language transcripts guided by a series of transformation options (e.g., lemmatize words, omit stopwords, split strings across rows). SemanticDistance computes two distinct metrics of cosine semantic distance (experiential and embedding). These values reflect pairwise cosine distance between different elements or chunks of a language sample. SemanticDistance can process monologues (e.g., stories, ordered text), dialogues (e.g., conversation transcripts), word pairs arrayed in columns, and unordered word lists. Users specify options for how they wish to chunk distance calculations. These options include: rolling ngram-to-word distance (window of n-words to each new word), ngram-to-ngram distance (2-word chunk to the next 2-word chunk), pairwise distance between words arrayed in columns, matrix comparisons (i.e., all possible pairwise distances between words in an unordered list), turn-by-turn distance (talker to talker in a dialogue transcript). SemanticDistance includes visualization options for analyzing distances as time series data and simple semantic network dynamics (e.g., clustering, undirected graph network).
Data collected on movement behavior is often in the form of time- stamped latitude/longitude coordinates sampled from the underlying movement behavior. These data can be compressed into a set of segments via the Top- Down Time Ratio Segmentation method described in Meratnia and de By (2004) <doi:10.1007/978-3-540-24741-8_44> which, with some loss of information, can both reduce the size of the data as well as provide corrective smoothing mechanisms to help reduce the impact of measurement error. This is an improvement on the well-known Douglas-Peucker algorithm for segmentation that operates not on the basis of perpendicular distances. Top-Down Time Ratio segmentation allows for disparate sampling time intervals by calculating the distance between locations and segments with respect to time. Provided a trajectory with timestamps, tdtr() returns a set of straight- line segments that can represent the full trajectory. McCool, Lugtig, and Schouten (2022) <doi:10.1007/s11116-022-10328-2> describe this method as implemented here in more detail.
This package provides functions to estimate the optimal threshold of diagnostic markers or treatment selection markers. The optimal threshold is the marker value that maximizes the utility of the marker based-strategy (for diagnostic or treatment selection) in a given population. The utility function depends on the type of marker (diagnostic or treatment selection), but always takes into account the preferences of the patients or the physician in the decision process. For estimating the optimal threshold, ones must specify the distributions of the marker in different groups (defined according to the type of marker, diagnostic or treatment selection) and provides data to estimate the parameters of these distributions. Ones must also provide some features of the target populations (disease prevalence or treatment efficacies) as well as the preferences of patients or physicians. The functions rely on Bayesian inference which helps producing several indicators derived from the optimal threshold. See Blangero, Y, Rabilloud, M, Ecochard, R, and Subtil, F (2019) <doi:10.1177/0962280218821394> for the original article that describes the estimation method for treatment selection markers and Subtil, F, and Rabilloud, M (2019) <doi:10.1002/bimj.200900242> for diagnostic markers.
An implementation of metaheuristic algorithms for continuous optimization. Currently, the package contains the implementations of 21 algorithms, as follows: particle swarm optimization (Kennedy and Eberhart, 1995), ant lion optimizer (Mirjalili, 2015 <doi:10.1016/j.advengsoft.2015.01.010>), grey wolf optimizer (Mirjalili et al., 2014 <doi:10.1016/j.advengsoft.2013.12.007>), dragonfly algorithm (Mirjalili, 2015 <doi:10.1007/s00521-015-1920-1>), firefly algorithm (Yang, 2009 <doi:10.1007/978-3-642-04944-6_14>), genetic algorithm (Holland, 1992, ISBN:978-0262581110), grasshopper optimisation algorithm (Saremi et al., 2017 <doi:10.1016/j.advengsoft.2017.01.004>), harmony search algorithm (Mahdavi et al., 2007 <doi:10.1016/j.amc.2006.11.033>), moth flame optimizer (Mirjalili, 2015 <doi:10.1016/j.knosys.2015.07.006>, sine cosine algorithm (Mirjalili, 2016 <doi:10.1016/j.knosys.2015.12.022>), whale optimization algorithm (Mirjalili and Lewis, 2016 <doi:10.1016/j.advengsoft.2016.01.008>), clonal selection algorithm (Castro, 2002 <doi:10.1109/TEVC.2002.1011539>), differential evolution (Das & Suganthan, 2011), shuffled frog leaping (Eusuff, Landsey & Pasha, 2006), cat swarm optimization (Chu et al., 2006), artificial bee colony algorithm (Karaboga & Akay, 2009), krill-herd algorithm (Gandomi & Alavi, 2012), cuckoo search (Yang & Deb, 2009), bat algorithm (Yang, 2012), gravitational based search (Rashedi et al., 2009) and black hole optimization (Hatamlou, 2013).
Generate continuous (normal or non-normal), binary, ordinal, and count (Poisson or Negative Binomial) variables with a specified correlation matrix. It can also produce a single continuous variable. This package can be used to simulate data sets that mimic real-world situations (i.e. clinical or genetic data sets, plasmodes). All variables are generated from standard normal variables with an imposed intermediate correlation matrix. Continuous variables are simulated by specifying mean, variance, skewness, standardized kurtosis, and fifth and sixth standardized cumulants using either Fleishman's third-order (<DOI:10.1007/BF02293811>) or Headrick's fifth-order (<DOI:10.1016/S0167-9473(02)00072-5>) polynomial transformation. Binary and ordinal variables are simulated using a modification of the ordsample() function from GenOrd'. Count variables are simulated using the inverse cdf method. There are two simulation pathways which differ primarily according to the calculation of the intermediate correlation matrix. In Correlation Method 1, the intercorrelations involving count variables are determined using a simulation based, logarithmic correlation correction (adapting Yahav and Shmueli's 2012 method, <DOI:10.1002/asmb.901>). In Correlation Method 2, the count variables are treated as ordinal (adapting Barbiero and Ferrari's 2015 modification of GenOrd, <DOI:10.1002/asmb.2072>). There is an optional error loop that corrects the final correlation matrix to be within a user-specified precision value of the target matrix. The package also includes functions to calculate standardized cumulants for theoretical distributions or from real data sets, check if a target correlation matrix is within the possible correlation bounds (given the distributions of the simulated variables), summarize results (numerically or graphically), to verify valid power method pdfs, and to calculate lower standardized kurtosis bounds.
Automatically runs 18 individual models and 14 ensembles on numeric data, for a total of 32 models. The package automatically returns complete results on all 32 models, 25 charts and six tables. The user simply provides the tidy data, and answers a few questions (for example, how many times would you like to resample the data). From there the package randomly splits the data into train, test and validation sets as the user requests (for example, train = 0.60, test = 0.20, validation = 0.20), fits each of models on the training data, makes predictions on the test and validation sets, measures root mean squared error (RMSE), removes features above a user-set level of Variance Inflation Factor, and has several optional features including scaling all numeric data, four different ways to handle strings in the data. Perhaps the most significant feature is the package's ability to make predictions using the 32 pre trained models on totally new (untrained) data if the user selects that feature. This feature alone represents a very effective solution to the issue of reproducibility of models in data science. The package can also randomly resample the data as many times as the user sets, thus giving more accurate results than a single run. The graphs provide many results that are not typically found. For example, the package automatically calculates the Kolmogorov-Smirnov test for each of the 32 models and plots a bar chart of the results, a bias bar chart of each of the 32 models, as well as several plots for exploratory data analysis (automatic histograms of the numeric data, automatic histograms of the numeric data). The package also automatically creates a summary report that can be both sorted and searched for each of the 32 models, including RMSE, bias, train RMSE, test RMSE, validation RMSE, overfitting and duration. The best results on the holdout data typically beat the best results in data science competitions and published results for the same data set.
Ruby bindings.
Failover for ActiveRecord and Redis
Documentation at https://melpa.org/#/ruby-refactor
Shim library for Module#ruby2_keywords
Documentation at https://melpa.org/#/org-re-reveal
Regression testing for the command line, and library API
Rubygems-task provides Rake tasks for managing and releasing Ruby Gems.
This package provides a modest JavaScript framework for the HTML you already have.
Platform Design Info for Affymetrix RUSGene-1_0-st.
Platform Design Info for Affymetrix RUSGene-1_1-st.
Enhances the R Optimization Infrastructure ('ROI') package with the optimx package.
OpenCL 2.0 compatible language runtime, supporting offline and in-process/in-memory compilation.
Sorbet's runtime type checking component. Sorbet is a powerful type checker for Ruby.
Platform Design Info for Affymetrix RheGene-1_0-st.