In social and educational settings, the use of Artificial Intelligence (AI) is a challenging task. Relevant data is often only available in handwritten forms, or the use of data is restricted by privacy policies. This often leads to small data sets. Furthermore, in the educational and social sciences, data is often unbalanced in terms of frequencies. To support educators as well as educational and social researchers in using the potentials of AI for their work, this package provides a unified interface for neural nets in PyTorch
to deal with natural language problems. In addition, the package ships with a shiny app, providing a graphical user interface. This allows the usage of AI for people without skills in writing python/R scripts. The tools integrate existing mathematical and statistical methods for dealing with small data sets via pseudo-labeling (e.g. Cascante-Bonilla et al. (2020) <doi:10.48550/arXiv.2001.06001>
) and imbalanced data via the creation of synthetic cases (e.g. Bunkhumpornpat et al. (2012) <doi:10.1007/s10489-011-0287-y>). Performance evaluation of AI is connected to measures from content analysis which educational and social researchers are generally more familiar with (e.g. Berding & Pargmann (2022) <doi:10.30819/5581>, Gwet (2014) <ISBN:978-0-9708062-8-4>, Krippendorff (2019) <doi:10.4135/9781071878781>). Estimation of energy consumption and CO2 emissions during model training is done with the python library codecarbon'. Finally, all objects created with this package allow to share trained AI models with other people.
This package provides functions for evaluating and testing asset pricing models, including estimation and testing of factor risk premia, selection of "strong" risk factors (factors having nonzero population correlation with test asset returns), heteroskedasticity and autocorrelation robust covariance matrix estimation and testing for model misspecification and identification. The functions for estimating and testing factor risk premia implement the Fama-MachBeth
(1973) <doi:10.1086/260061> two-pass approach, the misspecification-robust approaches of Kan-Robotti-Shanken (2013) <doi:10.1111/jofi.12035>, and the approaches based on tradable factor risk premia of Quaini-Trojani-Yuan (2023) <doi:10.2139/ssrn.4574683>. The functions for selecting the "strong" risk factors are based on the Oracle estimator of Quaini-Trojani-Yuan (2023) <doi:10.2139/ssrn.4574683> and the factor screening procedure of Gospodinov-Kan-Robotti (2014) <doi:10.2139/ssrn.2579821>. The functions for evaluating model misspecification implement the HJ model misspecification distance of Kan-Robotti (2008) <doi:10.1016/j.jempfin.2008.03.003>, which is a modification of the prominent Hansen-Jagannathan (1997) <doi:10.1111/j.1540-6261.1997.tb04813.x> distance. The functions for testing model identification specialize the Kleibergen-Paap (2006) <doi:10.1016/j.jeconom.2005.02.011> and the Chen-Fang (2019) <doi:10.1111/j.1540-6261.1997.tb04813.x> rank test to the regression coefficient matrix of test asset returns on risk factors. Finally, the function for heteroskedasticity and autocorrelation robust covariance estimation implements the Newey-West (1994) <doi:10.2307/2297912> covariance estimator.
This tool is extended from methods in Bio.SeqUtils.MeltingTemp
of python. The melting temperature of nucleic acid sequences can be calculated in three method, the Wallace rule (Thein & Wallace (1986) <doi:10.1016/S0140-6736(86)90739-7>), empirical formulas based on G and C content (Marmur J. (1962) <doi:10.1016/S0022-2836(62)80066-7>, Schildkraut C. (2010) <doi:10.1002/bip.360030207>, Wetmur J G (1991) <doi:10.3109/10409239109114069>, Untergasser,A. (2012) <doi:10.1093/nar/gks596>, von Ahsen N (2001) <doi:10.1093/clinchem/47.11.1956>) and nearest neighbor thermodynamics (Breslauer K J (1986) <doi:10.1073/pnas.83.11.3746>, Sugimoto N (1996) <doi:10.1093/nar/24.22.4501>, Allawi H (1998) <doi:10.1093/nar/26.11.2694>, SantaLucia
J (2004) <doi:10.1146/annurev.biophys.32.110601.141800>, Freier S (1986) <doi:10.1073/pnas.83.24.9373>, Xia T (1998) <doi:10.1021/bi9809425>, Chen JL (2012) <doi:10.1021/bi3002709>, Bommarito S (2000) <doi:10.1093/nar/28.9.1929>, Turner D H (2010) <doi:10.1093/nar/gkp892>, Sugimoto N (1995) <doi:10.1016/S0048-9697(98)00088-6>, Allawi H T (1997) <doi:10.1021/bi962590c>, Santalucia N (2005) <doi:10.1093/nar/gki918>), and it can also be corrected with salt ions and chemical compound (SantaLucia
J (1996) <doi:10.1021/bi951907q>, SantaLucia
J(1998) <doi:10.1073/pnas.95.4.1460>, Owczarzy R (2004) <doi:10.1021/bi034621r>, Owczarzy R (2008) <doi:10.1021/bi702363u>).
Intended to be used by the United States Copyright Office Product Management Division Business Analysts. Include algorithms for the United States Copyright Office Product Management Division SR Audit Data dataset. The algorithm takes in the SR Audit Data excel file and reformat the spreadsheet such that the values and variables fit the format of the online database. Support functions in this package include clean_str()
, which cleans instances of variable AUDIT_LOG; clean_data_to_excel()
, which cleans and output the reorganized SR Audit Data dataset in excel format; clean_data_to_dataframe()
, which cleans and stores the reorganized SR Audit Data data set to a data frame; format_from_excel()
, which reads in the outputted excel file from the clean_data_to_excel()
function and formats and returns the data as a dictionary that uses FIELD types as keys and NON-FIELD types as the values of those keys. format_from_dataframe()
, which reads in the outputted data frame from the clean_data_to_dataframe()
function and formats and returns the data as a dictionary that uses FIELD types as keys and NON-FIELD types as the values of those keys; support_function()
, which takes in the dictionary outputted either from the format_from_dataframe()
or format_from_excel()
function and returns the data as a formatted data frame according to the original U.S. Copyright Office SR Audit Data online database. The main function of this package is clean_format_all()
, which takes in an excel file and returns the formatted data into a new excel and text file according to the format from the U.S. Copyright Office SR Audit Data online database.
Quickly score raw data outputted from an Implicit Association Test (IAT; Greenwald, McGhee
, & Schwartz, 1998) <doi:10.1037/0022-3514.74.6.1464>. IAT scores are calculated as specified by Greenwald, Nosek, and Banaji (2003) <doi:10.1037/0022-3514.85.2.197>. The output of this function is a data frame that consists of four rows containing the following information: (1) the overall IAT effect size for the participant's dataset, (2) the effect size calculated for odd trials only, (3) the effect size calculated for even trials only, and (4) the proportion of trials with reaction times under 300ms (which is important for exclusion purposes). Items (2) and (3) allow for a measure of the internal consistency of the IAT. Specifically, you can use the subsetted IAT effect sizes for odd and even trials to calculate Cronbach's alpha across participants in the sample. The input function consists of three arguments. First, indicate the name of the dataset to be analyzed. This is the only required input. Second, indicate the number of trials in your entire IAT (the default is set to 220, which is typical for most IATs). Last, indicate whether congruent trials (e.g., flowers and pleasant) or incongruent trials (e.g., guns and pleasant) were presented first for this participant (the default is set to congruent). Data files should consist of six columns organized in order as follows: Block (0-6), trial (0-19 for training blocks, 0-39 for test blocks), category (dependent on your IAT), the type of item within that category (dependent on your IAT), a dummy variable indicating whether the participant was correct or incorrect on that trial (0=correct, 1=incorrect), and the participantâ s reaction time (in milliseconds). A sample dataset (titled sampledata') is included in this package to practice with.
Statistical methods for the modeling and monitoring of time series of counts, proportions and categorical data, as well as for the modeling of continuous-time point processes of epidemic phenomena. The monitoring methods focus on aberration detection in count data time series from public health surveillance of communicable diseases, but applications could just as well originate from environmetrics, reliability engineering, econometrics, or social sciences. The package implements many typical outbreak detection procedures such as the (improved) Farrington algorithm, or the negative binomial GLR-CUSUM method of Hoehle and Paul (2008) <doi:10.1016/j.csda.2008.02.015>. A novel CUSUM approach combining logistic and multinomial logistic modeling is also included. The package contains several real-world data sets, the ability to simulate outbreak data, and to visualize the results of the monitoring in a temporal, spatial or spatio-temporal fashion. A recent overview of the available monitoring procedures is given by Salmon et al. (2016) <doi:10.18637/jss.v070.i10>. For the retrospective analysis of epidemic spread, the package provides three endemic-epidemic modeling frameworks with tools for visualization, likelihood inference, and simulation. hhh4()
estimates models for (multivariate) count time series following Paul and Held (2011) <doi:10.1002/sim.4177> and Meyer and Held (2014) <doi:10.1214/14-AOAS743>. twinSIR()
models the susceptible-infectious-recovered (SIR) event history of a fixed population, e.g, epidemics across farms or networks, as a multivariate point process as proposed by Hoehle (2009) <doi:10.1002/bimj.200900050>. twinstim()
estimates self-exciting point process models for a spatio-temporal point pattern of infective events, e.g., time-stamped geo-referenced surveillance data, as proposed by Meyer et al. (2012) <doi:10.1111/j.1541-0420.2011.01684.x>. A recent overview of the implemented space-time modeling frameworks for epidemic phenomena is given by Meyer et al. (2017) <doi:10.18637/jss.v077.i11>.
Algorithms for checking the accuracy of a clustering result with known classes, computing cluster validity indices, and generating plots for comparing them. The package is compatible with K-means, fuzzy C means, EM clustering, and hierarchical clustering (single, average, and complete linkage). The details of the indices in this package can be found in: J. C. Bezdek, M. Moshtaghi, T. Runkler, C. Leckie (2016) <doi:10.1109/TFUZZ.2016.2540063>, T. Calinski, J. Harabasz (1974) <doi:10.1080/03610927408827101>, C. H. Chou, M. C. Su, E. Lai (2004) <doi:10.1007/s10044-004-0218-1>, D. L. Davies, D. W. Bouldin (1979) <doi:10.1109/TPAMI.1979.4766909>, J. C. Dunn (1973) <doi:10.1080/01969727308546046>, F. Haouas, Z. Ben Dhiaf, A. Hammouda, B. Solaiman (2017) <doi:10.1109/FUZZ-IEEE.2017.8015651>, M. Kim, R. S. Ramakrishna (2005) <doi:10.1016/j.patrec.2005.04.007>, S. H. Kwon (1998) <doi:10.1049/EL:19981523>, S. H. Kwon, J. Kim, S. H. Son (2021) <doi:10.1049/ell2.12249>, G. W. Miligan (1980) <doi:10.1007/BF02293907>, M. K. Pakhira, S. Bandyopadhyay, U. Maulik (2004) <doi:10.1016/j.patcog.2003.06.005>, M. Popescu, J. C. Bezdek, T. C. Havens, J. M. Keller (2013) <doi:10.1109/TSMCB.2012.2205679>, S. Saitta, B. Raphael, I. Smith (2007) <doi:10.1007/978-3-540-73499-4_14>, A. Starczewski (2017) <doi:10.1007/s10044-015-0525-8>, Y. Tang, F. Sun, Z. Sun (2005) <doi:10.1109/ACC.2005.1470111>, N. Wiroonsri (2024) <doi:10.1016/j.patcog.2023.109910>, N. Wiroonsri, O. Preedasawakul (2023) <doi:10.48550/arXiv.2308.14785>
, C. H. Wu, C. S. Ouyang, L. W. Chen, L. W. Lu (2015) <doi:10.1109/TFUZZ.2014.2322495>, X. Xie, G. Beni (1991) <doi:10.1109/34.85677> and Rousseeuw (1987) and Kaufman and Rousseeuw(2009) <doi:10.1016/0377-0427(87)90125-7> and <doi:10.1002/9780470316801> C. Alok. (2010).
The main goal of the R package treeDbalance
is to provide functions for the computation of several measurements of 3D node imbalance and their respective 3D tree imbalance indices, as well as to introduce the new phylo3D format for rooted 3D tree objects. Moreover, it encompasses an example dataset of 3D models of 63 beans in phylo3D format. Please note that this R package was developed alongside the project described in the manuscript Measuring 3D tree imbalance of plant models using graph-theoretical approaches by M. Fischer, S. Kersting, and L. Kühn (2023) <arXiv:2307.14537>
, which provides precise mathematical definitions of the measurements. Furthermore, the package contains several helpful functions, for example, some auxiliary functions for computing the ancestors, descendants, and depths of the nodes, which ensures that the computations can be done in linear time. Most functions of treeDbalance
require as input a rooted tree in the phylo3D format, an extended phylo format (as introduced in the R package ape 1.9 in November 2006). Such a phylo3D object must have at least two new attributes next to those required by the phylo format: node.coord', the coordinates of the nodes, as well as edge.weight', the literal weight or volume of the edges. Optional attributes are edge.diam', the diameter of the edges, and edge.length', the length of the edges. For visualization purposes one can also specify edge.type', which ranges from normal cylinder to bud to leaf, as well as edge.color to change the color of the edge depiction. This project was supported by the joint research project DIG-IT! funded by the European Social Fund (ESF), reference: ESF/14-BM-A55-0017/19, and the Ministry of Education, Science and Culture of Mecklenburg-Western Pomerania, Germany, as well as by the the project ArtIGROW
, which is a part of the WIR!-Alliance ArtIFARM
â Artificial Intelligence in Farming funded by the German Federal Ministry of Education and Research (FKZ: 03WIR4805).
Descriptive Statistics is essential for publishing articles. This package can perform descriptive statistics according to different data types. If the data is a continuous variable, the mean and standard deviation or median and quartiles are automatically output; if the data is a categorical variable, the number and percentage are automatically output. In addition, if you enter two variables in this package, the two variables will be described and their relationships will be tested automatically according to their data types. For example, if one of the two input variables is a categorical variable, another variable will be described hierarchically based on the categorical variable and the statistical differences between different groups will be compared using appropriate statistical methods. And for groups of more than two, the post hoc test will be applied. For more information on the methods we used, please see the following references: Libiseller, C. and Grimvall, A. (2002) <doi:10.1002/env.507>, Patefield, W. M. (1981) <doi:10.2307/2346669>, Hope, A. C. A. (1968) <doi:10.1111/J.2517-6161.1968.TB00759.X>, Mehta, C. R. and Patel, N. R. (1983) <doi:10.1080/01621459.1983.10477989>, Mehta, C. R. and Patel, N. R. (1986) <doi:10.1145/6497.214326>, Clarkson, D. B., Fan, Y. and Joe, H. (1993) <doi:10.1145/168173.168412>, Cochran, W. G. (1954) <doi:10.2307/3001616>, Armitage, P. (1955) <doi:10.2307/3001775>, Szabo, A. (2016) <doi:10.1080/00031305.2017.1407823>, David, F. B. (1972) <doi:10.1080/01621459.1972.10481279>, Joanes, D. N. and Gill, C. A. (1998) <doi:10.1111/1467-9884.00122>, Dunn, O. J. (1964) <doi:10.1080/00401706.1964.10490181>, Copenhaver, M. D. and Holland, B. S. (1988) <doi:10.1080/00949658808811082>, Chambers, J. M., Freeny, A. and Heiberger, R. M. (1992) <doi:10.1201/9780203738535-5>, Shaffer, J. P. (1995) <doi:10.1146/annurev.ps.46.020195.003021>, Myles, H. and Douglas, A. W. (1973) <doi:10.2307/2063815>, Rahman, M. and Tiwari, R. (2012) <doi:10.4236/health.2012.410139>, Thode, H. J. (2002) <doi:10.1201/9780203910894>, Jonckheere, A. R. (1954) <doi:10.2307/2333011>, Terpstra, T. J. (1952) <doi:10.1016/S1385-7258(52)50043-X>.
The data that is generated from independent and consecutive GillespieSSA
runs for a generic biochemical network is formatted as rows and constitutes an observation. The first column of each row is the computed timestep for each run. Subsequent columns are used for the number of molecules of each participating molecular species or "metabolite" of a generic biochemical network. In this way TemporalGSSA
', is a wrapper for the R-package GillespieSSA
'. The number of observations must be at least 30. This will generate data that is statistically significant. TemporalGSSA
', transforms this raw data into a simulation time-dependent and metabolite-specific trial. Each such trial is defined as a set of linear models (n >= 30) between a timestep and number of molecules for a metabolite. Each linear model is characterized by coefficients such as the slope, arbitrary constant, etc. The user must enter an integer from 1-4. These specify the statistical modality utilized to compute a representative timestep (mean, median, random, all). These arguments are mandatory and will be checked. Whilst, the numeric indicator "0" indicates suitability, "1" prompts the user to revise and re-enter their data. An optional logical argument controls the output to the console with the default being "TRUE" (curtailed) whilst "FALSE" (verbose). The coefficients of each linear model are averaged (mean slope, mean constant) and are incorporated into a metabolite-specific linear regression model as the dependent variable. The independent variable is the representative timestep chosen previously. The generated data is the imputed molecule number for an in silico experiment with (n >=30) observations. These steps can be replicated with multiple set of observations. The generated "technical replicates" can be statistically evaluated (mean, standard deviation) and will constitute simulation time-dependent molecules for each metabolite. For SSA-generated datasets with varying simulation times TemporalGSSA
will generate a simulation time-dependent trajectory for each metabolite of the biochemical network under study. The relevant publication with the mathematical derivation of the algorithm is (2022, Journal of Bioinformatics and Computational Biology) <doi:10.1142/S0219720022500184>. The algorithm has been deployed in the following publications (2021, Heliyon) <doi:10.1016/j.heliyon.2021.e07466> and (2016, Journal of Theoretical Biology) <doi:10.1016/j.jtbi.2016.07.002>.
Generates random strings and byte strings matching a regex.
Fast monotone priority queues
Sampling from random number distributions.
Sampling from random number distributions.
Minimal embedded v8 engine for Ruby
Minimal embedded v8 engine for Ruby
Minimal embedded v8 engine for Ruby
This package provides Safe bindings for gettext.
Electrical properties of resistor networks using matrix methods.
Various databases of microRNA
Targets.
This package provides core APIs for Rayon.
This package provides HTTP Range header parser.
Lazy static regular expressions checked at compile time.
Lazy static regular expressions checked at compile time.