Implementations of algorithms for data analysis based on the rough set theory (RST) and the fuzzy rough set theory (FRST). We not only provide implementations for the basic concepts of RST and FRST but also popular algorithms that derive from those theories. The methods included in the package can be divided into several categories based on their functionality: discretization, feature selection, instance selection, rule induction and classification based on nearest neighbors. RST was introduced by ZdzisÅ aw Pawlak in 1982 as a sophisticated mathematical tool to model and process imprecise or incomplete information. By using the indiscernibility relation for objects/instances, RST does not require additional parameters to analyze the data. FRST is an extension of RST. The FRST combines concepts of vagueness and indiscernibility that are expressed with fuzzy sets (as proposed by Zadeh, in 1965) and RST.
This package provides tools for fitting possibly high dimensional penalized regression models. The penalty structure can be any combination of an L1 penalty (lasso and fused lasso), an L2 penalty (ridge) and a positivity constraint on the regression coefficients. The supported regression models are linear, logistic and Poisson regression and the Cox Proportional Hazards model. Cross-validation routines allow optimization of the tuning parameters.
This package provides functions to work with date-times and time-spans: fast and user friendly parsing of date-time data, extraction and updating of components of a date-time (years, months, days, hours, minutes, and seconds), algebraic manipulation on date-time and time-span objects. The lubridate package has a consistent and memorable syntax that makes working with dates easy and fun.
This package provides functions to visualise webs and calculate a series of indices commonly used to describe pattern in (ecological) webs. It focuses on webs consisting of only two levels (bipartite), e.g. pollination webs or predator-prey-webs. Visualisation is important to get an idea of what we are actually looking at, while the indices summarise different aspects of the web's topology.
This package provides tools to compute and represent gene set enrichment or depletion from your data based on pre-saved maps from the Atlas of Cancer Signalling Networks (ACSN) or user imported maps. The gene set enrichment can be run with hypergeometric test or Fisher exact test, and can use multiple corrections. Visualization of data can be done either by barplots or heatmaps.
The rcs package utilizes the inclusion of RCS supplied data in LaTeX documents. In particular, you can easily access values of every RCS field in your document put the checkin date on the titlepage or put RCS fields in a footline. You can also typeset revision logs. You can also configure the rcs package easily to do special things for any keyword.
DNAfusion can identify gene fusions such as EML4-ALK based on paired-end sequencing results. This package was developed using position deduplicated BAM files generated with the AVENIO Oncology Analysis Software. These files are made using the AVENIO ctDNA surveillance kit and Illumina Nextseq 500 sequencing. This is a targeted hybridization NGS approach and includes ALK-specific but not EML4-specific probes.
Estimates heterogeneous effects in factorial (and conjoint) models. The methodology employs a Bayesian finite mixture of regularized logistic regressions, where moderators can affect each observation's probability of group membership and a sparsity-inducing prior fuses together levels of each factor while respecting ANOVA-style sum-to-zero constraints. Goplerud, Imai, and Pashley (2024) <doi:10.48550/ARXIV.2201.01357> provide further details.
Estimate natural mortality (M) throughout the life history for organisms, mainly fish and invertebrates, based on gnomonic interval approach proposed by Caddy (1996) <doi:10.1051/alr:1996023> and Martinez-Aguilar et al. (2005) <doi:10.1016/j.fishres.2004.04.008>. It includes estimation of duration of each gnomonic interval (life stage), the constant probability of death (G), and some basic plots.
Fast scalable Gaussian process approximations, particularly well suited to spatial (aerial, remote-sensed) and environmental data, described in more detail in Katzfuss and Guinness (2017) <arXiv:1708.06302>. Package also contains a fast implementation of the incomplete Cholesky decomposition (IC0), based on Schaefer et al. (2019) <arXiv:1706.02205> and MaxMin ordering proposed in Guinness (2018) <arXiv:1609.05372>.
Two-Step Lasso (TS-Lasso) and compound minimum methods to recover the abundance of missing peaks in mass spectrum analysis. TS-Lasso is an imputation method that handles various types of missing peaks simultaneously. This package provides the procedure to generate missing peaks (or data) for simulation study, as well as a tool to estimate and visualize the proportion of missing at random.
This package implements a one-sector Armington-CES gravity model with general equilibrium (GE) effects. This model is designed to analyze international and domestic trade by capturing the impacts of trade costs and policy changes within a general equilibrium framework. Additionally, it includes a local parameter to run simulations on productivity. The package provides functions for calibration, simulation, and analysis of the model.
An implementation of corrected sandwich variance (CSV) estimation method for making inference of marginal hazard ratios (HR) in inverse probability weighted (IPW) Cox model without and with clustered data, proposed by Shu, Young, Toh, and Wang (2019) in their paper under revision for Biometrics. Both conventional inverse probability weights and stabilized weights are implemented. Logistic regression model is assumed for propensity score model.
Computes bilateral and multilateral index numbers. It has support for many standard bilateral indexes as well as multilateral index number methods such as GEKS, GEKS-Tornqvist (or CCDI), Geary-Khamis and the weighted time product dummy (for details on these methods see Diewert and Fox (2020) <doi:10.1080/07350015.2020.1816176>). It also supports updating of multilateral indexes using several splicing methods.
Analysis of DNA copy number in single cells using custom genome-wide targeted DNA sequencing panels for the Mission Bio Tapestri platform. Users can easily parse, manipulate, and visualize datasets produced from the automated Tapestri Pipeline', with support for normalization, clustering, and copy number calling. Functions are also available to deconvolute multiplexed samples by genotype and parsing barcoded reads from exogenous lentiviral constructs.
Computation of the multivariate marine recovery index, including functions for data visualization and ecological diagnostics of marine ecosystems. The computational details are described in the original publication. Reference: Chauvel, N., Grall, J., Thiébaut, E., Houbin, C., Pezy, J.P. (in press). "A general-purpose Multivariate Marine Recovery Index for quantifying the influence of human activities on benthic habitat ecological status". Ecological Indicators.
Stanford CoreNLP annotation client. Stanford CoreNLP <https://stanfordnlp.github.io/CoreNLP/index.html> integrates all NLP tools from the Stanford Natural Language Processing Group, including a part-of-speech (POS) tagger, a named entity recognizer (NER), a parser, and a coreference resolution system, and provides model files for the analysis of English. More information can be found in the README.
Splits initial strata into refined strata that optimize covariate balance. For more information, please email the author for a copy of the accompanying manuscript. To solve the linear program, the Gurobi commercial optimization software is recommended, but not required. The gurobi R package can be installed following the instructions at <https://www.gurobi.com/documentation/9.1/refman/ins_the_r_package.html>.
We innovatively defined a pathway mutation accumulate perturbation score (PMAPscore) to reflect the position and the cumulative effect of the genetic mutations at the pathway level. Based on the PMAPscore of pathways, identified prognosis-related pathways altered by somatic mutation and predict immunotherapy efficacy by constructing a multiple-pathway-based risk model (Tarca, Adi Laurentiu et al (2008) <doi:10.1093/bioinformatics/btn577>).
Fetches the PREDICTS database and relevant metadata from the Data Portal at the Natural History Museum, London <https://data.nhm.ac.uk>. Data were collated from over 400 existing spatial comparisons of local-scale biodiversity exposed to different intensities and types of anthropogenic pressures, from sites around the world. These data are described in Hudson et al. (2013) <doi:10.1002/ece3.2579>.
Generates predicted stage change days for an insect, based on daily temperatures and development rate parameters, as developed by Pollard (2014) <http://mural.maynoothuniversity.ie/view/ethesisauthor/Pollard=3ACiaran_P=2E=3A=3A.html>. A few example datasets are included and implemented for P. vulgatissima, the blue willow beetle, but the approach can be readily applied to other species that display similar behaviour.
This package contains statistical inference tools applied to Partial Linear Regression (PLR) models. Specifically, point estimation, confidence intervals estimation, bandwidth selection, goodness-of-fit tests and analysis of covariance are considered. Kernel-based methods, combined with ordinary least squares estimation, are used and time series errors are allowed. In addition, these techniques are also implemented for both parametric (linear) and nonparametric regression models.
Convert text (and text in R objects) to Mocking SpongeBob case <https://knowyourmeme.com/memes/mocking-spongebob> and show them off in fun ways. CoNVErT TexT (AnD TeXt In r ObJeCtS) To MOCkINg SpoNgebOb CAsE <https://knowyourmeme.com/memes/mocking-spongebob> aND shOw tHem OFf IN Fun WayS.
Construct sketches of data via random subspace embeddings. For more details, see the following papers. Lee, S. and Ng, S. (2022). "Least Squares Estimation Using Sketched Data with Heteroskedastic Errors," Proceedings of the 39th International Conference on Machine Learning (ICML22), 162:12498-12520. Lee, S. and Ng, S. (2020). "An Econometric Perspective on Algorithmic Subsampling," Annual Review of Economics, 12(1): 45â 80.