An integrated pipeline to predict the potential synthetic lethality partners (SLPs) of tumour mutations, based on gene expression, mutation profiling and cell line genetic screens data. It has builtd-in support for data from cBioPortal. The primary SLPs correlating with muations in WT and compensating for the loss of function of mutations are predicted by random forest based methods (GENIE3) and Rank Products, respectively. Genetic screens are employed to identfy consensus SLPs leads to reduced cell viability when perturbed.
Query the four endpoints of the Air and Water Database (AWDB) REST API maintained by the National Water and Climate Center (NWCC) at the United States Department of Agriculture (USDA). Endpoints include data, forecast, reference-data, and metadata. The package is extremely light weight, with Rust via extendr doing most of the heavy lifting to deserialize and flatten deeply nested JSON responses. The AWDB can be found at <https://wcc.sc.egov.usda.gov/awdbRestApi/swagger-ui/index.html>.
Compute expected shortfall (ES) and Value at Risk (VaR) from a quantile function, distribution function, random number generator, probability density function, or data. ES is also known as Conditional Value at Risk (CVaR). Virtually any continuous distribution can be specified. The functions are vectorized over the arguments. The computations are done directly from the definitions, see e.g. Acerbi and Tasche (2002) <doi:10.1111/1468-0300.00091>. Some support for GARCH models is provided, as well.
This package implements the Improved Expectation Maximisation EM* and the traditional EM algorithm for clustering big data (gaussian mixture models for both multivariate and univariate datasets). This version implements the faster alternative-EM* that expedites convergence via structure based data segregation. The implementation supports both random and K-means++ based initialization. Reference: Parichit Sharma, Hasan Kurban, Mehmet Dalkilic (2022) <doi:10.1016/j.softx.2021.100944>. Hasan Kurban, Mark Jenne, Mehmet Dalkilic (2016) <doi:10.1007/s41060-017-0062-1>.
Cross-validated eigenvalues are estimated by splitting a graph into two parts, the training and the test graph. The training graph is used to estimate eigenvectors, and the test graph is used to evaluate the correlation between the training eigenvectors and the eigenvectors of the test graph. The correlations follow a simple central limit theorem that can be used to estimate graph dimension via hypothesis testing, see Chen et al. (2021) <doi:10.48550/arXiv.2108.03336> for details.
Multidimensional nonparametric spatial (spatio-temporal) geostatistics. S3 classes and methods for multidimensional: linear binning, local polynomial kernel regression (spatial trend estimation), density and variogram estimation. Nonparametric methods for simultaneous inference on both spatial trend and variogram functions (for spatial processes). Nonparametric residual kriging (spatial prediction). For details on these methods see, for example, Fernandez-Casal and Francisco-Fernandez (2014) <doi:10.1007/s00477-013-0817-8> or Castillo-Paez et al. (2019) <doi:10.1016/j.csda.2019.01.017>.
Calculate the ratio of iron oxides, hematite and goethite, in soil using the diffuse reflectance technique. The Kubelka-Munk theory, second derivative analysis, and spectral region amplitudes related to hematite and goethite content are used for quantification (Torrent, J., & Barron, V. (2008) <doi:10.2136/sssabookser5.5.c13>). Additionally, the package calculates soil color in the visible spectrum using Munsell and RGB color spaces, based on color theory (Viscarra et al. (2006) <doi:10.1016/j.geoderma.2005.07.017>).
Computes the optimal sample size for various 2-group designs (e.g., when comparing the means of two groups assuming equal variances, unequal variances, or comparing proportions) when the aim is to maximize the rewards over the full decision procedure of a) running a trial (with the computed sample size), and b) subsequently administering the winning treatment to the remaining N-n units in the population. Sample sizes and expected rewards for standard t- and z- tests are also provided.
Stagewise techniques implemented with Generalized Estimating Equations to handle individual, group, bi-level, and interaction selection. Stagewise approaches start with an empty model and slowly build the model over several iterations, which yields a path of candidate models from which model selection can be performed. This slow brewing approach gives stagewise techniques a unique flexibility that allows simple incorporation of Generalized Estimating Equations; see Vaughan, G., Aseltine, R., Chen, K., Yan, J., (2017) <doi:10.1111/biom.12669> for details.
Many of the models encountered in applications of point process methods to the study of spatio-temporal phenomena are covered in stpp'. This package provides statistical tools for analyzing the global and local second-order properties of spatio-temporal point processes, including estimators of the space-time inhomogeneous K-function and pair correlation function. It also includes tools to get static and dynamic display of spatio-temporal point patterns. See Gabriel et al (2013) <doi:10.18637/jss.v053.i02>.
Penalized weighted least-squares estimate for variable selection on correlated multiply imputed data and penalized estimating equations for generalized linear models with multiple imputation. Reference: Li, Y., Yang, H., Yu, H., Huang, H., Shen, Y*. (2023) "Penalized estimating equations for generalized linear models with multiple imputation", <doi:10.1214/22-AOAS1721>. Li, Y., Yang, H., Yu, H., Huang, H., Shen, Y*. (2023) "Penalized weighted least-squares estimate for variable selection on correlated multiply imputed data", <doi:10.1093/jrsssc/qlad028>.
This package provides an object-oriented modeling language for disciplined convex programming (DCP) as described in Fu, Narasimhan, and Boyd (2020, <doi:10.18637/jss.v094.i14>). It allows the user to formulate convex optimization problems in a natural way following mathematical convention and DCP rules. The system analyzes the problem, verifies its convexity, converts it into a canonical form, and hands it off to an appropriate solver to obtain the solution. Interfaces to solvers on CRAN and elsewhere are provided.
DNA methylation contains information about the regulatory state of the cell. MIRA aggregates genome-scale DNA methylation data into a DNA methylation profile for a given region set with shared biological annotation. Using this profile, MIRA infers and scores the collective regulatory activity for the region set. MIRA facilitates regulatory analysis in situations where classical regulatory assays would be difficult and allows public sources of region sets to be leveraged for novel insight into the regulatory state of DNA methylation datasets.
This package provides a scale based normalization (SCBN) method to identify genes with differential expression between different species. It takes into account the available knowledge of conserved orthologous genes and the hypothesis testing framework to detect differentially expressed orthologous genes. The method on this package are described in the article A statistical normalization method and differential expression analysis for RNA-seq data between different species by Yan Zhou, Jiadi Zhu, Tiejun Tong, Junhui Wang, Bingqing Lin, Jun Zhang (2018, pending publication).
This is an implementation of design methods for binomial reliability demonstration tests (BRDTs) with failure count data. The acceptance decision uncertainty of BRDT has been quantified and the impacts of the uncertainty on related reliability assurance activities such as reliability growth (RG) and warranty services (WS) are evaluated. This package is associated with the work from the published paper "Optimal Binomial Reliability Demonstration Tests Design under Acceptance Decision Uncertainty" by Suiyao Chen et al. (2020) <doi:10.1080/08982112.2020.1757703>.
This package provides a self-contained set of methods to aid clinical trial safety investigators, statisticians and researchers, in the early detection of adverse events using groupings by body-system or system organ class. This work was supported by the Engineering and Physical Sciences Research Council (UK) (EPSRC) [award reference 1521741] and Frontier Science (Scotland) Ltd. The package title c212 is in reference to the original Engineering and Physical Sciences Research Council (UK) funded project which was named CASE 2/12.
Neural network has potential in forestry modelling. This package is designed to create and assess Artificial Intelligence based Neural Networks with varying architectures for prediction of volume of forest trees using two input features: height and diameter at breast height, as they are the key factors in predicting volume, therefore development and validation of efficient volume prediction neural network model is necessary. This package has been developed using the algorithm of Tabassum et al. (2022) <doi:10.18805/ag.D-5555>.
This package provides a lightweight, dependency-free data engine for R that provides a grammar for tabular and time-series manipulation. Built entirely on Base R, m61r offers a fluent, chainable API inspired by modern data tools while prioritizing memory efficiency and speed. It includes optimized versions of common data verbs such as filtering, mutation, grouped aggregation, and approximate temporal joins, making it an ideal choice for environments where external dependencies are restricted or where performance in pure R is required.
Ing and Lai (2011) <doi:10.5705/ss.2010.081> proposed a high-dimensional model selection procedure that comprises three steps: orthogonal greedy algorithm (OGA), high-dimensional information criterion (HDIC), and Trim. The first two steps, OGA and HDIC, are used to sequentially select input variables and determine stopping rules, respectively. The third step, Trim, is used to delete irrelevant variables remaining in the second step. This package aims at fitting a high-dimensional linear regression model via OGA+HDIC+Trim.
Fits the regularization path of regression models (linear and logistic) with additively combined penalty terms. All possible combinations with Least Absolute Shrinkage and Selection Operator (LASSO), Smoothly Clipped Absolute Deviation (SCAD), Minimax Concave Penalty (MCP) and Exponential Penalty (EP) are supported. This includes Sparse Group LASSO (SGL), Sparse Group SCAD (SGS), Sparse Group MCP (SGM) and Sparse Group EP (SGE). For more information, see Buch, G., Schulz, A., Schmidtmann, I., Strauch, K., & Wild, P. S. (2024) <doi:10.1002/bimj.202200334>.
This package provides a set of functions is provided for 1) the stratum lengths analysis along a chosen direction, 2) fast estimation of continuous lag spatial Markov chains model parameters and probability computing (also for large data sets), 3) transition probability maps and transiograms drawing, 4) simulation methods for categorical random fields. More details on the methodology are discussed in Sartore (2013) <doi:10.32614/RJ-2013-022> and Sartore et al. (2016) <doi:10.1016/j.cageo.2016.06.001>.
Fit Bayesian hierarchical models of animal abundance and occurrence via the rstan package, the R interface to the Stan C++ library. Supported models include single-season occupancy, dynamic occupancy, and N-mixture abundance models. Covariates on model parameters are specified using a formula-based interface similar to package unmarked', while also allowing for estimation of random slope and intercept terms. References: Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>; Fiske and Chandler (2011) <doi:10.18637/jss.v043.i10>.
This package provides support for transformations of numeric aggregates between statistical classifications (e.g. occupation or industry categorisations) using the Crossmaps framework. Implements classes for representing transformations between a source and target classification as graph structures, and methods for validating and applying crossmaps to transform data collected under the source classification into data indexed using the target classification codes. Documentation about the Crossmaps framework is provided in the included vignettes and in Huang (2024, <doi:10.48550/arXiv.2406.14163>).
This package contains extensions to ggplot2.
Geomas:
geom_table,geom_plotandgeom_grobadd insets to plots using native data coordinates, whilegeom_table_npc,geom_plot_npcandgeom_grob_npcdo the same usingnpccoordinates through new aestheticsnpcxandnpcy.Statistics: select observations based on 2D density.
Positions: radial nudging away from a center point and nudging away from a line or curve.