Provide functions to estimate the coefficients in high-dimensional linear regressions via a tuning-free and robust approach. The method was published in Wang, L., Peng, B., Bradic, J., Li, R. and Wu, Y. (2020), "A Tuning-free Robust and Efficient Approach to High-dimensional Regression", Journal of the American Statistical Association, 115:532, 1700-1714(JASAâ s discussion paper), <doi:10.1080/01621459.2020.1840989>. See also Wang, L., Peng, B., Bradic, J., Li, R. and Wu, Y. (2020), "Rejoinder to â A tuning-free robust and efficient approach to high-dimensional regression". Journal of the American Statistical Association, 115, 1726-1729, <doi:10.1080/01621459.2020.1843865>; Peng, B. and Wang, L. (2015), "An Iterative Coordinate Descent Algorithm for High-Dimensional Nonconvex Penalized Quantile Regression", Journal of Computational and Graphical Statistics, 24:3, 676-694, <doi:10.1080/10618600.2014.913516>; Clémençon, S., Colin, I., and Bellet, A. (2016), "Scaling-up empirical risk minimization: optimization of incomplete u-statistics", The Journal of Machine Learning Research, 17(1):2682â 2717; Fan, J. and Li, R. (2001), "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties", Journal of the American Statistical Association, 96:456, 1348-1360, <doi:10.1198/016214501753382273>.
The Variable Infiltration Capacity (VIC) model is a macroscale hydrologic model that solves full water and energy balances, originally developed by Xu Liang at the University of Washington (UW). The version of VIC source code used is of 5.0.1 on <https://github.com/UW-Hydro/VIC/>, see Hamman et al. (2018). Development and maintenance of the current official version of the VIC model at present is led by the UW Hydro (Computational Hydrology group) in the Department of Civil and Environmental Engineering at UW. VIC is a research model and in its various forms it has been applied to most of the major river basins around the world, as well as globally <http://vic.readthedocs.io/en/master/Documentation/References/>. References: "Liang, X., D. P. Lettenmaier, E. F. Wood, and S. J. Burges (1994), A simple hydrologically based model of land surface water and energy fluxes for general circulation models, J. Geophys. Res., 99(D7), 14415-14428, <doi:10.1029/94JD00483>"; "Hamman, J. J., Nijssen, B., Bohn, T. J., Gergel, D. R., and Mao, Y. (2018), The Variable Infiltration Capacity model version 5 (VIC-5): infrastructure improvements for new applications and reproducibility, Geosci. Model Dev., 11, 3481-3496, <doi:10.5194/gmd-11-3481-2018>".
Fits relative survival regression models with or without proportional excess hazards and with the additional possibility to correct for background mortality by one or more parameter(s). These models are relevant when the observed mortality in the studied group is not comparable to that of the general population or in population-based studies where the available life tables used for net survival estimation are insufficiently stratified. In the latter case, the proposed model by Touraine et al. (2020) <doi:10.1177/0962280218823234> can be used. The user can also fit a model that relaxes the proportional expected hazards assumption considered in the Touraine et al. excess hazard model. This extension was proposed by Mba et al. (2020) <doi:10.1186/s12874-020-01139-z> to allow non-proportional effects of the additional variable on the general population mortality. In non-population-based studies, researchers can identify non-comparability source of bias in terms of expected mortality of selected individuals. An excess hazard model correcting this selection bias is presented in Goungounga et al. (2019) <doi:10.1186/s12874-019-0747-3>. This class of model with a random effect at the cluster level on excess hazard is presented in Goungounga et al. (2023) <doi:10.1002/bimj.202100210>.
This package provides a tool that allows users to generate various indices for evaluating statistical models. The fitstat()
function computes indices based on the fitting data. The valstat()
function computes indices based on the validation data set. Both fitstat()
and valstat()
will return 16 indices SSR: residual sum of squares, TRE: total relative error, Bias: mean bias, MRB: mean relative bias, MAB: mean absolute bias, MAPE: mean absolute percentage error, MSE: mean squared error, RMSE: root mean square error, Percent.RMSE: percentage root mean squared error, R2: coefficient of determination, R2adj: adjusted coefficient of determination, APC: Amemiya's prediction criterion, logL
: Log-likelihood, AIC: Akaike information criterion, AICc: corrected Akaike information criterion, BIC: Bayesian information criterion, HQC: Hannan-Quin information criterion. The lower the better for the SSR, TRE, Bias, MRB, MAB, MAPE, MSE, RMSE, Percent.RMSE, APC, AIC, AICc, BIC and HQC indices. The higher the better for R2 and R2adj indices. Petre Stoica, P., Selén, Y. (2004) <doi:10.1109/MSP.2004.1311138>\n Zhou et al. (2023) <doi:10.3389/fpls.2023.1186250>\n Ogana, F.N., Ercanli, I. (2021) <doi:10.1007/s11676-021-01373-1>\n Musabbikhah et al. (2019) <doi:10.1088/1742-6596/1175/1/012270>.
Fit unidimensional item response theory (IRT) models to a mixture of dichotomous and polytomous data, calibrate online item parameters (i.e., pretest and operational items), estimate examinees abilities, and examine the IRT model-data fit on item-level in different ways as well as provide useful functions related to IRT analyses such as IRT model-data fit evaluation and differential item functioning analysis. The bring.flexmirt()
and write.flexmirt()
functions were written by modifying the read.flexmirt()
function (Pritikin & Falk (2022) <doi:10.1177/0146621620929431>). The bring.bilog()
and bring.parscale()
functions were written by modifying the read.bilog()
and read.parscale()
functions, respectively (Weeks (2010) <doi:10.18637/jss.v035.i12>). The bisection()
function was written by modifying the bisection()
function (Howard (2017, ISBN:9780367657918)). The code of the inverse test characteristic curve scoring in the est_score()
function was written by modifying the irt.eq.tse()
function (González (2014) <doi:10.18637/jss.v059.i07>). In est_score()
function, the code of weighted likelihood estimation method was written by referring to the Pi()
, Ji()
, and Ii()
functions of the catR
package (Magis & Barrada (2017) <doi:10.18637/jss.v076.c01>).
This package provides a multi-visit clinical trial may collect participant responses on an ordinal scale and may utilize a stratified design, such as randomization within centers, to assess treatment efficacy across multiple visits. Baseline characteristics may be strongly associated with the outcome, and adjustment for them can improve power. The win ratio (ignores ties) and the win odds (accounts for ties) can be useful when analyzing these types of data from randomized controlled trials. This package provides straightforward functions for adjustment of the win ratio and win odds for stratification and baseline covariates, facilitating the comparison of test and control treatments in multi-visit clinical trials. For additional information concerning the methodologies and applied examples within this package, please refer to the following publications: 1. Weideman, A.M.K., Kowalewski, E.K., & Koch, G.G. (2024). â Randomization-based covariance adjustment of win ratios and win odds for randomized multi-visit studies with ordinal outcomes.â Journal of Statistical Research, 58(1), 33â 48. <doi:10.3329/jsr.v58i1.75411>. 2. Kowalewski, E.K., Weideman, A.M.K., & Koch, G.G. (2023). â SAS macro for randomization-based methods for covariance and stratified adjustment of win ratios and win odds for ordinal outcomes.â SESUG 2023 Proceedings, Paper 139-2023.
Supervised learning techniques designed for the situation when the dimensionality exceeds the sample size have a tendency to overfit as the dimensionality of the data increases. To remedy this High dimensionality; low sample size (HDLSS) situation, we attempt to learn a lower-dimensional representation of the data before learning a classifier. That is, we project the data to a situation where the dimensionality is more manageable, and then are able to better apply standard classification or clustering techniques since we will have fewer dimensions to overfit. A number of previous works have focused on how to strategically reduce dimensionality in the unsupervised case, yet in the supervised HDLSS regime, few works have attempted to devise dimensionality reduction techniques that leverage the labels associated with the data. In this package and the associated manuscript Vogelstein et al. (2017) <arXiv:1709.01233>
, we provide several methods for feature extraction, some utilizing labels and some not, along with easily extensible utilities to simplify cross-validative efforts to identify the best feature extraction method. Additionally, we include a series of adaptable benchmark simulations to serve as a standard for future investigative efforts into supervised HDLSS. Finally, we produce a comprehensive comparison of the included algorithms across a range of benchmark simulations and real data applications.
The analysis of microarray time series promises a deeper insight into the dynamics of the cellular response following stimulation. A common observation in this type of data is that some genes respond with quick, transient dynamics, while other genes change their expression slowly over time. The existing methods for detecting significant expression dynamics often fail when the expression dynamics show a large heterogeneity. Moreover, these methods often cannot cope with irregular and sparse measurements. The method proposed here is specifically designed for the analysis of perturbation responses. It combines different scores to capture fast and transient dynamics as well as slow expression changes, and performs well in the presence of low replicate numbers and irregular sampling times. The results are given in the form of tables including links to figures showing the expression dynamics of the respective transcript. These allow to quickly recognise the relevance of detection, to identify possible false positives and to discriminate early and late changes in gene expression. An extension of the method allows the analysis of the expression dynamics of functional groups of genes, providing a quick overview of the cellular response. The performance of this package was tested on microarray data derived from lung cancer cells stimulated with epidermal growth factor (EGF). Paper: Albrecht, Marco, et al. (2017)<DOI:10.1186/s12859-016-1440-8>.
The tsgc package provides comprehensive tools for the analysis and forecasting of epidemic trajectories. It is designed to model the progression of an epidemic over time while accounting for the various uncertainties inherent in real-time data. Underpinned by a dynamic Gompertz model, the package adopts a state space approach, using the Kalman filter for flexible and robust estimation of the non-linear growth pattern commonly observed in epidemic data. The reinitialization feature enhances the modelâ s ability to adapt to the emergence of new waves. The forecasts generated by the package are of value to public health officials and researchers who need to understand and predict the course of an epidemic to inform decision-making. Beyond its application in public health, the package is also a useful resource for researchers and practitioners in fields where the trajectories of interest resemble those of epidemics, such as innovation diffusion. The package includes functionalities for data preprocessing, model fitting, and forecast visualization, as well as tools for evaluating forecast accuracy. The core methodologies implemented in tsgc are based on well-established statistical techniques as described in Harvey and Kattuman (2020) <doi:10.1162/99608f92.828f40de>, Harvey and Kattuman (2021) <doi:10.1098/rsif.2021.0179>, and Ashby, Harvey, Kattuman, and Thamotheram (2024) <https://www.jbs.cam.ac.uk/wp-content/uploads/2024/03/cchle-tsgc-paper-2024.pdf>.
The Data Driven I-V Feature Extraction is used to extract Current-Voltage (I-V) features from I-V curves. I-V curves indicate the relationship between current and voltage for a solar cell or Photovoltaic (PV) modules. The I-V features such as maximum power point (Pmp), shunt resistance (Rsh), series resistance (Rs),short circuit current (Isc), open circuit voltage (Voc), fill factor (FF), current at maximum power (Imp) and voltage at maximum power(Vmp) contain important information of the performance for PV modules. The traditional method uses the single diode model to model I-V curves and extract I-V features. This package does not use the diode model, but uses data-driven a method which select different linear parts of the I-V curves to extract I-V features. This method also uses a sampling method to calculate uncertainties when extracting I-V features. Also, because of the partially shaded array, "steps" occurs in I-V curves. The "Segmented Regression" method is used to identify steps in I-V curves. This material is based upon work supported by the U.S. Department of Energyâ s Office of Energy Efficiency and Renewable Energy (EERE) under Solar Energy Technologies Office (SETO) Agreement Number DE-EE0007140. Further information can be found in the following paper. [1] Ma, X. et al, 2019. <doi:10.1109/JPHOTOV.2019.2928477>.
R codes for the (adaptive) Sum of Powered Score ('SPU and aSPU
') tests, inverse variance weighted Sum of Powered score ('SPUw and aSPUw
') tests and gene-based and some pathway based association tests (Pathway based Sum of Powered Score tests ('SPUpath'), adaptive SPUpath ('aSPUpath
') test, GEEaSPU
test for multiple traits - single SNP (single nucleotide polymorphism) association in generalized estimation equations, MTaSPUs
test for multiple traits - single SNP association with Genome Wide Association Studies ('GWAS') summary statistics, Gene-based Association Test that uses an extended Simes procedure ('GATES'), Hybrid Set-based Test ('HYST') and extended version of GATES test for pathway-based association testing ('GATES-Simes'). ). The tests can be used with genetic and other data sets with covariates. The response variable is binary or quantitative. Summary; (1) Single trait-'SNP set association with individual-level data ('aSPU
', aSPUw
', aSPUr
'), (2) Single trait-'SNP set association with summary statistics ('aSPUs
'), (3) Single trait-pathway association with individual-level data ('aSPUpath
'), (4) Single trait-pathway association with summary statistics ('aSPUsPath
'), (5) Multiple traits-single SNP association with individual-level data ('GEEaSPU
'), (6) Multiple traits- single SNP association with summary statistics ('MTaSPUs
'), (7) Multiple traits-'SNP set association with summary statistics('MTaSPUsSet
'), (8) Multiple traits-pathway association with summary statistics('MTaSPUsSetPath
').
Two papers published in the early 2000's (Zeeberg, B.R., Feng, W., Wang, G. et al. (2003) <doi:10.1186/gb-2003-4-4-r28>) and (Zeeberg, B.R., Qin, H., Narashimhan, S., et al. (2005) <doi:10.1186/1471-2105-6-168>) implement GoMiner
and High Throughput GoMiner
('HTGM') to map lists of genes to the Gene Ontology (GO) <https://geneontology.org>. Until recently, these were hosted on a server at The National Cancer Institute (NCI). In order to continue providing these services to the bio-medical community, I have developed stand-alone versions. The current package HTGM builds upon my recent package GoMiner
'. The output of GoMiner
is a heatmap showing the relationship of a single list of genes and the significant categories into which they map. High Throughput GoMiner
('HTGM') integrates the results of the individual GoMiner
analyses. The output of HTGM is a heatmap showing the relationship of the significant categories derived from each gene list. The heatmap has only 2 axes, so the identity of the genes are unfortunately "integrated out of the equation." Because the graphic for the heatmap is implemented in Scalable Vector Graphics (SVG) technology, it is relatively easy to hyperlink each picture element to the relevant list of genes. By clicking on the desired picture element, the user can recover the "lost" genes.
This package provides a music notation syntax and a collection of music programming functions for generating, manipulating, organizing, and analyzing musical information in R. Music syntax can be entered directly in character strings, for example to quickly transcribe short pieces of music. The package contains functions for directly performing various mathematical, logical and organizational operations and musical transformations on special object classes that facilitate working with music data and notation. The same music data can be organized in tidy data frames for a familiar and powerful approach to the analysis of large amounts of structured music data. Functions are available for mapping seamlessly between these formats and their representations of musical information. The package also provides an API to LilyPond
(<https://lilypond.org/>) for transcribing musical representations in R into tablature ("tabs") and sheet music. LilyPond
is open source music engraving software for generating high quality sheet music based on markup syntax. The package generates LilyPond
files from R code and can pass them to the LilyPond
command line interface to be rendered into sheet music PDF files or inserted into R markdown documents. The package offers nominal MIDI file output support in conjunction with rendering sheet music. The package can read MIDI files and attempts to structure the MIDI data to integrate as best as possible with the data structures and functionality found throughout the package.
This package provides functions for identifying, fitting, and applying continuous-space, continuous-time stochastic-process movement models to animal tracking data. The package is described in Calabrese et al (2016) <doi:10.1111/2041-210X.12559>, with models and methods based on those introduced and detailed in Fleming & Calabrese et al (2014) <doi:10.1086/675504>, Fleming et al (2014) <doi:10.1111/2041-210X.12176>, Fleming et al (2015) <doi:10.1103/PhysRevE.91.032107>
, Fleming et al (2015) <doi:10.1890/14-2010.1>, Fleming et al (2016) <doi:10.1890/15-1607>, Péron & Fleming et al (2016) <doi:10.1186/s40462-016-0084-7>, Fleming & Calabrese (2017) <doi:10.1111/2041-210X.12673>, Péron et al (2017) <doi:10.1002/ecm.1260>, Fleming et al (2017) <doi:10.1016/j.ecoinf.2017.04.008>, Fleming et al (2018) <doi:10.1002/eap.1704>, Winner & Noonan et al (2018) <doi:10.1111/2041-210X.13027>, Fleming et al (2019) <doi:10.1111/2041-210X.13270>, Noonan & Fleming et al (2019) <doi:10.1186/s40462-019-0177-1>, Fleming et al (2020) <doi:10.1101/2020.06.12.130195>, Noonan et al (2021) <doi:10.1111/2041-210X.13597>, Fleming et al (2022) <doi:10.1111/2041-210X.13815>, Silva et al (2022) <doi:10.1111/2041-210X.13786>, Alston & Fleming et al (2023) <doi:10.1111/2041-210X.14025>.
Allows researchers to conduct multivariate statistical analyses of survey data with list experiments. This survey methodology is also known as the item count technique or the unmatched count technique and is an alternative to the commonly used randomized response method. The package implements the methods developed by Imai (2011) <doi:10.1198/jasa.2011.ap10415>, Blair and Imai (2012) <doi:10.1093/pan/mpr048>, Blair, Imai, and Lyall (2013) <doi:10.1111/ajps.12086>, Imai, Park, and Greene (2014) <doi:10.1093/pan/mpu017>, Aronow, Coppock, Crawford, and Green (2015) <doi:10.1093/jssam/smu023>, Chou, Imai, and Rosenfeld (2017) <doi:10.1177/0049124117729711>, and Blair, Chou, and Imai (2018) <https://imai.fas.harvard.edu/research/files/listerror.pdf>. This includes a Bayesian MCMC implementation of regression for the standard and multiple sensitive item list experiment designs and a random effects setup, a Bayesian MCMC hierarchical regression model with up to three hierarchical groups, the combined list experiment and endorsement experiment regression model, a joint model of the list experiment that enables the analysis of the list experiment as a predictor in outcome regression models, a method for combining list experiments with direct questions, and methods for diagnosing and adjusting for response error. In addition, the package implements the statistical test that is designed to detect certain failures of list experiments, and a placebo test for the list experiment using data from direct questions.
This package provides functions to implement the methods of the Flood Estimation Handbook (FEH), associated updates and the revitalised flood hydrograph model (ReFH
). Currently the package uses NRFA peak flow dataset version 13. Aside from FEH functionality, further hydrological functions are available. Most of the methods implemented in this package are described in one or more of the following: "Flood Estimation Handbook", Centre for Ecology & Hydrology (1999, ISBN:0 948540 94 X). "Flood Estimation Handbook Supplementary Report No. 1", Kjeldsen (2007, ISBN:0 903741 15 7). "Regional Frequency Analysis - an approach based on L-moments", Hosking & Wallis (1997, ISBN: 978 0 521 01940 8). "Proposal of the extreme rank plot for extreme value analysis: with an emphasis on flood frequency studies", Hammond (2019, <doi:10.2166/nh.2019.157>). "Making better use of local data in flood frequency estimation", Environment Agency (2017, ISBN: 978 1 84911 387 8). "Sampling uncertainty of UK design flood estimation" , Hammond (2021, <doi:10.2166/nh.2021.059>). "Improving the FEH statistical procedures for flood frequency estimation", Environment Agency (2008, ISBN: 978 1 84432 920 5). "Low flow estimation in the United Kingdom", Institute of Hydrology (1992, ISBN 0 948540 45 1). Wallingford HydroSolutions
, (2016, <http://software.hydrosolutions.co.uk/winfap4/Urban-Adjustment-Procedure-Technical-Note.pdf>). Data from the UK National River Flow Archive (<https://nrfa.ceh.ac.uk/>, terms and conditions: <https://nrfa.ceh.ac.uk/costs-terms-and-conditions>).
Restic is a program that does backups right and was designed with the following principles in mind:
Easy: Doing backups should be a frictionless process, otherwise you might be tempted to skip it. Restic should be easy to configure and use, so that, in the event of a data loss, you can just restore it. Likewise, restoring data should not be complicated.
Fast: Backing up your data with restic should only be limited by your network or hard disk bandwidth so that you can backup your files every day. Nobody does backups if it takes too much time. Restoring backups should only transfer data that is needed for the files that are to be restored, so that this process is also fast.
Verifiable: Much more important than backup is restore, so restic enables you to easily verify that all data can be restored.
Secure: Restic uses cryptography to guarantee confidentiality and integrity of your data. The location the backup data is stored is assumed not to be a trusted environment (e.g. a shared space where others like system administrators are able to access your backups). Restic is built to secure your data against such attackers.
Efficient: With the growth of data, additional snapshots should only take the storage of the actual increment. Even more, duplicate data should be de-duplicated before it is actually written to the storage back end to save precious backup space.
An implementation of various learning algorithms based on fuzzy rule-based systems (FRBSs) for dealing with classification and regression tasks. Moreover, it allows to construct an FRBS model defined by human experts. FRBSs are based on the concept of fuzzy sets, proposed by Zadeh in 1965, which aims at representing the reasoning of human experts in a set of IF-THEN rules, to handle real-life problems in, e.g., control, prediction and inference, data mining, bioinformatics data processing, and robotics. FRBSs are also known as fuzzy inference systems and fuzzy models. During the modeling of an FRBS, there are two important steps that need to be conducted: structure identification and parameter estimation. Nowadays, there exists a wide variety of algorithms to generate fuzzy IF-THEN rules automatically from numerical data, covering both steps. Approaches that have been used in the past are, e.g., heuristic procedures, neuro-fuzzy techniques, clustering methods, genetic algorithms, squares methods, etc. Furthermore, in this version we provide a universal framework named frbsPMML
', which is adopted from the Predictive Model Markup Language (PMML), for representing FRBS models. PMML is an XML-based language to provide a standard for describing models produced by data mining and machine learning algorithms. Therefore, we are allowed to export and import an FRBS model to/from frbsPMML
'. Finally, this package aims to implement the most widely used standard procedures, thus offering a standard package for FRBS modeling to the R community.
Restic is a program that does backups right and was designed with the following principles in mind:
Easy: Doing backups should be a frictionless process, otherwise you might be tempted to skip it. Restic should be easy to configure and use, so that, in the event of a data loss, you can just restore it. Likewise, restoring data should not be complicated.
Fast: Backing up your data with restic should only be limited by your network or hard disk bandwidth so that you can backup your files every day. Nobody does backups if it takes too much time. Restoring backups should only transfer data that is needed for the files that are to be restored, so that this process is also fast.
Verifiable: Much more important than backup is restore, so restic enables you to easily verify that all data can be restored.
Secure: Restic uses cryptography to guarantee confidentiality and integrity of your data. The location the backup data is stored is assumed not to be a trusted environment (e.g. a shared space where others like system administrators are able to access your backups). Restic is built to secure your data against such attackers.
Efficient: With the growth of data, additional snapshots should only take the storage of the actual increment. Even more, duplicate data should be de-duplicated before it is actually written to the storage back end to save precious backup space.
The package calculates the indexes for selective stength in codon usage in bacteria species. (1) The package can calculate the strength of selected codon usage bias (sscu, also named as s_index) based on Paul Sharp's method. The method take into account of background mutation rate, and focus only on four pairs of codons with universal translational advantages in all bacterial species. Thus the sscu index is comparable among different species. (2) The package can detect the strength of translational accuracy selection by Akashi's test. The test tabulating all codons into four categories with the feature as conserved/variable amino acids and optimal/non-optimal codons. (3) Optimal codon lists (selected codons) can be calculated by either op_highly function (by using the highly expressed genes compared with all genes to identify optimal codons), or op_corre_CodonW/op_corre_NCprime
function (by correlative method developed by Hershberg & Petrov). Users will have a list of optimal codons for further analysis, such as input to the Akashi's test. (4) The detailed codon usage information, such as RSCU value, number of optimal codons in the highly/all gene set, as well as the genomic gc3 value, can be calculate by the optimal_codon_statistics and genomic_gc3 function. (5) Furthermore, we added one test function low_frequency_op in the package. The function try to find the low frequency optimal codons, among all the optimal codons identified by the op_highly function.
The rapid development of single-cell transcriptomic technologies has helped uncover the cellular heterogeneity within cell populations. However, bulk RNA-seq continues to be the main workhorse for quantifying gene expression levels due to technical simplicity and low cost. To most effectively extract information from bulk data given the new knowledge gained from single-cell methods, we have developed a novel algorithm to estimate the cell-type composition of bulk data from a single-cell RNA-seq-derived cell-type signature. Comparison with existing methods using various real RNA-seq data sets indicates that our new approach is more accurate and comprehensive than previous methods, especially for the estimation of rare cell types. More importantly,our method can detect cell-type composition changes in response to external perturbations, thereby providing a valuable, cost-effective method for dissecting the cell-type-specific effects of drug treatments or condition changes. As such, our method is applicable to a wide range of biological and clinical investigations. Dampened weighted least squares ('DWLS') is an estimation method for gene expression deconvolution, in which the cell-type composition of a bulk RNA-seq data set is computationally inferred. This method corrects common biases towards cell types that are characterized by highly expressed genes and/or are highly prevalent, to provide accurate detection across diverse cell types. See: <https://www.nature.com/articles/s41467-019-10802-z.pdf> for more information about the development of DWLS and the methods behind our functions.
Calculates the interval estimates for the parameters of linear models with heteroscedastic regression using bootstrap - (Wild Bootstrap) and double bootstrap-t (Wild Bootstrap). It is also possible to calculate confidence intervals using the percentile bootstrap and percentile bootstrap double. The package can calculate consistent estimates of the covariance matrix of the parameters of linear regression models with heteroscedasticity of unknown form. The package also provides a function to consistently calculate the covariance matrix of the parameters of linear models with heteroscedasticity of unknown form. The bootstrap methods exported by the package are based on the master's thesis of the first author, available at <https://raw.githubusercontent.com/prdm0/hcci/master/references/dissertacao_mestrado.pdf>. The hcci package in previous versions was cited in the book VINOD, Hrishikesh D. Hands-on Intermediate Econometrics Using R: Templates for Learning Quantitative Methods and R Software. 2022, p. 441, ISBN 978-981-125-617-2 (hardcover). The simple bootstrap schemes are based on the works of Cribari-Neto F and Lima M. G. (2009) <doi:10.1080/00949650801935327>, while the double bootstrap schemes for the parameters that index the linear models with heteroscedasticity of unknown form are based on the works of Beran (1987) <doi:10.2307/2336685>. The use of bootstrap for the calculation of interval estimates in regression models with heteroscedasticity of unknown form from a weighting of the residuals was proposed by Wu (1986) <doi:10.1214/aos/1176350142>. This bootstrap scheme is known as weighted or wild bootstrap.
An algorithm for identifying candidate driver combinations in cancer. CRSO is based on a theoretical model of cancer in which a cancer rule is defined to be a collection of two or more events (i.e., alterations) that are minimally sufficient to cause cancer. A cancer rule set is a set of cancer rules that collectively are assumed to account for all of ways to cause cancer in the population. In CRSO every event is designated explicitly as a passenger or driver within each patient. Each event is associated with a patient-specific, event-specific passenger penalty, reflecting how unlikely the event would have happened by chance, i.e., as a passenger. CRSO evaluates each rule set by assigning all samples to a rule in the rule set, or to the null rule, and then calculating the total statistical penalty from all unassigned event. CRSO uses a three phase procedure find the best rule set of fixed size K for a range of Ks. A core rule set is then identified from among the best rule sets of size K as the rule set that best balances rule set size and statistical penalty. Users should consult the crso vignette for an example walk through of a full CRSO run. The full description, of the CRSO algorithm is presented in: Klein MI, Cannataro V, Townsend J, Stern DF and Zhao H. "Identifying combinations of cancer driver in individual patients." BioRxiv
674234 [Preprint]. June 19, 2019. <doi:10.1101/674234>. Please cite this article if you use crso'.
Model for simulating language evolution in terms of cultural evolution (Smith & Kirby (2008) <DOI:10.1098/rstb.2008.0145>; Deacon 1997). The focus is on the emergence of argument-marking systems (Dowty (1991) <DOI:10.1353/lan.1991.0021>, Van Valin 1999, Dryer 2002, Lestrade 2015a), i.e. noun marking (Aristar (1997) <DOI:10.1075/sl.21.2.04ari>, Lestrade (2010) <DOI:10.7282/T3ZG6R4S>), person indexing (Ariel 1999, Dahl (2000) <DOI:10.1075/fol.7.1.03dah>, Bhat 2004), and word order (Dryer 2013), but extensions are foreseen. Agents start out with a protolanguage (a language without grammar; Bickerton (1981) <DOI:10.17169/langsci.b91.109>, Jackendoff 2002, Arbib (2015) <DOI:10.1002/9781118346136.ch27>) and interact through language games (Steels 1997). Over time, grammatical constructions emerge that may or may not become obligatory (for which the tolerance principle is assumed; Yang 2016). Throughout the simulation, uniformitarianism of principles is assumed (Hopper (1987) <DOI:10.3765/bls.v13i0.1834>, Givon (1995) <DOI:10.1075/z.74>, Croft (2000), Saffran (2001) <DOI:10.1111/1467-8721.01243>, Heine & Kuteva 2007), in which maximal psychological validity is aimed at (Grice (1975) <DOI:10.1057/9780230005853_5>, Levelt 1989, Gaerdenfors 2000) and language representation is usage based (Tomasello 2003, Bybee 2010). In Lestrade (2015b) <DOI:10.15496/publikation-8640>, Lestrade (2015c) <DOI:10.1075/avt.32.08les>, and Lestrade (2016) <DOI:10.17617/2.2248195>), which reported on the results of preliminary versions, this package was announced as WDWTW (for who does what to whom), but for reasons of pronunciation and generalization the title was changed.