Title: | Heritability-Based Estimation of Sample Size for RNA-Seq Data |
---|---|
Description: | Provides tools for estimating sample sizes primarily based on heritability, while also considering additional parameters such as statistical power and fold change. The package normalizes heritability values according to trait-specific heritability and classification to enhance accuracy in sample size estimation. |
Authors: | Naina Kumari [aut], Jagajjit Sahu [aut], Sarika Jaiswal [aut, cre], Mir Asif Iquebal [aut], Dinesh Kumar [aut] |
Maintainer: | Sarika Jaiswal <[email protected]> |
License: | GPL-3 |
Version: | 1.0.1 |
Built: | 2025-01-11 09:32:20 UTC |
Source: | https://github.com/cran/HEssRNA |
This function processes heritability index data, filtering out empty trait names, and calculates the mean heritability for each unique trait. The resulting output is a data frame with traits and their corresponding mean heritability values.
hIndxMeanCalc4Traits(hIndexValDF)
hIndxMeanCalc4Traits(hIndexValDF)
hIndexValDF |
A data frame containing heritability index values with at least two columns: |
A data frame with two columns: Trait.name
and MeanValue
, where MeanValue
represents the mean heritability for each trait.
Hu et al. (2018) doi:10.1093/nar/gky1084
# Example of usage: hIndexValDF <- data.frame(Trait.name = c("Trait1", "Trait2", "Trait1", "Trait2"), Heritability = c(0.5, 0.6, 0.7, 0.8)) result <- hIndxMeanCalc4Traits(hIndexValDF) print(result)
# Example of usage: hIndexValDF <- data.frame(Trait.name = c("Trait1", "Trait2", "Trait1", "Trait2"), Heritability = c(0.5, 0.6, 0.7, 0.8)) result <- hIndxMeanCalc4Traits(hIndexValDF) print(result)
This function takes the required input information such as count data, sample data, etc. to calculate the power. It filters the input count data, performs DESeq2 analysis to calculate differentially expressed genes (DEGs), and then calculates the power of detecting these DEGs based on simulations.
powerCalc( countDat, smplDat, alpha = 0.05, thrsholdFC = 2, inptNoOfReplicates = 3, sims = 10 )
powerCalc( countDat, smplDat, alpha = 0.05, thrsholdFC = 2, inptNoOfReplicates = 3, sims = 10 )
countDat |
A matrix or data frame of raw count data where rows represent genes and columns represent samples. |
smplDat |
A data frame of sample information, with at least a |
alpha |
The significance level (FDR threshold) used to identify differentially expressed genes. Default is 0.05. |
thrsholdFC |
The threshold for the absolute value of log2 fold change used to filter DEGs. Default is 2. |
inptNoOfReplicates |
The input number of replicates based on which the power will be calculated. Default is 3. |
sims |
The number of simulations to run for power calculation. Default is 10. |
Example files included with this package:
exmplCountDat.csv
: A toy dataset with count data.
exmplSampleDat.csv
: A sample dataset with metadata.
These files are stored in the inst/extdata
directory and can be accessed
using the system.file()
function in R.
A data frame containing the calculated power values and related parameters.
Bi et al. (2016) doi:10.1186/s12859-016-0994-9 Love et al. (2014) doi:10.1186/s13059-014-0550-8
# Load example files countDatPath <- system.file("extdata", "exmplCountDat.csv", package = "HEssRNA") smplDatPath <- system.file("extdata", "exmplSampleDat.csv", package = "HEssRNA") if (file.exists(countDatPath) && file.exists(smplDatPath)) { countDat <- read.csv(countDatPath) smplDat <- read.csv(smplDatPath) result <- powerCalc(countDat, smplDat) print(result$PowerResults) } else { warning("Example data files not found.") }
# Load example files countDatPath <- system.file("extdata", "exmplCountDat.csv", package = "HEssRNA") smplDatPath <- system.file("extdata", "exmplSampleDat.csv", package = "HEssRNA") if (file.exists(countDatPath) && file.exists(smplDatPath)) { countDat <- read.csv(countDatPath) smplDat <- read.csv(smplDatPath) result <- powerCalc(countDat, smplDat) print(result$PowerResults) } else { warning("Example data files not found.") }
This function takes a data frame in an in-house format and processes it to make it in longer format and round the value of the power to 3 digits for building a model. It reshapes the data from a wide format to a long format, extracting and manipulating columns related to replicate numbers and power values. This function is needed when user has a data frame similar to the in-house format. For the purpose of creating model the user should also have Heritability class and log fold change value too.
prcesDF4modelInhouse(df4modelInhouseFmt)
prcesDF4modelInhouse(df4modelInhouseFmt)
df4modelInhouseFmt |
A data frame containing the input data in in-house format. The columns should include replicate columns named starting with "R" (e.g., R1, R2, etc.). |
A data frame in long format with columns:
NoOfReplicates |
Numeric representation of the replicate number extracted from column names (R1, R2, etc.). |
pwr |
Power values rounded to 3 decimal places corresponding to the replicate number. |
# Example of usage: df <- data.frame( Gene = c("Gene1", "Gene2"), R1 = c(0.85, 0.90), R2 = c(0.88, 0.91), R3 = c(0.83, 0.89) ) result <- prcesDF4modelInhouse(df) print(result)
# Example of usage: df <- data.frame( Gene = c("Gene1", "Gene2"), R1 = c(0.85, 0.90), R2 = c(0.88, 0.91), R3 = c(0.83, 0.89) ) result <- prcesDF4modelInhouse(df) print(result)
This function predicts the number of replicates required for a given experiment based on heritability, power, fold change, and tissue type. The model is constructed using the provided data, and the prediction is adjusted based on the selected trait's mean heritability value. The function ensures that the predicted replicates are valid, rounding negative or unrealistic values to sensible minimums based on the heritability class.
smplSizPred( df4model = df4modelInpt, hIndexMeanDFinput = hIndexMeanDF, heritabilityClass, inptPwr, fc, trait = NULL, tissue = NULL )
smplSizPred( df4model = df4modelInpt, hIndexMeanDFinput = hIndexMeanDF, heritabilityClass, inptPwr, fc, trait = NULL, tissue = NULL )
df4model |
A data frame containing the input data for the model. It should include the following columns: |
hIndexMeanDFinput |
A data frame containing the mean heritability values for each trait. It should include at least the columns |
heritabilityClass |
A character string specifying the heritability class used for filtering and adjusting the prediction. Possible values are "low", "mid", and "high". |
inptPwr |
A numeric value representing the power used in the model. |
fc |
A numeric value representing the fold change used in the model. |
trait |
An optional parameter specifying the trait. If provided, the heritability value for the trait will be used to adjust the heritability class values. |
tissue |
An optional parameter specifying the tissue type. If provided, the model will include tissue as a factor in the regression. If not provided, tissue is excluded. |
A numeric value representing the predicted number of replicates. The value is rounded to the nearest whole number and adjusted to ensure it is valid for the selected heritability class.
Sun et al. (2017) doi:10.1093/nar/gkx204
# Example usage: df4modelInpt <- data.frame( NoOfReplicates = c(3, 5, 7, 9, 11), HeritabilityClass = c("high", "mid", "low", "high", "mid"), HeritabilityValue = c(0.5, 0.6, 0.7, 0.5, 0.6), pwr = c(0.8, 0.9, 0.85, 0.88, 0.86), FoldChange = c(2, 3, 2.5, 2.8, 3.2), Tissue = c("Liver", "Liver", "Kidney", "Liver", "Kidney") ) hIndexMeanDF <- data.frame(Trait.name = c("Trait1", "Trait2"), MeanValue = c(0.3, 0.5)) NoOfReplicatesPred <- smplSizPred(df4model = df4modelInpt, hIndexMeanDFinput = hIndexMeanDF, heritabilityClass = "mid", inptPwr = 0.85, fc = 2.5, trait = "Trait1", tissue = "Liver") print(NoOfReplicatesPred)
# Example usage: df4modelInpt <- data.frame( NoOfReplicates = c(3, 5, 7, 9, 11), HeritabilityClass = c("high", "mid", "low", "high", "mid"), HeritabilityValue = c(0.5, 0.6, 0.7, 0.5, 0.6), pwr = c(0.8, 0.9, 0.85, 0.88, 0.86), FoldChange = c(2, 3, 2.5, 2.8, 3.2), Tissue = c("Liver", "Liver", "Kidney", "Liver", "Kidney") ) hIndexMeanDF <- data.frame(Trait.name = c("Trait1", "Trait2"), MeanValue = c(0.3, 0.5)) NoOfReplicatesPred <- smplSizPred(df4model = df4modelInpt, hIndexMeanDFinput = hIndexMeanDF, heritabilityClass = "mid", inptPwr = 0.85, fc = 2.5, trait = "Trait1", tissue = "Liver") print(NoOfReplicatesPred)
This function generates a linear regression model to predict the number of replicates (NoOfReplicates
) based on heritability, power, fold change, and tissue type. The model is generated depending on whether the tissue information is provided in the data. The function returns the fitted model.
smplSizPredModel( df4model = df4modelInpt, heritabilityClass, inptPwr, fc, trait = NULL, tissue = NULL )
smplSizPredModel( df4model = df4modelInpt, heritabilityClass, inptPwr, fc, trait = NULL, tissue = NULL )
df4model |
A data frame containing the input data for the model. It should include the following columns: |
heritabilityClass |
A character value indicating the class of heritability used for filtering the data. |
inptPwr |
A numeric value representing the power used in the model. |
fc |
A numeric value representing the fold change used in the model. |
trait |
An optional parameter to specify the trait. If provided, it can be used for further filtering, but it's not currently used in the function. |
tissue |
An optional parameter specifying the tissue type. If provided, the model will include the tissue information in the regression. If not provided, the model will exclude tissue information. |
A linear model object (lm
class), which contains the fitted linear regression model for the number of replicates prediction.
Sun et al. (2017) doi:10.1093/nar/gkx204
# Example usage: df4modelInpt <- data.frame( NoOfReplicates = c(3, 5, 7, 9, 11), HeritabilityClass = c("high", "mid", "low", "high", "mid"), HeritabilityValue = c(0.5, 0.6, 0.7, 0.5, 0.6), pwr = c(0.8, 0.9, 0.85, 0.88, 0.86), FoldChange = c(2, 3, 2.5, 2.8, 3.2), Tissue = c("Liver", "Liver", "Kidney", "Liver", "Kidney") ) # Fit the model model <- smplSizPredModel( df4model = df4modelInpt, heritabilityClass = "high", inptPwr = 0.8, fc = 2, tissue = "Liver" ) # Summarize the results summary(model)
# Example usage: df4modelInpt <- data.frame( NoOfReplicates = c(3, 5, 7, 9, 11), HeritabilityClass = c("high", "mid", "low", "high", "mid"), HeritabilityValue = c(0.5, 0.6, 0.7, 0.5, 0.6), pwr = c(0.8, 0.9, 0.85, 0.88, 0.86), FoldChange = c(2, 3, 2.5, 2.8, 3.2), Tissue = c("Liver", "Liver", "Kidney", "Liver", "Kidney") ) # Fit the model model <- smplSizPredModel( df4model = df4modelInpt, heritabilityClass = "high", inptPwr = 0.8, fc = 2, tissue = "Liver" ) # Summarize the results summary(model)