March 13, 2017

Combining data from multiple RNASeq experiments: release the Kruskal! (...Wallis test)

tl;dr When you combine RNASeq experiments, the assumption about normality across samples with the same nominal factors goes out the window. The standard tests give 16 transcripts even when the experiment batch effect is modelled fully. For transcript-level (e.g. Kallisto output) abundance data with at least ~25 samples, we can test two factors in a non-parametric (expression rank) way by using a Kruskal-Wallis test.

A post-hoc U-Test for each transcript passing Kruskal-Wallis will tell us what factors (or their interaction) are significantly changing the expression level. This is despite a global U-Test of transcripts turning up nothing. In the example below, we get 776 candidates in the combined analysis, with 71 in the factor interaction term. Many of those 71 are enriched in specific pathways, and do not overlap with the 16 found via parametric tests.
_________________________

Recently, I analyzed a RNASeq experiment redo.  The experiment was redone because the original yielded no useful significant transcript changes across the experimental factor contrast of interest (exercise vs. non-exercise).  The redone experiment did yield some significant contrasts for the exercise factor, but I thought to myself: can't we combine the two experiments and get even better statistical power? The short answer is no if you're using the standard tests  like Wald and Likelihood Ratio, because they assume replicates that are normally distributed.  We shouldn't assume that the transcript expression levels for replicates across the two experiments are normally distributed.

Here's a three step process to model the two-factor experiment non-paramterically with the Kruskal Wallis test. 1) Check for outliers, 2) Run Kruskal-Wallis, 3) Run post-hoc pairwise U-tests.

Step 1: Check for outliers

Note that since we are going to be using specific commands to gather the raw abundance data from Kallisto later, it's important that the samples be in alphabetical order in the metadata file.

meta <- read.table("meta_kallisto_combined_sorted.tab", header=TRUE)
meta$path <- as.character(meta$path)

We'll concatenation the two factors' values to yield four distinct factor level combination, giving us a base level (ctl_five), the individual factor effects (ctl_ten for time, exp_five for condition) and the interaction effect (exp_ten).

meta$groups <- as.factor(paste(meta$condition, meta$time, sep="_"))
meta
            sample                     path condition time   groups
1    D10_control_2   D10_control_2.kallisto       ctl  ten  ctl_ten
2    D10_control_3   D10_control_3.kallisto       ctl  ten  ctl_ten
3    D10_control_4   D10_control_4.kallisto       ctl  ten  ctl_ten
4    D10_control_5   D10_control_5.kallisto       ctl  ten  ctl_ten
5     D5_control_1    D5_control_1.kallisto       ctl five ctl_five
6     D5_control_2    D5_control_2.kallisto       ctl five ctl_five
7     D5_control_5    D5_control_5.kallisto       ctl five ctl_five
8     D5_control_6    D5_control_6.kallisto       ctl five ctl_five
9   Day10_Control1  Day10_Control1.kallisto       ctl  ten  ctl_ten
10  Day10_Control2  Day10_Control2.kallisto       ctl  ten  ctl_ten
11  Day10_Control3  Day10_Control3.kallisto       ctl  ten  ctl_ten
12 Day10_Exercise1 Day10_Exercise1.kallisto       exp  ten  exp_ten
13 Day10_Exercise2 Day10_Exercise2.kallisto       exp  ten  exp_ten
14 Day10_Exercise3 Day10_Exercise3.kallisto       exp  ten  exp_ten
15   Day5_Control1   Day5_Control1.kallisto       ctl five ctl_five
16   Day5_Control2   Day5_Control2.kallisto       ctl five ctl_five
17   Day5_Control3   Day5_Control3.kallisto       ctl five ctl_five
18  Day5_Exercise1  Day5_Exercise1.kallisto       exp five exp_five
19  Day5_Exercise2  Day5_Exercise2.kallisto       exp five exp_five
20  Day5_Exercise3  Day5_Exercise3.kallisto       exp five exp_five
21     Exercise_12     Exercise_12.kallisto       exp five exp_five
22     Exercise_13     Exercise_13.kallisto       exp  ten  exp_ten
23     Exercise_17     Exercise_17.kallisto       exp  ten  exp_ten
24     Exercise_18     Exercise_18.kallisto       exp  ten  exp_ten
25     Exercise_19     Exercise_19.kallisto       exp  ten  exp_ten
26      Exercise_1      Exercise_1.kallisto       exp five exp_five
27      Exercise_2      Exercise_2.kallisto       exp five exp_five
28      Exercise_7      Exercise_7.kallisto       exp five exp_five

Now let's read in the transcript abundance data gathered by running Kallisto.  The commands to put this file together are described in a previous blog post.

> abundance_full <- read.table("all_abundance.tsv", header=TRUE)

Now let's make a distance matrix for the samples based on the abundance data, in order to look for outliers that may interfere with our analysis.  This is probably why the first experiment (samples named Day*) yielded no results.
> library("pheatmap")
> library("RColorBrewer")
> library("PoiClaClu")
> poisd <- PoissonDistance(t(abundance_full))
> samplePoisDistMatrix <- as.matrix( poisd$dd )
> rownames(samplePoisDistMatrix) <- colnames(abundance_full)
> colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
> pheatmap(samplePoisDistMatrix,  clustering_distance_rows=poisd$dd,clustering_distance_cols=poisd$dd, col=colors)


What immediately leaps out to me is that the main grouping factor is the experiment, i.e. all the DayXXX samples are grouped together. The second biggest grouping factor within each experiment is the day (5 vs. 10). Also, good contrasts yield a "checkerboard" effect in the distance matrix.  This is much more like a mosaic.

Does this mean that it'll be tough to pull out the exercise factor effect from the combined dataset? This is where a non-parametric test helps out, because it does not assume a normal distribution of the effect amongst "replicate" samples, just that the direction of the effect is the same (I'm glossing over details here, but it's generally true). Let's look at the residual sum of squares (RSS) of the model fit to get a relative measure of how well we are modelling the data with the known factors.

so <- sleuth_prep(meta, ~condition*time)
so <- sleuth_fit(so)
sum(so$fits$full$summary[,2])
[1] 1159405

Getting back to the distance matrix and outlier analysis, do you see the light (distant) streak running through the matrix for samples 9 and 12 (X-axis label)? They are all off on their own (cluster tree on the left margin), and don't relate nicely via the general factor trends. Let's try excluding them from the model and see how it affects our RSS.

meta_no_outliers <- meta[c(1:8,10:11,13:28),]
so_no_outliers <- sleuth_prep(meta_no_outliers, ~condition*time)
so_no_outliers <- sleuth_fit(so_no_outliers)
sum(so_no_outliers$fits$full$summary[,2])
[1] 1052599

So we reduces the RSS by over 9% by excluding 7% of the samples (2 of 28). Generally, excluding a good replicate will reduce the RSS by only up to half its sample weight (i.e. we'd expect an RSS reduction of <5% for these two samples, but we got 9% so they are really outliers).  Maybe these samples were prepared separately, etc. we'd have to go back to the lab book to look for an explanation.

Sample 27 from the second experiment run also looks a bit fishy, all off on its one in the cluster tree.  Let's try dropping it too.

meta_no_outliers_redux <- meta[c(1:8,10:11,13:26,28),]
so_redux <- sleuth_prep(meta_no_outliers_redux, ~condition*time)
so_redux <- sleuth_fit(so_redux)
sum(so_redux$fits$full$summary[,2])
[1] 1048939

The exclusion of Sample 27 (Exercise_2) dropped the RSS by a measly incremental 0.35% even though we dropped 3.8% of the sample (1 of 26). We should keep it because it isn't an outlier contributing to a bad model fit.

While it's fresh in our minds, let's assign an experiment factor to the metadata, which as you can see in the bottom right of the matrix is samples 10-20 once we eliminate the two big outliers. This will help us reduce the RSS later.

> meta_no_outliers$which_experiment <- as.factor(c(rep.int(0,8),rep.int(1,10),rep(0,8)))
> meta_no_outliers
            sample                     path condition time   groups which_experiment
1    D10_control_2   D10_control_2.kallisto      ctl  ten  ctl_ten                0
2    D10_control_3   D10_control_3.kallisto      ctl  ten  ctl_ten                0
3    D10_control_4   D10_control_4.kallisto      ctl  ten  ctl_ten                0
4    D10_control_5   D10_control_5.kallisto      ctl  ten  ctl_ten                0
5     D5_control_1    D5_control_1.kallisto      ctl five ctl_five                0
6     D5_control_2    D5_control_2.kallisto      ctl five ctl_five                0
7     D5_control_5    D5_control_5.kallisto      ctl five ctl_five                0
8     D5_control_6    D5_control_6.kallisto      ctl five ctl_five                0
10  Day10_Control2  Day10_Control2.kallisto      ctl  ten  ctl_ten                1
11  Day10_Control3  Day10_Control3.kallisto      ctl  ten  ctl_ten                1
13 Day10_Exercise2 Day10_Exercise2.kallisto      exp  ten  exp_ten                1
14 Day10_Exercise3 Day10_Exercise3.kallisto      exp  ten  exp_ten                1
15   Day5_Control1   Day5_Control1.kallisto      ctl five ctl_five                1
16   Day5_Control2   Day5_Control2.kallisto      ctl five ctl_five                1
17   Day5_Control3   Day5_Control3.kallisto      ctl five ctl_five                1
18  Day5_Exercise1  Day5_Exercise1.kallisto      exp five exp_five                1
19  Day5_Exercise2  Day5_Exercise2.kallisto      exp five exp_five                1
20  Day5_Exercise3  Day5_Exercise3.kallisto      exp five exp_five                1
21     Exercise_12     Exercise_12.kallisto      exp five exp_five                0
22     Exercise_13     Exercise_13.kallisto      exp  ten  exp_ten                0
23     Exercise_17     Exercise_17.kallisto      exp  ten  exp_ten                0
24     Exercise_18     Exercise_18.kallisto      exp  ten  exp_ten                0
25     Exercise_19     Exercise_19.kallisto      exp  ten  exp_ten                0
26      Exercise_1      Exercise_1.kallisto      exp five exp_five                0
27      Exercise_2      Exercise_2.kallisto      exp five exp_five                0
28      Exercise_7      Exercise_7.kallisto      exp five exp_five                0


Step 2: Run the non-parametric tests

Let's first try the non-parametric U-test (a.k.a. Wilcox or Mann-Whitney test) to check for difference between the exercise and non-exercise samples.

> abundance <- abundance_full[rowSums(abundance_full)>length(meta_no_outliers$sample),c(1:8,10:11,13:28)]

> wil <- function(x){df <- data.frame(tpm=t(abundance[x,])[,1], fctr=m); w <- wilcox.test(tpm ~ fctr, data=df); w$p.value}  #[ed. m will be populated with the factor values for each sample in the next function]

> calc_wilcox_qvals <- function(combo){m <<- as.factor(combo); pvals <<- c(); for(i in 1:length(abundance[,1])){pvals <<- append(pvals,wil(i))}; return(p.adjust(pvals, method="fdr"))}

> condition_wilcox_qvals <- calc_wilcox_qvals(meta_no_outliers$condition)
There were 50 or more warnings (use warnings() to see the first 50) #[ed. This is due to rank ties in the U-test, you can generally ignore these]

> table(condition_wilcox_qvals < .05)

FALSE 
23953

D'oh! We don't have any transcript differential expression picked up by the U-test that survive the False Discovery Rate (Benjaimin-Hochberg in this case) multiple testing correction.  We know from previous experience that we might pick up a two-level contrast once we have at least 15 Kallisto samples.  We can also show that we have the power to do so in this experiment by looking at one of the primary grouping factors in the distance matrix: the time levels of Day5 vs Day10.

> time_wilcox_qvals <- calc_wilcox_qvals(meta_no_outliers$time)
There were 50 or more warnings (use warnings() to see the first 50)

> table(time_wilcox_qvals < .05)

FALSE  TRUE 
21176  2777 

If exercise-vs.-non isn't separating by rank nicely across the two experiments, this might be because at Day 5, the effect of exercise isn't very strong so those exercise samples mix ranks with the non-exercise samples.  The way to improve our power to detect the Day10 exercise effect would be to look at the interaction (synergistic) effect of the day and exercise factors, as per the "groups" metadata factor we created earlier that has four levels. The U-test only works with two factor levels.  The generalization of it for more than two factor levels in the Kruskal-Wallis test, which we'll employ here.

> kru <- function(x){d <- data.frame(tpm=t(abundance[x,])[,1], groups=m); k <- kruskal.test(tpm ~ groups, data=d); k$p.value}


> kru_qvals <- function(combo){m <<- as.factor(combo); pvals <<- c(); for(i in 1:length(abundance[,1])){pvals <<- append(pvals,kru(i))}; return(p.adjust(pvals, method="fdr"))}
> k <- kru_qvals(meta_no_outliers$groups)
> table(k < .05)

FALSE  TRUE 
23193   760 

[Aside: An astute reader will note that we have less than 8 samples in each of the four factor levels, yet we are picking out significant contrasts when the simpler U-test is underpowered at <8 samples per factor level.  Because we have four factor levels, there are more unique permutations possible of the ranks than when there's only two, hence the better performance here.  Thinking of it another way, for any fixed string length, the number of unique ways to write 1's and 0's is << the number of ways to write 0, 1, 2, and 3. More unique ranking choices means more information entropy, which means greater statistical power for any given factor level.]

Let's note the Kruskal-Wallis significant transcripts for later.  

> kru_sig_abundance <- abundance[which(k < .05),]

> sig_ks <- k[which(k < .05)]

Let's do a sanity check and see if a random labelling of the samples gives any significant changes (it shouldn't).

k_random <- kru_qvals(sample(meta_no_outliers$groups))
> table(k_random < .05)

FALSE 

23953 

Phew! You could run the random sampling a bunch of times, that's left as an exercise to the reader.


Stage 3: Post-hoc significance testing with many factor levels

Now, we know that 760 transcripts have factor-levels with significantly different rankings according to our groups label, but which of the groups factor's four levels are the ones that are significant?  We need to contrast each pair of factor levels. This page from the R Companion provides a nice summary of ways to do the post-hoc analysis of the differing groups. Let's use the U-Test since it behaves nicely without an even number of items in each category (remember that we removed two samples, and if you don't believe me...


> table(meta_no_outliers$groups)

ctl_five  ctl_ten exp_five  exp_ten 

       7        6        7        6 
).

No only do we want to use the U-test (a.k.a. Mann-Whitney or Wilcox) for each pair of factor levels and do multiple testing correction (FDR again), but we need to load a couple of libraries that'll give us useful functions to print out the group comparisons later.

> library(multcompView)
> library(rcompanion)


> categorize <- function(x){pt <- pairwise.wilcox.test(as.numeric(x), meta_no_outliers$groups, p.adjust.method="fdr"); multcompLetters(fullPTable(pt$p.value), compare="<", threshold=0.05, Letters=letters, reverse=FALSE)$Letters}

> kru_sig_diff_groups <- apply(kru_sig_abundance, 1, categorize)


This will give us for each of the 760 transcripts a compact representation of the significant grouping information for the 4 factor levels, e.g. 

> kru_sig_diff_groups[,1]
ctl_five ctl_ten exp_five exp_ten
"ab"     "c"     "a"      "bc"

This means that ctl_five is statistically indistinguishable from exp_five (they shared 'a') and exp_ten(they share 'b'), ctl_ten is indistinguishable from exp_ten (they share 'c'), exp_five and exp_ten are distinct (they share no letters), etc..  Let's write a function to keep only the U-test results that have exp_ten (our interaction term, index 4) distinct from either non-exercise control time points (indices 1& 2).


> exercise_only <- function(x){!grepl(x[1],x[4]) && !grepl(x[4],x[1]) && !grepl(x[2],x[4]) && !grepl(x[4],x[2])}
> eo <- apply(kru_sig_diff_groups, 2, exercise_only)
> table(eo)
eo
FALSE  TRUE 

  689    71 
> kru_exercise_sigids <- colnames(kru_sig_diff_groups[,eo])

Eureka!  We have 71 significant Day10 exercise differentially expressed transcripts across both experiments.  This was done based solely on expression rank of transcripts across samples, so we didn't need the experiments to be truly replicates, just moving in the same direction generally.  We could write a function to calculate the log2 fold changes, but it'd get complicated as we have multiple factor levels, and won't include estimates of technical variability, etc that Sleuth provides.  Even if the parametric test results from Sleuth aren't going to be significant due to the non-normality of the samples, the fold change information is useful so let's run it. 

> so <- sleuth_prep(meta_no_outliers, ~condition*time+which_experiment+condition:which_experiment+time:which_experiment)
> sum(so$fits$full$summary[,2])
[1] 864793.9

Note that this is a much better RSS than originally (~1160K). We did two things: exclude the outliers (~9% drop in RSS), and we explicitly modelled the experiments and their effect on the other factors (~18% drop in RSS). This will help the Sleuth results for fold-change* be more realistic. Run the Wald and Likelihood Ratio (LRT) tests.

> so <- sleuth_fit(so, ~time+which_experiment+time:which_experiment, "no_condition")
...
> so <- sleuth_fit(so, ~condition+which_experiment+condition:which_experiment, "no_time")
...
> so <- sleuth_fit(so, ~condition+time+which_experiment+condition:which_experiment+time:which_experiment, "no_interaction")
...
> so <- sleuth_lrt(so, 'no_condition', 'full')
> so <- sleuth_lrt(so, 'no_time', 'full')
> so <- sleuth_lrt(so, 'no_interaction', 'full')
> lrt_condition <- sleuth_results(so, 'no_condition:full', test_type = 'lrt')
> lrt_time <- sleuth_results(so, 'no_time:full', test_type = 'lrt')
> lrt_interaction <- sleuth_results(so, 'no_interaction:full', test_type = 'lrt')
> so <- sleuth_wt(so, 'conditionexp')
> so <- sleuth_wt(so, 'timeten')
> so <- sleuth_wt(so, 'conditionexp:timeten')
> so <- sleuth_wt(so, 'conditionexp:which_experiment1')
> so <- sleuth_wt(so, 'timeten:which_experiment1')
> wt_condition <- sleuth_results(so, 'conditionexp')
> wt_time <- sleuth_results(so, 'timeten')
> wt_interaction <- sleuth_results(so, 'conditionexp:timeten')
> wt_condition1 <- sleuth_results(so, 'conditionexp:which_experiment1')
> wt_time1 <- sleuth_results(so, 'timeten:which_experiment1')

Did the parametric tests show up anything significant?

> table(wt_condition$qval < .05)

FALSE  TRUE 
33456    16 
> table(lrt_condition$qval < .05)

FALSE  TRUE 
33466     6 
> table(lrt_interaction$qval < .05)

A few, but not as many as our non-parametric Kruskal-Wallis (71).  The LRT (6) results are a subset of the Wald results (10).  Let's put the non-paramteric and parametric results together for reporting.

> all_sigids <- unique(c(wt_condition_sigids,kru_exercise_sigids))
> length(all_sigids)
[1] 87

87 = 71 + 16 so there is no overlap in the two tests' results.  Let's define a function that provides the post-hoc pairwise U-test results when available, or NAs when not available, for reporting purposes.

> kru_data <- function(id){if(id %in% colnames(kru_sig_diff_groups)){c(kru_sig_diff_groups[,which(colnames(kru_sig_diff_groups) == id)],sig_ks[which(colnames(kru_sig_diff_groups)==id)])}else{rep(NA,5)}}

Gather the Wald test results (which includes the fold-change*) and the pairwise U-test results for each of the 87 significantly differentially expressed transcripts, then transpose the rows and columns, then write to a file.

> dl <- vapply(all_sigids, function(id){unlist(c(wt_condition[wt_condition$target_id == id,],wt_condition1[wt_condition1$target_id == id,],wt_time[wt_time$target_id == id,],kru_data(id)))}, FUN.VALUE=rep("1", 38))

> write.csv(t(dl), "kruskal_wallis_q_lt_0.05_u_test_fdr_0.05.csv")

I'll load this into Excel and remove extraneous Wald test result columns that just confuse most people reading the analysis, and convert the beta values to log2 fold-changes* with an Excel formula for users'  didactic purposes.**



If you look at the exp_five column, you'll see that it is often indistinguishable from (shares a letter with) one or both of the controls. This explains why the two-factor level U-test failed, and why using the Kruskal-Wallis test was key to seeing the interaction effect.  As per usual, take any individual transcript with a grain of salt, but look for pathway enrichment in IPA, DAVID, etc.


________
* Not really fold-change, but the natural logarithm of the bias estimator.  Read the Sleuth manual for details. It's worth it.

** A clever reader will note that you can change the log base with a constant value. Once again, I used this formula for didactic purposes, making the base conversion more explicit.

No comments:

Post a Comment