16S_Soil_Microbiome2017_FA19
16S_Soil_Microbiome2017_FA19
R Packages
The following code uses several packages including vegan for community ecology, tidyr and plyr for organizing data tables and manipulating strings of text, and ggplot for plotting data.
Project Summary
The following code is used to analyze 16S data from bulk soils collected from 47 Juglans nigra trees. Raw reads were generated on the Illumina Miseq platform (2x250) and processed into OTUs using the bioinformatics pipeline Mothur. Following sequence processing in Mothur a total of 37,604 OTUs were generated from 181,195 unique sequences (2,256,393 sequences total from 2,933,074 reads prior to processing).Taxonomic assignments were made in Mothur using the SILVA database. OTUs were generated in MOTHUR using a 97% similarity threshold for sequences aligned to the V4 region of the 16S rRNA region.
Data Import
Below raw OTU tables (data has not yet been rarefied), metadata which indicates the state from which the sample was collected and clone ID, and the taxonomic classifications for each OTU are imported into R.
#First I import the whole OTU table I recieved from Mothur. Note at this point taxonomy is not included, it is just raw OTU counts (columns) by sample or group (rows). Below I indicate that the row.names are in column 2 of the imported file.
bigfile.sample.soil<-read.table("/Volumes/AaronOnufrakMac/researchprojects_2019.2022_ajo/j.nigra_microbiome_2017_ajo_gmw/soil16sstatistics_jnigra17_2019_ajo/soil16smothur_jnigra17_microbiome/soil16sotutable_jnigra17_fa19_ajo.shared",header=TRUE,sep="\t",fill=TRUE,strip.white=TRUE, row.names=2)
#Next I import the metadata for the particular habitat of interest. In this case I am working with soils. I will use this metadata to subset the larger OTU file above.
soil.met<-read.table("/Volumes/AaronOnufrakMac/researchprojects_2019.2022_ajo/j.nigra_microbiome_2017_ajo_gmw/soil16sstatistics_jnigra17_2019_ajo/soil16smothur_jnigra17_microbiome/soil16smetadata_jnigra17_su19_ajo.txt",header=TRUE,sep="\t",fill=TRUE,strip.white=TRUE)
#Next I import the taxonomy file. In earlier versions of this code, this contained taxonomic assignments for the whole data set and not just a single habitat. Current version is only 16S soil data.
taxonomy.soil<-read.table("/Volumes/AaronOnufrakMac/researchprojects_2019.2022_ajo/j.nigra_microbiome_2017_ajo_gmw/soil16sstatistics_jnigra17_2019_ajo/soil16smothur_jnigra17_microbiome/soil16staxonomy_jnigra17_fa19_ajo.cons.taxonomy",header=TRUE,sep="\t",fill=TRUE,strip.white=TRUE,row.names=1)
soil.names<-soil.met$Group
Data Subsetting
Below I subset the OTU table to remove mothur inserted metadata which indicates the percent identity threshold and the number of OTUs for the whole habitat.
#Here I use the subset function to pull out just the samples belonging to soils.
soil<-subset(bigfile.sample.soil, rownames(bigfile.sample.soil) %in% soil.names)
#Here I am removing any OTU that is not present in any samples. What the below function is doing and summing the columns (OTUs) and any OTU that equals 0 is removed from the dataset. This will hopefully speed up downstream analyses. This line is important when processing multiple habitats that were processed together but analyzed spearately
soil.pure<- soil[,colSums(soil)>0]
#Here I am further subsetting the data. I am removing some MOTHUR metadata that is useless.
soil.pureotu<-soil.pure[,3:37606]
Rarefaction and Singleton Removal
To evaluate coverage and differences in sequencing depth between samples, I generate rarefaction curves with sequencing depth on the x-axis and number of OTUs on the y-axis. These curves can be used to evaluate where to set the rarefaction cut-off to reduce sampling biases when calculating alpha-diversity measures such as diversity and observed richness. Additionally, to reduce the effects of rare taxa on ordinations, I remove singletons here as well.
Rarefaction Curves
Below I generate the rarefaction curves using the rarecurve function from vegan. This will take sometime to run, particularly on larger OTU tables. The output is then subset so that individual samples can be plotted in ggplot.
#Below I create the rarefaction curves for each sample using the rarecurve function from vegan.
library(vegan)
rarebacsoilcurve<-rarecurve(soil.pureotu)
#I then create an object that spans the range of the OTU richness. This will be used to plot the vertical line in my ggplot rarefaction curve.
y<-(0:6500)
#I then create objects containing the x and y coordinates values for each sample.
sequencesoil47<-attr(rarebacsoilcurve[[47]],which="Subsample")
sequencesoil46<-attr(rarebacsoilcurve[[46]],which="Subsample")
sequencesoil45<-attr(rarebacsoilcurve[[45]],which="Subsample")
sequencesoil44<-attr(rarebacsoilcurve[[44]],which="Subsample")
sequencesoil43<-attr(rarebacsoilcurve[[43]],which="Subsample")
sequencesoil42<-attr(rarebacsoilcurve[[42]],which="Subsample")
sequencesoil41<-attr(rarebacsoilcurve[[41]],which="Subsample")
sequencesoil40<-attr(rarebacsoilcurve[[40]],which="Subsample")
sequencesoil39<-attr(rarebacsoilcurve[[39]],which="Subsample")
sequencesoil38<-attr(rarebacsoilcurve[[38]],which="Subsample")
sequencesoil37<-attr(rarebacsoilcurve[[37]],which="Subsample")
sequencesoil36<-attr(rarebacsoilcurve[[36]],which="Subsample")
sequencesoil35<-attr(rarebacsoilcurve[[35]],which="Subsample")
sequencesoil34<-attr(rarebacsoilcurve[[34]],which="Subsample")
sequencesoil33<-attr(rarebacsoilcurve[[33]],which="Subsample")
sequencesoil32<-attr(rarebacsoilcurve[[32]],which="Subsample")
sequencesoil31<-attr(rarebacsoilcurve[[31]],which="Subsample")
sequencesoil30<-attr(rarebacsoilcurve[[30]],which="Subsample")
sequencesoil29<-attr(rarebacsoilcurve[[29]],which="Subsample")
sequencesoil28<-attr(rarebacsoilcurve[[28]],which="Subsample")
sequencesoil27<-attr(rarebacsoilcurve[[27]],which="Subsample")
sequencesoil26<-attr(rarebacsoilcurve[[26]],which="Subsample")
sequencesoil25<-attr(rarebacsoilcurve[[25]],which="Subsample")
sequencesoil24<-attr(rarebacsoilcurve[[24]],which="Subsample")
sequencesoil23<-attr(rarebacsoilcurve[[23]],which="Subsample")
sequencesoil22<-attr(rarebacsoilcurve[[22]],which="Subsample")
sequencesoil21<-attr(rarebacsoilcurve[[21]],which="Subsample")
sequencesoil20<-attr(rarebacsoilcurve[[20]],which="Subsample")
sequencesoil19<-attr(rarebacsoilcurve[[19]],which="Subsample")
sequencesoil18<-attr(rarebacsoilcurve[[18]],which="Subsample")
sequencesoil17<-attr(rarebacsoilcurve[[17]],which="Subsample")
sequencesoil16<-attr(rarebacsoilcurve[[16]],which="Subsample")
sequencesoil15<-attr(rarebacsoilcurve[[15]],which="Subsample")
sequencesoil14<-attr(rarebacsoilcurve[[14]],which="Subsample")
sequencesoil13<-attr(rarebacsoilcurve[[13]],which="Subsample")
sequencesoil12<-attr(rarebacsoilcurve[[12]],which="Subsample")
sequencesoil11<-attr(rarebacsoilcurve[[11]],which="Subsample")
sequencesoil10<-attr(rarebacsoilcurve[[10]],which="Subsample")
sequencesoil9<-attr(rarebacsoilcurve[[9]],which="Subsample")
sequencesoil8<-attr(rarebacsoilcurve[[8]],which="Subsample")
sequencesoil7<-attr(rarebacsoilcurve[[7]],which="Subsample")
sequencesoil6<-attr(rarebacsoilcurve[[6]],which="Subsample")
sequencesoil5<-attr(rarebacsoilcurve[[5]],which="Subsample")
sequencesoil4<-attr(rarebacsoilcurve[[4]],which="Subsample")
sequencesoil3<-attr(rarebacsoilcurve[[3]],which="Subsample")
sequencesoil2<-attr(rarebacsoilcurve[[2]],which="Subsample")
sequencesoil1<-attr(rarebacsoilcurve[[1]],which="Subsample")
#Here I plot the rarefaction curves.
library(ggplot2)
library(ggpubr)
soilrarecurv<-ggplot()+
geom_line(aes(x=sequencesoil47,y=rarebacsoilcurve[[47]]), size=1)+
geom_line(aes(x=sequencesoil46,y=rarebacsoilcurve[[46]]),size=1)+
geom_line(aes(x=sequencesoil45,y=rarebacsoilcurve[[45]]),size=1 )+
geom_line(aes(x=sequencesoil44,y=rarebacsoilcurve[[44]]),size=1)+
geom_line(aes(x=sequencesoil43,y=rarebacsoilcurve[[43]]),size=1)+
geom_line(aes(x=sequencesoil42,y=rarebacsoilcurve[[42]]),size=1)+
geom_line(aes(x=sequencesoil41,y=rarebacsoilcurve[[41]]),size=1)+
geom_line(aes(x=sequencesoil40,y=rarebacsoilcurve[[40]]),size=1)+
geom_line(aes(x=sequencesoil39,y=rarebacsoilcurve[[39]]), size=1)+
geom_line(aes(x=sequencesoil38,y=rarebacsoilcurve[[38]]), size=1)+
geom_line(aes(x=sequencesoil37,y=rarebacsoilcurve[[37]]), size=1)+
geom_line(aes(x=sequencesoil36,y=rarebacsoilcurve[[36]]),size=1)+
geom_line(aes(x=sequencesoil35,y=rarebacsoilcurve[[35]]),size=1 )+
geom_line(aes(x=sequencesoil34,y=rarebacsoilcurve[[34]]),size=1)+
geom_line(aes(x=sequencesoil33,y=rarebacsoilcurve[[33]]),size=1)+
geom_line(aes(x=sequencesoil32,y=rarebacsoilcurve[[32]]),size=1)+
geom_line(aes(x=sequencesoil31,y=rarebacsoilcurve[[31]]),size=1)+
geom_line(aes(x=sequencesoil30,y=rarebacsoilcurve[[30]]),size=1)+
geom_line(aes(x=sequencesoil29,y=rarebacsoilcurve[[29]]), size=1)+
geom_line(aes(x=sequencesoil28,y=rarebacsoilcurve[[28]]), size=1)+
geom_line(aes(x=sequencesoil27,y=rarebacsoilcurve[[27]]), size=1)+
geom_line(aes(x=sequencesoil26,y=rarebacsoilcurve[[26]]),size=1)+
geom_line(aes(x=sequencesoil25,y=rarebacsoilcurve[[25]]),size=1 )+
geom_line(aes(x=sequencesoil24,y=rarebacsoilcurve[[24]]),size=1)+
geom_line(aes(x=sequencesoil23,y=rarebacsoilcurve[[23]]),size=1)+
geom_line(aes(x=sequencesoil22,y=rarebacsoilcurve[[22]]),size=1)+
geom_line(aes(x=sequencesoil21,y=rarebacsoilcurve[[21]]),size=1)+
geom_line(aes(x=sequencesoil20,y=rarebacsoilcurve[[20]]),size=1)+
geom_line(aes(x=sequencesoil19,y=rarebacsoilcurve[[19]]), size=1)+
geom_line(aes(x=sequencesoil18,y=rarebacsoilcurve[[18]]),size=1)+
geom_line(aes(x=sequencesoil17,y=rarebacsoilcurve[[17]]),size=1)+
geom_line(aes(x=sequencesoil16,y=rarebacsoilcurve[[16]]),size=1)+
geom_line(aes(x=sequencesoil15,y=rarebacsoilcurve[[15]]),size=1)+
geom_line(aes(x=sequencesoil14,y=rarebacsoilcurve[[14]]), size=1)+
geom_line(aes(x=sequencesoil13,y=rarebacsoilcurve[[13]]), size=1)+
geom_line(aes(x=sequencesoil12,y=rarebacsoilcurve[[12]]),size=1)+
geom_line(aes(x=sequencesoil11,y=rarebacsoilcurve[[11]]),size=1 )+
geom_line(aes(x=sequencesoil10,y=rarebacsoilcurve[[10]]),size=1)+
geom_line(aes(x=sequencesoil9,y=rarebacsoilcurve[[9]]),size=1)+
geom_line(aes(x=sequencesoil8,y=rarebacsoilcurve[[8]]),size=1)+
geom_line(aes(x=sequencesoil7,y=rarebacsoilcurve[[7]]),size=1)+
geom_line(aes(x=sequencesoil6,y=rarebacsoilcurve[[6]]),size=1)+
geom_line(aes(x=sequencesoil5,y=rarebacsoilcurve[[5]]), size=1)+
geom_line(aes(x=sequencesoil4,y=rarebacsoilcurve[[4]]),size=1)+
geom_line(aes(x=sequencesoil3,y=rarebacsoilcurve[[3]]),size=1)+
geom_line(aes(x=sequencesoil2,y=rarebacsoilcurve[[2]]),size=1)+
geom_line(aes(x=sequencesoil1,y=rarebacsoilcurve[[1]]),size=1)+
geom_path(aes(x=37329,y=y), linetype=5)+
#scale_color_manual(values=c("#00CCCC","#336699","#336699","#336699","#663300","#663300","#00CCCC","#CC9933","#CC9933","#CC9933","#0000FF","#0000FF","#006600","#006600"))+
theme(panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
panel.background = element_blank(),
panel.border=element_rect(color="black", size=1, fill=NA))+
theme(axis.title.x=element_text(size=14, face="bold"))+
theme(axis.title.y=element_text(size=14, face="bold"))+
theme(axis.text.x=element_text(size=12, face="bold"))+
theme(axis.text.y=element_text(size=12, face="bold"))+
labs(x=("Sequencing Depth"), y=expression(bold(paste("OTU Richness", sep=""))))
soilrarecurv
Sample Rarefaction
Below I rarefy my samples using the rrarefy function from vegan. Rarefaction cut-offs are determined by reviewing the rarefaction curves and identifying the lowest number of sequences that can be retained that minimize sample loss but maximize sequencing depth across samples. Based on the above rarefaction curve, a cut-off of 37,329 sequences per samples.
#Here we are rarefying the data using the rrarefy function from vegan. I first use the sort function to determine the lowest number of sequences in a sample.
sort(rowSums(soil.pureotu[,1:37604]))
## TN272ASBAC TNWTMB19SBAC TN55BSBAC TNLS2SBAC INWTMCB29SBAC
## 37329 37684 38265 38442 39419
## WA130BNL20SBAC INWTMCB33SBAC TN272CSBAC INWTMCB28SBAC IN55MCB6SBAC
## 39425 39683 39851 40200 40940
## WA130RN4SBAC WA132RN8SBAC WA55RN2SBAC IN272MCB10SBAC WA272RN10SBAC
## 41540 42393 42612 42632 42834
## IN130MCB2SBAC TN130ASBAC IN55MCB7SBAC WA132RN7SBAC TN132BSBAC
## 43301 43475 43653 43660 43673
## TN55CSBAC WA55BNL19SBAC TN130BSBAC IN132MCB17SBAC IN272MCB11SBAC
## 44219 44795 44871 44903 45177
## IN55MCB8SBAC TN55ASBAC TNLS3SBAC IN130MCB26SBAC TN272BSBAC
## 45805 45902 46124 46558 46730
## TN132ASBAC WA132RN9SBAC IN130MCB9SBAC TNLS1SBAC WA272BNL17SBAC
## 47123 47622 48478 48700 49595
## TNWTMB20SBAC WAWTBNL22SBAC TN132CSBAC IN132MCB16SBAC WAWTBNL23SBAC
## 49818 50268 50973 51569 51933
## TN130CSBAC WA55RN1SBAC TNWTMB21SBAC IN272MCB24SBAC WAWTBNL21SBAC
## 59111 59566 68835 68965 75457
## IN132MCB27SBAC WA272BNL18SBAC
## 75813 76472
#Then using the rrarefy function from vegan, I determine the best rarefy my samples. The rarefaction cutoff or "sample" value is determined by evaluating my rarefaction curves and choosing the number of sequences that minimizes sample loss but maximizes sampling depth.
library(vegan)
soil.rareotu<-as.data.frame(rrarefy(soil.pureotu, sample=37329))
#Because the rrarefy function doesn't automatically remove samples from the dataframe I subset the rarefied dataframe to remove unrarefied samples (those less than the minimum sequencing depth).
soil.rareotu<-as.data.frame(subset(soil.rareotu,rowSums(soil.rareotu)>=37329))
Data Subsetting
Below I remove unrarefied samples (those less than minimum sequencing depth) as well as singleton OTUs (those with only one sequence following rarefaction)
#First I create an object that contains the sample names of the samples that passed through the rarefaction step. This will be used to filter the metadata file.
soil.samples<-row.names(soil.rareotu)
soil.met.rare<-subset(soil.met,soil.met$Group%in% soil.names)
#I am now removing OTUs present in the OTU table that were only present in samples removed following rarefaction. These OTUs will have column sums of 0.
soil.rareotu<- soil.rareotu[,colSums(soil.rareotu)>0]
#write.table(soil.rareotu, "~/Documents/microbiome2017/16s_soil_otutable_rarefied_fa19_ajo.shared",sep="\t", row.names=TRUE)
#I am now removing singleton OTUs (OTUs with only 1 representative sequence following rarefaction). This is to limit the effects of rare species on our distance matrices used during ordination.
soil.rareotu.nosingletons<-soil.rareotu[,colSums(soil.rareotu)>1]
#write.table(soil.rareotu.nosingletons, "~/Documents/microbiome2017/16s_soil_otutable_rarefied_nosingleton_fa19_ajo.shared",sep="\t", row.names=TRUE)
Relative Abundance Charts
To visualize differences in the relative abundance of different taxa, I reorganize the OTU table and taxonomy strings to create relative abundance stacked bar charts. Below, stacked bar charts are made for phylum, class, and order levels.
Taxonomy Subsetting
To begin, the taxonomy file is subsetted to include only taxonomy information for OTUs present in the rarefied OTU table with singletons removed. The OTU table is first transposed so that OTU IDs are the row names and sample names are the column names. Then the taxonomic assignments are added to the OTU table. Taxonomic strings are then split by semi-colon into individual columns by Phylum, Class, Order, Family, and Genus (SILVA does not go to species level).
#Now that samples have been rarefied I can perform alpha and beta diversity analyses on my data sets. To begin I first reformat my OTU table so that I can do some filtering and add taxonomy information.
soil.rareotu.nosingles<-read.table("/Volumes/AaronOnufrakMac/researchprojects_2019.2022_ajo/j.nigra_microbiome_2017_ajo_gmw/soil16sstatistics_jnigra17_2019_ajo/soil16smothur_jnigra17_microbiome/soil16sotutable.rarefied.nosingleton_jnigra17_fa19_ajo.shared", header=TRUE, sep="\t")
#I now transpose my OTU table so that I can begin to add in taxonomy information.
t.soil.rareotu.nosingles<-as.data.frame(t(soil.rareotu.nosingles))
#The below function is asking specifically for the OTU IDs. These are the row names. I will use these to subset the taxonomy file.
t.soil.rareotu.nosingleslabs<-labels(t.soil.rareotu.nosingles)
soil.taxrare<-subset(taxonomy.soil, rownames(taxonomy.soil) %in% t.soil.rareotu.nosingleslabs[[1]])
#Now I am subsetting the new taxonomy file to include only the taxonomy information stored in column 2.
soil.taxrareinfo<-(soil.taxrare[,2])
#Below I am using the separate function from tidyr to split the taxonomy strings into columns by semi-colon so that I can rename the OTUs to give them more meaning for downstream analyses.
library(tidyr)
#The SILVA database only provides classifications up to genus level, new columns are created only up to Genus level.
soil.taxonomylabs<-c("kingdom","phylum","class","order","family","genus")
#Below I create a new data frame with a column containing taxonomy information.
t.soil.rareotu.nosinglestax<-data.frame(taxonomy=soil.taxrareinfo,t.soil.rareotu.nosingles)
#I then separate the taxonomy strings into new columns labeled with the taxonomy labels saved in the taxonomylabs object. Strings are being separated based on the presence of semi-colons.
t.soil.rareotu.nosinglestaxsep<-separate(t.soil.rareotu.nosinglestax,into=soil.taxonomylabs,col=taxonomy,sep=";")
Relative Abundance OTU Table
I then create a relative abundance OTU table. This table contains taxonomic information and can be searched to look at specific OTUs more easily.
#Now that samples have been rarefied I can perform alpha and beta diversity analyses on my data sets. To begin I first reformat my OTU table so that I can do some filtering and add taxonomy information.
soil.rareotu.nosingles<-read.table("/Volumes/AaronOnufrakMac/researchprojects_2019.2022_ajo/j.nigra_microbiome_2017_ajo_gmw/soil16sstatistics_jnigra17_2019_ajo/soil16smothur_jnigra17_microbiome/soil16sotutable.rarefied.nosingleton_jnigra17_fa19_ajo.shared", header=TRUE, sep="\t")
soil.rareotu.nosingles.table<-decostand(soil.rareotu.nosingles,method="total")
#I now transpose my OTU table so that I can begin to add in taxonomy information.
t.soil.rareotu.nosingles.table<-as.data.frame(t(soil.rareotu.nosingles.table))
#The below function is asking specifically for the OTU IDs. These are the row names. I will use these to subset the taxonomy file.
t.soil.rareotu.nosingleslabs<-labels(t.soil.rareotu.nosingles.table)
soil.taxrare<-subset(taxonomy.soil, rownames(taxonomy.soil) %in% t.soil.rareotu.nosingleslabs[[1]])
#Now I am subsetting the new taxonomy file to include only the taxonomy information stored in column 2.
soil.taxrareinfo<-(soil.taxrare[,2])
#Below I am using the separate function from tidyr to split the taxonomy strings into columns by semi-colon so that I can rename the OTUs to give them more meaning for downstream analyses.
library(tidyr)
#The SILVA database only provides classifications up to genus level, new columns are created only up to Genus level.
soil.taxonomylabs<-c("kingdom","phylum","class","order","family","genus")
#Below I create a new data frame with a column containing taxonomy information.
t.soil.rareotu.nosinglestax.table<-data.frame(taxonomy=soil.taxrareinfo,t.soil.rareotu.nosingles.table)
#I then separate the taxonomy strings into new columns labeled with the taxonomy labels saved in the taxonomylabs object. Strings are being separated based on the presence of semi-colons.
t.soil.rareotu.nosinglestaxsep.table<-separate(t.soil.rareotu.nosinglestax.table,into=soil.taxonomylabs,col=taxonomy,sep=";")
library(DT)
datatable(t.soil.rareotu.nosinglestaxsep.table,fillContainer = T, height="500dpx")
Phylum Relative Abundance Chart
Below I generate a phylum relative abundance plot. I first extract the phylum information. I then clean the taxa names to remove assignment confidence values. OTU raw abundance is then summed by phylum and state, phyla that represent less than 1% of the total community are grouped into other, and relative abundance is calculated. Stacked bar charts are made using the melt function and ggplot.
#Below I create a data frame that consists of phylum assignment and the rarefied OTU table.
t.soil.raretabphy<-data.frame(phylum=t.soil.rareotu.nosinglestaxsep$phylum,t.soil.rareotu.nosingles)
#I then remove the assignment confidence values contained in parentheses found in each taxonomic string using the sub function.
t.soil.raretabphy$phylum<-sub("\\(.*)","",x=t.soil.raretabphy$phylum)
#The following code is for the generation of phylum relative abundance stacked bar charts. I first sum each OTU by the phylum it belongs to.
library(plyr)
t.soil.rareotutabphy<-as.data.frame(ddply(t.soil.raretabphy, .(phylum),colwise(sum)))
rownames(t.soil.rareotutabphy)<-make.names(t.soil.rareotutabphy$phylum)
t.soil.rareotutabphy<-t.soil.rareotutabphy[,2:48]
#I then transpose the data frame so that sample names are now row names and column names are OTU names.
soil.rareotutabphy<-as.data.frame(t(t.soil.rareotutabphy))
soil.rareotutabphy.state<-data.frame(state=soil.met.rare$State,soil.rareotutabphy)
#I then sum each OTU by state and subset to exclude sample names. This is because the following functions only work on matrices containing numeric data.
soil.rareotutabphy.statesum<-as.data.frame(ddply(soil.rareotutabphy.state, .(state),colwise(sum)))
soil.phylumcols<-soil.rareotutabphy.statesum[,2:46]
#I am now creating an other category. Other includes those OTUs belonging to a phylum that comprises less than 1% of the total community. This new object is referred to as phyothers.
soil.phyoth<-soil.phylumcols[,colSums(soil.phylumcols)/sum(soil.phylumcols)<=0.01]
soil.phyother<-rowSums(soil.phyoth)
#I then create an object that contains all of the phyla that comprise more than 1% of the total community.
soil.phyreg<-soil.phylumcols[,colSums(soil.phylumcols)/sum(soil.phylumcols)>0.01]
#I then create a new dataframe containingt the state information, the other column, and the remaining phyla.
soil.phytot2<-data.frame(state=soil.rareotutabphy.statesum$state,soil.phyreg,Other=soil.phyother)
#These values are then converted to relative abundance using the decostand function from vegan.
soil.phyrelabund<-decostand(soil.phytot2[,2:13],method="total")
soil.phyrelabund<-data.frame(state=soil.rareotutabphy.statesum$state,soil.phyrelabund)
#Below I use the melt function from the data.table package to reformat the data into stacked bar graph format.
library(data.table)
library(ggplot2)
library(ggpubr)
library(RColorBrewer)
soil.phyrelabundmelt<-melt(soil.phyrelabund, id.vars="state", variable.name="Phylum")
soil.colors.n.phylum <- length(unique(soil.phyrelabundmelt[,'Phylum']))
soil.phyrel<-ggplot(soil.phyrelabundmelt, aes(x=state, y=value, fill=Phylum))+
geom_bar(stat="identity", show.legend=TRUE, color="black")+
scale_fill_manual(values=colorRampPalette(brewer.pal(9, 'Set1'))(soil.colors.n.phylum)) +
xlab("State") +
ylab("Relative Abundance") +
theme(panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.text.x = element_text(size = 14), axis.ticks.x = element_blank(), axis.text.y = element_text(size = 14), axis.line.y.left = element_line(), axis.title.y = element_text(size = 14))+
scale_y_continuous(breaks=c(0,0.25,0.50,0.75,1),limits=c(0,1.05))+
geom_text(data=NULL,aes(x=0.50, y=1.05,label="16S Soil"),hjust=0,colour="black")
ggplotly(soil.phyrel,tooltips=c("Phylum"))
Class Relative Abundance Chart
Below I generate a class relative abundance plot. I first extract the class information from the OTU table that contains the taxonomy information. I then clean the taxa names to remove assignment confidence values and other non-taxonomic associated string portions from the class assignment. I then create an unknown group which consists of uncultured bacteria and bacteria unclassified at the class level. OTU raw abundance is then summed by class and state, classes that represent less than 1% of the total community are grouped into other, and relative abundance is calculated. Stacked bar charts are made using the melt function from ggplot.
#The following chunk creates a relative abundance bar chart at the class level.
library(plyr)
t.soil.raretabcls<-data.frame(class=t.soil.rareotu.nosinglestaxsep$class,t.soil.rareotu.nosingles)
#What I am doing below is cleaning up the class names. These include assignment confidence levels, _or, and _.
t.soil.raretabcls$class<-sub("\\(.*)","",x=t.soil.raretabcls$class)
t.soil.raretabcls$class<-sub("_or","",x=t.soil.raretabcls$class)
t.soil.raretabcls$class<-sub("_"," ",x=t.soil.raretabcls$class)
#I then sum the OTUs by class and make the row names the class name.
t.soil.rareotutabcls<-as.data.frame(ddply(t.soil.raretabcls, .(class),colwise(sum)))
rownames(t.soil.rareotutabcls)<-make.names(t.soil.rareotutabcls$class)
t.soil.rareotutabcls<-t.soil.rareotutabcls[,2:48]
#I then transpose the data frame and sum each class by state.
soil.rareotutabcls<-as.data.frame(t(t.soil.rareotutabcls))
soil.rareotutabcls.state<-data.frame(state=soil.met.rare$State,soil.rareotutabcls)
soil.rareotutabcls.statesum<-as.data.frame(ddply(soil.rareotutabcls.state, .(state),colwise(sum)))
#At the class level, SILVA may classify things as unknown class, phylum.unclassified, or uncultured. These don't provide a lot of information and thus are grouped into a knew group called unknown.
soil.clsunknown<-soil.rareotutabcls.statesum[,grep(".unclassified|Unknown.class|uncultured",names(soil.rareotutabcls.statesum))]
soil.clsunknown<-rowSums(soil.clsunknown)
#I then use the same grep function as above to pull out the class with meaningful information. By setting invert=TRUE I select items lacking the strings in the grep command.
soil.clstot<-soil.rareotutabcls.statesum[,grep(".unclassified|Unknown.class|uncultured",names(soil.rareotutabcls.statesum), invert=TRUE)]
#I then subset the data so that only numerical data is present and creating a new class of objects, other, that includes OTUs from classes that represent 1% or less of the total community.
soil.clscols<-soil.clstot[,2:125]
soil.clsoth<-soil.clscols[,colSums(soil.clscols)/sum(soil.clscols)<=0.01]
soil.clsother<-rowSums(soil.clsoth)
soil.clsreg<-soil.clscols[,colSums(soil.clscols)/sum(soil.clscols)>0.01]
#I then create a new data frame that contains all of the class data and determine relative abundance using the decostand function from vegan.
soil.clstot2<-data.frame(state=soil.clstot$state,soil.clsreg,Unclassified=soil.clsunknown,Other=soil.clsother)
soil.clsrelabund<-decostand(soil.clstot2[,2:23],method="total")
soil.clsrelabund<-data.frame(state=soil.rareotutabcls.statesum$state,soil.clsrelabund)
#Below I use the melt function from the data.table package to reformat the data into stacked bar graph format.
library(data.table)
library(ggplot2)
library(ggpubr)
library(RColorBrewer)
soil.clsrelabundmelt<-melt(soil.clsrelabund, id.vars="state", variable.name="class")
colorCount<- length(unique(soil.clsrelabundmelt[,'class']))
soil.cls.getPalette=colorRampPalette(brewer.pal(9,"Set1"))
#Below I generate the relative abundance bar charts in ggplot
soil.clsrel<-ggplot(soil.clsrelabundmelt, aes(x=state, y=value, fill=class))+
geom_bar(stat="identity",show.legend=TRUE,color="black")+
scale_fill_manual(values=soil.cls.getPalette(colorCount))+
xlab("State") +
ylab("Relative Abundance") +
theme(panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.text.x = element_text(size = 14), axis.ticks.x = element_blank(), axis.text.y = element_text(size = 14), axis.line.y.left = element_line(), axis.title.y = element_text(size = 14))+
scale_y_continuous(breaks=c(0,0.25,0.50,0.75,1),limits=c(0,1.05))+
geom_text(data=NULL,aes(x=0.50, y=1.05,label="16S Soil"),hjust=0,colour="black")
ggplotly(soil.clsrel,tooltip=c("class"))
Order Relative Abundance Chart
Below I generate an order relative abundance plot. I first extract the order information from the OTU table that contains the taxonomy information. I then clean the taxa names to remove assignment confidence values and other non-taxonomic associated string portions from the order assignment. I then create an unknown group which consists of uncultured bacteria and bacteria unclassified at the order level. OTU raw abundance is then summed by order and state, orders that represent less than 1% of the total community are grouped into other, and relative abundance is calculated. Stacked bar charts are made using the melt function from ggplot.
#The following chunk creates a relative abundance bar chart at the order level.
#What I am doing below is cleaning up the order names. These include assignment confidence levels, _or, and _.
library(plyr)
t.soil.raretabord<-data.frame(order=t.soil.rareotu.nosinglestaxsep$order,t.soil.rareotu.nosingles)
t.soil.raretabord$order<-sub("\\(.*)","",x=t.soil.raretabord$order)
t.soil.raretabord$order<-sub("_or","",x=t.soil.raretabord$order)
t.soil.raretabord$order<-sub("_"," ",x=t.soil.raretabord$order)
#I then sum the OTUs by order and make the row names the order name.
t.soil.rareotutabord<-as.data.frame(ddply(t.soil.raretabord, .(order),colwise(sum)))
rownames(t.soil.rareotutabord)<-make.names(t.soil.rareotutabord$order)
t.soil.rareotutabord<-t.soil.rareotutabord[,2:48]
#I then transpose the data frame and sum each order by state.
soil.rareotutabord<-as.data.frame(t(t.soil.rareotutabord))
soil.rareotutabord.state<-data.frame(state=soil.met.rare$State,soil.rareotutabord)
soil.rareotutabord.statesum<-as.data.frame(ddply(soil.rareotutabord.state, .(state),colwise(sum)))
#At the class level, SILVA may classify things as unknown class, phylum.unclassified, or uncultured. These don't provide a lot of information and thus are grouped into a knew group called unknown.
soil.ordunknown<-soil.rareotutabord.statesum[,grep(".unclassified|Unknown.Order|uncultured",names(soil.rareotutabord.statesum))]
#I then use the same grep function as above to pull out the order with meaningful information. By setting invert=TRUE I select items lacking the strings in the grep command.
soil.ordunknownsum<-rowSums(soil.ordunknown)
soil.ordtot<-soil.rareotutabord.statesum[,grep(".unclassified|Unknown.Order|uncultured",names(soil.rareotutabord.statesum), invert=TRUE)]
#I then subset the data so that only numerical data is present and creating a new class of objects, other, that includes OTUs from order that represent 1% or less of the total community.
soil.ordcols<-soil.ordtot[,2:294]
soil.ordoth<-soil.ordcols[,colSums(soil.ordcols)/sum(soil.ordcols)<=0.01]
soil.ordother<-rowSums(soil.ordoth)
soil.ordreg<-soil.ordcols[,colSums(soil.ordcols)/sum(soil.ordcols)>0.01]
soil.pseud<-(soil.ordcols$Pseudomonadales)/sum(soil.ordcols)
#I then create a new data frame that contains all of the order data and determine relative abundance using the decostand function from vegan.
soil.ordtot2<-data.frame(state=soil.ordtot$state,soil.ordreg,Other=soil.ordother,Unclassified=soil.ordunknownsum)
soil.ordrelabund<-decostand(soil.ordtot2[,2:24],method="total")
soil.ordrelabund<-data.frame(state=soil.rareotutabcls.statesum$state,soil.ordrelabund)
#Below I use the melt function from the data.table package to reformat the data into stacked bar graph format.
library(data.table)
library(ggplot2)
library(ggpubr)
library(RColorBrewer)
soil.ordrelabundmelt<-melt(soil.ordrelabund, id.vars="state", variable.name="order")
colorCount<- length(unique(soil.ordrelabundmelt[,'order']))
soil.ord.getPalette=colorRampPalette(brewer.pal(9,"Set1"))
#Below I generate the relative abundance bar charts in ggplot
soil.ordrel<-ggplot(soil.ordrelabundmelt, aes(x=state, y=value, fill=order))+
geom_bar(stat="identity",show.legend=TRUE,color="black")+
scale_fill_manual(values=soil.ord.getPalette(colorCount))+
xlab("State") +
ylab("Relative Abundance") +
theme(panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.text.x = element_text(size = 14), axis.ticks.x = element_blank(), axis.text.y = element_text(size = 14), axis.line.y.left = element_line(), axis.title.y = element_text(size = 14))+
scale_y_continuous(breaks=c(0,0.25,0.50,0.75,1),limits=c(0,1.05))+
geom_text(data=NULL,aes(x=0.50, y=1.05,label="16S Soil"),hjust=0,colour="black")
ggplotly(soil.ordrel,tooltip=c("order"))
Taxonomy Filtering and Merging
Taxonomy is first filtered to include only those OTUs present in the rarefied OTU table with no singletons. Then the OTU table is transposed so that OTU ID is the row name and sample ID is the column name. Taxonomy strings are then added to the OTU table and strings are split by semi-colon.
#Below I am reformatting the OTU table so that I can begin to filter and merge the taxonomy information.
#I first transpose the rarefied OTU table with no singletons.
t.soil.rareotu.nosingles<-as.data.frame(t(soil.rareotu.nosingles))
#The below function is asking specifically for the OTU IDs. These are the row names. I will use these to subset the taxonomy file.
t.soil.rareotu.nosingleslabs<-labels(t.soil.rareotu.nosingles)
soil.taxrare<-subset(taxonomy.soil, rownames(taxonomy.soil) %in% t.soil.rareotu.nosingleslabs[[1]])
soil.taxrareinfo<-(soil.taxrare[,2])
#Below I am using the separate function from tidyr to split the taxonomy strings into columns by semi-colon so that I can rename the OTUs to give them more meaning for downstream analyses.
library(tidyr)
soil.taxonomylabs<-c("kingdom","phylum","class","order","family","genus")
t.soil.rareotutabtax<-data.frame(taxonomy=soil.taxrareinfo,t.soil.rareotu.nosingles)
t.soil.rareotutabtaxsep<-separate(t.soil.rareotutabtax,into=soil.taxonomylabs,col=taxonomy,sep=";")
Renaming of OTUs
OTUs are then renamed. Prior to official renaming, taxonomy strings are cleaned so that assignment confidence values and taxon leaders and any _unclassifed tails are removed.
#Here I am renaming the OTUs using the make.names function. This will make the OTU name something more informative. I am using genus to do this.
t.soil.rareotutabtaxsep$genus<-sub("\\(.*)","",x=t.soil.rareotutabtaxsep$genus)
t.soil.rareotutabtaxsep$genus<-sub("(_unclassified)","",x=t.soil.rareotutabtaxsep$genus)
t.soil.rareotutabtaxsep$genus<-sub("(_ge)","",x=t.soil.rareotutabtaxsep$genus)
rownames(t.soil.rareotutabtaxsep)<-make.names(t.soil.rareotutabtaxsep$genus,unique=TRUE)
t.soil.rareotutabtax.rename<-t.soil.rareotutabtaxsep[,7:53]
soil.rareotutabtax.rename<-as.data.frame(t(t.soil.rareotutabtax.rename))
Principal Coordinate Analysis
To visually assess differences in the community composition of soil bacterial communities by clone and state, I perform a principal coordinate analysis (PCoA).
Relative Abundance Calculation
Prior to conducting a PCoA I first convert my OTU table from raw OTU abundances to relative abundance using the decostand function from vegan.
Principal Coordinate Analysis
I then perform the PCoA using the pcoa function from the ape package. Prior to performing the PCoA, a distance matrix needs to be calculated for the OTU table. In this case, I use the bray-curtis method.
Plotting of Principal Coordinate Analysis
Then to plot the PCoA, I use the ggplot function. To use ggplot I need to first pull out the site scores for each sample and generate ellipses.
#We need to first subset our metadata table to include only those samples that passed rarefaction.
soil.met.rare<-subset(soil.met,soil.met$Group%in% soil.samples)
#Then we extract PCoA site scores (these will be the x and y coordinates for each sample)
soil.bac.pcoavec<-as.data.frame(soil.bac.pcoa$vectors)
soil.bac.pcoasitescores<-data.frame(PC1=soil.bac.pcoavec$Axis.1, PC2=soil.bac.pcoavec$Axis.2)
#Create a new dataframe that includes the site scores from above with metadata from your study.
soil.bac.pcoagraph<-data.frame(soil.bac.pcoasitescores,PC1=soil.bac.pcoasitescores$PC1, PC2=soil.bac.pcoasitescores$PC2, State=soil.met.rare$State, Clone=soil.met.rare$Clone,group=soil.met$Group)
#This is where you make confidence ellipses. I don't know what all the code means. Just know where to plug in my objects.
soil.bac.pcoaellipse<-ordiellipse(soil.bac.pcoasitescores,soil.bac.pcoagraph$State, display="sites", kind="sd",draw="none")
df_ell <- data.frame()
for(g in levels(soil.bac.pcoagraph$State)){
df_ell <- rbind(df_ell, cbind(as.data.frame(with(soil.bac.pcoagraph[soil.bac.pcoagraph$State==g,], vegan:::veganCovEllipse(soil.bac.pcoaellipse[[g]]$cov,soil.bac.pcoaellipse[[g]]$center,soil.bac.pcoaellipse[[g]]$scale))) ,State=g))}
#Plot PCoA in ggplot
library(ggplot2)
soil.pcoa<-ggplot(soil.bac.pcoagraph, aes(PC1,PC2, colour=State))+
#geom_text(aes(label=group))+
geom_point(aes(shape=Clone), size=3.5)+
geom_path(data=df_ell, aes(x=PC1, y=PC2, colour=State),size=1, linetype=2)+
theme(axis.title.x=element_text(size=14, face="bold"))+
theme(axis.title.y=element_text(size=14, face="bold"))+
theme(axis.text.x=element_text(size=12, face="bold"))+
theme(axis.text.y=element_text(size=12, face="bold"))+
scale_color_manual(values=c("#006600","#3399FF","#FF9900"))+
scale_shape_manual(values=c(16,10,8,7,0,2))+
scale_x_continuous(breaks=c(-0.60,-0.30,0,0.30,0.60),limits=c(-0.60,0.60))+
scale_y_continuous(breaks=c(-0.60,-0.30,0,0.30,0.60),limits=c(-.60,.60))+
theme(panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
panel.border=element_rect(colour="black", size=1, fill=NA))+
theme(panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
panel.background = element_blank(),
panel.border=element_rect(color="black", size=1, fill=NA))+
geom_text(data=NULL,aes(x=-0.60,y=0.60,label="16S Soil"),hjust=0,colour="black")
soil.pcoa
PERMANOVA
To formally test how well state and clone explain differences in soil bacteria community composition, I perform a PERMANOVA using the adonis function from vegan.
#Below I perform permanovas by state and clone using the adonis function from vegan.
library(vegan)
soil.bac.perm.state<-adonis(soil.relabund~soil.met.rare$State+soil.met.rare$Clone, method="bray",permutations=10000)
soil.bac.perm.state.table<-soil.bac.perm.state$aov.tab
row.names(soil.bac.perm.state.table)[[1]]<-make.names("State")
row.names(soil.bac.perm.state.table)[[2]]<-make.names("Clone")
kable(soil.bac.perm.state.table)
Df | SumsOfSqs | MeanSqs | F.Model | R2 | Pr(>F) | |
---|---|---|---|---|---|---|
State | 2 | 2.0599954 | 1.0299977 | 9.2177029 | 0.3014686 | 0.0001000 |
Clone | 4 | 0.3035548 | 0.0758887 | 0.6791466 | 0.0444235 | 0.8984102 |
Residuals | 40 | 4.4696503 | 0.1117413 | NA | 0.6541079 | NA |
Total | 46 | 6.8332005 | NA | NA | 1.0000000 | NA |
Richness and Diversity
Then to assess the alpha diversity of our samples, I calculate the observed richness and shannon diversity index using the vegan package. This is done using the rarefied OTU table that contains singletons. Singletons are included to keep sampling depth standard between samples.
Richness
Observed OTU richness is first calculated for each sample. Then due to the unbalanced nature of our study design, we use a type 3 ANOVA.
#In this chunk we calculate richness and create a data frame including richness with metadata
#I first import my rarefied OTU table that contains singletons.
soil.rareotu.wsingles<-read.table("/Volumes/AaronOnufrakMac/researchprojects_2019.2022_ajo/j.nigra_microbiome_2017_ajo_gmw/soil16sstatistics_jnigra17_2019_ajo/soil16smothur_jnigra17_microbiome/soil16sotutable.rarefied_jnigra17_fa19_ajo.shared", header=TRUE, sep="\t")
#Below I calculate observed species richness using the specnumber function from vegan. I then create a new data frame to include sample metadata.
library(vegan)
soil.bac.rich<-specnumber(soil.rareotu.wsingles)
soil.richnesstabmet<-data.frame(State=soil.met.rare$State, clone=soil.met.rare$Clone, sample=soil.met.rare$Group,Richness=soil.bac.rich)
#I first test for a significant interaction between state and clone. Due to the unbalanced design we use a type 3 ANOVA.
library(car)
soil.anova.typ3.rich<-aov((Richness)~State*clone, data=soil.richnesstabmet)
soil.typ3aov.rich<-Anova(soil.anova.typ3.rich,type="III")
kable(soil.typ3aov.rich)
Sum Sq | Df | F value | Pr(>F) | |
---|---|---|---|---|
(Intercept) | 58353120.3 | 1 | 271.0432117 | 0.0000000 |
State | 1307072.2 | 2 | 3.0355963 | 0.0620663 |
clone | 488691.1 | 4 | 0.5674778 | 0.6880388 |
State:clone | 834679.0 | 8 | 0.4846229 | 0.8579215 |
Residuals | 6889306.8 | 32 | NA | NA |
#There was no significant interaction between state and clone so we drop the interaction term.
soil.anova.add.rich<-aov((Richness)~State+clone, data=soil.richnesstabmet)
soil.add.aov.rich<-Anova(soil.anova.add.rich,type="III")
kable(soil.add.aov.rich)
Sum Sq | Df | F value | Pr(>F) | |
---|---|---|---|---|
(Intercept) | 123304417.80 | 1 | 638.5533064 | 0.0000000 |
State | 5112004.75 | 2 | 13.2367015 | 0.0000387 |
clone | 99047.17 | 4 | 0.1282332 | 0.9712996 |
Residuals | 7723985.86 | 40 | NA | NA |
#Perform Tukey test
soil.anova.add.rich.tukey<-TukeyHSD(soil.anova.add.rich)
soil.anova.add.rich.tukey.table<-rbind(soil.anova.add.rich.tukey$State,soil.anova.add.rich.tukey$clone)
kable(soil.anova.add.rich.tukey.table)
diff | lwr | upr | p adj | |
---|---|---|---|---|
TN-IN | -153.02222 | -526.9365 | 220.8920 | 0.5836065 |
WA-IN | 630.81905 | 233.3656 | 1028.2725 | 0.0011434 |
WA-TN | 783.84127 | 402.7126 | 1164.9699 | 0.0000341 |
132-130 | 110.58360 | -499.2636 | 720.4308 | 0.9850583 |
272-130 | 29.13915 | -580.7080 | 638.9864 | 0.9999183 |
55-130 | 122.69471 | -487.1525 | 732.5419 | 0.9780435 |
WT-130 | 91.21111 | -481.6406 | 664.0628 | 0.9908401 |
272-132 | -81.44444 | -673.0831 | 510.1942 | 0.9947458 |
55-132 | 12.11111 | -579.5276 | 603.7498 | 0.9999972 |
WT-132 | -19.37249 | -572.7998 | 534.0548 | 0.9999764 |
55-272 | 93.55556 | -498.0831 | 685.1942 | 0.9910765 |
WT-272 | 62.07196 | -491.3554 | 615.4993 | 0.9976229 |
WT-55 | -31.48360 | -584.9109 | 521.9437 | 0.9998364 |
#There was no significant effect of clone so we drop and see just state alone.
soil.anova.sing.rich<-aov((Richness)~State, data=soil.richnesstabmet)
soil.sing.aov.rich<-Anova(soil.anova.sing.rich,type="III")
kable(soil.sing.aov.rich)
Sum Sq | Df | F value | Pr(>F) | |
---|---|---|---|---|
(Intercept) | 312378420 | 1 | 1756.94650 | 0.00e+00 |
State | 5206781 | 2 | 14.64255 | 1.34e-05 |
Residuals | 7823033 | 44 | NA | NA |
#Perform Tukey test
soil.anova.sing.rich.tukey<-TukeyHSD(soil.anova.sing.rich)
soil.anova.sing.rich.tukey.table<-rbind(soil.anova.sing.rich.tukey$State)
kable(soil.anova.sing.rich.tukey.table)
diff | lwr | upr | p adj | |
---|---|---|---|---|
TN-IN | -153.0222 | -510.5703 | 204.5258 | 0.5571720 |
WA-IN | 630.8190 | 250.7621 | 1010.8760 | 0.0006350 |
WA-TN | 783.8413 | 419.3946 | 1148.2880 | 0.0000139 |
#The below two lines generate plots to evaluate regression assumptions. I keep these muted because they're annoying if you're bulk running code.
plot(soil.anova.sing.rich)
#I then create a box plot for the richness data by state using ggplot.
library(Hmisc)
library(ggpubr)
soil.richness.summarized<-summarize(soil.richnesstabmet$Richness,by=soil.richnesstabmet$State, FUN=max)
soil.richness.summarized<-data.frame(State=soil.richness.summarized$`soil.richnesstabmet$State`,Richness=soil.richness.summarized$`soil.richnesstabmet$Richness`)
library(ggplot2)
library(ggpubr)
#Below I create a boxplot for richness values.
soil.rich<-ggboxplot(soil.richnesstabmet, x="State", y="Richness", outlier.shape=NA)+
geom_jitter(data=soil.richnesstabmet,position=position_jitter(0.2), aes(shape=clone),size=3)+
scale_shape_manual(values=c(16,10,8,7,0,2))+
#geom_text(data=richness.summarized, aes(x=State, y = 25 + Richness, label=Group, fontface="bold"))+
geom_text(data=NULL,aes(x=1, y=5300, label="A"))+
geom_text(data=NULL,aes(x=2, y=5300, label="A"))+
geom_text(data=NULL,aes(x=3, y=6005, label="B"))+
geom_text(data=NULL,aes(x=0.50,y=6100,label="16S Soil"),hjust=0)+
theme(axis.line.x=element_blank(),
axis.text.x=element_blank(),
axis.title.x=element_blank(),
axis.ticks.x=element_blank())
soil.rich
Diversity
Shannon diversity is first calculated for each sample. Then due to the unbalanced nature of our study design, we use a type 3 ANOVA.
#In this chunk we calculate shannon diversity and create a data frame including diversity with metadata
#I first import my rarefied OTU table that contains singletons.
soil.rareotu.wsingles<-read.table("/Volumes/AaronOnufrakMac/researchprojects_2019.2022_ajo/j.nigra_microbiome_2017_ajo_gmw/soil16sstatistics_jnigra17_2019_ajo/soil16smothur_jnigra17_microbiome/soil16sotutable.rarefied_jnigra17_fa19_ajo.shared", header=TRUE, sep="\t")
#Calculate shannon diversity using the diversity function of vegan
library(vegan)
soil.bac.div<-diversity (soil.rareotu.wsingles, index="shannon")
#I then create a new data frame to include sample metadata.
soil.diversity.tabmet<-data.frame(State=soil.met.rare$State, clone=soil.met.rare$Clone, sample=soil.met.rare$Group,Shannon=soil.bac.div)
#I first test for a significant interaction between state and clone. Due to the unbalanced design we use a type 3 ANOVA.
library(car)
soil.anova.typ3.div<-aov((Shannon)~State*clone, data=soil.diversity.tabmet)
soil.typ3aov.div<-Anova(soil.anova.typ3.div,type="III")
kable(soil.typ3aov.div)
Sum Sq | Df | F value | Pr(>F) | |
---|---|---|---|---|
(Intercept) | 129.0868144 | 1 | 2951.0386914 | 0.0000000 |
State | 0.5568037 | 2 | 6.3645125 | 0.0047094 |
clone | 0.1237413 | 4 | 0.7072090 | 0.5929482 |
State:clone | 0.1436735 | 8 | 0.4105628 | 0.9060315 |
Residuals | 1.3997709 | 32 | NA | NA |
#There was no significant interaction between state and clone so we drop the interaction term.
soil.anova.add.div<-aov((Shannon)~State+clone, data=soil.diversity.tabmet)
soil.add.aov.div<-Anova(soil.anova.add.div,type="III")
kable(soil.add.aov.div)
Sum Sq | Df | F value | Pr(>F) | |
---|---|---|---|---|
(Intercept) | 264.6935053 | 1 | 6859.813328 | 0.0000000 |
State | 1.9610185 | 2 | 25.410939 | 0.0000001 |
clone | 0.1004248 | 4 | 0.650654 | 0.6297693 |
Residuals | 1.5434444 | 40 | NA | NA |
#Perform Tukey test
soil.anova.add.div.tukey<-TukeyHSD(soil.anova.add.div)
soil.anova.add.div.tukey.table<-rbind(soil.anova.add.div.tukey$State,soil.anova.add.div.tukey$Clone)
kable(soil.anova.add.div.tukey.table)
diff | lwr | upr | p adj | |
---|---|---|---|---|
TN-IN | 0.0807888 | -0.0863575 | 0.2479351 | 0.4739357 |
WA-IN | 0.4885664 | 0.3108976 | 0.6662351 | 0.0000001 |
WA-TN | 0.4077776 | 0.2374063 | 0.5781488 | 0.0000025 |
#There was no significant interaction between state and clone so we drop clone from the model.
soil.anova.sing.div<-aov((Shannon)~State, data=soil.diversity.tabmet)
soil.sing.aov.div<-Anova(soil.anova.sing.div,type="III")
kable(soil.sing.aov.div)
Sum Sq | Df | F value | Pr(>F) | |
---|---|---|---|---|
(Intercept) | 663.709482 | 1 | 17764.92771 | 0 |
State | 1.995571 | 2 | 26.70685 | 0 |
Residuals | 1.643869 | 44 | NA | NA |
#Perform Tukey test
soil.anova.sing.div.tukey<-TukeyHSD(soil.anova.sing.div)
soil.anova.sing.div.tukey.table<-rbind(soil.anova.sing.div.tukey$State)
kable(soil.anova.sing.div.tukey.table)
diff | lwr | upr | p adj | |
---|---|---|---|---|
TN-IN | 0.0807888 | -0.0831117 | 0.2446894 | 0.4620979 |
WA-IN | 0.4885664 | 0.3143477 | 0.6627850 | 0.0000001 |
WA-TN | 0.4077776 | 0.2407146 | 0.5748405 | 0.0000013 |
#I then create a box plot for the richness data by state using ggplot.
library(Hmisc)
library(ggpubr)
soil.diversity.summarized<-summarize(soil.diversity.tabmet$Shannon,by=soil.diversity.tabmet$State, FUN=max)
soil.diversity.summarized<-data.frame(State=soil.diversity.summarized$`soil.diversity.tabmet$State`,Diversity=soil.diversity.summarized$`soil.diversity.tabmet$Shannon`)
library(ggplot2)
library(ggpubr)
#Below I create a boxplot for richness values.
soil.div<-ggboxplot(soil.diversity.tabmet, x="State", y="Shannon", outlier.shape=NA)+
geom_jitter(data=soil.diversity.tabmet,position=position_jitter(0.2), aes(shape=clone),size=3)+
scale_shape_manual(values=c(16,10,8,7,0,2))+
#geom_text(data=diversity.summarized, aes(x=State, y = 0.5 + Diversity, label=group, fontface="bold"))+
geom_text(data=NULL,aes(x=1, y=7.1, label="A"))+
geom_text(data=NULL,aes(x=2, y=7.1, label="A"))+
geom_text(data=NULL,aes(x=3, y=7.3, label="B"))+
theme(axis.line.x=element_blank(),
axis.text.x=element_blank(),
axis.title.x=element_blank(),
axis.ticks.x=element_blank())+
geom_text(data=NULL, aes(x=0.50,y=7.4,label="16S Soil"),hjust=0)
soil.div