BCH709 Introduction to Bioinformatics: 6_Introduction of R & R plotting

PhD Tong Zhou

Tong Zhou, PhD https://tongzhoulab.org/ ` L-207B, Center for Molecular Medicine, MS575
Department of Physiology and Cell Biology
University of Nevada, Reno School of Medicine
1664 North Virginia Street, Reno , NV 89557
` ` The Zhou Lab carries out translational and theoretical research in bioinformatics and computational biology. Much of our research addresses questions of computational molecular medicine and molecular evolution, in particular about the use of genomic data to understand the pathobiology and develop biomarkers for human diseases and to understand the mechanism of exnoic sequence evolution. `

Introduction to R

R (www.r-project.org) is a commonly used free Statistics software. R allows you to carry out statistical analyses in an interactive mode, as well as allowing simple programming. R

Prepare your laptop

Open two terminal

Connect one to pronghorn

ssh <yourID>@pronghorn.rc.unr.edu

Open web browser

Connect to https://pastebin.com/6rba4Q9x

R Installation

Prepare working folder in Pronghorn

cd ~/
mkdir r_plot
cd r_plot
wget -O r_plot.yaml https://pastebin.com/raw/kSAC1AsK
wget -O dataset1.txt https://pastebin.com/raw/N5g8bXg6
conda env create -n r_plot -f r_plot.yaml
conda activate r_plot

Start R

R

R Plot with Gene Expression Data Read

Getting help with functions and features

Getting information on any specific named function

help(plot)
?plot

Getting information on a feature specified by special characters

help("%%")
? "%%"

Launching a web browser for help

help.start()

Vectors and assignment

Setting up a vector named a, consisting of five numbers

a <- c(1, 4, 6, 7, 20)
a = c(1, 4, 6, 7, 20)
a

Setting up a vector named b, consisting of two sequences of characters (strings)

b <- c("Hello", "world")
b = c("Hello", "world")
b

Setting up a logical vector named x

x = c(TRUE, TRUE, FALSE, FALSE)
x = c(T, T, F, F)
x

Arithmetic operators

x = c(8, 4, 2, 1)
y = c(2, 2, 2, 2)
x + y
x - y
x * y
x / y
x ^ y
x %% y

Functions mean(), sd(), and sum()

mean(x)
sd(x)
sum(x

Logical operators

x = c(8, 4, 2, 1)
y = c(2, 2, 2, 2)
x > y; x >= y
x < y; x <= y
x == y
x != y
a = 5
b = 0
a > 0 & b > 0 # and
a < 0 | b > 0 # or
! a == 5 # not
TRUE & TRUE == TRUE
TRUE & FALSE == FALSE
FALSE & FALSE == FALSE
TRUE | TRUE == TRUE
TRUE | FALSE == TRUE 
FALSE | FALSE == FALSE
! TRUE == FALSE
! FALSE == TRUE

Generating regular sequences

Generating a vector c(1, 2, …, 29, 30)

1:30
1:30*2
# What is the difference between 1:30-1 and 1:(30-1) ?

Function seq()

seq(from=3, to=20); seq(3, 20); 3:20
seq(3, 20, by=0.5)

Function rep()

rep(-2, 20)
rep(c(1, 4, 5, 6), 10)

Selecting subsets of a vector

age = c(20, 19, 25, 26, 33, 24)
id= c("Tom", "Susan", "David", "Lauren", "Joe", "Nancy")
names(age) = id

Pick up the first three elements of the “age” vector

age[1:3]
age[c("Tom", "Susan", "David")]

Pick up the last three elements of the “age” vector

age[4:6]
age[-(1:3)]
age[c("Lauren", "Joe", "Nancy")]

Exercise

Generate a vector containing the odd numbers between 1 and 100

1:50*2-1
seq(1, 100, by=2)

Pick up the integers from 1 to 100 that can be divided by both 3 and 4

x = 1:100
x[x %% 3 == 0 & x %% 4 == 0]

Pick up the integers from 300 to 1000 that can be divided by either 12 or 16

x = 300:1000
x[x %% 12 == 0 | x %% 16 == 0]

Basic function: plot()

Scatter plot

pdf("plot.pdf") 
plot(1:10, 1:10)
dev.off()

Line plot

pdf("lineplot.pdf") 
plot(1:10, 1:10, type="l")
dev.off()

Plot a curve

pdf("curveplot.pdf") 
x = seq(0, 2*pi, by=0.01)
plot(x, sin(x), type="l", col="red")
dev.off()

Download results to your laptop

*Try this on your laptop

###Please do it on your desktop
###Please replace <YOURID> to your id.
### If this is not working raise your hand and find me.
scp <YOURID>@pronghorn.rc.unr.edu:~/r_plot/plot.pdf ~/bch709
scp <YOURID>@pronghorn.rc.unr.edu:~/r_plot/lineplot.pdf ~/bch709
scp <YOURID>@pronghorn.rc.unr.edu:~/r_plot/curveplot.pdf ~/bch709

In your local terminal

Windows

cd ~/bch709
explorer.exe .

MacOS

cd ~/bch709
open .

Git Bash

cd ~/bch709
explorer.exe .

Dataset

Dataset (dataset1.txt)

cat dataset1.txt
Gene expression dataset of lung tissue samples (72×78)
72 genes
78 samples
30 samples from normal tissues
48 samples from tumor tissues

Loading data frame

#1 load the expression data sheet - 78 samples and 72 genes  
expr = read.delim("dataset1.txt", row.names="gene")  

#2 the first 30 samples are normal lung tissue while the last 48 samples are from lung tumors  
cl = c(rep("Normal", 30), rep("Tumor", 48))  

Box plot of gene expression

#3 boxplot showing the difference in expression of gene TP63 between normal and tumor tissues  
pdf("T63_boxplot.pdf")  
boxplot(t(expr["TP63",])~cl, xlab="", ylab="Expression", main="TP63", col=c("grey80", "grey40"))  
dev.off()  

#4 boxplot showing the difference in expression of gene SLC2A1 between normal and tumor tissues
pdf("SLC2A1_boxplot2.pdf")  
boxplot(t(expr["SLC2A1",])~cl, xlab="", ylab="Expression", main="SLC2A1", col=c("grey80", "grey40"))  
dev.off()  

Download results to your laptop

*Try this on your laptop

###Please do it on your desktop
###Please replace <YOURID> to your id.

scp <YOURID>@pronghorn.rc.unr.edu:~/r_plot/T63_boxplot.pdf ~/bch709
scp <YOURID>@pronghorn.rc.unr.edu:~/r_plot/SLC2A1_boxplot2.pdf ~/bch709

In your local terminal

Windows

cd ~/bch709
explorer.exe .

MacOS

cd ~/bch709
open .

Git Bash

cd ~/bch709
explorer.exe .

Co-expression pattern 1

#5 scatter plot showing the co-expression pattern between gene TP63 and gene SLC2A1
pdf("TP63_SLC2A1_scatter_plot.pdf")  
plot(t(expr["TP63",]), t(expr["SLC2A1",]), xlab="Expression of TP63", ylab="Expression of SLC2A1", pch=19, col="blue", cex=1.2)  
dev.off()  

#6 correlation in expression between between gene TP63 and gene SLC2A1
cor.test(t(expr["TP63",]), t(expr["SLC2A1",]))  

Download results to your laptop

Try this on your laptop

###Please replace <YOURID> to your id.
scp <YOURID>@pronghorn.rc.unr.edu:~/r_plot/TP63_SLC2A1_scatter_plot.pdf  ~/bch709

Co-expression pattern 2


#7 scatter plot showing the co-expression pattern between gene TP63 and gene TSHZ2
pdf("co-expression_pattern_between.pdf")
plot(t(expr["TP63",]), t(expr["TSHZ2",]), xlab="Expression of TP63", ylab="Expression of SLC2A1", pch=21, bg="orange", cex=1.5)
dev.off()

#8 correlation in expression between between gene TP63 and gene TSHZ2
cor.test(t(expr["TP63",]), t(expr["TSHZ2",]))

Download results to your laptop

Try this on your laptop

###Please replace <YOURID> to your id.
scp <YOURID>@pronghorn.rc.unr.edu:~/r_plot/co-expression_pattern_between.pdf  ~/bch709

Heatmap of the gene expression

#9 gene expression heatmap
pdf("heatmap1.pdf")
heatmap(as.matrix(expr), col=cm.colors(256), scale='row')
dev.off()

pdf("heatmap2.pdf")
heatmap(as.matrix(expr), col=cm.colors(256), scale='row', labRow="", labCol="", ColSideColors=c(rep("slateblue", 30), rep("red", 48)))
dev.off()

pdf("heatmap3.pdf")
heatmap(as.matrix(expr), col=rainbow(256), scale='row', labRow="", labCol="", ColSideColors=c(rep("slateblue", 30), rep("red", 48)))
dev.off()

pdf("heatmap4.pdf")
heatmap(as.matrix(expr), col=heat.colors(256), scale='row', labRow="", labCol="", ColSideColors=c(rep("slateblue", 30), rep("red", 48)))
dev.off()

pdf("heatmap5.pdf")
heatmap(as.matrix(expr), col=c("darkblue", "blue", "lightblue", "white", "lightgreen", "green", "darkgreen"), scale='row', labRow="", labCol="", ColSideColors=c(rep("slateblue", 30), rep("red", 48)))
dev.off()

Download results to your laptop

Try this on your laptop

###Please replace <YOURID> to your id.
scp <YOURID>@pronghorn.rc.unr.edu:~/r_plot/heatmap1.pdf ~/bch709
scp <YOURID>@pronghorn.rc.unr.edu:~/r_plot/heatmap2.pdf ~/bch709
scp <YOURID>@pronghorn.rc.unr.edu:~/r_plot/heatmap3.pdf ~/bch709
scp <YOURID>@pronghorn.rc.unr.edu:~/r_plot/heatmap4.pdf ~/bch709
scp <YOURID>@pronghorn.rc.unr.edu:~/r_plot/heatmap5.pdf ~/bch709

PCA of the gene expression

#10 principal component analysis on the gene expression matrix
library(ade4) # if not installed yet, type install.packages(“ade4”)
pca_output = dudi.pca(t(expr), scannf=FALSE, nf=2)
pdf("PCA_plot.pdf")
plot(pca_output$l1$RS1[cl=="Normal"], pca_output$l1$RS2[cl=="Normal"], bg="slateblue", xlim=c(-2, 2), ylim=c(-2, 4.5), xlab='PC1', ylab='PC2', pch=21, cex=1.2)
points(pca_output$l1$RS1[cl=="Tumor"], pca_output$l1$RS2[cl=="Tumor"], bg="red", pch=21, cex=1.2)
legend('topleft', c("Normal tissue", "Tumor tissue"), pch=21, pt.bg=c("slateblue", "red"), pt.cex=1.2, bty='n', ncol=1)
dev.off()

Download results to your laptop

Try this on your laptop

###Please replace <YOURID> to your id.
scp <YOURID>@pronghorn.rc.unr.edu:~/r_plot/PCA_plot.pdf .

Histogram of the gene expression

#11 histogram of the TP63 gene expression
pdf("histogram1.pdf")
hist(t(expr["TP63",]), xlab="Expression", main="")
dev.off()
pdf("histogram2.pdf")
hist(t(expr["TP63",]), xlab="Expression", main="", breaks= -2:12*0.5, col="grey")
dev.off()

Download results to your laptop

Try this on your laptop

###Please replace <YOURID> to your id.
scp <YOURID>@pronghorn.rc.unr.edu:~/r_plot/histogram1.pdf ~/bch709
scp <YOURID>@pronghorn.rc.unr.edu:~/r_plot/histogram2.pdf ~/bch709

Distribution of the gene expression

#12 distribution of the TP63 gene expression
pdf("TP63_expression.pdf")
plot(density(t(expr["TP63",])), xlab="Expression", main="")
dev.off()

Download results to your laptop

Try this on your laptop

###Please replace <YOURID> to your id.
scp <YOURID>@pronghorn.rc.unr.edu:~/r_plot/TP63_expression.pdf ~/bch709

Reading materials

The R Project for Statistical Computing https://www.r-project.org/ R Graphic cookbook https://learning.oreilly.com/library/view/r-graphics-cookbook/9781491978597/ R workshop https://bioinformatics.ca/workshops/2018-introduction-to-R/ RStudio https://www.rstudio.com/ Wikipedia R (programming language) https://en.wikipedia.org/wiki/R_(programming_language) Wikipedia RStudio https://en.wikipedia.org/wiki/RStudio