BCH709 Introduction to Bioinformatics: midterm_review

Midterm

What is file permission?

 -    (rw-)   (rw-)   (r--) 1 john sap
|      |       |       |
type  owner  group   others

Conda

Conda

Conda env clean

conda clean --all

Conda create enviroment

conda create -n review python=3  

Conda activate enviroment

conda activate review  

Install software

conda install -c bioconda -c anaconda trinity samtools multiqc fastqc rsem jellyfish bowtie2 salmon trim-galore fastqc bioconductor-ctc bioconductor-deseq2 bioconductor-edger bioconductor-biobase  bioconductor-qvalue r-ape  r-gplots  r-fastcluster
conda install -c anaconda openblas
conda install nano
conda install -c eumetsat tree
conda install -c lmfaber transrate

Check installation

conda list

Conda installation

search on web browser [software name & conda] on Google ex: 'hisat2 conda'  
conda install [package name]

Conda export

export your environment to rnaseq.yaml

conda env export  > [Name].yaml

Sequencing

Illumina sequencing

Illumina

PacBio sequencing

PacBio

File format

Fasta file

fasta

Fastq file

Fastq_file basequality

GFF format

GFF

  1. Sequence ID
  2. Source
    • Describes the algorithm or the procedure that generated this feature. Typically Genescane or Genebank, respectively.
  3. Feature Type
    • Describes what the feature is (mRNA, domain, exon, etc.). These terms are constrained to the Sequence Ontology terms.
  4. Feature Start
  5. Feature End
  6. Score
    • Typically E-values for sequence similarity and P-values for predictions.
  7. Strand
  8. Phase
    • Indicates where the feature begins with reference to the reading frame. The phase is one of the integers 0, 1, or 2, indicating the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon. 9 .Atributes
    • A semicolon-separated list of tag-value pairs, providing additional information about each feature. Some of these tags are predefined, e.g. ID, Name, Alias, Parent . You can see the full list here.

RNA Sequencing

RNA Sequencing

  1. The transcriptome is spatially and temporally dynamic
  2. Data comes from functional units (coding regions)
  3. Only a tiny fraction of the genome

Paper reading

Please read this paper
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0881-8

Basic Unix/Linux command

cd

cd /data/gpfs/assoc/bch709-4/<YOUR_ID>

mkdir

mkdir BCH709_midterm
cd BCH709_midterm

pwd

pwd

wget

file download

wget https://www.dropbox.com/s/yqvfm70yz79jvij/fasta.zip https://www.dropbox.com/s/jjz6aip3euh0d7q/fastq.tar
wget https://www.dropbox.com/s/szzyb3l4243xcsu/bch709.py 

Decompress tar file

tar xvf fastq.tar

ls

Decompress zip file

unzip fasta.zip

ls

gz file

zcat

pipe

wc

rm

RNA-Seq

RNA Sequencing workflow

Advanced bioinformatics tools

Seqkit

https://plantgenomicslab.github.io/BCH709/seqkit_tutorial/index.html

Trim-Galore

References: