Midterm
- Six questions
- Thursday (10:00AM) to Monday (2PM)
What is file permission?
- (rw-) (rw-) (r--) 1 john sap
| | | |
type owner group others
Conda
Conda
- Dependencies is one of the main reasons to use Conda.
Sometimes, install a package is not as straight forward as you think. Imagine a case like this: You want to install package Matplotlib, when installing, it asks you to install Numpy, and Scipy, because Matplotlib need these Numpy and Scipy to work. They are called the dependencies of Matplotlib. For Numpy and Scipy, they may have their own dependencies. These require even more packages.
Conda env clean
conda clean --all
Conda create enviroment
conda create -n review python=3
Conda activate enviroment
conda activate review
Install software
conda install -c bioconda -c anaconda trinity samtools multiqc fastqc rsem jellyfish bowtie2 salmon trim-galore fastqc bioconductor-ctc bioconductor-deseq2 bioconductor-edger bioconductor-biobase bioconductor-qvalue r-ape r-gplots r-fastcluster
conda install -c anaconda openblas
conda install nano
conda install -c eumetsat tree
conda install -c lmfaber transrate
Check installation
conda list
Conda installation
search on web browser [software name & conda] on Google ex: 'hisat2 conda'
conda install [package name]
Conda export
export your environment to rnaseq.yaml
conda env export > [Name].yaml
Sequencing
Illumina sequencing
PacBio sequencing
File format
Fasta file
Fastq file
GFF format
- Sequence ID
- Source
- Describes the algorithm or the procedure that generated this feature. Typically Genescane or Genebank, respectively.
- Feature Type
- Describes what the feature is (mRNA, domain, exon, etc.). These terms are constrained to the Sequence Ontology terms.
- Feature Start
- Feature End
- Score
- Typically E-values for sequence similarity and P-values for predictions.
- Strand
- Phase
- Indicates where the feature begins with reference to the reading frame. The phase is one of the integers 0, 1, or 2, indicating the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon. 9 .Atributes
- A semicolon-separated list of tag-value pairs, providing additional information about each feature. Some of these tags are predefined, e.g. ID, Name, Alias, Parent . You can see the full list here.
RNA Sequencing
- The transcriptome is spatially and temporally dynamic
- Data comes from functional units (coding regions)
- Only a tiny fraction of the genome
Paper reading
Please read this paper
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0881-8
Basic Unix/Linux command
cd
cd /data/gpfs/assoc/bch709-4/<YOUR_ID>
mkdir
mkdir BCH709_midterm
cd BCH709_midterm
pwd
pwd
wget
file download
wget https://www.dropbox.com/s/yqvfm70yz79jvij/fasta.zip https://www.dropbox.com/s/jjz6aip3euh0d7q/fastq.tar
wget https://www.dropbox.com/s/szzyb3l4243xcsu/bch709.py
Decompress tar file
tar xvf fastq.tar
ls
Decompress zip file
unzip fasta.zip
ls
gz file
zcat
pipe
wc
rm
RNA-Seq
Advanced bioinformatics tools
Seqkit
https://plantgenomicslab.github.io/BCH709/seqkit_tutorial/index.html
Trim-Galore
References:
- Conda documentation https://docs.conda.io/en/latest/
- Conda-forge https://conda-forge.github.io/
- BioConda https://bioconda.github.io/