Month: December 2017

Setting PATH variable in different shells

The Linux PATH is an environmental variable that contains all the directories that the shell will search for executable files, when a user issues a command. It can be modified temporarily or permanently in order to include specific software, so that having to type the whole path of that software won’t be needed anymore. Setting the PATH variable can also be useful if a user wants to use a different version of a software already included in the PATH. Since I often use different types of shells to do my job, I decided to highlight how to change the...

Read More

Adding users to the sudo group

One of the most important things to do after setting up a new Linux server (or after taking over an existing one) is to create a new user, possibly with sudo powers. Sudo is a special Linux command that allows users to perform administrator tasks even if they are not system admins. The main reason for having a sudo user (or sudoer) is because logging in as root is usually not desirable, since it can cause troubles more often than not, but we may still want to be able to perform administrator tasks with a non-root user. Moreover, adding...

Read More

Phred quality score

Next Generation Sequencing techniques have brought new insights into -omics data analysis, mostly thanks to their reliability in detecting biological variants. This reliability is usually measured using a quality score named Phred (or Q score). The Phred score of a base is an integer value that represents the estimated probability of an error in base calling. Mathematically, a Q score is logarithmically related to the base-calling error probabilities P, and can be calculated using the following formula: Q = -10 log10 P In the real world, a quality score of 20 means that there is a possibility in 100...

Read More

Counting sequences in fasta/fastq files

A well-established bioinformatician usually has a handful of appropriate informatics tools to manipulate and analyse genomic data, for example counting sequences in a file. Nonetheless, in some cases it may be useful to rely on standard Unix commands, for example when your trusty laptop is not available or you’re working on someone else’s machine. Fasta files A .fasta file is a simple plain text file in which every sequence is represented by a header line, beginning with “>” and containing the sequence identifier and details, followed by a number of lines containing the actual sequence: >SEQUENCE_1 MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL...

Read More

Linux terminal shortcuts

I often find myself looking for some shortcuts for working on the terminal with bioinformatics data, so here is a list of the most useful shortcuts and commands for the Unix terminal (credits to cheatsheetworld): File system ls  –  list items in the current directory ls -l  –  list items in the current directory in a long format, to see permissions, size and modification date ls -a  –  list all items in the current directory, including hidden files ls -F  –  list items in the current directory showing directories with a slash and executables with a star ls [dir]...

Read More