Using the Command Line

I accidentally closed the tab containing the NEARLY FINISHED original post (and it did not magically save itself) so fair warning this post will be shorter, crankier, and less clever then it was originally intended.

The terminal/command line allows the user to interact with their computer without using a GUI (graphical user interface) or the mouse. Experience and comfort working in the command line is essential to bioinformatic analysis. Using it for the first time can seem intimidating, but it is fairly simple once you get started. The command line uses an operating system called unix. In the following post I will show you how to access the command line and introduce few simple commands.

A few quick definitions

command line – the command line is the place where you type commands in a terminal window. It looks something like this: (colors of text and background, may vary)

Screen Shot 2014-05-30 at 2.41.32 PM

script – a short computer program. Usually computer programs are called scripts when they perform a simple function or a small number of simple functions. Scripts are often strung together into a larger program called a pipeline. Scripts are typically only run from the command line.

directory – for practical purposes, this is just another name for a folder (the same type you would encounter on your desktop or in your finder (if you’re working on Mac OS).

** These instructions are compatible with a computer running Mac OS or linux (although if you are running linux you probably already know how to access/use the command line). Future posts will mainly cover python syntax/coding either online or in a python shell so if you are a windows user you should keep following the blog!

I did find these tutorials/resources using the command line with a computer running windows

http://www.cs.princeton.edu/courses/archive/spr05/cos126/cmd-prompt.html

http://www.bleepingcomputer.com/tutorials/windows-command-prompt-introduction/

http://www.computerhope.com/issues/chusedos.htm

Locating the Terminal

I have included both videos and text on locating and using the terminal because while I personally hate learning things from videos, they can be helpful in transmitting information. I recommend at least watching the video on locating the terminal ( I tried to make these videos as nonirritating as possible–they may seem a little fast but when I slowed them down they became painful to watch).

To open the terminal

Go to Go>Applications>Terminal

Using the Terminal

In the terminal you move between directories (folders) using the command “cd” (change directory) followed by the path to the directory you want to go be in.

When you type in the terminal it will appear after the $. The word prior to the $ is your current directory. Everything after the $ is the code you are executing.

IMPORTANT SIDE NOTE

In unix (and many/most other coding languages) spaces are significant. This is because every unix command takes the basic form:

command[space]input

For example:

“cd directory_name”

Since spaces are used to separate important parts of the command, file names with spaces can be problematic. For this reason most programmers replace spaces with underscores (_) in file and directory names to avoid screwing up their scripts (i.e. file_name)


 

dhcp16-gc1:~ Madison$ cd Desktop/Workflow/Paper

In the above example I am currently in the Madison home directory. The ~ indicates it is the home directory. To change directories I use the cd command followed by the path I want to take. I move to the paper directory via the Desktop followed by the Workflow directories. I could move through the same path in three separate steps

dhcp16-gc1:~ Madison$ cd Desktop/

dhcp16-gc1:Desktop Madison$ cd Workflow/

dhcp16-gc1:Workflow Madison$ cd Paper/

To move up a level you can use the “cd” command followed by “..”

dhcp16-gc1:Paper Madison$ cd ..

The above command moves from the Paper directory to the Workflow directory above it

dhcp16-gc1:Workflow Madison$ cd ..

The above command moves from the Workflow directory to the Desktop directory above it

To return directly to the home directory you can use the “cd ~” command

$ cd ~

If you aren’t sure where you are, or what directories are available to you type the command “pwd” which will display the path of the current directory

$ pwd

To see all of the files and directories in your current directory use the “ls” command

$ ls

To specify a file name you need to either be in the directory containing that file, or you need to specify the path to the file

If I wanted to specify a file in the Paper directory (from the above examples) the syntax would be

$ ~/Desktop/Workflow/Paper/file_name

PRO TIP

Tab complete is the best! When typing the name of a file or directory after you have typed the first few letters hit tab and the computer will fill in the rest. If multiple files or directories begin with the same letters, tab complete will fill in the letters to the point at which they diverge. If you then double tap the tab button it will list all of the directories that begin with those characters.

$ less file_name

The “less” command will display the file (listed after the command) in the terminal window, “cat” and “more” perform a similar function

$ cp file1 file2

The “cp” command will make a copy of file1 named file2, if you want file2 to be located in a different directory you must specify the path before the new file name (i.e. cp file1 ~/Documents/Blog/file2)

$ mv file1 file2

The “mv” command is similar to cp, but instead of making a second copy of the file, it renames file1 (and moves if you specify a path) to file2

$ head file_name

The “head” command displays the first ten lines of the file

$ tail file_name

The “tail” command displays the last ten lines of the file

$ grep ‘keyword’ file_name

The “grep” searches a file for all instances of the keyword and displays them on the terminal screen (the keyword MUST be in single quotes i.e. ‘keyword’)

$ grep -c ‘keyword’ file_name

The “grep -c” command counts the number of times the keyword occurs in the file and displays that number on the terminal screen (again the keyword MUST be in single quotes i.e. ‘keyword’)

$ whatis command

The “whatis” command followed by a command (i.e. whatis ls) will return a brief description of the command

Here is a cheat sheet of Unix commands and their meanings

Screen Shot 2014-05-30 at 2.11.42 PM

Screen Shot 2014-05-30 at 2.12.17 PM

This tutorial walks you through different unix commands

http://www.ee.surrey.ac.uk/Teaching/Unix/

The Turner lab at University of Virginia also has a great unix/command line tutorial pdf on their blog Getting Genetics Done

Ian Korf at the UC Davis Genome Center also has an excellent unix and perl primer here

This also looked like a useful resource

http://lifehacker.com/5633909/who-needs-a-mouse-learn-to-use-the-command-line-for-almost-anything

Thanks to the awesome Hannah Holland-Moritz for her help writing and editing this post!

Advertisements

Hello World

Welcome to Bioinformatics for beginners!

As a beginner myself I am creating this blog as a place to organize what I have learned (and am learning) about coding and as a resource for other novice coders. If there are any terms/jargon/concepts you dont understand feel free to contact me either by leaving a comment below or via twitter (@MDunitz)

Although I am sometimes shocked by how far I have come in such a short time, I am doing my best to direct this blog towards myself a year ago, a girl who didn’t know an operating system from a…well basically I didn’t know what an operating system was. In this post I am going to summarize what I have learned so far and how/where I learned it. If there are terms you don’t understand don’t worry! I promise I will explain them in future posts.

A little about me

I graduated with a degree in microbiology and political science from UC Davis in December of 2013. I have always been vaguely interested in coding/knowing more about computers but sort of in the same way I was interested in learning French–it would be useful and cool for someday.  In January I began working full time in the Eisen lab at the UC Davis Genome Center.  I work on a variety of fascinating (at least to me) projects using bioinformatics to study microbes in the built environment. For more information check out:

http://microbe.net/

http://phylogenomics.blogspot.com/

Currently I am working on a workflow/methods paper From Swab to Publication: a comprehensive workflow for microbial genome sequencing. The goal of this paper is to make sequencing and de novo assembly of genomes, as well as basic bioinformatics more accessible to undergraduates and smaller labs.

I am lucky enough to work with an amazing group of scientists. My coworkers range in “computer literacy” from novice bioinformaticians like myself to PhDs designing amazing bioinformatic tools/pipelines for the scientific community (such as A5 and phylosift) and all of them have been happy to assist me in my introduction to bioinformatics.

My experience with coding prior to this past winter was limited to the occasional analysis in STATA (a data analysis/statistics program) for political science classes. I began learning the command line in order to utilize QIIME (a tool for comparing and analyzing microbial communities). I also began learning python from Codeacademy (which I highly recommend-although be warned it is insanely addictive, you may find yourself staying up until three or four in the morning for “just one more level”). I then did python village on Rosalind and I am currently working on the bioinformatics stronghold.

I am not entirely sure how I will organize this blog, but I am hoping to explain the basics of the command line, and then review the python I learned in code academy and Rosalind and potentially check out the python tutorial.

Good Luck!