Bioinformatics group new user guide

Welcome to the bioinformatics groups. This guide will explain how to access and use our computer services. In addition to explaining common resources, we also explain how to accomplish common tasks.

The login server

The main way to access our shared computer systems is through SSH on dna.uwaterloo.ca. This server is intended for compilation, testing, editing, and other interactive tasks. For computationally intesive tasks, we have a small cluster running Sun GridEngine which we explain in detail later. Please use the cluster for such tasks instead of dna.uwaterloo.ca. Running long computationaly intesive jobs on dna.uwaterloo.ca is disruptive to the work of others.

The first thing you will want to do, however, is change your given password.

Changing your password

To change your password, SSH to dna.uwaterloo.ca and log in. If you are on Windows, you can do this with a free SSH client called Putty. Once logged in, issue the command passwd and follow the prompts. While there are no restrictions on the password you choose, it is a good idea to use a mix of lowercase and uppercase letters, numbers, and symbols.

The file system and snapshots

The files you access through dna.uwaterloo.ca are stored on our file server. This server is backed in several ways. For disaster recovery, we back up files to another server in a different physical location. For user convenience, we also provide snapshots.

A snapshot is an image of your files at a particular time. We store a snapshot of your files as of midnight the day before, midnight two days before, and midnight every Saturday. You can access these images though ~/.zfs/snapshot/ where ~ represents the root of your home directory (e.x. /net/home/username).

Accessing the file server from Windows

To access the file server from Windows, open explorer and type

\\username@hmsbarracouta.cs.uwaterloo.ca\username

in to the address bar. The password is the same as for dna.uwaterloo.ca.

The queue system

The Sun GridEngine queue system allows you to queue up many jobs. It will run your jobs on the cluster node with the least resources currently in use. If our cluster is at capacity, your job will remain in the queue until resources become available. .

GridEngine commands

The following is a quick summary of the relevant commands needed to use the queue system. The the main commands you will want to use are qrun, qdel, and qstat.

The qrun command takes a command in single quotes and runs it on the queue as if it was run from the current directory. You may include the standard redirects within the single quotes. Note that it is very important to use single quotes instead of double quotes or no quotes. Additionally, you must tell qrun the approximate parameters of your job with the following flags.

By using the correct command for your job size you allow the queue to manage your job more efficiently. We also provide three quick submit commands. Unlike qrun, these commands do not require quotes around the command and instead try to infer the correct behaviour automatically. To use them simply prepend one of the commands to the command you want to run in the queue and run as normal.

Note that the above three commands do not work well inside a script. If you are using a script to submit many jobs, us the qrun command instead.

Finally, if you desire more advanced resource specifications, you may look at the qsub command. This is the native GridEngine job submission command and is extremely flexible. The down side to this command is that it is also more complex to use and does not set copy your shell environment by default.

Installed software

We have a large selection of software and libraries installed on the cluster for your use. If you require something that is not currently installed, send your request to the system administrator whose email you can find below.

Other questions

If you have other questions or requests, please email me at akhudek at cs uwaterloo ca.