On Github fomightez / JanFifteenth_slides
Created by Wayne Decatur for Feng Lab Group meeting
Easy to arrive at two ways:
open source and open science
... Datapocalypse (yours and the field in general)
comments?
highly recommend Practical Computing for Biologists by Haddock and Dunn
I am a nomad
Maybe a better question at this point is “why code?”
Adds up to better data handling and greater productivity.
Don't forget regular expressions and basic shell commands can handle a lot of your text and file manipulation needs.
When you reach the limit of what you can easily do with those, then you need to code.
For example, recently wanted something that produced mRNA sequences from a list of NCBI's mRNA entries in FASTA form, such as Schizosaccharomyces pombe 972h- ribonuclease MRP complex subunit, mRNA.
Easy `find and replace` won't work because changes all `T's` and may change useful information.
Regular expression may have worked if I figured out how to do backward assertion?
Look closer at the example input here
Go through all the lines input and decide what to do with them. Want to skip lines beginning with >. For all the other lines, we want replace 't' with 'u' elsewhere.
Of course there's lots of options for languages
Examples:
On the following slides, I am going to try to convince you that Python is a good language to learn...
but honestly, as long as you actively develop, the exact language does not matter that much. See here if you want to hear this from others.
robust, versatile, fully mature language, beyond basic 'scripts', despite it being a 'scripting language' since interpreted and not compiled. Lots of room to grow and learn. can even do object oriented programming.
Widely available and portable
Lots of support and resources
Highly suited to text processing. See here, here and here for example) which makes up a lot of genomics/ngs work
PYTHON continued
Popular among those using data of many, many kinds. Especially scientists!
Current 'popular kid' among bioinformaticians (and biologists in general) for a general language
You are a fan of British humor and yearn to be known as a Phythonista.
import random numbers = range (1,50) chosen = [] while len(chosen) < 6: number = random.choice(numbers) numbers.remove(number) chosen.append(number) chosen.sort() print "This week's numbers are", chosen print "The bonus ball is", random.choice(numbers)
Code example borrowed from here.
I vote for 2.x.
Examples:
From Python for next-generation sequencing
The Python library links in this article are for Python 2.7 which is the version currently supported by most of the major scientific and numerical Python packages, as well as being the most commonly used release in the scientific Python community.
Another at Sourcelair
Not a big deal.
Biggest difference for most people is print command.
See key differences between Python 2.7.x and Python 3.x
For basic things it is minor.
Don't let it be a hurdle to getting started.
see the docs
Reminder that approach used here today was just to save on installation issue. See here for an example of how it can hinder getting started.
For exploring and initial development cloud-approach is fine. Suggest though as you move towards real development of useful programs, you code in a text editor or IDE on your machine and then run in the cloud or even on your machine, depending on how nomadic you are or computing power/recources needed.
Why do I say do the code development locally? - main reason is reliability. You can easily undo and are more likely to have active backup mechanisms available on your computer or through linking with Dropbox etc., just like you would when you do work on a Microsoft Word document or presentation. Later you'd be smart to integrate git/github into your code development workflow, but that serves a different purpose (organizing development and collaboration) other than direct recovery of your most recent edits.
sites go down for spells companies fold internet not always accessibleBottom line: get started, then work out your coding and running style/workflow. Won't be same for all things obviously. Examples from a nomad- mostly PythonAnywhere; large jobs in the cloud unless no money than can use lab desktop (at the cost of convenience).
Today we'll use Sourcelair and Sagemath Cloud
Wading in...
We'll break it up with some real world examples
Starting with this example , can you write a program to convert DNA fasta to mRNA?
example, DNA_to_RNAsimple.py
Start small
Test often
Document
Develop modularly as much as you can.
...continued
-Leave time for developing . Consider time to write the code vs time to do it by hand. Factor in the high degree of accuracy you gain, over doing it by hand, when you code.
...continued
Advice from here
Some basic principles of this kind of computing Try to make your code easy to read. Rather than commenting everything to death, make your function and variable names descriptive. Test out your functions on small data sets where you know the answer. The third time you find yourself performing the same query or merge, write a function to do it instead. Corrollary: don't overplan in advance! Only do this the third time!
Can take effort and some educating of yourself to get to the point you can make a meaningful query with the right terminology to find anything related.
When you know enough to know what is covered elsewhere and what isn't, then ask uncovered questions at those sites.
Practical Computing for Biologists by Haddock and Dunn - overall great resource for all aspects of computing
Install Enthought's Canopy or Anaconda Python distribution on you system. They have free academic use versions. Supplement or expand on that with utilization of PythonAnywhere and/or Domino Data Labs.
Run other people's Python code to make you more productive.
Just use python and code! - Try things. Run others code. Look at others code (Github and blogs) and using resources on the internet (in particular Biostars and Stackoverflow) can get you far.
...continued
I suggest as you work considering everything you do and see where repetition is dragging you down or where ability to be more exhausative would be a benefit.
Another workshop?
Try a course eventuallly?
Watch some time soon: