Introduction to Coding in Python – January 15th, 2015



Introduction to Coding in Python – January 15th, 2015

0 0


JanFifteenth_slides

slides for January 15th Introduction to Python

On Github fomightez / JanFifteenth_slides

Introduction to Coding in Python

January 15th, 2015

Created by Wayne Decatur for Feng Lab Group meeting

Workshop Infrastructure

the hub is here

Easy to arrive at two ways:

  • write down http://bit.ly/FengPyCode
  • search for Wayne Decatur proteopedia, follow github and look for repo for Feng lab workshop

Today is not really about Python

  • working productively
  • automation
  • reusable code
  • modular design
  • develop smart, test often
  • programming concepts and algorithmic thinking
  • troubleshooting
  • collaboration
  • reproducibility --> reproducible science
  • open source and open science

    ... Datapocalypse (yours and the field in general)

Prep reading?

comments?

highly recommend Practical Computing for Biologists by Haddock and Dunn

I am a nomad

Why code in Python?

Maybe a better question at this point is “why code?”

Why code?

  • quickly perform repetitive tasks accurately
  • automate pipelines and workflows
  • allows implementing algorithms and leads to algorithmic thinking that empowers you in problem-solving
  • can help in creating text and visual assets

Adds up to better data handling and greater productivity.

When to code

Don't forget regular expressions and basic shell commands can handle a lot of your text and file manipulation needs.

When you reach the limit of what you can easily do with those, then you need to code.

For example, recently wanted something that produced mRNA sequences from a list of NCBI's mRNA entries in FASTA form, such as Schizosaccharomyces pombe 972h- ribonuclease MRP complex subunit, mRNA.

How to approach that?

Easy `find and replace` won't work because changes all `T's` and may change useful information.

Regular expression may have worked if I figured out how to do backward assertion?

So we are getting closer...

Look closer at the example input here

Pseudocode for the problem

Go through all the lines input and decide what to do with them.

Want to skip lines beginning with >.

For all the other lines, we want replace 't' with 'u' elsewhere.

Why code in Python?

Of course there's lots of options for languages

Examples:

  • Perl
  • Java
  • R
  • Antha
  • many others

Why code in Python?

On the following slides, I am going to try to convince you that Python is a good language to learn...

but honestly, as long as you actively develop, the exact language does not matter that much. See here if you want to hear this from others.

Python

  • robust, versatile, fully mature language, beyond basic 'scripts', despite it being a 'scripting language' since interpreted and not compiled. Lots of room to grow and learn. can even do object oriented programming.

  • very high-level language

  • Widely available and portable

  • Lots of support and resources

  • Highly suited to text processing. See here, here and here for example) which makes up a lot of genomics/ngs work

PYTHON continued

  • Popular among those using data of many, many kinds. Especially scientists!

  • Current 'popular kid' among bioinformaticians (and biologists in general) for a general language

  • You are a fan of British humor and yearn to be known as a Phythonista.

Python is a Very High Level Language

import random


numbers = range (1,50)
chosen = []

while len(chosen) < 6:
	number = random.choice(numbers)
	numbers.remove(number)
	chosen.append(number)

chosen.sort()

print "This week's numbers are", chosen
print "The bonus ball is", random.choice(numbers)

Code example borrowed from here.

Python versatility

  • Django - Instagram, Mozilla site, Pinterest, Schmitt Lab site
  • Pymol for structural biology

Python 2.x or 3.x?

I vote for 2.x.

  • Typical for many scientific modules and packages to find 2.x supported and 3.x coming along. (See here for an overview, albeit outdated. Situation is vastly improved now.)

Python 2.x or 3.x?

Examples:

From Python for next-generation sequencing

The Python library links in this article are for Python 2.7 which is the version currently supported by most of the major scientific and numerical Python packages, as well as being the most commonly used release in the scientific Python community.

Another at Sourcelair

Python 2.x or 3.x?

Not a big deal.

Biggest difference for most people is print command.

See key differences between Python 2.7.x and Python 3.x

For basic things it is minor.

Don't let it be a hurdle to getting started.

Where to code and run Python?

see the docs

Sidebar: Running Python for today

Reminder that approach used here today was just to save on installation issue. See here for an example of how it can hinder getting started.

For exploring and initial development cloud-approach is fine. Suggest though as you move towards real development of useful programs, you code in a text editor or IDE on your machine and then run in the cloud or even on your machine, depending on how nomadic you are or computing power/recources needed.

Why do I say do the code development locally? - main reason is reliability. You can easily undo and are more likely to have active backup mechanisms available on your computer or through linking with Dropbox etc., just like you would when you do work on a Microsoft Word document or presentation. Later you'd be smart to integrate git/github into your code development workflow, but that serves a different purpose (organizing development and collaboration) other than direct recovery of your most recent edits.

sites go down for spells companies fold internet not always accessible

Bottom line: get started, then work out your coding and running style/workflow. Won't be same for all things obviously. Examples from a nomad- mostly PythonAnywhere; large jobs in the cloud unless no money than can use lab desktop (at the cost of convenience).

Getting started

Today we'll use Sourcelair and Sagemath Cloud

SourceLair

Guide to Python on Sourcelair

Sagemath Cloud

Using Sourcelair to run Python Interactively

Wading in...

We'll break it up with some real world examples

Break I: Running Other's Python code

Running Python in an IPython Notebook

Modules/Packages/libraries

APIs can now be your friend

Break II: Use some APIs

Putting it all together

Starting with this example , can you write a program to convert DNA fasta to mRNA?

example, DNA_to_RNAsimple.py

Tips for getting started

  • Start small

  • Test often

  • Document

  • Develop modularly as much as you can.

Tips for getting started

...continued

-Leave time for developing . Consider time to write the code vs time to do it by hand. Factor in the high degree of accuracy you gain, over doing it by hand, when you code.

Tips for getting started

...continued

Advice from here

Some basic principles of this kind of computing Try to make your code easy to read. Rather than commenting everything to death, make your function and variable names descriptive. Test out your functions on small data sets where you know the answer. The third time you find yourself performing the same query or merge, write a function to do it instead. Corrollary: don't overplan in advance! Only do this the third time!

Places for questions

  • Try Google, probably will lead you to one of my listed resources or...
  • Stackoverflow for general scripting and computing
  • Biostars

Can take effort and some educating of yourself to get to the point you can make a meaningful query with the right terminology to find anything related.

When you know enough to know what is covered elsewhere and what isn't, then ask uncovered questions at those sites.

Suggested immediate next steps

Practical Computing for Biologists by Haddock and Dunn - overall great resource for all aspects of computing

Install Enthought's Canopy or Anaconda Python distribution on you system. They have free academic use versions. Supplement or expand on that with utilization of PythonAnywhere and/or Domino Data Labs.

Run other people's Python code to make you more productive.

Just use python and code! - Try things. Run others code. Look at others code (Github and blogs) and using resources on the internet (in particular Biostars and Stackoverflow) can get you far.

Suggested immediate next steps

...continued

I suggest as you work considering everything you do and see where repetition is dragging you down or where ability to be more exhausative would be a benefit.

Another workshop?

Try a course eventuallly?

Watch some time soon:

A History of Bioinformatics (in the Year 2039)

THE END

Introduction to Coding in Python January 15th, 2015 Created by Wayne Decatur for Feng Lab Group meeting