Unix - Part 3 – How to work (real work) – cURL



Unix - Part 3 – How to work (real work) – cURL

0 0


2015_experimental_bio


On Github yumyai / 2015_experimental_bio

Unix - Part 3

How to work (real work)

Preecha Patumcharoenpol

Command refresher

echo "Hello" # Show message
echo "Hello " > greeting.txt # and put it into a file
echo "Robert" > name.txt
cat greeting.txt name.txt  # concat
cat greeting.txt name.txt > hello.txt # concat and then.
					

Shell programming

clear
echo "Hello $USER"
echo "Today is \c ";date
echo "Number of user login : \c" ; who | wc -l
echo "Calendar"
cal
exit 0
					

Shell programming

Working in a shell

cat, grep and stuff

cat 01.fa
cat 01.fa 02.fa 03.fa
grep ">" 01.fa
wc -l 01.fa
              

Use them together

# same as grep ">" 0.1fa
cat 01.fa | grep ">"
# same as wc -l 0.1fa
cat 01.fa | wc -l
              

Use them together

grep ">" 01.fa > temp
wc -l temp
              

More concise

cat 01.fa
cat 01.fa | grep ">"
cat 01.fa | grep ">" | wc -l  # Success
              

Magic

cat 01.fa 02.fa 03.fa | grep ">" | wc -l
              

Where do you get your data

Most likely from the internet

  • NCBI
  • KEGG
  • Uniprot

API - Application Programming Interface

http://www.kegg.jp/kegg/rest/keggapi.html

cURL

curl -s -X GET "http://www.google.com"
curl -s -X GET "http://rest.kegg.jp/list/pathway"
            

Bonus: BLAST

curl -s -X POST "http://www.ebi.ac.uk/Tools/services/rest/ncbiblast/run" --data "email=yumyai%40gmail.com&stype=protein&program=blastp&database=uniprotkb&sequence=%3Eseq1%20Some%20sequence%0AMAGAKEIRSKIASVQNTQKITKAMEMVAASKMRKSQDRMAASRPYAETMRKVIG"
curl -s -X GET "http://www.ebi.ac.uk/Tools/services/rest/ncbiblast/----YOUR ID----/"
curl -s -X GET "http://www.ebi.ac.uk/Tools/services/rest/ncbiblast/result/----YOUR ID-----/out?format=6"
            

It is all about composibility

# open http://rest.kegg.jp/list/pathway/cre in browser or curl it

PATHWAY="path:cre03440"
curl -s http://rest.kegg.jp/get/$PATHWAY
gawk '/^GENE/ {seen = 1 } seen {print}' |
sed '1s/^GENE//g' |
sed -n '/^[^ ]/q;p' |
sed 's/^[ \t]*//' |
gawk '{print $1}'
            
PATHWAY="path:cre03440"
curl -s http://rest.kegg.jp/get/$PATHWAY
            
PATHWAY="path:cre03440"
curl -s http://rest.kegg.jp/get/$PATHWAY |
gawk '/^GENE/ {seen = 1 } seen {print}'
            
PATHWAY="path:cre03440"
curl -s http://rest.kegg.jp/get/$PATHWAY |
gawk '/^GENE/ {seen = 1 } seen {print}' |
sed '1s/^GENE//g'
            
PATHWAY="path:cre03440"
curl -s http://rest.kegg.jp/get/$PATHWAY |
gawk '/^GENE/ {seen = 1 } seen {print}' |
sed '1s/^GENE//g' |
sed -n '/^[^ ]/q;p'
            
PATHWAY="path:cre03440"
curl -s http://rest.kegg.jp/get/$PATHWAY |
gawk '/^GENE/ {seen = 1 } seen {print}' |
sed '1s/^GENE//g' |
sed -n '/^[^ ]/q;p' |
sed 's/^[ \t]*//'
            
PATHWAY="path:cre03440"
curl -s http://rest.kegg.jp/get/$PATHWAY |
gawk '/^GENE/ {seen = 1 } seen {print}' |
sed '1s/^GENE//g' |
sed -n '/^[^ ]/q;p' |
sed 's/^[ \t]*//' |
gawk '{print $1}'
exit 0
            

Get FASTA file

GENE="cre:CHLREDRAFT_195401"
curl -s "http://rest.kegg.jp/get/$GENE/aaseq"
          

Put them into script

See 02_kegg.sh and 03_getaa.sh

Let go

sh 02_kegg.sh cre00860 | sed 's/^/cre:/' | xargs -I{} sh 03_getaa.sh {}
sh 02_kegg.sh hsa04740 | sed 's/^/hsa:/' | xargs -I{} sh 03_getaa.sh {}
          

I don't understand?

The point of this lecture is to understand how programs are working together.

What to do if something is wrong

Test them.

Do we really need go throught all of that?

The answer is maybe.

THE END

Quiz

  • Create 04_getnt.sh that get nucleotide instead of amino acid that work seemlessly with an old pipeline
    sh 02_kegg.sh hsa04740 | sed 's/^/hsa:/' | xargs -I{} sh 04_getnt.sh {}
                  
  • Hint: we used the url "http://rest.kegg.jp/get/$GENE/aaseq" to get aa sequence.
Unix - Part 3 How to work (real work) Preecha Patumcharoenpol