On Github thoppe / Presentation_GWU_CSintro
Postdoctoral Fellow, National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD
GSIGAASMEF CFDVFKELKV HHANENIFYC PIAIMSALAM VYLGAKDSTR TQINKVVRFD KLPGFGDEIE AQCGTSVNVH SSLRDILNQI TKPNDVYSFS LASRLYAEER YPILPEYLQC VKELYRGGLE PINFQTAADQ ARELINSWVE SQTNGIIRNV LQPSSVDSQT AMVLVNAIVF KGLWEKAFKD EDTQAMPFRV TEQESKPVQM MYQIGLFRVA SMASEKMKIL ELPFASGTMS MLVLLPDEVS GLEQLESIIN FEKLTEWTSS NVMEERKIKV YLPRMKMEEK YNLTSVLMAM GITDVFSSSA NLSGISSAES LKISQAVHAA HAEINEAGRE VVGGAEAGVD AASVSEEFRA DHPFLFCIKH IATNAVLFFG RCVSPSecondary structurehelices [red], sheets [blue Tertiary structure3D structure Higher-order structurecomplexes, aggregation
How do we model many protein-protein interactions?Can we predict aggregates from experimental structure?
Human serum albuminPDB:1AO6 OvalbuminPDB:1OVA LysozymePDB:1W6Z Bovine Serum AlbuminPDB:3V03
Be able to say what is possible, and what isn't!
Algorithmic design, ex. linear algebra, molecular dynamics...
Hardware design, specialized hardware, ex. Anton, GRAPE.
Predicting run-time (non-trivial at model stage!).
Scaling up!
Meet The Man Who Gamed Reddit With A Bot
Download WikipediaDownload all posts with score>1000 for 2013 and 2014 (~5000)Cross-reference each post to the correct Wikipedia paragraphBuild True positives (known TIL's)Build Decoys (other paragraphs in TIL's)Build unknown samples (rest of Wikipedia*)
>> "Good muffins cost $3.88\n in New York" ['Good', 'muffins', 'cost', 'TOKEN_MONEY', 'in', 'New', 'York', 'TOKEN_EOS']
>> "I sat on the rock" ['I', 'sat', 'on', 'rock']
>> stem("factionally") 'faction'
>>> model.most_similar(positive=['woman', 'king'], negative=['man']) [('queen', 0.50882536), ...] >>> model.doesnt_match("breakfast cereal dinner lunch".split()) 'cereal' >>> model.similarity('woman', 'man') 0.73723527 >>> model['computer'] # raw numpy vector of a word array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32)
Uses far fewer features to store relationships between words!
Training classifier Test Accuracy: 0.878; Test Accuracy on TP: 0.116; Test Accuracy on TN: 0.998Receiver Operating Characteristic
TIL The Founder Of Japans Mcdonalds Stated | 4726TIL Mike Kurtz An American Burglar Found Out That | 4123TIL A Woman That Reported 100 Incidents Of | 2899TIL During The Sentencing Of His War Crimes Trial | 1551TIL That Art Spiegelman The Creator Of Maus A | 1144TIL That Once Officially Labeled As Retarded | 640TIL Before World War Ii It Was Very Rare For | 498TIL That A Study Showed Those With A Distressed | 142TIL Frankie Fraser A Notorious English Gangster | 135TIL Rafael Quintero A Mexican Drug Trafficker | 68...
Natural language parsing, NLP.
Supervised and unsupervised learning.
Knowing the right algorithm and its limitations...
Validation and statistics.
... computer science is more than just code ...
For class participation credit, fill out this questionnaire: