On Github arokem / 2015-11-16-viscog
Ariel Rokem, University of Washington eScience Institute
Follow along at http://arokem.github.io/2015-11-16-viscog
1. Empirical (experimental)
2. Theoretical (mathematical)
3. Simulation (computational)
4. Data-intensive (eScience)
Programming and software engineering
Data management
Statistics and machine learning
Data visualization and communication
A focus on reproducibility and openess
Facilitate data-intensive research in different fields (inter- and cross- disciplinary)
Focus on methodology
Focus on reproducibility
Contribute to openly available tools, rather than/in addition to peer-reviewed publications
"Career paths for data scientists that recognize and reward contributions in methodology, computation, or development of tools are important."
Focused, intensive, collaborative projects
Data scientists + domain scientists
Results that wouldn't be possible otherwise
Inspired by DSSG program at U Chicago, GA Tech
10-week internship program
16 DSSG fellows/students
6 high-school students from ALVA program
4 projects (+project leads!)
+ Data scientist mentors
Brain connections change with development
Individual differences account for differences in behaviour
Adapt with learning
This has clinical significance
Started in 2009 by Eleftherios Garyfallidis
Contributors from at least six different countries and many different labs
The lingua franca of reproducible computational science
Open source
Easy to learn
The lingua franca of reproducible computational science
Open source
Easy to learn
Phenomenal ecosystem of open-source tools
model = ReconstModel(gtab, ...)
fit = model.fit(data, ...) # => ReconstFit
model = dti.TensorModel(gtab)
fit = model.fit(data1)
prediction = fit.predict(gtab)
RMSE = np.sqrt(\ np.mean((prediction - data2) ** 2), -1))
rRMSE = RMSE / np.sqrt(\ np.mean((data1 - data2) ** 2), -1))
# Use a k of 2
dti_pred = kfold_xval(dti_model, data, 2)
csd_pred = kfold_xval(csd_model, data, 2)
Forward model from the tracks to the measured signal
fiber_model = life.FiberModel(gtab)
fit = fiber_model.fit(data, tracks)
prediction = fit.predict(gtab)
optimized_tracks = tracks[fit.beta>0]
The eScience Institute
The Dipy project
In vivo validation through statistical learning
Come visit the Data Science Studio!