Daniel Whitenack (@dwhitena): Data Science language GO, Containers, Reproducibility

By Rajib Bahar at June 17, 2017 06:29
Filed Under:


Daniel (@dwhitena) is a Ph.D. trained data scientist working with Pachyderm (@pachydermIO). Daniel develops innovative, distributed data pipelines which include predictive models, data visualizations, statistical analyses, and more. He has spoken at conferences around the world (ODSC, Spark Summit, Datapalooza, DevFest Siberia, GopherCon, and more), teaches data science/engineering with Ardan Labs (@ardanlabs), maintains the Go kernel for Jupyter, and is actively helping to organize contributions to various open source data science projects.

Interviewer: Rajib Bahar

Agenda:
- Many of us may or may not be aware of "Jupyter Notebook", which is a web application to write codes in various Languages such is R, Python, Julia, node.js, GoLang, Ruby, & Scala. That appliation in turn creates separate process in the Kernel to receive output from the OS and return the output back to the web application. One of the coolest thing you do is to maintain the Kernela on GoLang aka Go. Currently, Data Scientists tend to gravitate toward either R, or Python as language. You're playing with a bit more modern languages in data science. Why Go? How is it more useful in statistical analysis or Data visualization?
- How do you achive reproducibility in data science?
- Most of us heard of Virtual Machine tools such as VMWare, Virtual PC, Virtual Box. This is the 1st time I heard of containers. What are some key benefits of it?Are there websites such as Turnkey hub where you can get some good images of various OS / software / DBMS platforms?
- What are some best practices around deploying Data Science Models? Do you do something similar to DBAs or DataEngineers to run a job at certain frequencies in the day or hour?
- How do you use data pipelines in your project? Is that something used in ETL like Data-Wrangling process?
- Please tell us where we can find you in social media?

Tag cloud

Month List