Data Scientists Making Their Code Public? So Crazy it Just Might Work

At Uptake, our data scientists do a lot. They live and breathe industrial data – from overcharging on a single alternator to complex large-scale weather patterns.

What does a typical day look like?

The short answer: there is no typical day. Some of our data scientists develop predictive models for our partners. This involves spending time learning about electrical systems on locomotives, gearboxes on wind turbines, or diesel engines on freight trucks and applying that knowledge to the sensor data generated by our partners’ critical assets.

Others work on assignments with quick turnarounds, taking in telematics data from a prospective customer and exploring what Uptake can do with that data.

Some spend their time building tools for the rest of the data science team to use. The people working on Uptake’s machine learning engines are focused on delivering tools that other data scientists can use to rapidly prototype, deploy and monitor models that create value for our customers.

Uptake’s Open Source Committee

How do we move so fast? By using open source software. About a year ago we formally created an Open Source Committee at Uptake. The committee’s mission is simple: encourage Uptakers to contribute back to open source projects they’re passionate about and that enable us to do what we do, and give them the guidance and support necessary to make those contributions successful.

Using open source in our stack enables the team to build our proprietary capabilities faster, adapt to new challenges and tap into the wealth of knowledge and creative ability in the broad data science community.

Success in the Open Source Community

The results? Uptakers have made contributions big and small to about a dozen different projects, and have even formally released three of our own:

  • uptasticsearch - an Elasticsearch client written in R and tailored to data science workflows

  • pkgnet - an R package that uses graph theory to explore other R packages

  • updraft - an R package that allows users to compose and execute processing DAGs

Our team is out on the road, speaking about open source at data science conferences here in Chicago and beyond. Our own Stephanie Kirmer will be talking about R packages at the Open Data Science Conference (ODSC) in Boston this May.

We’ve also hosted several internal “Open Source Hack Nights” where data scientists learn more about how to contribute to open source packages and spend the night eating, drinking, and submitting code to the projects they care about. Uptakers have been out on message boards, Github issues pages, and the surprisingly-active data science corner of Twitter, shaping the discussion about the direction of the projects we care deeply about.

The response has been overwhelming. Especially since many of our team members took their first steps into open source only after joining Uptake. It is an exciting time to be working on open source at Uptake. I’m very proud of what the team has accomplished so far, and can’t wait to see what comes next!

Interested in joining the data science team at Uptake? View our open positions.

James Lamb is a Senior Data Scientist at Uptake and leads the data science team’s open source efforts. He is the maintainer and co-author of the uptasticsearch R package (available on CRAN) and has contributed on many other open source data science projects.