Build things. With people.
How open source software can help prepare you for industry
When beginners ask me how they should prepare for a career in the data science industry, my answer is always the same: build things. I made this argument in a previous blog post for SharpestMinds:
Landing a job as a data scientist, machine learning engineer, or really any kind of role writing software, takes more than just math and programming knowledge. In reality, these roles require you to make hundreds of decisions every day… Getting better at making these decisions requires practice, and the only way to get that practice is to build things.
This is great news. With just a laptop and an internet connection, you have all you need to get started and gain experience. But there’s still something missing. When you finally get that job, you won’t just be building things — you’ll be building things with other people.
This is a significant difference. Collaborating on a project with other people introduces a whole new set of problems and constraints. That’s where open source software (OSS) offers a great opportunity to learn. Contributing to OSS let’s you practice collaborating with others.
In a thread on the SharpestMinds Slack, Rebecca, a Data Engineer, summed this up well: “The amount of glue work that happens just to work in a collaborative environment is high. … [It] isn’t always immediately understandable why you need to do things — especially if you are self-taught. CI/CD tools and multiple git branches are far less necessary for individuals learning on their own, and figuring out how to work in a multi-user environment is a big first step.”
When you’re working alone, you don’t need to follow best practices. You don’t need to maintain documentation. You don’t need processes to review and test new code contributions. You certainly can do these things when working alone (and it’s a good idea to build those habits), but there’s no real pressure.
Contributing to OSS can be intimidating. But, that doesn’t have to stop you. There is a lot more to a software project than just the code. According to SharpestMinds mentor Ray Phan, “It is very daunting to develop new features or add new things to an OSS project — especially if you’re a beginner. Instead, tackle low hanging fruit that can be done by a novice but the main developer does not have time to do.”
Some things you can do:
- Report and/or resolve issues
- Update or create documentation (this can be as simple as fixing a typo!)
- Increase the test coverage of the project
- Offer to test other pull requests, then comment about it
- Review pull requests
- Submit your own pull requests
The good news is that data science and machine learning space is full of popular open source projects. The bigger, mainstream projects will have good systems in place already for contributors. For example, scikit-learn’s contribution guide offers clear instructions on how to contribute. These bigger projects will usually have issues labelled with “good first issue” — a sign that it is a good entry point for beginners (e.g. see the issues on dbt core or scikit-learn).
However, according to Rebecca, “The bigger projects are often more complex both in functionality and development setup. Smaller projects are usually simpler and developers are happy to have contributions so you’ll have more of an impact, but you might have to dig around more to figure out how to contribute… If you can find an area that interests you, there is often a lot more synergy that comes out of it. But don’t let not finding your passion stop you from contributing.”
To get started in OSS, don’t worry about how big your contributions are. Even the smallest things help. “Docs are the lowest hanging fruit,” says Alex Strick, “Checking that the ‘example code’… actually works [is a] great way to get to know a tool or a framework… [It’s] an easy win, and way to get to know the development team.”
By contributing to OSS you will learn how to work collaboratively on a code-base. You’ll see what CI/CD actually looks like in practice. You’ll learn how to read diffs and pull requests, report and track bugs, manage multiple branches, and do effective code review. All of those things are hard to learn when working on your own.
Plus, OSS contributions can be a powerful signal to hiring managers. SharpestMinds mentor, Farid, encourages all of his mentees to contribute to OSS. He is the co-creator of IceVision, an open-source computer vision framework. “Virtually every company uses version control systems,” he said, “Contributing to an open-source project shows the candidate is familiar with a version control.”
Thanks to SM members Ray, Rebeccca, Alex Strick, JV Kyle, Denys, and Farid for their comments that helped shape this post!