Building a portfolio for Data Scientists
If you have interviewed for a data scientist role, remember this question: “Tell me about a project you had recently…”. This question seeks to explore or understand some of your work in data science. But there is a better question that could be asked: “Tell me more about that Fake news detection project you have in your portfolio.” The first question seeks to get a general understanding of projects you have worked on, while the latter seeks specifics. So, how do you get to this point to be asked specific questions? By having a portfolio. A data science portfolio highlights projects you have worked on and the specifics about each of them. It showcases your skills while capturing different areas of data science, such as communication, coding, and documentation.

Do Data scientists need a portfolio?
The data science learning path is complex. It involves learning many skills and the need to showcase them. A portfolio comes in handy in doing that. Additionally, If you are looking for a job in data science, you need to showcase your skills. A data science portfolio will impress the hiring manager and show them what skills you are likely to bring to their company. It also increases your chances of being shortlisted for the interview. Given these factors, building a strong data science portfolio is essential for a career in data science. So, let’s discuss some of the essentials that will impress the hiring manager and give you an edge in the recruitment process.
Elements of a good portfolio
The essential elements of a good data science portfolio are based on what skills you want to communicate. As a data scientist, communication skills around Exploratory Data Analysis, Data Preparation, Data Gathering, Data Cleaning, Visualization, and Modeling are integral to your career development. But that’s not all; your portfolio also needs to highlight aspects such as your ability to develop code that is clean and adheres to software development standards. You also need to showcase your documentation and communication skills. As a data scientist, you will need to communicate your findings to stakeholders; therefore, you need to demonstrate this in your portfolio.
While highlighting your skills, the portfolio must demonstrate the EDA skills mentioned above. Tools such as pandas, NumPy, and matplotlib are essential to show basic knowledge in the field. Discuss the tool and provide the projects that show how you can use those tools. Have projects in your portfolio with different use cases and varied industries to ensure that you use most skills that are native or basic to data science. Host the projects on notebooks or colab. This shows your skills in the different environments necessary for data science.
Additionally, your grasp of machine learning types and their applications using libraries such as scikit-learn is important. Have at least one project that uses supervised and unsupervised learning. This shows that you grasp the machine learning and modeling part of the data science process.
Besides skills, a good portfolio also needs a good profile. The profile should highlight your goals, the industries you are interested in working in, and your outlook on the data ecosystem. In the same case, the profile needs to be concise and direct while providing room for a good discussion. The portfolio design also needs to be professional to ensure it does not cost you the shortlisting for the job you are applying to. Make the portfolio eye-catching, using colors that don’t strain the audience’s eyes and neat, clear, and precise visualizations.
Projects to include
The range of projects to have in your data science portfolio varies based on your goals and the industry you are interested in. However, different projects will greatly enhance your chances of being shortlisted or contacted by a recruiter from your portfolio.
Exploratory Data Analysis
EDA is an important part of any data science project; therefore, a project that covers the full EDA process and highlights data wrangling and cleaning is very important. The process allows you to highlight almost all the necessary steps in data science, including data exploration, visualization, cleaning, and documentation. For documentation, I refer to the comments most developers leave. While it is instinctively likely that you will forget commenting your code and explaining the steps you are taking, a project highlight
EDA gives you a chance to show that you can communicate your process while doing the work needed. I have included data cleaning here since, when you explore the data, you will likely come across some missing data or other data quality issues that require addressing. This will allow you to show your skills in the same way.
Data gathering
In most cases, data gathering is left to data engineers. However, as a data scientist, you need to know how to gather the necessary data for your project. Having a project that highlights these skills, especially where you can use API tools, web scraping, and SQL, is highly important. The recruiter needs to know that you can get the data and use it for a given project when needed. A data-gathering project also allows you to show that you know what kind of data is needed for specific data science projects.
Machine learning
When working as a business-side data scientist, you will, in most cases, need to create models to experiment on certain decisions or activities in the organization. This is where machine learning comes in. As a data scientist, you need the skills to apply machine learning to your data and get results that you can use to make decisions. Having such a project in your portfolio is highly important.
Visualization
Visualization comes in different formats. Creating dashboards, for instance, is important for client-facing data scientists. You need the skills to create dashboards that communicate the necessary information to clients and stakeholders. Highlighting skills in Looker Studio, Tableau, and PowerBI is important. Similarly, it is integral to highlight the use of libraries such as seaborn, matplotlib, and plotty in your data science portfolio. These show the recruiter that you have the necessary skills in visualization and data storytelling, which are parts of the communication step in a data science project.
Once you have your portfolio ready, the question is, where do you showcase it?
Platform to showcase your portfolio
GitHub provides both a version control platform and free hosting for your portfolio. Hosting your portfolio on GitHub has its advantages. First, it shows that you can use it, which is essential for version control, and have a freely accessible web page to showcase your skills. Using GitHub as your portfolio platform also allows you to send it out, attach the link to your data science portfolio on your social media pages, or send it to anyone who wants to know about your skills. A portfolio is essentially your online resume that can be dynamic. Links to your projects on the GitHub portfolio can point to your repositories. This ensures that the recruiter can view your code and the improvements or changes to your data from projects over time, making it a very important platform for any data scientist.
Data science is a dynamic, ever-evolving field. As a result, you must constantly learn and gain new skills. LinkedIn offers a chance to showcase what you are learning and the skills you have gained. Using LinkedIn to highlight your learning process or new skills shows recruiters that you can be what they are looking for. It also allows you to interact with and network with other people learning in the same field and see what they are learning.
Kaggle
Kaggle provides a platform for practicing and working on data-related projects. As a data scientist, having a diverse set of projects, datasets, and contributions on Kaggle can help build your reputation in the industry. It also offers a good opportunity for your portfolio, as it hosts your projects for free and encourages comments from the community. Since Kaggle also provides a profile that links to the projects and comments you have made in the community, it is a good place to showcase your portfolio while also showing your collaborative skills.
Medium
One essential skill in data science is communication. You are one foot closer to being hired if you can show you can communicate your findings or process in data science projects. Medium offers a free blogging platform for anyone in the tech industry. As a beginner, you can share your learning or projects on medium to ensure they reach and are accessible to anyone you provide the link. Additionally, it attracts comments from industry experts and can be a good place to get hired as a beginner or an expert in data science.
Conclusion
When building your data science portfolio, focus on the skills and tools integral to data science projects. As a beginner, it is vital to showcase your work and skills with different tools. While there are other choices that you will make, such as choosing which language to focus on and which machine learning libraries to specialize in. However, the different steps in a data science project are the same no matter what language you use. In this article, I have highlighted the tools and libraries needed when using Python. Still, the guideline can be applied to any other language that you might want to use.