Best Practices

Data science is a discipline that requires integrity, clarity, and responsibility. Below we provide some key practices that every data scientist should follow in their work.

Understanding Your Data

The first step in data science is to understand your data. This includes knowing where your data is coming from and what it represents. For example, a FiveThirtyEight article misinterpreted kidnapping data from Nigeria because the data represented the number of news reports rather than the actual events. This underlines the importance of understanding the context of your data. Before starting your analysis, make sure to ask yourself what you’re aiming to achieve and what you hope to convey to your readers.

Data Analysis

Throughout your analysis, it’s essential to “interview” your data to understand the who, what, when, where, why, and how. Integrate context into your data without discarding the original data. When using data, distinguish between mean and median. If there are outliers, use the median; if the distribution is normal, use the mean. Look for patterns and outliers that tell a story, and use anecdotes to enhance your storytelling.

Data Visualization

Visualization is a powerful tool in data science. Striking a balance between clarity and exploration is key. The viewer should be able to understand the main point of your visualization intuitively, but also have the option to explore further if they wish. Use color and style to communicate a mood. When choosing charts, remember that line charts show trends over time, bar charts are best for categorical data, and pie charts show parts of a whole and should only be used for 2-3 categories.

Writing With Data

In writing, maintain the open data mission. Make sure you explain where your data comes from, what it contains, and how you extracted insights from it. Use layman’s terms when possible, and explain specialized concepts. This not only ensures your readers trust you but also makes your work accountable. When writing, remember to strike a HODP tone: it’s not an academic paper, nor a blog, but a journalistic middle ground.

Choosing Your Project

The best practice for working on a data science project is to choose a topic that is meaningful to you. Ask yourself what you have always wondered about the subject matter and what you wish you knew. Stick to the data, but also tell a story. Be sure to state your hypotheses and back up your insights with rigorous statistical analysis. Group your results by subject and explain why they are important.

In conclusion, the best practices in data science involve understanding your data, analyzing it correctly, visualizing it effectively, writing with clarity, and choosing meaningful projects. These practices will lead to successful projects that convey clear, honest, and transparent insights.