There’s some confusion surrounding the role of a data scientist. In 2001, William S. Cleveland published “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics.” 1 This paper was the first to merge the fields of statistics and computer science to create a new area of innovation called “data science.” At the same time, Leo Breiman published “Statistical Modeling: which described how statisticians should change their mindset and embrace a more diverse set of tools. These two papers created a foundation for data science, but it built on the field of statistics.
In 2008, some top data gurus from Facebook and LinkedIn got together to discuss their day-to-day challenges. They realized they were doing similar things. They saw their role as a crossover of many different disciplines.
They decided to call this role as a “data scientist.” A data scientist at this time was just a list of qualities. For example:
• Understand data
• Know statistics and math
• Apply machine learning
• Know programming
• Be curious
• to be a great communicator and hacker
They were renaissance enthusiasts who crossed over into many different fields. The problem is that this list of skills is not easily found in one person. Each of us is predisposed to certain areas based on our talents. We usually gravitate toward our talents, and then work to refine our craft. Learn more info at data science course
A statistician will often work to become a better statistician. A business analyst will work to refine his or her communication skills. There is also a lot of organizational pressure to specialize. Most large organizations are divided into functional areas. There’s some need for common understanding, but not always common expertise.
People are also notoriously bad at self-assessing their abilities. The famous Dunning Kruger study 3 found that people who rated themselves as highly skilled often dramatically overestimated their expertise. A gifted statistician may rate themselves as an excellent communicator, but you don’t need to be a good communicator to be a great statistician. A great statistician could easily have a long career even if he or she fumbles through presentations.
That’s why most organizations divide the work up into teams. Individuals on the team will have their areas of expertise. A cross-functional team doesn’t assume that everyone is an expert. Instead, it encourages individuals to learn from each other’s strengths and cover each other’s weaknesses. A team of data scientists might not be able to identify those weaknesses. The team will blindly fumble if there’s no one to identify blind spots.
I once worked for an organization that had a team of data scientists building out a cluster. There was some concern from the business because the higherups had no idea what the team was building—they were frustrated because they were paying for something they didn’t understand. For more Additional info at data science online training
I went to a few of the meetings. The team of data scientists demonstrated a simple MapReduce job. The business managers stared blankly at the screen and occasionally glanced at their smartphones. To an outsider, it seemed obvious from the yawns and eye rubbing that the team was not doing a great job communicating. After the meeting, I wrote a matrix on the whiteboard. I listed the following six skill sets:
• Machine learning
I asked the data scientists to rate how they felt they were doing in each of these areas from 1 to 10 (1 being poor and 10 being best) so we could look for areas to improve. I took that same list of skill sets and showed it to one of the business analysts. I asked them to rate the team.
Skill Set Data Scientists’ Ratings Business Analysts’ Ratings Data
Development 8 10
Machine learning 7 9
Statistics 8 9
Math 8 10
Communication 9 6
It was a classic Dunning Kruger result. In the places where the data scientists rated themselves as highly skilled, they dramatically overestimated their expertise. The data scientists all came from quantitative fields. They were statisticians, mathematicians, and data analysts. They couldn’t identify their blind spots. It took someone from an entirely different field to shine a light on their challenges.
If you’re part of a large organization trying to get value from data science, it would be a mistake to rely on a few superhero data scientists. Individuals who come from a similar background tend to share the same blind spots. Academic research shows that you often get better insights from a cross-functional team with varied backgrounds.
There is some wisdom in our eclectic organizational structures. People with marketing, business, and management backgrounds deserve their place at the data science table. It’s unrealistic to assume that key people with a quantitative background will have all the same questions and insights. Keep your team varied and you’re more likely to have great results.