Data science has been attracting a lot of attention recently. The potential in using data science and Artificial Intelligence to analyze data is the fuel of the next industrial revolution. The question that remains is how hard it is to join the "party". What kind of skills are needed to practice data science? In this article, we will discuss the barriers to entering that field.
It is common practice to split the background knowledge data scientists need into three parts:
Computer skills/IT skills – this is a relatively "easy" demand. Computer skills such as SQL and scripting are becoming "common knowledge", and a rather large portion of the data science community possesses those skills. However, these skills go together with big data, and if your IT skills are not good enough, every stage of the work could take a long time.
Domain/Business knowledge – while this sounds easy, finding a good analyst who knows to ask the right questions rather than just answering them is rare. This skill requires you to understand your data better than anyone else and is very domain-specific, making it harder to utilize across different disciplines.
Math and Statistics knowledge – this is a very challenging requirement. Even basic algorithms require an understanding of linear algebra and calculus at a university level. These are courses most people overlook or underestimate in their first year of university.
Math, or better yet "scientific" thinking, is also required for designing good and correct data experiments. Good experiment design comes with a thesis or idea of what can be done with the data. Creating an experiment or simulation to prove a thesis is not an easy task and is hardly ever taught in any course. It means you can only learn it on the job.
A good data scientist is considered to have all these skills combined - a true "unicorn" in the job market out there. Even then, each data scientist will usually fit only one domain.
The paradigm must change. For machine learning to take place properly, a combined mathematical and IT team needs to build the data channels for the domain expert. It is the domain expert, not the data scientist, who needs to clean the data, analyze it, and then "teach" the machine.
Comentarios