If you are like me, there is a good chance that you are confused as well about the most recent terminology to use in the field of data science … pardon, artificial intelligence … no, I mean data science. No, I mean artificial intelligence. Please, somebody tell me what I should call it and what the difference is!
Isn’t artificial intelligence just a new cool name to label the old traditional data science? Don`t both concepts cover the same algorithms? And isn’t it all machine learning anyway? This is what I used to think until I took a pause to write this post.
During this breather, I went back in time and tried to remember all the names that used to be used to label this field of what essentially is data analytics. Let’s see …
At the beginning of the 90s, I started working on my master’s thesis on neural networks and statistical algorithms. Statistical algorithms at that time mainly referred to Naive Bayes. Neural networks mainly meant the Backpropagation algorithm. C4.5 had not been published yet.
And the cool name to use was machine learning.
The stress was on “learning” — algorithms that were able to learn by themselves how to implement a certain task, just from a set of data. The whole enthusiasm for the new technology relied on this self-learning feature, which was sharply in contrast with previous expert systems where knowledge rules were manually translated from human expertise.
More or less a decade later, the name data mining started to appear to designate the same set of machine learning algorithms. In the meantime, a few new self-learning algorithms had been proposed, for example, the whole family of decision trees (ID3, C4.5, CART).
Data mining was defined as the process of discovering patterns in large data sets, usually with the usage of machine learning algorithms. The focus then had moved from the algorithms and their self-learning property to the data and the knowledge that can be extracted from them.
A few years later, with the explosion of big data and parallel computation, the term data science popped up. Notice that the definition of data science is very similar to the definition of data mining since it still uses machine learning algorithms to extract knowledge from data.
In the meantime, cheaper and more spacious storage devices made it possible to work with large amounts of data. Thus, big data — within data science — introduced much larger data sets and faster machine learning algorithms able to parallelize and speed up the computational process. Apache Spark, in particular, offered a number of parallelized machine learning algorithms for data analytics inside its MLlib library.
So, we can say that the “science” in data science referred to the ability to deal with massive amounts of data and at a faster speed than in the previous data mining decade.
So, let’s see what has changed recently in the data analytics world to justify the need for a new name: deep learning. Deep learning represents the modern evolution of neural networks. Faster machines and again parallel computation have paved the road for more sophisticated neural layers and more complex neural architectures. Examples of this are the networks dedicated to image processing, text processing, and, more in general, time series analysis.
Complex predictions based on past characters, words, or images are the great innovation of deep learning architectures. In the combined field of text processing and time series analysis, especially, a few new tasks have become possible, for example, the generation of free text.
I believe that these new, more creative skills — different from traditional classification or automation of repetitive tasks — have earned the term “intelligence” for this field.
To Conclude …
I am still not sure if it is worth it to change a name every few years to describe machine learning algorithms applied to data. However, I am starting to see the little shifts in topic focus that have happened over time that have produced (a need for) the new name.
Moving from self-learning algorithms to knowledge extraction from data and then from extending the algorithms to work faster on larger data sets through to claiming a more creative role for the latest generation of machine learning models, these shifts in focus over the years seem to have been the trigger for the need of a new name for the same old field of data analytics.
Let’s go back to our work now, whatever that might be: machine learning, data mining, data science, or artificial intelligence.
As first published in BetaNews.