Renowned Data Science Personalities

With the advancement of big data and artificial intelligence, the need for its efficient and ethical usage also grew. Prior to the AI boom, the main focus of companies was to find solutions for data storage and management. With the advancement of various frameworks, the focus has shifted to data processing and analytics which require knowledge of programming, mathematics, and statistics. In more popular terms, this process today is known as Data Science. Few names stand out and have a separate base of importance when the name data science comes into the picture, largely due to their contributions to this field and who have devoted their life and study to reinvent the wheel. Let’s talk about some of the best data scientists in the world.


Andrew Ng

Andrew Ng is one of the most prominent names among leaders in the fields of AI and Data Science. He is counted among the best machine learning and artificial intelligence experts in the world.  He is an adjunct professor at Stanford University and also the co-founder of Coursera. Formerly, he was the head of the AI unit in Baidu. He is also an enthusiast researcher, having authored and co-authored around 100 research papers on machine learning, AI, deep learning, robotics, and many more relevant fields. He is highly appreciated in the group of new practitioners and researchers in the field of data science. He has also worked in close collaboration with Google on their Google Brain project. He is the most popular data scientist with a vast number of followers on social media and other channels.

DJ Patil

The Data Science Man, DJ Patil, needs no introduction. He is one of the most famous data scientists in the world. He is one of the influencing personalities, not just in Data Science but around the world in general. He was the co-coiner of the term Data Science. He was the former Chief Data Scientist at the White House. He was also honored by being the former Head of Data Products, Chief Scientist, and Chief Security Officer at LinkedIn. He was the former Director of Strategy, Analytics, and Product / Distinguished Research Scientist at eBay Inc. The list just goes on.

DJ Patil is inarguably one of the top data scientists around the world. He received his PhD in Applied Mathematics from the ‘University of Maryland College Park’.

Kirk Borne

Kirk Borne has been the chief data scientist and the leading executive advisor at Booz Allen Hamilton since 2015. Working as a former NASA astrophysicist, he was part of many major projects. At the time of crisis, he was also called upon by the former President of the US to analyze data post the 9/11 attack on the WTC in an attempt to prevent further attacks. He is one of the top data scientists to follow with over 250K followers on Twitter.

Geoffrey Hinton

He is known for his astonishing work on Artificial Neural Networks. Geoffrey was the brain behind the ‘Backpropagation’ algorithm which is used to train deep neural networks. Currently, he leads the AI team at Google and simultaneously finds time for the ‘Computer Science’ department at the ‘University of Toronto’. His research group has done some overwhelming work for the resurgence of neural networks and deep learning.

Geoff coined the term ‘Dark Knowledge’.

Yoshua Bengio

Having worked with AT&T & MIT as a machine learning expert, Yoshua holds a Ph.D. in Computer Science from McGill University, Montreal. He is currently the Head of the Montreal Institute for Learning Algorithms (MILA) and also has been a professor at Université de Montréal for the past 24yrs.

Yann LeCun

Director of AI Research at Facebook, Yann has 14 registered US patents. He is also the founding director of NYU Center for Data Science. Yann has a PhD in Computer Science from Pierre and Marie Curie University. He’s also a professor of Computer Science, Neural Science and the Founding Director of the Data Science Center at New York University.

Peter Norvig

Peter Norvig is a co-Author of ‘Artificial Intelligence: A Modern Approach’ and ‘Paradigms of AI Programming: Case Studies in Common Lisp’, some insightful books for programming and artificial intelligence. Peter has close to 45 publications under his name. Currently the ‘Engineering Director’ at ‘Google’, he has worked on various roles in Computational Sciences at NASA for three years. Peter received his Ph.D. from the ‘University of California’ in ‘Computer Science.’

Alex “Sandy” Pentland

Named the ‘World’s Most Powerful Data Scientist’ by Forbes, Alex has been a professor at MIT for the past 31 years. He has also been a chief advisor at Nissan and Telefonica. Alex has co-founded many companies over the years some of which include Home, Sense Networks, Cogito Corp, and many more. Currently, he is on the board of Directors of the UN Global Partnership for Sustainable Data Development.

These are some of the few leaders from a vast community of leaders. There are many unnamed leaders whose work is the reason why you have recommender systems, advanced neural networks, fraud detection algorithms, and many other intelligent systems that we seek help to fulfill our daily needs.

Artificial Intelligence vs Machine Learning vs Deep Learning

Artificial Intelligence, Machine Learning, and Deep Learning are one of the most prominent topics in the domain of technology at the present. Although the three terminologies Artificial Intelligence, Machine Learning, and Deep Learning are used interchangeably, are they really the same? Every technophile is stuck at least once in the beginning whenever there is a mention of artificial learning vs machine learning vs deep learning. Let us try to find out how actually these three terms differ.

The easiest way to think of the relationship between the above terms is to visualize them as concentric circles using the concept of sets with AI — the idea that came first — the largest, then machine learning — which blossomed later, and the most recent being deep learning — which is driving today’s AI explosion — fitting inside both.

Graphically this relation can be explained as in the picture below.

As you can see in the above image consisting of three concentric circles, Deep Learning is a subset of ML, which is also a subset of AI. This gives an idea that AI is the all-encompassing concept that initially erupted, which was then followed by ML that thrived later, and lastly, Deep Learning that is promising to escalate the advances of AI to another level.

Starting with AI, let us have a more in-depth insight into the following terms.

Artificial Intelligence

Intelligence, as defined by Wikipedia, is “Perceiving the information through various sources, followed by retaining them as knowledge and applying them with real-life challenges.” Artificial intelligence is the science that deals with machines that are programmed to think and act like humans. By Wikipedia, it is defined as the simulation of human intelligence in machines using programs and algorithms.

Machines built on AI are of two types – General AI and Narrow AI

General AI refers to the machines capable of using all our senses. We’ve seen these General AI in Sci-Fi movies like The Terminator. In real life, a lot of work has been done on the development of these machines; however, more research is yet to be done to bring them into existence.

What we CAN do falls in the hands of “Narrow AI”. These refer to the technologies that can perform specific tasks as well as, or better than, we humans can. Some examples are – classifying emails as spam and not spam and facial recognition on Facebook. These technologies exhibit some facets of human intelligence.

Where does that intelligence come from? That brings us to our next term -> Machine Learning.

Machine Learning

Learning, as defined by Wikipedia, is referred to as “acquiring information and finding a pattern between the outcome and the inputs from the set of examples given.” ML intends to enable artificial machines to learn by themselves using the provided data and make accurate predictions. Machine Learning is a subset of AI. More importantly, it is a method of training algorithms such that they can learn to make decisions. (ReadAI and ML. Are they one and the same?)

Machine learning algorithms can be classified as supervised and unsupervised depending on the type of problem being solved. In Supervised learning the machine is trained using data which is well labelled that is, some data is already tagged with the correct answer while in unsupervised learning the machine is trained using the information that is neither classified nor labelled and the algorithm is supposed to find a solution to it without guidance. Also, a term called semi-supervised learning exists in which the algorithm learns from a dataset that includes both supervised and unsupervised data.

Training in machine learning requires a lot of data to be fed to the machine which then allows the machine (models) to learn more about the processed information.

Deep Learning 

Deep Learning is an algorithmic approach for the early machine-learning crowd. Neural Networks from the base for Deep Neural Learning and is inspired by our understanding of the biology of the human brain. However, unlike a biological brain where any neuron unit can connect to any other neuron unit within a certain physical distance, these artificial neural networks (ANN) have discrete layers, connections, and directions of data propagation.

For a system designed to recognize a STOP sign, a neural Network model can come up with a “probability score”, which is a highly educated guess, based on the algorithm. In this example, the system might be 86% confident the image is a stop sign, 7% convinced it’s a speed limit sign, and 5% it’s a kite stuck in a tree, and so on.

A trained Neural Networks is one that has been analyzed on millions of samples until it is sampled so that it gets the answer right practically every time.

Deep Learning can automatically discover new features to be used for classification. Machine Learning, on the other hand, requires to be provided these features manually. Also, in contrast to Machine Learning, Deep Learning requires high-end machines and considerably significant amounts of training data to deliver accurate results.

Wrapping up, AI has a bright future, considering the development of deep learning. At the current pace, we can expect driverless vehicles, better recommender systems, and more in the forthcoming time. AI, ML, and Deep Learning (DL) are not very different from each other; but are not the same.

A Short History of Data Science

Over the past two decades, tremendous progress has been made in the field of Information & Technology. There has been an exponential growth in technology and machines. Data and Analytics have become one of the most commonly used words since the past decade. As they are interrelated, it becomes essential to know what is the relation between them and how are they evolving and reshaping businesses.

Data Science was officially accepted as a study since the year 2011; the different or related names were being used since 1962.

There are six stages in which the development of Data Science can be summarised-

Stage 1: Contemplating about the power of Data
This stage witnessed the uprising of the data warehouse where the business and transactions were centralised into a vast repository. This period was embarked at the beginning of the 1960s. In 1962, John Tukey published the article The Future of Data Analysis – a source that established a relation between statistics and data analysis. In 1974, another data enthusiast, namely Peter Naur, gained popularity for his article namely Concise Survey of Computer Methods. He further coined the term “Data Science” which came into existence as a vast field with lot many applications in the 21st century.

Stage 2: More research on the importance of data
This period was witnessed as a period where businesses started research for finding the importance of collecting vast data. In 1977, the International Association of Statistical Computing (IASC) was founded. In the same year, Tukey published his second major work – “Exploratory Data Analysis” – arguing that emphasis should be laid on using data to suggest the hypothesis for testing and simultaneous exploratory testing for confirmatory data analysis. The year 1989 saw the establishment of the first workshop on Data Discovery which was titled Knowledge Discovery in Databases(KDD) which is now more popularly known as the annual ACM SIGKDD Conference on Knowledge Discovery and Data Mining(KDD).

Stage 3: Data Science gained attention
The early forms of markets began to appear during this phase. Data Science started attracting the attention of businesses. The idea of analysing data was sold and popularised. The Business Week cover story from the year 1994 which was titled ‘Database Marketing” supports this uprise. Businesses started to witness the importance of collecting and applying data for their profit. Various companies started stockpiling massive amounts of data. However, they didn’t know what and how to use it for their benefit. This led to the beginning of a new era in the history of Data Science.

The term Data Science was yet again taken in 1996 in the International Federation of Classification Societies(IFCS) in Kobe, Japan. In the same year, Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth published “From Data Mining to Knowledge Discovery in Databases”. They described Data Mining and stated “Data mining is the application of specific algorithms for extracting patterns from data.

The additional steps in the KDD process, such as data preparation, data selection, data cleaning, incorporation of appropriate prior knowledge, and proper interpretation of the results of mining, became essential to ensure that useful knowledge is derived from the data.

Stage 4: Data Science started being practised
The dawn of the 21st century saw significant developments in the history of data science. Throughout the 2000s, various academic journals began to recognise data science as an emerging discipline. Data science and big data seemed to work ideally with the developing technology. Another notable figure who contributed largely to this field is William S. Cleveland. He co-edited Tukey’s collected works, developed valuable statistical methods, and published the paper “Data Science: An Action Plan for Expanding the Technical Areas of the field of Statistics”.

Cleveland put forward his notion that data science was an independent discipline and named six areas where data scientists should be educated namely multidisciplinary investigations, models and methods of data, computing with data, pedagogy, tool evaluation, and theory.

Stage 5: A New Era of Data Science
Till now, the world has seen enough of the advantages of analysing data. The term data scientist is attributed to Jeff Hammerbacher and DJ Patil as they carefully chose the word. A buzzword was born. The term “data science” wasn’t prevalent yet, but was made incredibly useful and significantly developed. In 2013, IBM shared the statistics that 90% of the world’s data has been created in the last two years alone. By this time, companies had also begun to view data as a commodity upon which they could capitalise. The importance of transforming large clusters of data into usable information and finding usable patterns gained emphasis.

Stage 6: Data Science in Demand
The major tech giants saw significant developments in demand for their products after applying data science. Apple laid out a statement for increased sales giving credit to BigData, and Data Mining. Amazon said that it sold more Kindle online books than ever. Companies like Google, Microsoft used deep Learning for speech and Voice Recognition. Using AI techniques, the usage of data was further enhanced. Data became so precious; companies started collecting all kinds of data from all sorts of sources.

Putting it all together, data science didn’t have a very prestigious beginning and was ignored by the researchers, but once its importance was adequately understood by the researchers and the businessmen, it helped them gain a large amount of profit.

What is Natural Language Processing?

What is Natural Language Processing (NLP)?

Natural Language Processing commonly abbreviated as NLP is a subfield of computer science and artificial intelligence. It is mainly concerned with the interaction between computers and the languages humans speak, like English, Italian, French, among various others. It is used in particular to program machines to process and analyze large amounts of natural language data.

The development of NLP applications is quite challenging because computers traditionally require human beings to communicate to them through a programming language or a high-level language. Human speech, however, is not always precise, is often ambiguous and is dependent on factors like the emphasis on a particular word or expression. These are the factors that the computer finds very difficult to understand.

How does Natural Language Processing work?

Syntax and Semantic analysis are two main techniques that are used with NLP. The  Syntax is the arrangement of words in a sentence to make some grammatical sense. Different Syntax methods used are:

  • Parsing
  • Word segmentation
  • Sentence breaking
  • Morphological segmentation and
  • Stemming

The Semantic involves the use and meaning behind the words. NLP applies the algorithms to understand the grammar and meaning of the sentences. The techniques used by NLP in semantic Analysis are:

  • Named Entity Recognition
  • Natural Language Generation

The current approaches to NLP are mainly based on Deep Learning, which is a type of AI that examines and uses the patterns in data to improve programs understanding. It is basically dependent on supervised learning, which consists of a training set and a test set.

Three tools very commonly used for NLP are NLTK, Gensim, and Intel NLP Architect. Natural Language Toolkit(NLTK), is an open-source python module with data sets and tutorials. Gensim is a Python library for topic modeling and document indexing. Intel NLP Architect is also another Python library for deep learning topologies and techniques.

What are the Uses of Natural Language Processing?

Although, NLP came into existence for the first time by Alan Turing when he published an article titled “Computer Machinery and Intelligence,”. The vast use came into effect only from the 80s, after the introduction of Machine Learning. Before 1980, the most natural language processing systems were based on complex sets of hand-written rules. Writing these rules included a lot of labor and was inaccurate due to diversity in the pronunciation of a language. Introduction of Machine Learning speeded up the development of Natural Language Processing.

Natural language Processing is very widely used today in our daily routine. It finds its application in:

  • Chatbots – Chatbots handle various clients and answer their query without considerable human effort. Chatbots are trained on a vast set of data and hence process only the essential part from a conversation. Companies like Uber, Zomato use the Chatbots to minimize human involvement.
  • Voice Assistants – The most significant use of NLP is implemented for this purpose. Technological giants like Google, Microsoft, Amazon, etc. use their own personal voice assistant to help in communicating with smart devices quickly. Amazon assigns over 1000 personnel globally for enhancing its voice assistant.
  • Very brilliant use of NLP came with the name Grammarly, which is a tool that keeps on a check on writers’ write-ups, and points out grammatical errors and suggests better phrases.
  • Google translate also uses NLP to translate a webpage from one language to another by understanding its content.

 

What are the Challenges faced by NLP?

NLP, though a new technology with a lot of advantages isn’t completely developed. For Example – Semantic and Grammar Analysis is still a challenge for NLP. Other difficulties includes, NLP not relating to sarcasm easily, since NLP cannot figure the changing meaning of words on the basis of speaker emphasis. NLP is also challenged by the fact that the dialect of people changes with regions.

On a final note, Natural Language Processing is a very handy tool, although it is in the developing state and faces some difficulties. The recent development in the NLP has made it a gem for the Technological Giants. The future of NLP through Machine and Deep Learning seems quite bright.