A Short History of Data Science

Over the past two decades, tremendous progress has been made in the field of Information & Technology. There has been an exponential growth in technology and machines. Data and Analytics have become one of the most commonly used words since the past decade. As they are interrelated, it becomes essential to know what is the relation between them and how are they evolving and reshaping businesses.

Data Science was officially accepted as a study since the year 2011; the different or related names were being used since 1962.

There are six stages in which the development of Data Science can be summarised-

Stage 1: Contemplating about the power of Data
This stage witnessed the uprising of the data warehouse where the business and transactions were centralised into a vast repository. This period was embarked at the beginning of the 1960s. In 1962, John Tukey published the article The Future of Data Analysis – a source that established a relation between statistics and data analysis. In 1974, another data enthusiast, namely Peter Naur, gained popularity for his article namely Concise Survey of Computer Methods. He further coined the term “Data Science” which came into existence as a vast field with lot many applications in the 21st century.

Stage 2: More research on the importance of data
This period was witnessed as a period where businesses started research for finding the importance of collecting vast data. In 1977, the International Association of Statistical Computing (IASC) was founded. In the same year, Tukey published his second major work – “Exploratory Data Analysis” – arguing that emphasis should be laid on using data to suggest the hypothesis for testing and simultaneous exploratory testing for confirmatory data analysis. The year 1989 saw the establishment of the first workshop on Data Discovery which was titled Knowledge Discovery in Databases(KDD) which is now more popularly known as the annual ACM SIGKDD Conference on Knowledge Discovery and Data Mining(KDD).

Stage 3: Data Science gained attention
The early forms of markets began to appear during this phase. Data Science started attracting the attention of businesses. The idea of analysing data was sold and popularised. The Business Week cover story from the year 1994 which was titled ‘Database Marketing” supports this uprise. Businesses started to witness the importance of collecting and applying data for their profit. Various companies started stockpiling massive amounts of data. However, they didn’t know what and how to use it for their benefit. This led to the beginning of a new era in the history of Data Science.

The term Data Science was yet again taken in 1996 in the International Federation of Classification Societies(IFCS) in Kobe, Japan. In the same year, Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth published “From Data Mining to Knowledge Discovery in Databases”. They described Data Mining and stated “Data mining is the application of specific algorithms for extracting patterns from data.

The additional steps in the KDD process, such as data preparation, data selection, data cleaning, incorporation of appropriate prior knowledge, and proper interpretation of the results of mining, became essential to ensure that useful knowledge is derived from the data.

Stage 4: Data Science started being practised
The dawn of the 21st century saw significant developments in the history of data science. Throughout the 2000s, various academic journals began to recognise data science as an emerging discipline. Data science and big data seemed to work ideally with the developing technology. Another notable figure who contributed largely to this field is William S. Cleveland. He co-edited Tukey’s collected works, developed valuable statistical methods, and published the paper “Data Science: An Action Plan for Expanding the Technical Areas of the field of Statistics”.

Cleveland put forward his notion that data science was an independent discipline and named six areas where data scientists should be educated namely multidisciplinary investigations, models and methods of data, computing with data, pedagogy, tool evaluation, and theory.

Stage 5: A New Era of Data Science
Till now, the world has seen enough of the advantages of analysing data. The term data scientist is attributed to Jeff Hammerbacher and DJ Patil as they carefully chose the word. A buzzword was born. The term “data science” wasn’t prevalent yet, but was made incredibly useful and significantly developed. In 2013, IBM shared the statistics that 90% of the world’s data has been created in the last two years alone. By this time, companies had also begun to view data as a commodity upon which they could capitalise. The importance of transforming large clusters of data into usable information and finding usable patterns gained emphasis.

Stage 6: Data Science in Demand
The major tech giants saw significant developments in demand for their products after applying data science. Apple laid out a statement for increased sales giving credit to BigData, and Data Mining. Amazon said that it sold more Kindle online books than ever. Companies like Google, Microsoft used deep Learning for speech and Voice Recognition. Using AI techniques, the usage of data was further enhanced. Data became so precious; companies started collecting all kinds of data from all sorts of sources.

Putting it all together, data science didn’t have a very prestigious beginning and was ignored by the researchers, but once its importance was adequately understood by the researchers and the businessmen, it helped them gain a large amount of profit.

Ethical issues in Artificial Intelligence – Problems and Promises

With the growth of Artificial Intelligence (AI) in the 21st century, the ethical issues with AI grow in importance along with the growth in the technology. Typically, ethics in AI is divided into Robo-ethics and Machine-ethics. Robo-ethics is a concern with the moral behaviour of humans as they design and construct artificially intelligent beings, while Machine-ethics relates to the ethical conduct of artificial moral agents (AMAs). In the modern world today, the countries are stockpiling weapons, artificially intelligent robots and other AI driven machines. So, analysing risks of artificial intelligence like whether it will overtake the major jobs and how can its uncontrolled and unethical usage can affect the humanity also becomes important. And to prevent humanity from the ill-effects and risks of artificial intelligence, these ethics were coined.

AI and robotics are unarguably one of the major topics in the field of artificial intelligence technology. Robot Ethics or more popularly known as roboethics is the morality of how humans interact, design, construct, use, and treat robots. It considers how artificially intelligent beings (AIs) may be used to harm humans and how they may be used to benefit humans. It emphasizes the fact that machines with artificial intelligence should prioritize human safety above everything else and keeping human morality in perspective.

Can AI be a threat to human dignity?

It was the first time in 1976 when a voice was raised against the potential ill-effects of an artificially developed being. Joseph Weizenbaum argued that AI should not be used to replace people in position that require respect and care, such as:

  • A customer service representative
  • A therapist
  • A soldier
  • A Police Officer
  • A Judge

Weizenbaum explains that we require authentic feelings of empathy from people in these positions. If machines replace them, they will feel alienated, devalued, and frustrated. However, there are voices in support of AI when it comes to the matter of partiality, as a machine would be impartial and fair.

Biases in AI System

The most widespread use of AI in today’s world is in the field of voice and facial recognition and thus AI bias cases are also increasing.  Among many systems, some of them have real business implications and directly impact other people. A biased training set will result in a biased predictor. Bias can always creep into algorithms in many ways and it poses one of the biggest threats in AI. As a result, large companies such as IBM, Google, etc. have started researching and addressing bias.

Weaponization of Artificial Intelligence

As questioned in 1976 by Weizenbaum for not providing arms to robots, there stemmed disputes regarding the fact whether robots should be given some degree of autonomous functions.

There has been a recent outcry about the engineering of artificial intelligence weapons that have included ideas of a robot takeover of humanity. In the near future of AI, these AI weapons present a type of danger far different from that of human-controlled weapons. Powerful nations have begun to fund programs to develop AI weapons.

If any major military power pushes ahead with the AI weapon development, a global arms race is virtually inevitable, and the endpoint of this technological trajectory is obvious: autonomous weapons will become the Kalashnikovs of tomorrow“, are the words of a petition signed by Skype co-founder Jaan Tallinn, and many MIT professors as additional supporters against AI Weaponry.

Machine Ethics or Machine Morality is the field of research concerned with designing of Artificial Moral Agents (AMAs), robots and artificially intelligent beings that are made to behave morally or as though moral. The sci-fi director Isaac Asimov considered the issue in the 1950s in his famous movie – I-Robot. It was here that he proposed his three fundamental laws of machine ethics. His work also suggests that no set of fixed laws can sufficiently anticipate all possible circumstances. In 2009, during an experiment at the Laboratory of Intelligent Systems in the Polytechnique Fédérale of Lausanne, Switzerland, robots that were programmed to cooperate eventually learned to lie to each other in an attempt to hoard the beneficial resource.

Concluding, Artificial Intelligence is a necessary evil. Artificial Intelligence-based beings (friendly AIs) can be a gigantic leap for humans in technological development. It comes with a set of miraculous advantages. However, if fallen into the wrong hands, the destruction can be unimaginable and unstoppable.  As quoted by Claude Shannon, “I visualize a time when we will be to robots what dogs are to humans, and I’m rooting for the machines.”Thus ethics in the age of artificial intelligence is supremely important.

Video surveillance and video analytics

Video Analytics was invented with a motive – to help in reviewing the growing hours of surveillance video that a security guard or a system manager (or a human) may never have time to watch. Video Surveillance systems equipped with Video Analytics can help us in finding those minor details that can’t be perceived by naked eyes. Video Analytics or Video Content Analysis is computerized video footage analysis that uses algorithms to differentiate between object types and identify specific behavior or action in real-time, providing alerts and insights to users. Since Video Analytics is based on the technology of Artificial Intelligence, experience plays a significant role. A highly trained model can see through very minute details in video footage.

This technical capability is being used in a wide range of domains, including entertainment, health-care, retail, transport, home automation, flame, and smoke detection, safety, and security.

Video Analytics relies on useful video input. To make the video useful, following techniques are implemented for increasing the quality of the video recorded:
1. Video Denoising
2. Image Stabilisation
3. Unsharp Masking
4. Super-Resolution

What are the Commercial Applications of Video Analytics?

CCTV Systems – This is the most widespread application of Video Analytics. VCA(Video Content Analysis) is distributed on the cameras (at the edge) or centralized on dedicated processing systems. These CCTV’s, for example, can be used to detect and report any suspicious activities of shoppers in a store. Another popular example, is the PIDS(Perimeter Intrusion Detection System). It is deployed in areas whose perimeter can extend to a large radius, like airports, seaports, and the railways. With this technology, we are able to track any intrusions in real-time, giving us sufficient time to react.

  • Traffic Systems – Deployment of Video Analytics on busy squares in crowded cities of the world can be a massive time-saver for the people and the government. At peak times of the day, when the traffic is very high, specialized use of analytics can be used to avoid congestion.

 

  • Counter-Flow Detection – Walking against the flow in specific locations, such as the airport security checkpoint and gates, can be a sign of some danger. It can potentially result in terminal shutdowns. The wrong entry of vehicles in a one-way can lead to congestion affecting a large number of people. These faults can be quickly responded to by the uses of video Analytics.
  • Suspect Search – The data of facial-recognition can be aggregated along with video Analytics for the detection of criminals at high-security places like the airports’ immigration counter, the baggage collection facility, taxi stands, etc. This can lead to the smooth and swift arrest of such people or elimination of such objects. Time is the essence when looking for something critical.

 

  • Long Queue Problems at the Shopping Centres – In the densely populated countries like China, India, the crowd increases significantly in stores during the festive season. Trends in data can be used to analyze the crowd and arrange for particular changes for a short duration of time, increasing the store efficiency, and saving the time of the people.
  • Reducing Retail Shrinkage – Retail and logistics companies can use video surveillance analytics to minimize inventory loss significantly. The model is trained to detect unusual activities like unexpected times of presence, unauthorized access, or any suspicious movement of inventory and more.

 

  • Improving Patient Satisfaction – Video analytics can help hospitals and dispensaries to improve the overall patient experience. Artificially engineered cameras can continuously monitor patients waiting to meet the doctor and ensure they are checked-in within a given time duration. Even an alert can be sent to the staff regarding a patient who has been left unattended for a long time.

Video Analytics is the smart way of engaging customers, reducing wastage of time and improve security. Video data collected is massive and it would be practically impossible for a human to replace a computer. With the fast pace of life and the amount of video content today, using video analytics is a lifesaver for different fields.

Everything you need to know about Automated Machine Learning

What is Automated Machine Learning?

It is the term used for the technology automating the end-to-end process of applying machine learning to real-world problems. A typical machine learning problem requires a dataset that consists of some input data on which a training model is needed to be built. The input data may not be in such a form that all machine learning algorithms may be applied to it. An ML expert needs to implement the appropriate procedures (including data pre-processing steps, feature scaling, feature extraction), resulting in a dataset suitable for machine learning. Building the model involves the selection of the best algorithm for maximizing performance from the dataset. Many of these steps are often beyond the abilities of non-experts. Considering this in mind, AutoML was proposed as an Artificial Intelligence-based solution to the gruesome challenge of applying machine learning. AutoML in machine learning using python and r thus started gaining popularity. (Read – AI and ML. Are they one and the same?)

What is the Need for AutoML?

The idea of AutoML took off with the development in the field of Artificial Intelligence. It all took shape when Jeff Dean, Google’s Head of AI, suggested that “100x computational power could replace the need for machine learning expertise”. This raised several questions:

Do hundreds of thousands of developers need to “design new neural nets for their particular needs,” or is there an effective way for Neural Networks to generalize similar problems? Or can a large amount of computation power replace machine learning expertise?

Clearly, the answer is NO. Many factors support the idea of AutoML:

  • Shortage of machine learning expertise
  • Machine-Learning expertise is cost-inefficient

For large organizations requiring high efficiency, AutoML cannot replace a machine learning expert, but it can be cost-effective and can be useful for smaller organizations.

Applications of AutoML

AutoML can be used for the following tasks using AutoML platforms like Google cloud AutoML:

  • Automated Data Preparation
    It Involves column type detection, intent detection, and automated task detection within the dataset.
  • Feature Engineering
    It includes Feature Scaling, meta-learning, and feature selection.
  • Automated Model Selection
    AutoML can help in model selection.
  • Automated problem checking
    Problem checking and debugging can be automated.
  • Automated analysis of results obtained
    Applying wonders of AI can save time and capital.

Here is a good read – Two Real Life Examples of Google’s Automated Machine Learning.

Popular AutoML Libraries like Featuretools, Auto-sklearn, MLBox, TPOT, H2O, Auto-Keras are the ones contributing to enhanced AutoML experience.

Advantages of AutoML

  • The installation of the libraries is effortless.
  • The introduction of Cloud AutoML has speeded up the development of AutoML.
  • Cost-effective, and Labour-efficient.
  • Require a lower level of expertise.

Limitations of AutoML

Although coming with a set of advantages, advanced AutoML introduces the concept of hyperparameters, which are itself needed to be learnt. AutoML can be usefully incorporated for doing a task that can be generalized, but for functions that are unique and require some level of expertise, AutoML turns out to be a disaster.

Future of AutoML

Automated Machine Learning (AutoML) has been gaining traction within the Data Science community. This surge of interest is reflected in the development and release of numerous open-source Automated Machine Learning tools and libraries, which are mentioned above, and on the emergence of businesses focused on building and commercializing AutoML systems (like DataRobot, DarwinAI, H2O.ai, OneClick.ai). AutoML is a hot topic for the industry, but it is not all-set for replacing data scientists from existence. Besides the difficulty of automating many of the data science tasks, its sole purpose is to assist data scientists and free them from the burden of repetitive, and less demanding jobs that can be generalized, so they can invest their time on tasks that are more challenging, creative, and harder to automate. (AutoML: The Next Wave of Machine Learning)

Concluding, we live in an era where the growth of data beats our ability to make sense of it. AutoML is an exciting technological field that has been in the spotlight and which promises to mitigate this problem through the development in the sector of Artificial Intelligence.

We expect significant strides of progress in this field in the near future, and we recognize the help of AutoML systems in solving many of the challenges that we face out there.

Quantum Computing – The Unexplored Miracle

What is Quantum Computing?
Quantum computing is the use of quantum-mechanical phenomena such as superposition and entanglement to perform computation. A quantum computer is specifically used to perform such calculation, which can be implemented theoretically or physically. The field of quantum computing is a sub-field of quantum information science, which includes quantum cryptography and quantum communication. The idea of Quantum Computing took shape in the early 1980s when Richard Feynman and Yuri Manin expressed the idea that a quantum computer had the potential to simulate things that a classical computer could not.

The year 1994 saw further development of Quantum Computing when Peter Shor published an algorithm that was able to efficiently solve problems that were being used in asymmetric cryptography that were considered very hard for a classical computer. There are currently two main approaches to physically implementing a quantum computer: analog and digital. Analogue methods are further divided into the quantum simulation, quantum annealing, and adiabatic quantum-computation.

Basic Fundamentals of Quantum Computing
Digital quantum computers use quantum logic gates to do computation. Both approaches use quantum bits or qubits. These qubits are fundamental to Quantum Computing and are somewhat analogous to bits in a classical computer. Like a regular bit, Qubit resides in either 0 or 1 state. The specialty is that they can also be in the superposition of 1 and 0 states. However, when qubits are measured, the result is always either a 0 or a 1; the probabilities of the two outcomes depends on the quantum state they were in.

Principle of Operation of Quantum Computing
A quantum computer with a given number of quantum bits is fundamentally very different from a classical computer composed of the same number of bits. For example, representing the state of an n-qubit system on a traditional computer requires the storage of 2n complex coefficients, while to characterize the state of a classical n-bit system it is sufficient to provide the values of the n bits, that is, only n numbers.

A classical computer has a memory made up of bits, where each bit is represented by either a one or a zero. A quantum computer, on the other hand, maintains a sequence of qubits, which can represent a one, a zero, or any quantum superposition of those two qubit states; a pair of qubits can be in any quantum superposition of 4 states, and three qubits in any superposition of 8 states. In general, a quantum computer with n qubits can be in any superposition of up to different states. Quantum algorithms are often probabilistic, as they provide the correct solution only with a certain known probability.

What is the Potential that Quantum Computing offers?
Quantum Computing is such a unique field that very few people show their interest in it. There is a lot of room for development. It has a lot of scope. Some of the areas in which this is penetrating today are:

  • Cryptography – A quantum computer could efficiently solve this problem using multiple algorithms. This ability would allow a quantum computer to break many of the cryptographic systems in use today
  • Quantum SearchQuantum computers offer polynomial speedup for some problems. The most well-known example of this is quantum database search, which can be solved by Grover’s algorithm using quadratically fewer queries to the database than that is required by classical algorithms.
  • Quantum Simulation – Since chemistry and nanotechnology rely on understanding quantum systems, and such systems are impossible to simulate efficiently classically, many believe quantum simulation will be one of the most important applications of quantum computing.
  • Quantum Annealing and Adiabatic Optimization
  • Solving Linear Equations – The Quantum algorithm for linear systems of equations or “HHL Algorithm,” named after its discoverers Harrow, Hassidim, and Lloyd, is expected to provide speedup over classical counterparts.
  • Quantum Supremacy

In conclusion, Quantum computers could spur the development of breakthroughs in science, medication to save lives, machine learning methods to diagnose illnesses sooner, materials to make more efficient devices and structures, financial strategies to live well in retirement, and algorithms to direct resources such as ambulances quickly.  The scope of Quantum Computing is beyond imagination. Further developments in this field will have a significant impact on the world.

What is Natural Language Processing?

What is Natural Language Processing (NLP)?

Natural Language Processing commonly abbreviated as NLP is a subfield of computer science and artificial intelligence. It is mainly concerned with the interaction between computers and the languages humans speak, like English, Italian, French, among various others. It is used in particular to program machines to process and analyze large amounts of natural language data.

The development of NLP applications is quite challenging because computers traditionally require human beings to communicate to them through a programming language or a high-level language. Human speech, however, is not always precise, is often ambiguous and is dependent on factors like the emphasis on a particular word or expression. These are the factors that the computer finds very difficult to understand.

How does Natural Language Processing work?

Syntax and Semantic analysis are two main techniques that are used with NLP. The  Syntax is the arrangement of words in a sentence to make some grammatical sense. Different Syntax methods used are:

  • Parsing
  • Word segmentation
  • Sentence breaking
  • Morphological segmentation and
  • Stemming

The Semantic involves the use and meaning behind the words. NLP applies the algorithms to understand the grammar and meaning of the sentences. The techniques used by NLP in semantic Analysis are:

  • Named Entity Recognition
  • Natural Language Generation

The current approaches to NLP are mainly based on Deep Learning, which is a type of AI that examines and uses the patterns in data to improve programs understanding. It is basically dependent on supervised learning, which consists of a training set and a test set.

Three tools very commonly used for NLP are NLTK, Gensim, and Intel NLP Architect. Natural Language Toolkit(NLTK), is an open-source python module with data sets and tutorials. Gensim is a Python library for topic modeling and document indexing. Intel NLP Architect is also another Python library for deep learning topologies and techniques.

What are the Uses of Natural Language Processing?

Although, NLP came into existence for the first time by Alan Turing when he published an article titled “Computer Machinery and Intelligence,”. The vast use came into effect only from the 80s, after the introduction of Machine Learning. Before 1980, the most natural language processing systems were based on complex sets of hand-written rules. Writing these rules included a lot of labor and was inaccurate due to diversity in the pronunciation of a language. Introduction of Machine Learning speeded up the development of Natural Language Processing.

Natural language Processing is very widely used today in our daily routine. It finds its application in:

  • Chatbots – Chatbots handle various clients and answer their query without considerable human effort. Chatbots are trained on a vast set of data and hence process only the essential part from a conversation. Companies like Uber, Zomato use the Chatbots to minimize human involvement.
  • Voice Assistants – The most significant use of NLP is implemented for this purpose. Technological giants like Google, Microsoft, Amazon, etc. use their own personal voice assistant to help in communicating with smart devices quickly. Amazon assigns over 1000 personnel globally for enhancing its voice assistant.
  • Very brilliant use of NLP came with the name Grammarly, which is a tool that keeps on a check on writers’ write-ups, and points out grammatical errors and suggests better phrases.
  • Google translate also uses NLP to translate a webpage from one language to another by understanding its content.

 

What are the Challenges faced by NLP?

NLP, though a new technology with a lot of advantages isn’t completely developed. For Example – Semantic and Grammar Analysis is still a challenge for NLP. Other difficulties includes, NLP not relating to sarcasm easily, since NLP cannot figure the changing meaning of words on the basis of speaker emphasis. NLP is also challenged by the fact that the dialect of people changes with regions.

On a final note, Natural Language Processing is a very handy tool, although it is in the developing state and faces some difficulties. The recent development in the NLP has made it a gem for the Technological Giants. The future of NLP through Machine and Deep Learning seems quite bright.