Hacking the AI: The Dark Side of Machine Learning 

Table of Contents
    Add a header to begin generating the table of contents

    ai-hacking

    Introduction 

    As we venture into the era of the 4th Industrial Revolution, where data reigns supreme, the utilization of Machine Learning has surged dramatically. From email filters to self-driving cars, Machine Learning has become an integral part of our lives. However, this technological advancement brings along a new set of challenges. Adversarial Machine Learning, a branch of AI (Artificial Intelligence), exposes vulnerabilities within these models, enabling adversaries to manipulate them for malicious purposes. In this blog, we will explore the concept of Adversarial Machine Learning, examine real-life examples, and discuss potential defence mechanisms. 

    The Adversarial Nature of Machine Learning  

    While Machine Learning offers immense benefits, it is not impervious to manipulation. Adversaries can exploit the vulnerabilities of AI systems by introducing inaccurate or misleading data during training or by crafting malicious inputs to deceive trained models. This adversarial behavior poses significant risks in various domains. For instance, hackers could alter stop signs in a way that confuses self-driving cars, potentially leading to accidents. Similarly, internet trolls manipulated Microsoft’s AI chatbot, Tay, into generating offensive content, resulting in its prompt shutdown. Such instances emphasize the need for robust defence mechanisms. 

    major-known-data-breaches

    Examples of Adversarial Attacks  

    To comprehend the extent of adversarial attacks, let us examine a few real-world instances: Researchers from Samsung, the Universities of Washington, Michigan, and UC Berkeley modified stop signs subtly, rendering them unrecognizable to self-driving cars’ computer vision algorithms. This manipulation could cause unpredictable behavior and potential accidents. Moreover, researchers at Carnegie Mellon University discovered that wearing special glasses could deceive facial recognition systems into misidentifying individuals as celebrities. These examples highlight the ease with which adversaries can exploit vulnerabilities, necessitating proactive measures to mitigate the risks associated with adversarial machine learning.  

    Defending Against Adversarial Attacks  

    Protecting Machine Learning models from adversarial attacks requires a multi-faceted approach. Adversarial training, which involves training models with adversarial examples, can enhance resilience. Another strategy involves deploying ensemble models, which combine multiple models to collectively make predictions, making it harder for adversaries to manipulate them. Additionally, the development of more generalized models that can withstand diverse adversarial inputs can enhance robustness. However, these defence mechanisms often come at a cost, both in terms of computational resources and time required for development. Therefore, there is a pressing need for further research and innovation to strengthen the defences against adversarial attacks in Machine Learning systems. 

    Defending-Against-Adversarial-Attacks

    Conclusion

    Machine Learning has revolutionized numerous domains, but it also introduces new risks through adversarial attacks. Adversaries can exploit vulnerabilities, causing potential harm and disruption. To protect against these attacks, researchers and practitioners are actively exploring various defences mechanisms. However, this remains an ongoing challenge, and the development of effective and efficient defence strategies is paramount.

    As we continue to embrace the power of Machine Learning, it is crucial to address the risks associated with adversarial machine learning and strive towards building more secure and resilient AI systems that can withstand the evolving threat landscape. 

    The Machine Learning Lifecycle

     

    The machine learning life cycle is a cyclic process to develop, train, and use machine learning models. It elucidates each step that an organization needs to follow to take advantage of machine learning techniques and use them in their business. In simple words, it defines an end-to-end process of solving a machine learning problem.

    The machine learning life cycle is data-driven because the model and the output of training are linked to the data on which it was trained. Thus, the initial steps of the cycle exclusively explain how to deal with raw data. Then patterns are gathered from the data and finally used to predict an attribute or element. Let us understand each of these steps in detail.

    PART 1: From data collection to exploratory data analysis

    • Data collection– Gathering data is the first step of the machine learning life cycle. Data can be collected from various sources such as the internet and databases and can be stored in formats like CSV and XML. The quality and quantity of data determine the efficiency of the output obtained. The more the data available, the more accurate the prediction.
    • Data exploration– The next step after data collection is data exploration. Looking for similarities in elements of the dataset, finding correlations, managing inconsistent and missing information that could skew the data findings later, all these are included under this step.
    • Data wrangling- The collected raw data may have issues like missing values, duplicate or insignificant values, and noise. Dirty data can affect the accuracy of the predicted outcome. In order to fit in the machine learning model perfectly, the prepared data should be formatted and edited. Data wrangling is the process of cleaning complex datasets for easy access and analysis. It consists of the following processes- data cleaning, variable selection for the model, and transformation of data into a proper format for analysis.
    • Data interpretation and analysis- After data cleaning, machine learning algorithms are used to build a suitable model. A large variety of machine learning models can be used. Some examples of models are regression models, clustering models, and reinforcement learning models. The machine learning models use various statistical and mathematical methods to analyze and visualize data and predict outcomes. Various statistical plots like histograms, pair plots, distribution plots, and heat maps are used to analyze and strike comparisons between the elements of the data.

    PART 2: From model training to evaluation

    • Train model – On completion of thorough data analysis, the entire data is split into training and testing sets. Training sets are used to fit and tune models. Training the model is required to understand patterns and trends from the cleaned data Machine learning algorithms use this training data to find model parameters like coefficients of polynomial and intercepts, and further use these to predict outcomes for test data.
    • Test model – How are we tested when we attend university? – Through assessments and examinations. In machine learning as well, this needs to happen. Once the model is trained, it is tested. Testing the model determines the percentage accuracy depending on the problem being dealt with. The test data is assumed to be new data whose output values can be determined by the model’s algorithm. Predictions are gathered for the test dataset from the training model.
    • Model deployment – Deployment of a machine learning model is the process of integrating a machine learning model into an existing production mechanism to make practical business decisions. If the resulting model produces accurate outputs efficiently, then it can be deployed to real-world systems.

    Upscale your business with Machine Learning.

    Let's talk

     

    Importance of the machine learning lifecycle

    It is important because it describes the role of every person in a company who is dealing with data science initiatives and projects. It takes every project from beginning to completion and gives a high-level perspective of how the organization data should be structured and dealt with to obtain practical business value from it and leverage profits. If there exists an error in the execution of any one of the steps in the Machine Learning life cycle, the resulting model will not give accurate values and will not be of any practical use to organizations.

    With Machine Learning gaining more traction in businesses and giving momentum to their increasing profit rates, a development lifecycle that supports learning models for building custom Machine Learning algorithms and applications has become very crucial.

    Understanding every step of a machine learning life cycle and using it to select and use the most appropriate Machine Learning model is the ultimate aim of studying the Machine Learning cycle.

    Latest developments in Natural Language Processing (NLP)

    Ever wondered how robots and machines perceive a command given to them?

    Well, Natural language processing (NLP) gives them the ability to read, understand, and deduce useful information them from human languages. There are many applications of natural language processing including this.

    As defined by Wikipedia, Natural language processing is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language. It talks about how to program computers to process and analyze large amounts of natural language data. NLP finds numerous applications in today’s world with major ones being in chatbots, sentiment analysis, and market intelligence.

    Following are some of the major trends and advancements and some NLP examples that have dominated AI and the tech world in recent years-

    • Business Intelligence– There is a parallel connection between business intelligence and NLP. NLP facilitates the user’s interaction with complicated databases.
      • Using NLP companies gain product information like marketing and sales information, customer service, brand notoriety and the present talent pool of a company.
      • Another popular method of NLP used in BI is opinion mining. It uses NLP to extract customer sentiments from their reviews and ratings.
    • Semantic modelling- Semantic analysis is the process of relating syntactic structures, from the levels of phrases, clauses, sentences, and paragraphs to the level of the writing as a whole, to their language-independent meanings. Its goal is to draw the exact meaning from a text. It processes the logical structure of the text and identifies the most relevant elements in the text. It also understands the relationships between the elements of the text which is used in further analysis.
    • Chatbots– Natural Language Processing in AI is very popular. Atleast one-fourth of the organizations will have chatbots or visual customer assistants or some other type of NLP included in their customer service system by 2020. Chatbots learn the semantic relations, understand the objective of the questions asked, and then automatically perform the filtration and organization necessary to serve a relatable and significant answer, rather than simply showing the data. For instance, Microsoft’s Cortana is helping many small and large-scale businesses do research and process data by voice.
    • Human-machine interaction- One of the most common examples of usage of NLP in human-machine interaction is spam detection where emails undergo a process of getting filtered administered under NLP algorithms based on whether it is spam or not.
    • Deep learning for NLP– Deep learning techniques like Recurrent Neural Networks are used to get accurate results after analyzing the data.
    • Supervised and unsupervised learning– Natural Language Processing in machine learning is used for text analytics where statistics identify sentiments, expressions, and aspects of speech.

     

     

    Did you know that AI can make your business more lucrative?

    Let's talk

     

    Let us discuss the supervised and unsupervised learning aspects of NLP in more detail below since they house some of the most popular techniques.

    In supervised learning, a set of text documents are tagged with examples with the machine of what is to be searched. These supplied examples are used to train the model which is later supposed to analyze the unlabelled or untagged text. Some of the most popular supervised learning NLP techniques are Support Vector Machines and Neural Networks. (A good read – Natural Language Processing (NLP) for Machine Learning)

    In unsupervised learning, the model is trained without pre-tagging. Clustering, Latent Semantic Indexing (LSI), and Matrix Factorization are some popular techniques in unsupervised learning NLP.

    • In clustering, a similar types of data is grouped into sets and later sorted based on relevance using algorithms.
    • LSI on the other hand involves identifying words and phrases that frequently occur in the given text.
    • Matrix Factorization is different from the other two as it deals with breaking larger matrices into smaller ones using latent factors i.e., similarities between two or more items.
    • Reinforcement learning- Reinforcement learning along with supervised and unsupervised learning forms the three basic paradigms for ML. It allows machines and software applications to determine the ideal behaviour within a specific context. Tasks such as summarization of a text are performed by reinforcement learning algorithms.
    • Company monitoring- The impact of social media is irreplaceable. It has become an integral part of the normal life of every individual and perhaps this is the reason why companies and organisations have started focusing on social media interactions for promotions and growth of their business and reach more than ever before And social media monitoring tools such as Buffer and Hootsuite have been built using the latest algorithms of NLP. Tools like these help in monitoring company’s engagement in the market.

     

    General Natural Language Processing Tasks

    Now as we have seen various applications of NLP, let us walk through the general NLP tasks that are followed when NLP systems deal with a language-

    • Content categorization– Includes summarization along with content indexing, duplication, and content alerts.
    • Topic discovery and modelling- Deducing meaning from the text and applying analytics
    • Contextual extraction- Extracting information from text-based sources
    • Sentiment analysis-Includes identifying specific moods and opinion mining.
    • Speech-to-text and text-to-speech conversion-Transforming voice commands into written text, and vice versa.
    • Document summarization-Generating structures from large bodies of text.
    • Machine translation- Automatic translation of text or speech from one language to another.

    NLP has gained popularity since its inception. Devices like Amazon’s Alexa are being used widely all across the globe today. And for enterprises, business intelligence and customer monitoring are fast becoming popular and will dominate the sector in the coming years.

    Best Practices in Python and why Python is so popular

    Python is a versatile language that has attracted a broad base of people in recent times. Python has become one of the most popular programming languages.  The popularity of Python grew exponentially during the last decade. According to an estimate, the previous five years saw more Python developers than the conventional Java/C++ programmers. Now the question is why is Python so popular? The primary reasons for this are its simplicity, speed, and performance.

    Why does Python have an edge over the other programming languages? Let’s find out!

    • Everything is an object in Python
    • Support for Object-Oriented Programming – including multiple inheritances, instance methods, and class methods
    • Attribute access customization
    • List, dictionary, and set comprehensions
    • Generators expressions and generator functions (lazy iteration)
    • Standard library support of queues, fixed precisions decimals, rational numbers.
    • Wide-ranging standard library including OS access, Internet access, cryptography, and much more.
    • Strict nested scoping rules
    • Support for modules and packages
    • Python is used in the data science field
    • Python is used in machine learning and deep learning
    • Parallel Programming

    As a Python developer, you must know some basic techniques and practices which could help you by providing a free-flowing work environment. Some of the best practices in Python are listed below.

    Create Readable Documentation

    In python, the best practice is readable documentation. You may find it a little burdensome, but it creates a clean code. For this purpose, you can use Markdown, reStructuredText, Sphinx, or docstrings. reStructuredText and Markdown are markup languages with plain text formatting syntax to make it easy to markup text and convert it into a format like HTML or PDF. Sphinx is a tool to create intelligent and beautiful documentation easily, while reStructuredText lets you create in-line documentation. It also enables you to export documentation in formats like HTML.

    Follow Style Guidelines

    Python follows a system of community-generated proposals known as Python Enhancement Proposals(abbreviated as PEPs) which attempt to provide the basic set of guidelines and standards for a wide variety of topics for proper Python Development. One of the most widely referenced PEPs ever created is PEP8, which is also termed as the “Python community Bible” for properly styling your code.

    Immediately Correct your Code

    When creating a python application, it is almost always more beneficial in the long-term to acknowledge quickly and repair broken code. (Join the Xaltius Academy to learn how!)

    Give Preferences to PyPI over manual Coding

    The above will help in obtaining a clean and elegant code. However, one of the best tools to improve your use of Python is the huge module repository namely The Python Package Index (short for PyPI). Not considering the level and experience of the Python Developer, this repository will be very beneficial for you. Most projects will initially begin by utilizing existing projects on PyPI. The PyPI has over 10,000 projects at the time of writing. There’s undoubtedly some code that will fulfill your project needs.

    Watch out for Exceptions

    The developer should watch out for exceptions. They creep in from anywhere and are difficult to debug.

    Example: One of the most annoying is the KeyError exception. To handle this, a programmer must first check whether or not a key exists in the dictionary.

    Write Modular and non-repetitive Code

    A class/function should be defined if some operation is required to be performed multiple times. This will shorten your code, also increasing code readability and reducing debugging time.

    Use the right data structures

    The benefits of different data structures are very well known. This will result in higher working speed, storage space reduction, and higher code efficiency.

    These are the good practices in Python that every Python developer must follow for a smooth experience in Python. Python is a growing language and its increased use in the field of Data Analytics and Machine Learning has proved to be very useful for the developers. Python for AI has also gained popularity in recent years. In the upcoming years, Python shall have a very bright future, and the programmers who are proficient in Python will have an advantage.

    A Short History of Data Science

    Over the past two decades, tremendous progress has been made in the field of Information & Technology. There has been an exponential growth in technology and machines. Data and Analytics have become one of the most commonly used words since the past decade. As they are interrelated, it becomes essential to know what is the relation between them and how are they evolving and reshaping businesses.

    Data Science was officially accepted as a study since the year 2011; the different or related names were being used since 1962.

    There are six stages in which the development of Data Science can be summarised-

    Stage 1: Contemplating about the power of Data
    This stage witnessed the uprising of the data warehouse where the business and transactions were centralised into a vast repository. This period was embarked at the beginning of the 1960s. In 1962, John Tukey published the article The Future of Data Analysis – a source that established a relation between statistics and data analysis. In 1974, another data enthusiast, namely Peter Naur, gained popularity for his article namely Concise Survey of Computer Methods. He further coined the term “Data Science” which came into existence as a vast field with lot many applications in the 21st century.

    Stage 2: More research on the importance of data
    This period was witnessed as a period where businesses started research for finding the importance of collecting vast data. In 1977, the International Association of Statistical Computing (IASC) was founded. In the same year, Tukey published his second major work – “Exploratory Data Analysis” – arguing that emphasis should be laid on using data to suggest the hypothesis for testing and simultaneous exploratory testing for confirmatory data analysis. The year 1989 saw the establishment of the first workshop on Data Discovery which was titled Knowledge Discovery in Databases(KDD) which is now more popularly known as the annual ACM SIGKDD Conference on Knowledge Discovery and Data Mining(KDD).

    Stage 3: Data Science gained attention
    The early forms of markets began to appear during this phase. Data Science started attracting the attention of businesses. The idea of analysing data was sold and popularised. The Business Week cover story from the year 1994 which was titled ‘Database Marketing” supports this uprise. Businesses started to witness the importance of collecting and applying data for their profit. Various companies started stockpiling massive amounts of data. However, they didn’t know what and how to use it for their benefit. This led to the beginning of a new era in the history of Data Science.

    The term Data Science was yet again taken in 1996 in the International Federation of Classification Societies(IFCS) in Kobe, Japan. In the same year, Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth published “From Data Mining to Knowledge Discovery in Databases”. They described Data Mining and stated “Data mining is the application of specific algorithms for extracting patterns from data.

    The additional steps in the KDD process, such as data preparation, data selection, data cleaning, incorporation of appropriate prior knowledge, and proper interpretation of the results of mining, became essential to ensure that useful knowledge is derived from the data.

    Stage 4: Data Science started being practised
    The dawn of the 21st century saw significant developments in the history of data science. Throughout the 2000s, various academic journals began to recognise data science as an emerging discipline. Data science and big data seemed to work ideally with the developing technology. Another notable figure who contributed largely to this field is William S. Cleveland. He co-edited Tukey’s collected works, developed valuable statistical methods, and published the paper “Data Science: An Action Plan for Expanding the Technical Areas of the field of Statistics”.

    Cleveland put forward his notion that data science was an independent discipline and named six areas where data scientists should be educated namely multidisciplinary investigations, models and methods of data, computing with data, pedagogy, tool evaluation, and theory.

    Stage 5: A New Era of Data Science
    Till now, the world has seen enough of the advantages of analysing data. The term data scientist is attributed to Jeff Hammerbacher and DJ Patil as they carefully chose the word. A buzzword was born. The term “data science” wasn’t prevalent yet, but was made incredibly useful and significantly developed. In 2013, IBM shared the statistics that 90% of the world’s data has been created in the last two years alone. By this time, companies had also begun to view data as a commodity upon which they could capitalise. The importance of transforming large clusters of data into usable information and finding usable patterns gained emphasis.

    Stage 6: Data Science in Demand
    The major tech giants saw significant developments in demand for their products after applying data science. Apple laid out a statement for increased sales giving credit to BigData, and Data Mining. Amazon said that it sold more Kindle online books than ever. Companies like Google, Microsoft used deep Learning for speech and Voice Recognition. Using AI techniques, the usage of data was further enhanced. Data became so precious; companies started collecting all kinds of data from all sorts of sources.

    Putting it all together, data science didn’t have a very prestigious beginning and was ignored by the researchers, but once its importance was adequately understood by the researchers and the businessmen, it helped them gain a large amount of profit.

    Ethical issues in Artificial Intelligence – Problems and Promises

    With the growth of Artificial Intelligence (AI) in the 21st century, the ethical issues with AI grow in importance along with the growth in the technology. Typically, ethics in AI is divided into Robo-ethics and Machine-ethics. Robo-ethics is a concern with the moral behaviour of humans as they design and construct artificially intelligent beings, while Machine-ethics relates to the ethical conduct of artificial moral agents (AMAs). In the modern world today, the countries are stockpiling weapons, artificially intelligent robots and other AI driven machines. So, analysing risks of artificial intelligence like whether it will overtake the major jobs and how can its uncontrolled and unethical usage can affect the humanity also becomes important. And to prevent humanity from the ill-effects and risks of artificial intelligence, these ethics were coined.

    AI and robotics are unarguably one of the major topics in the field of artificial intelligence technology. Robot Ethics or more popularly known as roboethics is the morality of how humans interact, design, construct, use, and treat robots. It considers how artificially intelligent beings (AIs) may be used to harm humans and how they may be used to benefit humans. It emphasizes the fact that machines with artificial intelligence should prioritize human safety above everything else and keeping human morality in perspective.

    Can AI be a threat to human dignity?

    It was the first time in 1976 when a voice was raised against the potential ill-effects of an artificially developed being. Joseph Weizenbaum argued that AI should not be used to replace people in position that require respect and care, such as:

    • A customer service representative
    • A therapist
    • A soldier
    • A Police Officer
    • A Judge

    Weizenbaum explains that we require authentic feelings of empathy from people in these positions. If machines replace them, they will feel alienated, devalued, and frustrated. However, there are voices in support of AI when it comes to the matter of partiality, as a machine would be impartial and fair.

    Biases in AI System

    The most widespread use of AI in today’s world is in the field of voice and facial recognition and thus AI bias cases are also increasing.  Among many systems, some of them have real business implications and directly impact other people. A biased training set will result in a biased predictor. Bias can always creep into algorithms in many ways and it poses one of the biggest threats in AI. As a result, large companies such as IBM, Google, etc. have started researching and addressing bias.

    Weaponization of Artificial Intelligence

    As questioned in 1976 by Weizenbaum for not providing arms to robots, there stemmed disputes regarding the fact whether robots should be given some degree of autonomous functions.

    There has been a recent outcry about the engineering of artificial intelligence weapons that have included ideas of a robot takeover of humanity. In the near future of AI, these AI weapons present a type of danger far different from that of human-controlled weapons. Powerful nations have begun to fund programs to develop AI weapons.

    If any major military power pushes ahead with the AI weapon development, a global arms race is virtually inevitable, and the endpoint of this technological trajectory is obvious: autonomous weapons will become the Kalashnikovs of tomorrow“, are the words of a petition signed by Skype co-founder Jaan Tallinn, and many MIT professors as additional supporters against AI Weaponry.

    Machine Ethics or Machine Morality is the field of research concerned with designing of Artificial Moral Agents (AMAs), robots and artificially intelligent beings that are made to behave morally or as though moral. The sci-fi director Isaac Asimov considered the issue in the 1950s in his famous movie – I-Robot. It was here that he proposed his three fundamental laws of machine ethics. His work also suggests that no set of fixed laws can sufficiently anticipate all possible circumstances. In 2009, during an experiment at the Laboratory of Intelligent Systems in the Polytechnique Fédérale of Lausanne, Switzerland, robots that were programmed to cooperate eventually learned to lie to each other in an attempt to hoard the beneficial resource.

    Concluding, Artificial Intelligence is a necessary evil. Artificial Intelligence-based beings (friendly AIs) can be a gigantic leap for humans in technological development. It comes with a set of miraculous advantages. However, if fallen into the wrong hands, the destruction can be unimaginable and unstoppable.  As quoted by Claude Shannon, “I visualize a time when we will be to robots what dogs are to humans, and I’m rooting for the machines.”Thus ethics in the age of artificial intelligence is supremely important.