Concept Learning: The Stepping Stone Toward Machine Learning With Find-S
Learn about one of the most simple algorithms of artificial intelligence, the Find-S algorithm, to help you get started with machine learning.
Join the DZone community and get the full member experience.
Join For FreeIn a previous blog, we came across what awesome stuff a machine can do with machine learning and all the math that's required before you take a deep dive into machine learning. Now that we all know the prerequisites for machine learning, let’s start the journey toward machine learning with small but effective steps towards awesomeness.
Many of us often wonder how machines can learn from data and how they can predict the future. We're living in an era where lots of us are working globally on big data technologies with great efficiency and speed. But having a huge amount of data only is not the only thing we need to focus on; we also need to focus on the optimal use of data until we are able to find patterns and use those patterns to predict future events and identify our solutions.
To understand how a machine can learn from past experiences and predict the future based on experience, we need to understand the workings of the human brain first. Once we understand the patterns of the human brain when it comes to solving a problem, we can make our machine learn in a nearly identical way in that the human brain has no limits and there is a lot to explore. As machine learning is a huge field of study and there are a lot of possibilities, let's discuss one of the most simple algorithms of machine learning: the Find-S algorithm.
What Is Learning?
There are several definitions of "learning." One of the simplest definitions is:
“The activity or process of gaining knowledge or skill by studying, practicing, being taught, or experiencing something.”
Just as there are various definitions for learning, there are various categories of learning methods.
As a human, we learn a lot of things throughout our life. Some of them are based on our experience and some of them are based on memorization. On the basis of that, we can divide learning methods into five parts:
Rote learning (memorization): Memorizing things without knowing the concept/logic behind them.
Passive learning (instructions): Learning from a teacher/expert.
Analogy (experience): Learning new things from our past experience.
Inductive learning (experience): On the basis of past experience, formulating a generalized concept.
Deductive learning: Deriving new facts from past facts.
Inductive learning is based on formulating a generalized concept after observing examples of the concept. For example, if a kid is asked to write an answer to 2*8=x, they can either use the rote learning method to memorize the answer or use inductive learning (i.e. thinking how 2*1=2, 2*2=4, and so on) to formulate a concept to calculate the results. In this way, the kid will be able to solve similar types of questions using the same concept.
Similarly, we can make our machine learn from past data and make them identify whether an object falls into a specific category of our interest.
What Is Concept Learning?
In terms of machine learning, "concept learning" can be defined as:
“The problem of searching through a predefined space of potential hypotheses for the hypothesis that best fits the training examples.” — Tom Michell
Much of human learning involves acquiring general concepts from past experiences. For example, humans identify different vehicles among all the vehicles based on specific sets of features defined over a large set of features. This special set of features differentiates the subset of cars in a set of vehicles. This set of features that differentiate cars can be called a concept.
Similarly, machines can learn from concepts to identify whether an object belongs to a specific category by processing past/training data to find a hypothesis that best fits the training examples.
Target concept:
The set of items/objects over which the concept is defined is called the set of instances and denoted by X. The concept or function to be learned is called the target concept and denoted by c. It can be seen as a boolean valued function defined over X and can be represented as c: X -> {0, 1}.
If we have a set of training examples with specific features of target concept C, the problem faced by the learner is to estimate C that can be defined on training data.
H is used to denote the set of all possible hypotheses that the learner may consider regarding the identity of the target concept. The goal of a learner is to find a hypothesis H that can identify all the objects in X so that h(x) = c(x) for all x in X.
An algorithm that supports concept learning requires:
Training data (past experiences to train our models)
Target concept (hypothesis to identify data objects)
Actual data objects (for testing the models)
Inductive Learning Hypothesis
As we discussed earlier, the ultimate goal of concept learning is to identify a hypothesis H identical to target concept C over data set X with the only available information about C being its value over X. Our algorithm can guarantee that it best fits the training data. In other words:
"Any hypothesis found approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples."
For example, whether a person goes to a movie is based on four binary features with two values (true or false):
Has money
Has free time
It’s a holiday
Has pending work
With the training data, we have with two data objects as positive samples and one as negative:
x1: <true, true, false, false> : +ve
x2: <true, false, false, true> : +ve
x3:<true, false, false, true> : -ve
Hypothesis Notations
Each of the data objects represents a concept and hypotheses. Considering a hypothesis <true, true, false, false> is more specific because it can cover only one sample. Generally, we can add some notations into this hypothesis. We have the following notations:
ⵁ (represents a hypothesis that rejects all)
< ? , ? , ? , ? > (accepts all)
<true, false, ? , ? > (accepts some)
The hypothesis ⵁ will reject all the data samples. The hypothesis <? , ? , ? , ? > will accept all the data samples. The ? notation indicates that the values of this specific feature do not affect the result.
The total number of the possible hypothesis is (3 * 3 * 3 * 3) + 1 — 3 because one feature can have either true, false, or ? and one hypothesis for rejects all (ⵁ).
General to Specific
Many machine learning algorithms rely on the concept of general-to-specific ordering of hypothesis.
h1 = < true, true, ?, ? >
h2 = < true, ? , ? , ? >
Any instance classified by h1 will also be classified by h2. We can say that h2 is more general than h1. Using this concept, we can find a general hypothesis that can be defined over the entire dataset X.
To find a single hypothesis defined on X, we can use the concept of being more general than partial ordering. One way to do this is start with the most specific hypothesis from H and generalize this hypothesis each time it fails to classify and observe positive training data object as positive.
The first step in the Find-S algorithm is to start with the most specific hypothesis, which can be denoted by h <- <ⵁ, ⵁ, ⵁ, ⵁ>.
This step involves picking up next training sample and applying Step 3 on the sample.
The next step involves observing the data sample. If the sample is negative, the hypothesis remains unchanged and we pick the next training sample by processing Step 2 again. Otherwise, we process Step 4.
If the sample is positive and we find that our initial hypothesis is too specific because it does not cover the current training sample, then we need to update our current hypothesis. This can be done by the pairwise conjunction (logical and operation) of the current hypothesis and training sample.
If the next training sample is <true, true, false, false> and the current hypothesis is <ⵁ, ⵁ, ⵁ, ⵁ>, then we can directly replace our existing hypothesis with the new one.
If the next positive training sample is <true, true, false, true> and current hypothesis is <true, true, false, false>, then we can perform a pairwise conjunctive. With the current hypothesis and next training sample, we can find a new hypothesis by putting ? in the place where the result of conjunction is false:
<true, true, false, true> ⴷ <true, true, false, false> = <true, true, false, ?>
Now, we can replace our existing hypothesis with the new one: h <-<true, true, false, ?>
This step involves repetition of Step 2 until we have more training samples.
Once there are no training samples, the current hypothesis is the one we wanted to find. We can use the final hypothesis to classify the real objects.
Limitations of the Find-S Algorithm
The Find-S algorithm for concept learning is one of the most basic algorithms of machine learning, though it has some limitation and disadvantages like:
There's no way to determine if the only final hypothesis (found by Find-S) is consistent with the data or kf there are more hypotheses that are consistent with data.
Inconsistent sets of training examples can mislead the Find-S algorithm, as it ignores negative data samples. An algorithm that can detect inconsistency of training data is better.
A good concept learning algorithm should be able to backtrack the choice of hypothesis found so that the resulting hypothesis can be improved over time. Unfortunately, Find-S provides no such method.
Many of the limitations can be removed in one of the most important algorithms of concept learning, called the candidate elimination algorithm.
In our next blog, we will be explaining Find-S algorithm with a basic example. To explore Find-S implementation, please check out the next part of this blog (coming soon).
Published at DZone with permission of Girish Bharti, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments