White Papers

ManualsBrandsDell ManualsConverged InfrastructureHigh Performance Computing Solution Resources

Ready Solutions Engineering Test Results

the property of their respective owners. Published in the USA. Dell EMC believes the information in this document is accurate as of its publication date. The information is

subject to change without notice.

The Complexity of Learning Concept

How much data do we need to use a ML algorithm?

Although this is the most common question, it is hard to answer since the amount of data mainly depends on how complex the learning

concept is. In Machine Learning (ML), the learning complexity can be broken down into informational and computational complexities.

Further, informational complexity considers two aspects, how many training examples are needed (sample complexity) and how fast a

learner/model’s estimate can converge to the true population parameters (rate of convergence). Computational complexity refers the

types of algorithms and the computational resources to extract the learner/model’s prediction within a reasonable time. As you can

guess now, this blog will cover informational complexity to answer the question.

Learn from an example – ‘To be or Not to be banana’

Let’s try to learn what banana is. In this example, banana is the learning concept (one hypothesis, that is ‘to be’ or ‘not to be banana’),

and the various descriptions associated with banana can be features

such as colors and shapes. Unlike the way human can process

the concept of banana – the human does not require non-banana information to classify a banana, typical machine learning algorithm

requires counter-examples. Although there is One Class Classification (OCC) which has been widely used for outlier or anomaly

detection, this is harder than the problem of conventional binary/multi-class classification.

Let’s place another concept ‘Apple’ into this example and make this practice as a binary-class classification. By doing this, we just

made the learning concept simpler, ‘to be banana = not apple’ and ‘not to be banana = apple’. This is little counter-intuitive since adding

an additional learning concept into a model makes the model simpler: however, OCC basically refers one versus all others, and the

number of all other cases are pretty much infinite. This is where we are in terms of ML; one of the simplest learning activities for human

is the most difficult problem to solve in ML. Before generating some data for banana, we need to define some terms.

• Instances

X describes banana with features such as color (f

= yellow, green or red, |f

|=3), shape (f

= cylinder or sphere,

|=2) and class label (C → {banana, apple}, |C|=2). These values for color and shape need to be enumerated. For examples,

we can assign integers to each value like (Yellow=1, Green=2, Red=3), (Cylinder=1, Sphere=2) and (banana=0, apple=1)

(Table 1)

• The target function t generates a prediction for ‘is this banana or apple’ as a number ranging between 0 ≤ t(x

) ≤ 1. Typically,

we want to have a prediction, t(x

) as close as c(x

), 0 ≤ i ≤ n, where n is the total number of samples.

• The hypothesis space H can be defined as the conjunction of features and target function h(x

) = (f

, f

, t(x

)).

• Training examples S must contain roughly the same number of banana (0) and apple (1) examples. A sample is described as

s(x

) = (f

, f

, c(x

))

Sample complexity – estimate the size of training data set in a quick and dirty way

Ideally, we want to have all the instances in the training sample set S covering all the possible combinations of features with respect to t

as you can see in Table 1. There are three possible values for f

and two possible values for f

. Also, there are two classes in this

example. Therefore, the number of all the possible instances |X| = |f

| x |f

| x |C| = 3 x 2 x 2 = 12. However, f

is a lucky feature

iii

that is

mutually exclusive between banana and apple. Hence, |f

| is considered as 1 in this case. In addition to that, we can subtract one case

because there is no red banana. For this example, only 5 instances can exhaust the entire sample space H. In general, the number of

features (columns) in a data set is exponentially proportional to the required number of training samples (|S| = n). If we assume that all

Summary of content (3 pages)

PAGE 1
Ready Solutions Engineering Test Results 1 The Complexity of Learning Concept How much data do we need to use a ML algorithm? Although this is the most common question, it is hard to answer since the amount of data mainly depends on how complex the learning concept is. In Machine Learning (ML), the learning complexity can be broken down into informational and computational complexities.
PAGE 2
Ready Specs 2 features are binary like a simple value of yes or no, then |X| = 2 x 2 x 2 = 23. Two to the power of the number of columns is the minimum n in the simplest case. This example only works when the values in all the features are discrete values.
PAGE 3
Ready Specs 3 Back in the day, not a single ML paper was accepted without a learning curve. Without this simple plot, the entire performance claim will be unverifiable. Resources Internal web page External web page Contacts Americas Kihoon Yoon Sr. Principal Systems Dev Eng Kihoon.Yoon@dell.com +1 512 728 4191 i Or attributes. A feature is an individual measurable property or characteristic of a phenomenon being observed.