IST718 WK8

3 minute read

Week 8 | Machine Learning, Part I

8.1 Introduction to Machine Learning

To get us started, we’re going to use machine learning for classification. We’re going to use a data set that many of you are familiar with, the Iris data set. We’ll start with some single layer perceptrons, an adaptive linear network model, to give us a flavor for both neural networks and machine learning.

8.2 Data Review

8.3 Perceptron

Welcome back. Today we’re going to talk about perceptrons. In 1957, Frank Rosenblatt published the first concept of a perceptron learning rule. It was based on the McCulloch-Pitts neuron model, but Rosenblatt’s proposed algorithm would automatically learn that optimal weight coefficients are multiplied by the input features in order to make a decision on whether neuron fires.

So the whole idea behind the McCulloch-Pitts neuron and Rosenblatt’s threshold perceptron is to use a reduction approach to replicate how a single neuron in the brain works. It either fires, or it doesn’t fire. So it’s either a 0 or a 1. And it’s very similar to how a computer works. It’s either 0 or 1. Based on this construct, Rosenblatt’s perception rule is fairly simple and can be summarized in the following steps.

If it’s less than the threshold, it would be a class of negative 1, which in this case would be our sentosa. In Rosenblatt’s perception algorithm, the activation function is a simple unit step function. And we’ll draw that out later in the session so we can see the difference between that and the continuous function that we’ll see in one of the next lessons.

So we have our input values. We’ve got our corresponding weight vector. Together, these give us our net input or z. So it’s w transpose x. So the weight vector transposed times the input vector. And this is our activation function.

This new equation is our perceptron learning rule. And the difference in the weights is used to update the weight vector. In this case, ADA is the learning rate. yi is the true value, and y-hat is the predicted value.

8.4 Adaline

  • Linear activation function
  • Quantizer for predicting the class

Developed by Woodrow and Huff out of Stanford, Adaline can be considered an improvement over Rosenblatt’s work. Adaline includes the idea of defining and minimizing the cost function. The key difference between the two algorithms is while Rosenblatt’s perceptron updates weights using the step function, Adaline uses a linear activation function. And again, for this intro, we are continuing to look at how we can draw this line that separates the two classes for the iris data set.

If our learning rate is small, we’re going to slowly climb down to our global minimum. And so each one of those steps is another epoch, another pass through the data. If we make that learning rate large, we might continue to overshoot our global minimum. And so our error function would continue to be larger rather than getting smaller. So again, learning rate’s very important. It will affect both the error rate and it will affect the number of epochs it takes to reach convergence.

8.5 Code Review

This is your opportunity to get hands on the keyboard and start exploring the data yourself. Using Python on your computer (or the Syracuse lab environment) and the text as a guide, try to recreate the graphics and the model for a classification problem. You will have the opportunity to upload your notebook file in a later segment.

8.6 Neural Networks

So in your interview questions, when someone asks, what’s a deep neural network, it’s a multilayer network with more than one hidden layer. The error term is always being passed– in this particular model, the error term is always being passed forward. This will be interesting in a couple of sessions when we start talking about forward and backward propagation.

8.7 Propagation

8.8 Template Matching

8.9 Code Creation

8.10 Introduction to Case Study

Tags:

Updated: