ECE448/CS440: Introduction to Artificial Intelligence Spring 2009 PROBLEM SET 4 Handed Out: 4/9/2009 Due:4/23/2009 Programming the MultiLayer Perceptron In this machine problem, you will be implementing the Back-propagation algorithm for training a multi-layer neural network. A discussion of the algorithm is found in section 11.3 of the textbook. The Data Set The data you will be using is from a real data set of mushrooms from the Audubon Society Field Guide. The data set is posted on the course web site. There are two files, a README file that describes the data and the actual DATASET. The data comprises 8124 instances of mushrooms each having 22 attributes or features. The first one is the target value for the training data in the next 21 values. The target values are binary, either "p" for poisonous or "e" for edible. Your network is to be trained to reproduce these values as closely as possible. The features are all discrete variables that take on only a finite number of values. You must transform these variables into a set of binary variables valued only 0 or 1 which should be regarded as numerical rather than logical values. Use the following conversion method. If a feature can take k values, convert it into a sequence of k binary values so that the lth bit is 1 if the feature is the lth symbol. All other bits for the feature are set to 0. If a feature is missing, set all the bits to 0. In this way, each mushroom will correspond to a binary vector of 60 bits. The network The input layer will have exactly 60 units. Use one hidden layer with as many units in it as you see fit. You may try different values. The output layer will have two units, one for "poisonous" and one for "edible". The decision rule is to label the sample according to which of the two output units has the greater activation level. Use the standard sigmoid function as the unit non-linearity. Training Assign randomly selected values for all the weights. Use about 75% of the data as training and the remaining 25% as testing data. The samples you hold out for testing is up to you. Use back propagation to determine the weights with a convergence criterion and a step size of your choosing. Testing Put the held out data into the network and use the decision rule above to make the classification. Compare the decision to the true category given in the data set and compute the error rate. Try various types of networks and step sizes to get the best performance. Turn in the following a) A diagram of your final best network b) The performance of the networks you tried as a function of the number of hidden units, the step size, and the convergence criterion. c) Your code