Problem Set #3
For this problem set we won’t work directly from an exercise in the textbook. Instead, first download and unzip the NIST dataset from the authors’ website. Then open and read the file Datasets/Datasets/README.txt for a description of the data format.
Next, open the folder Datasets/Datasets/8_3 Data in 2 Columns and look at some of the files there. Picking a file at random (or one that appeals to you because of your research interests), open it in Excel and make sure the X data are lined up in one column and the Y data in another. (Since you guys seem to be a lot better at Excel than me, I won’t bore you with details on how to do this, but please do contact me if you’re struggling with that step.) Use a scatter plot to visualize the X/Y relationship, making sure that the X values are on the horizontal axis.
Now look at the NIST Dataset Archives in the References on p. 370 of the textbook. For each dataset, NIST provides the modeling equation and their own X/Y plot. Your goal will be to replicate NIST’s results using their model parameters. For each dataset, you should try to replicate (1) the plot on the webpage for the model; (2) the sum of squares and standard deviation in the Certified Values page for the model. Include these plots and values in your PDF writeup. How do these statistical values (sum squares, standard deviation) relate to the relative error that we’ve been working with in class? I’m looking for an an overall formula, not a separate answer for each model!
For full credit, find a dataset – online or from another class or research project of yours – and do a similar model-building and evaluation exercise, without knowing the model or parameters beforehand.