As this assignment is graded based on a competition in the class, it is your art on what model to fit to the dataset. This implies that even if all selected independent variables are significant, you as a modeler, may decide to remove some of them from the model. Clearly all non-significant parameters must be removed from the model. However, there is no obligation to include all remaining significant parameters in the model and you can decide which ones you would like to include in the model. For instance, suppose that X1, X2, X4, and X6 are found to be significant. Then you may decide to include all four of them in the model, or only X1, X2, and X4, or only X1, X2, X6. For any selected combination of independent variables, a new model should be fitted to the data to estimate the parameters. This is the art of modeling and a part of the competition.

It is suggested, first, partitioning the 300 records in the training set into 250 and 50 houses. Fit an appropriate multiple linear regression model including all significant parameters to the first 250 houses. Then, as suggested in Item 1, try different models for the first 250 houses and use them to predict the other 50 houses. As you have the actual values of the latter, you can find the total error of your predictions (e.g., find the average absolute error of your predictions, i.e., the average of |actual value – predicted values| for all 50 houses). Find the best fitted model based on achieving the minimum total error.

Use the final fitted regression model to estimate the mean price of 100 houses given in the data file Test_set.xlsx and store your predictions in Column B (labelled Predicted Price).