{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Predicting the ages of abalones using Linear Regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note: if you don't now what an abalone is, you might want to educate yourself before proceeding further:*\n", " https://en.wikipedia.org/wiki/Abalone" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook, we use linear regression to predict the ages of abalones.\n", "The dataset used in this short tutorial is available here: https://archive.ics.uci.edu/ml/datasets/abalone." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This dataset provides measurements on physical characteristics of abalones such as length, diameter, height, weight, etc. This physical features will be used to infer the age of abalones." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Data Visualization\n", "Let's load and visualize the dataset using Pandas" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "np.random.seed(123)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "names = ['Sex', 'Length', 'Diameter', 'Height', 'Whole weight',\n", " 'Shucked weight', 'Viscera weight', 'Shell weight', 'Rings']" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "abalone_df = pd.read_csv('abalone.data', names=names)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Sex | \n", "Length | \n", "Diameter | \n", "Height | \n", "Whole weight | \n", "Shucked weight | \n", "Viscera weight | \n", "Shell weight | \n", "Rings | \n", "
---|---|---|---|---|---|---|---|---|---|
0 | \n", "M | \n", "0.455 | \n", "0.365 | \n", "0.095 | \n", "0.5140 | \n", "0.2245 | \n", "0.1010 | \n", "0.1500 | \n", "15 | \n", "
1 | \n", "M | \n", "0.350 | \n", "0.265 | \n", "0.090 | \n", "0.2255 | \n", "0.0995 | \n", "0.0485 | \n", "0.0700 | \n", "7 | \n", "
2 | \n", "F | \n", "0.530 | \n", "0.420 | \n", "0.135 | \n", "0.6770 | \n", "0.2565 | \n", "0.1415 | \n", "0.2100 | \n", "9 | \n", "
3 | \n", "M | \n", "0.440 | \n", "0.365 | \n", "0.125 | \n", "0.5160 | \n", "0.2155 | \n", "0.1140 | \n", "0.1550 | \n", "10 | \n", "
4 | \n", "I | \n", "0.330 | \n", "0.255 | \n", "0.080 | \n", "0.2050 | \n", "0.0895 | \n", "0.0395 | \n", "0.0550 | \n", "7 | \n", "
5 | \n", "I | \n", "0.425 | \n", "0.300 | \n", "0.095 | \n", "0.3515 | \n", "0.1410 | \n", "0.0775 | \n", "0.1200 | \n", "8 | \n", "
6 | \n", "F | \n", "0.530 | \n", "0.415 | \n", "0.150 | \n", "0.7775 | \n", "0.2370 | \n", "0.1415 | \n", "0.3300 | \n", "20 | \n", "
7 | \n", "F | \n", "0.545 | \n", "0.425 | \n", "0.125 | \n", "0.7680 | \n", "0.2940 | \n", "0.1495 | \n", "0.2600 | \n", "16 | \n", "
8 | \n", "M | \n", "0.475 | \n", "0.370 | \n", "0.125 | \n", "0.5095 | \n", "0.2165 | \n", "0.1125 | \n", "0.1650 | \n", "9 | \n", "
9 | \n", "F | \n", "0.550 | \n", "0.440 | \n", "0.150 | \n", "0.8945 | \n", "0.3145 | \n", "0.1510 | \n", "0.3200 | \n", "19 | \n", "
10 | \n", "F | \n", "0.525 | \n", "0.380 | \n", "0.140 | \n", "0.6065 | \n", "0.1940 | \n", "0.1475 | \n", "0.2100 | \n", "14 | \n", "
11 | \n", "M | \n", "0.430 | \n", "0.350 | \n", "0.110 | \n", "0.4060 | \n", "0.1675 | \n", "0.0810 | \n", "0.1350 | \n", "10 | \n", "
12 | \n", "M | \n", "0.490 | \n", "0.380 | \n", "0.135 | \n", "0.5415 | \n", "0.2175 | \n", "0.0950 | \n", "0.1900 | \n", "11 | \n", "
13 | \n", "F | \n", "0.535 | \n", "0.405 | \n", "0.145 | \n", "0.6845 | \n", "0.2725 | \n", "0.1710 | \n", "0.2050 | \n", "10 | \n", "
14 | \n", "F | \n", "0.470 | \n", "0.355 | \n", "0.100 | \n", "0.4755 | \n", "0.1675 | \n", "0.0805 | \n", "0.1850 | \n", "10 | \n", "
15 | \n", "M | \n", "0.500 | \n", "0.400 | \n", "0.130 | \n", "0.6645 | \n", "0.2580 | \n", "0.1330 | \n", "0.2400 | \n", "12 | \n", "
16 | \n", "I | \n", "0.355 | \n", "0.280 | \n", "0.085 | \n", "0.2905 | \n", "0.0950 | \n", "0.0395 | \n", "0.1150 | \n", "7 | \n", "
17 | \n", "F | \n", "0.440 | \n", "0.340 | \n", "0.100 | \n", "0.4510 | \n", "0.1880 | \n", "0.0870 | \n", "0.1300 | \n", "10 | \n", "
18 | \n", "M | \n", "0.365 | \n", "0.295 | \n", "0.080 | \n", "0.2555 | \n", "0.0970 | \n", "0.0430 | \n", "0.1000 | \n", "7 | \n", "
19 | \n", "M | \n", "0.450 | \n", "0.320 | \n", "0.100 | \n", "0.3810 | \n", "0.1705 | \n", "0.0750 | \n", "0.1150 | \n", "9 | \n", "
20 | \n", "M | \n", "0.355 | \n", "0.280 | \n", "0.095 | \n", "0.2455 | \n", "0.0955 | \n", "0.0620 | \n", "0.0750 | \n", "11 | \n", "
21 | \n", "I | \n", "0.380 | \n", "0.275 | \n", "0.100 | \n", "0.2255 | \n", "0.0800 | \n", "0.0490 | \n", "0.0850 | \n", "10 | \n", "
22 | \n", "F | \n", "0.565 | \n", "0.440 | \n", "0.155 | \n", "0.9395 | \n", "0.4275 | \n", "0.2140 | \n", "0.2700 | \n", "12 | \n", "
23 | \n", "F | \n", "0.550 | \n", "0.415 | \n", "0.135 | \n", "0.7635 | \n", "0.3180 | \n", "0.2100 | \n", "0.2000 | \n", "9 | \n", "
24 | \n", "F | \n", "0.615 | \n", "0.480 | \n", "0.165 | \n", "1.1615 | \n", "0.5130 | \n", "0.3010 | \n", "0.3050 | \n", "10 | \n", "
25 | \n", "F | \n", "0.560 | \n", "0.440 | \n", "0.140 | \n", "0.9285 | \n", "0.3825 | \n", "0.1880 | \n", "0.3000 | \n", "11 | \n", "
26 | \n", "F | \n", "0.580 | \n", "0.450 | \n", "0.185 | \n", "0.9955 | \n", "0.3945 | \n", "0.2720 | \n", "0.2850 | \n", "11 | \n", "
27 | \n", "M | \n", "0.590 | \n", "0.445 | \n", "0.140 | \n", "0.9310 | \n", "0.3560 | \n", "0.2340 | \n", "0.2800 | \n", "12 | \n", "
28 | \n", "M | \n", "0.605 | \n", "0.475 | \n", "0.180 | \n", "0.9365 | \n", "0.3940 | \n", "0.2190 | \n", "0.2950 | \n", "15 | \n", "
29 | \n", "M | \n", "0.575 | \n", "0.425 | \n", "0.140 | \n", "0.8635 | \n", "0.3930 | \n", "0.2270 | \n", "0.2000 | \n", "11 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
4147 | \n", "M | \n", "0.695 | \n", "0.550 | \n", "0.195 | \n", "1.6645 | \n", "0.7270 | \n", "0.3600 | \n", "0.4450 | \n", "11 | \n", "
4148 | \n", "M | \n", "0.770 | \n", "0.605 | \n", "0.175 | \n", "2.0505 | \n", "0.8005 | \n", "0.5260 | \n", "0.3550 | \n", "11 | \n", "
4149 | \n", "I | \n", "0.280 | \n", "0.215 | \n", "0.070 | \n", "0.1240 | \n", "0.0630 | \n", "0.0215 | \n", "0.0300 | \n", "6 | \n", "
4150 | \n", "I | \n", "0.330 | \n", "0.230 | \n", "0.080 | \n", "0.1400 | \n", "0.0565 | \n", "0.0365 | \n", "0.0460 | \n", "7 | \n", "
4151 | \n", "I | \n", "0.350 | \n", "0.250 | \n", "0.075 | \n", "0.1695 | \n", "0.0835 | \n", "0.0355 | \n", "0.0410 | \n", "6 | \n", "
4152 | \n", "I | \n", "0.370 | \n", "0.280 | \n", "0.090 | \n", "0.2180 | \n", "0.0995 | \n", "0.0545 | \n", "0.0615 | \n", "7 | \n", "
4153 | \n", "I | \n", "0.430 | \n", "0.315 | \n", "0.115 | \n", "0.3840 | \n", "0.1885 | \n", "0.0715 | \n", "0.1100 | \n", "8 | \n", "
4154 | \n", "I | \n", "0.435 | \n", "0.330 | \n", "0.095 | \n", "0.3930 | \n", "0.2190 | \n", "0.0750 | \n", "0.0885 | \n", "6 | \n", "
4155 | \n", "I | \n", "0.440 | \n", "0.350 | \n", "0.110 | \n", "0.3805 | \n", "0.1575 | \n", "0.0895 | \n", "0.1150 | \n", "6 | \n", "
4156 | \n", "M | \n", "0.475 | \n", "0.370 | \n", "0.110 | \n", "0.4895 | \n", "0.2185 | \n", "0.1070 | \n", "0.1460 | \n", "8 | \n", "
4157 | \n", "M | \n", "0.475 | \n", "0.360 | \n", "0.140 | \n", "0.5135 | \n", "0.2410 | \n", "0.1045 | \n", "0.1550 | \n", "8 | \n", "
4158 | \n", "I | \n", "0.480 | \n", "0.355 | \n", "0.110 | \n", "0.4495 | \n", "0.2010 | \n", "0.0890 | \n", "0.1400 | \n", "8 | \n", "
4159 | \n", "F | \n", "0.560 | \n", "0.440 | \n", "0.135 | \n", "0.8025 | \n", "0.3500 | \n", "0.1615 | \n", "0.2590 | \n", "9 | \n", "
4160 | \n", "F | \n", "0.585 | \n", "0.475 | \n", "0.165 | \n", "1.0530 | \n", "0.4580 | \n", "0.2170 | \n", "0.3000 | \n", "11 | \n", "
4161 | \n", "F | \n", "0.585 | \n", "0.455 | \n", "0.170 | \n", "0.9945 | \n", "0.4255 | \n", "0.2630 | \n", "0.2845 | \n", "11 | \n", "
4162 | \n", "M | \n", "0.385 | \n", "0.255 | \n", "0.100 | \n", "0.3175 | \n", "0.1370 | \n", "0.0680 | \n", "0.0920 | \n", "8 | \n", "
4163 | \n", "I | \n", "0.390 | \n", "0.310 | \n", "0.085 | \n", "0.3440 | \n", "0.1810 | \n", "0.0695 | \n", "0.0790 | \n", "7 | \n", "
4164 | \n", "I | \n", "0.390 | \n", "0.290 | \n", "0.100 | \n", "0.2845 | \n", "0.1255 | \n", "0.0635 | \n", "0.0810 | \n", "7 | \n", "
4165 | \n", "I | \n", "0.405 | \n", "0.300 | \n", "0.085 | \n", "0.3035 | \n", "0.1500 | \n", "0.0505 | \n", "0.0880 | \n", "7 | \n", "
4166 | \n", "I | \n", "0.475 | \n", "0.365 | \n", "0.115 | \n", "0.4990 | \n", "0.2320 | \n", "0.0885 | \n", "0.1560 | \n", "10 | \n", "
4167 | \n", "M | \n", "0.500 | \n", "0.380 | \n", "0.125 | \n", "0.5770 | \n", "0.2690 | \n", "0.1265 | \n", "0.1535 | \n", "9 | \n", "
4168 | \n", "F | \n", "0.515 | \n", "0.400 | \n", "0.125 | \n", "0.6150 | \n", "0.2865 | \n", "0.1230 | \n", "0.1765 | \n", "8 | \n", "
4169 | \n", "M | \n", "0.520 | \n", "0.385 | \n", "0.165 | \n", "0.7910 | \n", "0.3750 | \n", "0.1800 | \n", "0.1815 | \n", "10 | \n", "
4170 | \n", "M | \n", "0.550 | \n", "0.430 | \n", "0.130 | \n", "0.8395 | \n", "0.3155 | \n", "0.1955 | \n", "0.2405 | \n", "10 | \n", "
4171 | \n", "M | \n", "0.560 | \n", "0.430 | \n", "0.155 | \n", "0.8675 | \n", "0.4000 | \n", "0.1720 | \n", "0.2290 | \n", "8 | \n", "
4172 | \n", "F | \n", "0.565 | \n", "0.450 | \n", "0.165 | \n", "0.8870 | \n", "0.3700 | \n", "0.2390 | \n", "0.2490 | \n", "11 | \n", "
4173 | \n", "M | \n", "0.590 | \n", "0.440 | \n", "0.135 | \n", "0.9660 | \n", "0.4390 | \n", "0.2145 | \n", "0.2605 | \n", "10 | \n", "
4174 | \n", "M | \n", "0.600 | \n", "0.475 | \n", "0.205 | \n", "1.1760 | \n", "0.5255 | \n", "0.2875 | \n", "0.3080 | \n", "9 | \n", "
4175 | \n", "F | \n", "0.625 | \n", "0.485 | \n", "0.150 | \n", "1.0945 | \n", "0.5310 | \n", "0.2610 | \n", "0.2960 | \n", "10 | \n", "
4176 | \n", "M | \n", "0.710 | \n", "0.555 | \n", "0.195 | \n", "1.9485 | \n", "0.9455 | \n", "0.3765 | \n", "0.4950 | \n", "12 | \n", "
4177 rows × 9 columns
\n", "