{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Exercise 1 - Bayes classification system" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import some useful libraries\n", "\n", "import math\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import OrdinalEncoder, StandardScaler" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1a. Getting started with Bayes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "a) Read the training data from file ex1-data-train.csv. The first two columns are x1 and x2. The last column holds the class label y." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "def read_data(file):\n", " dataset = pd.read_csv(file, names=['x1','x2','y'])\n", " print(dataset.head())\n", " return dataset[[\"x1\", \"x2\"]], dataset[\"y\"].values" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X_train, y_train = read_data(\"ex1-data-train.csv\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Prepare a function to compute accuracy\n", "def accuracy_score(y_true, y_pred):\n", " return (y_true == y_pred).sum() / y_true.size" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "b) Compute the priors of both classes P(C0) and P(C1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "# TODO: Compute the priors\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "c) Compute histograms of x1 and x2 for each class (total of 4 histograms). Plot these histograms. Advice : use the numpy `histogram(a, bins=\"auto\")` function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "# TODO: Compute histograms\n", "\n", "\n", "\n", "# TODO: plot histograms\n", "\n", "plt.figure(figsize=(16,6))\n", "\n", "plt.subplot(1, 2, 1)\n", "...\n", "plt.xlabel('Likelihood hist - Exam 1')\n", "\n", "plt.subplot(1, 2, 2)\n", "...\n", "plt.xlabel('Likelihood hist - Exam 2')\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "d) Use the histograms to compute the likelihoods p(x1|C0), p(x1|C1), p(x2|C0) and p(x2|C1). For this define a function `likelihood_hist(x, hist_values, edge_values)` that returns the likelihood of x for a given histogram (defined by its values and bin edges as returned by the numpy `histogram()` function)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "def likelihood_hist(x: float, hist_values: np.ndarray, bin_edges: np.ndarray) -> float:\n", " # TODO: compute likelihoods from histograms outputs\n", "\n", " return ..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "e) Implement the classification decision according to Bayes rule and compute the overall accuracy of the system on the test set ex1-data-test.csv. :\n", "- using only feature x1\n", "- using only feature x2\n", "- using x1 and x2 making the naive Bayes hypothesis of feature independence, i.e. p(X|Ck) = p(x1|Ck) · p(x2|Ck)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "X_test, y_test = read_data(\"ex1-data-test.csv\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "# TODO: predict on test set in the 3 cases described above\n", "\n", "y_pred = []\n", "\n", "...\n", "\n", "accuracy_score(y_test, y_pred)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Which system is the best ?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "TODO: answer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1b. Bayes - Univariate Gaussian distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do the same as in a) but this time using univariate Gaussian distribution to model the likelihoods p(x1|C0), p(x1|C1), p(x2|C0) and p(x2|C1). You may use the numpy functions `mean()` and `var()` to compute the mean μ and variance σ2 of the distribution. To model the likelihood of both features, you may also do the naive Bayes hypothesis of feature independence, i.e. p(X|Ck) = p(x1|Ck) · p(x2|Ck).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "def likelihood_univariate_gaussian(x: float, mean: float, var: float) -> float:\n", " # TODO: compute likelihoods from histograms outputs\n", "\n", " return ..." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "# TODO: Compute mean and variance for each classes and each features (8 values)\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO: predict on test set in the 3 cases\n", "\n", "y_pred = []\n", "\n", "...\n", "\n", "accuracy_score(y_test, y_pred)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.7" }, "pycharm": { "stem_cell": { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [] } } }, "nbformat": 4, "nbformat_minor": 1 }