Files
MachLePublic/PW-3/ex1/ex1-bayes-stud.ipynb
gabriel.marinoja b72020d0f7 feat: added PW3
2025-10-01 17:58:36 +02:00

334 lines
7.2 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Exercise 1 - Bayes classification system"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Import some useful libraries\n",
"\n",
"import math\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.preprocessing import OrdinalEncoder, StandardScaler"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1a. Getting started with Bayes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"a) Read the training data from file ex1-data-train.csv. The first two columns are x1 and x2. The last column holds the class label y."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"def read_data(file):\n",
" dataset = pd.read_csv(file, names=['x1','x2','y'])\n",
" print(dataset.head())\n",
" return dataset[[\"x1\", \"x2\"]], dataset[\"y\"].values"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_train, y_train = read_data(\"ex1-data-train.csv\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Prepare a function to compute accuracy\n",
"def accuracy_score(y_true, y_pred):\n",
" return (y_true == y_pred).sum() / y_true.size"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"b) Compute the priors of both classes P(C0) and P(C1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"# TODO: Compute the priors\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"c) Compute histograms of x1 and x2 for each class (total of 4 histograms). Plot these histograms. Advice : use the numpy `histogram(a, bins=\"auto\")` function."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"# TODO: Compute histograms\n",
"\n",
"\n",
"\n",
"# TODO: plot histograms\n",
"\n",
"plt.figure(figsize=(16,6))\n",
"\n",
"plt.subplot(1, 2, 1)\n",
"...\n",
"plt.xlabel('Likelihood hist - Exam 1')\n",
"\n",
"plt.subplot(1, 2, 2)\n",
"...\n",
"plt.xlabel('Likelihood hist - Exam 2')\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"d) Use the histograms to compute the likelihoods p(x1|C0), p(x1|C1), p(x2|C0) and p(x2|C1). For this define a function `likelihood_hist(x, hist_values, edge_values)` that returns the likelihood of x for a given histogram (defined by its values and bin edges as returned by the numpy `histogram()` function)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"def likelihood_hist(x: float, hist_values: np.ndarray, bin_edges: np.ndarray) -> float:\n",
" # TODO: compute likelihoods from histograms outputs\n",
"\n",
" return ..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"e) Implement the classification decision according to Bayes rule and compute the overall accuracy of the system on the test set ex1-data-test.csv. :\n",
"- using only feature x1\n",
"- using only feature x2\n",
"- using x1 and x2 making the naive Bayes hypothesis of feature independence, i.e. p(X|Ck) = p(x1|Ck) · p(x2|Ck)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"X_test, y_test = read_data(\"ex1-data-test.csv\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"# TODO: predict on test set in the 3 cases described above\n",
"\n",
"y_pred = []\n",
"\n",
"...\n",
"\n",
"accuracy_score(y_test, y_pred)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Which system is the best ?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"TODO: answer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1b. Bayes - Univariate Gaussian distribution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Do the same as in a) but this time using univariate Gaussian distribution to model the likelihoods p(x1|C0), p(x1|C1), p(x2|C0) and p(x2|C1). You may use the numpy functions `mean()` and `var()` to compute the mean μ and variance σ2 of the distribution. To model the likelihood of both features, you may also do the naive Bayes hypothesis of feature independence, i.e. p(X|Ck) = p(x1|Ck) · p(x2|Ck).\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"def likelihood_univariate_gaussian(x: float, mean: float, var: float) -> float:\n",
" # TODO: compute likelihoods from histograms outputs\n",
"\n",
" return ..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"# TODO: Compute mean and variance for each classes and each features (8 values)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO: predict on test set in the 3 cases\n",
"\n",
"y_pred = []\n",
"\n",
"...\n",
"\n",
"accuracy_score(y_test, y_pred)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.7"
},
"pycharm": {
"stem_cell": {
"cell_type": "raw",
"metadata": {
"collapsed": false
},
"source": []
}
}
},
"nbformat": 4,
"nbformat_minor": 1
}