feat: added PW3

This commit is contained in:
gabriel.marinoja
2025-10-01 17:58:36 +02:00
parent 492686df00
commit b72020d0f7
7 changed files with 21240 additions and 0 deletions

View File

@@ -0,0 +1,333 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Exercise 1 - Bayes classification system"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Import some useful libraries\n",
"\n",
"import math\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.preprocessing import OrdinalEncoder, StandardScaler"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1a. Getting started with Bayes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"a) Read the training data from file ex1-data-train.csv. The first two columns are x1 and x2. The last column holds the class label y."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"def read_data(file):\n",
" dataset = pd.read_csv(file, names=['x1','x2','y'])\n",
" print(dataset.head())\n",
" return dataset[[\"x1\", \"x2\"]], dataset[\"y\"].values"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_train, y_train = read_data(\"ex1-data-train.csv\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Prepare a function to compute accuracy\n",
"def accuracy_score(y_true, y_pred):\n",
" return (y_true == y_pred).sum() / y_true.size"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"b) Compute the priors of both classes P(C0) and P(C1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"# TODO: Compute the priors\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"c) Compute histograms of x1 and x2 for each class (total of 4 histograms). Plot these histograms. Advice : use the numpy `histogram(a, bins=\"auto\")` function."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"# TODO: Compute histograms\n",
"\n",
"\n",
"\n",
"# TODO: plot histograms\n",
"\n",
"plt.figure(figsize=(16,6))\n",
"\n",
"plt.subplot(1, 2, 1)\n",
"...\n",
"plt.xlabel('Likelihood hist - Exam 1')\n",
"\n",
"plt.subplot(1, 2, 2)\n",
"...\n",
"plt.xlabel('Likelihood hist - Exam 2')\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"d) Use the histograms to compute the likelihoods p(x1|C0), p(x1|C1), p(x2|C0) and p(x2|C1). For this define a function `likelihood_hist(x, hist_values, edge_values)` that returns the likelihood of x for a given histogram (defined by its values and bin edges as returned by the numpy `histogram()` function)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"def likelihood_hist(x: float, hist_values: np.ndarray, bin_edges: np.ndarray) -> float:\n",
" # TODO: compute likelihoods from histograms outputs\n",
"\n",
" return ..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"e) Implement the classification decision according to Bayes rule and compute the overall accuracy of the system on the test set ex1-data-test.csv. :\n",
"- using only feature x1\n",
"- using only feature x2\n",
"- using x1 and x2 making the naive Bayes hypothesis of feature independence, i.e. p(X|Ck) = p(x1|Ck) · p(x2|Ck)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"X_test, y_test = read_data(\"ex1-data-test.csv\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"# TODO: predict on test set in the 3 cases described above\n",
"\n",
"y_pred = []\n",
"\n",
"...\n",
"\n",
"accuracy_score(y_test, y_pred)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Which system is the best ?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"TODO: answer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1b. Bayes - Univariate Gaussian distribution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Do the same as in a) but this time using univariate Gaussian distribution to model the likelihoods p(x1|C0), p(x1|C1), p(x2|C0) and p(x2|C1). You may use the numpy functions `mean()` and `var()` to compute the mean μ and variance σ2 of the distribution. To model the likelihood of both features, you may also do the naive Bayes hypothesis of feature independence, i.e. p(X|Ck) = p(x1|Ck) · p(x2|Ck).\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"def likelihood_univariate_gaussian(x: float, mean: float, var: float) -> float:\n",
" # TODO: compute likelihoods from histograms outputs\n",
"\n",
" return ..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"# TODO: Compute mean and variance for each classes and each features (8 values)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO: predict on test set in the 3 cases\n",
"\n",
"y_pred = []\n",
"\n",
"...\n",
"\n",
"accuracy_score(y_test, y_pred)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.7"
},
"pycharm": {
"stem_cell": {
"cell_type": "raw",
"metadata": {
"collapsed": false
},
"source": []
}
}
},
"nbformat": 4,
"nbformat_minor": 1
}