feat: added PW3

This commit is contained in:
gabriel.marinoja
2025-10-01 17:58:36 +02:00
parent 492686df00
commit b72020d0f7
7 changed files with 21240 additions and 0 deletions

View File

@@ -0,0 +1,333 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Exercise 1 - Bayes classification system"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Import some useful libraries\n",
"\n",
"import math\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.preprocessing import OrdinalEncoder, StandardScaler"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1a. Getting started with Bayes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"a) Read the training data from file ex1-data-train.csv. The first two columns are x1 and x2. The last column holds the class label y."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"def read_data(file):\n",
" dataset = pd.read_csv(file, names=['x1','x2','y'])\n",
" print(dataset.head())\n",
" return dataset[[\"x1\", \"x2\"]], dataset[\"y\"].values"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_train, y_train = read_data(\"ex1-data-train.csv\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Prepare a function to compute accuracy\n",
"def accuracy_score(y_true, y_pred):\n",
" return (y_true == y_pred).sum() / y_true.size"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"b) Compute the priors of both classes P(C0) and P(C1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"# TODO: Compute the priors\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"c) Compute histograms of x1 and x2 for each class (total of 4 histograms). Plot these histograms. Advice : use the numpy `histogram(a, bins=\"auto\")` function."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"# TODO: Compute histograms\n",
"\n",
"\n",
"\n",
"# TODO: plot histograms\n",
"\n",
"plt.figure(figsize=(16,6))\n",
"\n",
"plt.subplot(1, 2, 1)\n",
"...\n",
"plt.xlabel('Likelihood hist - Exam 1')\n",
"\n",
"plt.subplot(1, 2, 2)\n",
"...\n",
"plt.xlabel('Likelihood hist - Exam 2')\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"d) Use the histograms to compute the likelihoods p(x1|C0), p(x1|C1), p(x2|C0) and p(x2|C1). For this define a function `likelihood_hist(x, hist_values, edge_values)` that returns the likelihood of x for a given histogram (defined by its values and bin edges as returned by the numpy `histogram()` function)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"def likelihood_hist(x: float, hist_values: np.ndarray, bin_edges: np.ndarray) -> float:\n",
" # TODO: compute likelihoods from histograms outputs\n",
"\n",
" return ..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"e) Implement the classification decision according to Bayes rule and compute the overall accuracy of the system on the test set ex1-data-test.csv. :\n",
"- using only feature x1\n",
"- using only feature x2\n",
"- using x1 and x2 making the naive Bayes hypothesis of feature independence, i.e. p(X|Ck) = p(x1|Ck) · p(x2|Ck)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"X_test, y_test = read_data(\"ex1-data-test.csv\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"# TODO: predict on test set in the 3 cases described above\n",
"\n",
"y_pred = []\n",
"\n",
"...\n",
"\n",
"accuracy_score(y_test, y_pred)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Which system is the best ?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"TODO: answer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1b. Bayes - Univariate Gaussian distribution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Do the same as in a) but this time using univariate Gaussian distribution to model the likelihoods p(x1|C0), p(x1|C1), p(x2|C0) and p(x2|C1). You may use the numpy functions `mean()` and `var()` to compute the mean μ and variance σ2 of the distribution. To model the likelihood of both features, you may also do the naive Bayes hypothesis of feature independence, i.e. p(X|Ck) = p(x1|Ck) · p(x2|Ck).\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"def likelihood_univariate_gaussian(x: float, mean: float, var: float) -> float:\n",
" # TODO: compute likelihoods from histograms outputs\n",
"\n",
" return ..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": false
}
},
"outputs": [],
"source": [
"# TODO: Compute mean and variance for each classes and each features (8 values)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO: predict on test set in the 3 cases\n",
"\n",
"y_pred = []\n",
"\n",
"...\n",
"\n",
"accuracy_score(y_test, y_pred)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.7"
},
"pycharm": {
"stem_cell": {
"cell_type": "raw",
"metadata": {
"collapsed": false
},
"source": []
}
}
},
"nbformat": 4,
"nbformat_minor": 1
}

100
PW-3/ex1/ex1-data-test.csv Normal file
View File

@@ -0,0 +1,100 @@
39.1963341568658,78.53029405902203,0
40.448499233673424,86.83946993295656,1
65.57192032694599,44.303496565835594,0
79.64811329486565,70.8065641864705,1
66.26022052135889,41.67270317074954,0
97.6637443782087,68.3249232452966,1
30.548823788843436,57.31847952965393,0
89.47322095778219,85.94680780258534,1
50.93087801180052,34.2357678392285,0
39.79292275937423,83.42467462939659,1
47.45440952767612,43.40242137611206,0
69.97497171303611,84.4084067760751,1
66.57906119077748,42.13570922437346,0
85.05872976046471,54.31025004023918,1
66.50445545099684,46.515380367647104,0
75.67274744410004,93.79012528285647,1
30.589637766842877,71.58841488039977,0
43.2174833244174,83.55961536494472,1
58.04023606927604,39.47235992846592,0
40.15801957067056,94.28873609786281,1
65.40785754453304,39.872039582416946,0
58.25386824923051,64.96454852577446,1
90.05150698066501,34.03096751205591,0
72.24873848000416,90.1077757094509,1
32.732305095404456,98.49269418173134,0
74.06410532697512,66.96252809184301,1
30.074888412046263,56.513104954256875,0
87.57197590933474,68.15013081653733,1
54.562040422189284,49.542441977062865,0
78.30902280632358,72.23271250670665,1
57.870305028845,48.514216465966285,0
91.35751201085463,85.6201641726489,1
32.89942225933118,68.89835152862396,0
75.96271751468554,73.37079167632794,1
49.73784613458287,59.13494209712587,0
73.5544567377702,66.04140381033584,1
34.20510941997501,72.62513617755425,0
54.49230689236608,75.50968920375037,1
48.50711697988822,47.74600670205531,0
92.3876668476141,76.82950398511272,1
39.89720264828788,62.09872615693186,0
75.76883065897587,43.6375457580161,1
32.938859931422954,75.6959591164835,0
44.53335294213268,86.44202248365731,1
51.265631719309845,60.12130845234037,0
70.78776945843022,84.2462083261098,1
28.94644639193278,39.599160546805116,0
47.53708530844937,73.62887169594207,1
49.02408652102979,48.50397486087145,0
78.37067490088779,93.91476948225585,1
48.806979396137145,62.206605350437144,0
72.03919354554785,88.5636216577281,1
31.23633606784064,96.30534895479137,0
51.56156298671939,89.15548481990747,1
65.08996501958059,39.488228986986606,0
81.75983894249494,47.952028645978714,1
46.466982795222684,43.17493123886225,0
64.49601863360589,82.20819682836424,1
65.59947425235588,42.79658543523777,0
50.66778894002708,64.22662181783375,1
30.665280235026138,42.70685221873931,0
76.60228200416394,65.62163965042933,1
60.39824874786827,38.54265995207925,0
80.7498890348191,47.942468664004934,1
81.83730756343084,39.62946723071423,0
76.67188156208798,73.0039571691345,1
31.702591304883626,73.4485451232566,0
89.75853252236888,65.1794033434368,1
31.111272744640324,77.90680809560692,0
56.360076920020845,68.81541270666031,1
47.365528695867354,59.268265092300844,0
81.99701278469126,55.477765254828924,1
73.19627144242138,28.399910031060564,0
50.28593379220375,85.68597173591368,1
30.532888808836397,77.17395841411421,0
66.62736064332904,65.14099834530835,1
30.563843972698294,44.15958836055778,0
69.30483520344725,90.15732087213348,1
40.63104177166124,61.47155968946135,0
67.51887729702649,76.70896125160789,1
33.6944962783859,43.961979616998335,0
54.61941030575024,73.60040410454849,1
29.956247697479498,91.60028497230863,0
59.56176709683286,81.89054923262506,1
29.097516205452173,92.0159604576793,0
87.75444054660184,65.2841177353011,1
79.14696413604753,40.118482227299694,0
74.48492746059782,92.34246943037195,1
26.332352061636747,44.9551699040027,0
54.346942016509146,58.43293962287077,1
29.947060203169244,93.06082834209418,0
96.32633710641187,64.80350360838675,1
29.864465690194475,73.11550264372423,0
62.2263271267271,57.84956855286749,1
35.2611254453108,72.85531587549292,0
47.340681257438895,69.41232032562911,1
63.19534209968015,36.963350930620166,0
59.46464897992196,72.40245846384263,1
60.08389682243888,42.48638233127113,0
57.45295498601704,73.67928309399463,1
1 39.1963341568658 78.53029405902203 0
2 40.448499233673424 86.83946993295656 1
3 65.57192032694599 44.303496565835594 0
4 79.64811329486565 70.8065641864705 1
5 66.26022052135889 41.67270317074954 0
6 97.6637443782087 68.3249232452966 1
7 30.548823788843436 57.31847952965393 0
8 89.47322095778219 85.94680780258534 1
9 50.93087801180052 34.2357678392285 0
10 39.79292275937423 83.42467462939659 1
11 47.45440952767612 43.40242137611206 0
12 69.97497171303611 84.4084067760751 1
13 66.57906119077748 42.13570922437346 0
14 85.05872976046471 54.31025004023918 1
15 66.50445545099684 46.515380367647104 0
16 75.67274744410004 93.79012528285647 1
17 30.589637766842877 71.58841488039977 0
18 43.2174833244174 83.55961536494472 1
19 58.04023606927604 39.47235992846592 0
20 40.15801957067056 94.28873609786281 1
21 65.40785754453304 39.872039582416946 0
22 58.25386824923051 64.96454852577446 1
23 90.05150698066501 34.03096751205591 0
24 72.24873848000416 90.1077757094509 1
25 32.732305095404456 98.49269418173134 0
26 74.06410532697512 66.96252809184301 1
27 30.074888412046263 56.513104954256875 0
28 87.57197590933474 68.15013081653733 1
29 54.562040422189284 49.542441977062865 0
30 78.30902280632358 72.23271250670665 1
31 57.870305028845 48.514216465966285 0
32 91.35751201085463 85.6201641726489 1
33 32.89942225933118 68.89835152862396 0
34 75.96271751468554 73.37079167632794 1
35 49.73784613458287 59.13494209712587 0
36 73.5544567377702 66.04140381033584 1
37 34.20510941997501 72.62513617755425 0
38 54.49230689236608 75.50968920375037 1
39 48.50711697988822 47.74600670205531 0
40 92.3876668476141 76.82950398511272 1
41 39.89720264828788 62.09872615693186 0
42 75.76883065897587 43.6375457580161 1
43 32.938859931422954 75.6959591164835 0
44 44.53335294213268 86.44202248365731 1
45 51.265631719309845 60.12130845234037 0
46 70.78776945843022 84.2462083261098 1
47 28.94644639193278 39.599160546805116 0
48 47.53708530844937 73.62887169594207 1
49 49.02408652102979 48.50397486087145 0
50 78.37067490088779 93.91476948225585 1
51 48.806979396137145 62.206605350437144 0
52 72.03919354554785 88.5636216577281 1
53 31.23633606784064 96.30534895479137 0
54 51.56156298671939 89.15548481990747 1
55 65.08996501958059 39.488228986986606 0
56 81.75983894249494 47.952028645978714 1
57 46.466982795222684 43.17493123886225 0
58 64.49601863360589 82.20819682836424 1
59 65.59947425235588 42.79658543523777 0
60 50.66778894002708 64.22662181783375 1
61 30.665280235026138 42.70685221873931 0
62 76.60228200416394 65.62163965042933 1
63 60.39824874786827 38.54265995207925 0
64 80.7498890348191 47.942468664004934 1
65 81.83730756343084 39.62946723071423 0
66 76.67188156208798 73.0039571691345 1
67 31.702591304883626 73.4485451232566 0
68 89.75853252236888 65.1794033434368 1
69 31.111272744640324 77.90680809560692 0
70 56.360076920020845 68.81541270666031 1
71 47.365528695867354 59.268265092300844 0
72 81.99701278469126 55.477765254828924 1
73 73.19627144242138 28.399910031060564 0
74 50.28593379220375 85.68597173591368 1
75 30.532888808836397 77.17395841411421 0
76 66.62736064332904 65.14099834530835 1
77 30.563843972698294 44.15958836055778 0
78 69.30483520344725 90.15732087213348 1
79 40.63104177166124 61.47155968946135 0
80 67.51887729702649 76.70896125160789 1
81 33.6944962783859 43.961979616998335 0
82 54.61941030575024 73.60040410454849 1
83 29.956247697479498 91.60028497230863 0
84 59.56176709683286 81.89054923262506 1
85 29.097516205452173 92.0159604576793 0
86 87.75444054660184 65.2841177353011 1
87 79.14696413604753 40.118482227299694 0
88 74.48492746059782 92.34246943037195 1
89 26.332352061636747 44.9551699040027 0
90 54.346942016509146 58.43293962287077 1
91 29.947060203169244 93.06082834209418 0
92 96.32633710641187 64.80350360838675 1
93 29.864465690194475 73.11550264372423 0
94 62.2263271267271 57.84956855286749 1
95 35.2611254453108 72.85531587549292 0
96 47.340681257438895 69.41232032562911 1
97 63.19534209968015 36.963350930620166 0
98 59.46464897992196 72.40245846384263 1
99 60.08389682243888 42.48638233127113 0
100 57.45295498601704 73.67928309399463 1

100
PW-3/ex1/ex1-data-train.csv Normal file
View File

@@ -0,0 +1,100 @@
34.62365962451697,78.0246928153624,0
30.28671076822607,43.89499752400101,0
35.84740876993872,72.90219802708364,0
60.18259938620976,86.30855209546826,1
79.0327360507101,75.3443764369103,1
45.08327747668339,56.3163717815305,0
61.10666453684766,96.51142588489624,1
75.02474556738889,46.55401354116538,1
76.09878670226257,87.42056971926803,1
84.43281996120035,43.53339331072109,1
95.86155507093572,38.22527805795094,0
75.01365838958247,30.60326323428011,0
82.30705337399482,76.48196330235604,1
69.36458875970939,97.71869196188608,1
39.53833914367223,76.03681085115882,0
53.9710521485623,89.20735013750205,1
69.07014406283025,52.74046973016765,1
67.94685547711617,46.67857410673128,0
70.66150955499435,92.92713789364831,1
76.97878372747498,47.57596364975532,1
67.37202754570876,42.83843832029179,0
89.67677575072079,65.79936592745237,1
50.534788289883,48.85581152764205,0
34.21206097786789,44.20952859866288,0
77.9240914545704,68.9723599933059,1
62.27101367004632,69.95445795447587,1
80.1901807509566,44.82162893218353,1
93.114388797442,38.80067033713209,0
61.83020602312595,50.25610789244621,0
38.78580379679423,64.99568095539578,0
61.379289447425,72.80788731317097,1
85.40451939411645,57.05198397627122,1
52.10797973193984,63.12762376881715,0
52.04540476831827,69.43286012045222,1
40.23689373545111,71.16774802184875,0
54.63510555424817,52.21388588061123,0
33.91550010906887,98.86943574220611,0
64.17698887494485,80.90806058670817,1
74.78925295941542,41.57341522824434,0
34.1836400264419,75.2377203360134,0
83.90239366249155,56.30804621605327,1
51.54772026906181,46.85629026349976,0
94.44336776917852,65.56892160559052,1
82.36875375713919,40.61825515970618,0
51.04775177128865,45.82270145776001,0
62.22267576120188,52.06099194836679,0
77.19303492601364,70.45820000180959,1
97.77159928000232,86.7278223300282,1
62.07306379667647,96.76882412413983,1
91.56497449807442,88.69629254546599,1
79.94481794066932,74.16311935043758,1
99.2725269292572,60.99903099844988,1
90.54671411399852,43.39060180650027,1
34.52451385320009,60.39634245837173,0
50.2864961189907,49.80453881323059,0
49.58667721632031,59.80895099453265,0
97.64563396007767,68.86157272420604,1
32.57720016809309,95.59854761387875,0
74.24869136721598,69.82457122657193,1
71.79646205863379,78.45356224515052,1
75.3956114656803,85.75993667331619,1
35.28611281526193,47.02051394723416,0
56.25381749711624,39.26147251058019,0
30.05882244669796,49.59297386723685,0
44.66826172480893,66.45008614558913,0
66.56089447242954,41.09209807936973,0
40.45755098375164,97.53518548909936,1
49.07256321908844,51.88321182073966,0
80.27957401466998,92.11606081344084,1
66.74671856944039,60.99139402740988,1
32.72283304060323,43.30717306430063,0
64.0393204150601,78.03168802018232,1
72.34649422579923,96.22759296761404,1
60.45788573918959,73.09499809758037,1
58.84095621726802,75.85844831279042,1
99.82785779692128,72.36925193383885,1
47.26426910848174,88.47586499559782,1
50.45815980285988,75.80985952982456,1
60.45555629271532,42.50840943572217,0
82.22666157785568,42.71987853716458,0
88.9138964166533,69.80378889835472,1
94.83450672430196,45.69430680250754,1
67.31925746917527,66.58935317747915,1
57.23870631569862,59.51428198012956,1
80.36675600171273,90.96014789746954,1
68.46852178591112,85.59430710452014,1
42.0754545384731,78.84478600148043,0
75.47770200533905,90.42453899753964,1
78.63542434898018,96.64742716885644,1
52.34800398794107,60.76950525602592,0
94.09433112516793,77.15910509073893,1
90.44855097096364,87.50879176484702,1
55.48216114069585,35.57070347228866,0
74.49269241843041,84.84513684930135,1
89.84580670720979,45.35828361091658,1
83.48916274498238,48.38028579728175,1
42.2617008099817,87.10385094025457,1
99.31500880510394,68.77540947206617,1
55.34001756003703,64.9319380069486,1
74.77589300092767,89.52981289513276,1
1 34.62365962451697 78.0246928153624 0
2 30.28671076822607 43.89499752400101 0
3 35.84740876993872 72.90219802708364 0
4 60.18259938620976 86.30855209546826 1
5 79.0327360507101 75.3443764369103 1
6 45.08327747668339 56.3163717815305 0
7 61.10666453684766 96.51142588489624 1
8 75.02474556738889 46.55401354116538 1
9 76.09878670226257 87.42056971926803 1
10 84.43281996120035 43.53339331072109 1
11 95.86155507093572 38.22527805795094 0
12 75.01365838958247 30.60326323428011 0
13 82.30705337399482 76.48196330235604 1
14 69.36458875970939 97.71869196188608 1
15 39.53833914367223 76.03681085115882 0
16 53.9710521485623 89.20735013750205 1
17 69.07014406283025 52.74046973016765 1
18 67.94685547711617 46.67857410673128 0
19 70.66150955499435 92.92713789364831 1
20 76.97878372747498 47.57596364975532 1
21 67.37202754570876 42.83843832029179 0
22 89.67677575072079 65.79936592745237 1
23 50.534788289883 48.85581152764205 0
24 34.21206097786789 44.20952859866288 0
25 77.9240914545704 68.9723599933059 1
26 62.27101367004632 69.95445795447587 1
27 80.1901807509566 44.82162893218353 1
28 93.114388797442 38.80067033713209 0
29 61.83020602312595 50.25610789244621 0
30 38.78580379679423 64.99568095539578 0
31 61.379289447425 72.80788731317097 1
32 85.40451939411645 57.05198397627122 1
33 52.10797973193984 63.12762376881715 0
34 52.04540476831827 69.43286012045222 1
35 40.23689373545111 71.16774802184875 0
36 54.63510555424817 52.21388588061123 0
37 33.91550010906887 98.86943574220611 0
38 64.17698887494485 80.90806058670817 1
39 74.78925295941542 41.57341522824434 0
40 34.1836400264419 75.2377203360134 0
41 83.90239366249155 56.30804621605327 1
42 51.54772026906181 46.85629026349976 0
43 94.44336776917852 65.56892160559052 1
44 82.36875375713919 40.61825515970618 0
45 51.04775177128865 45.82270145776001 0
46 62.22267576120188 52.06099194836679 0
47 77.19303492601364 70.45820000180959 1
48 97.77159928000232 86.7278223300282 1
49 62.07306379667647 96.76882412413983 1
50 91.56497449807442 88.69629254546599 1
51 79.94481794066932 74.16311935043758 1
52 99.2725269292572 60.99903099844988 1
53 90.54671411399852 43.39060180650027 1
54 34.52451385320009 60.39634245837173 0
55 50.2864961189907 49.80453881323059 0
56 49.58667721632031 59.80895099453265 0
57 97.64563396007767 68.86157272420604 1
58 32.57720016809309 95.59854761387875 0
59 74.24869136721598 69.82457122657193 1
60 71.79646205863379 78.45356224515052 1
61 75.3956114656803 85.75993667331619 1
62 35.28611281526193 47.02051394723416 0
63 56.25381749711624 39.26147251058019 0
64 30.05882244669796 49.59297386723685 0
65 44.66826172480893 66.45008614558913 0
66 66.56089447242954 41.09209807936973 0
67 40.45755098375164 97.53518548909936 1
68 49.07256321908844 51.88321182073966 0
69 80.27957401466998 92.11606081344084 1
70 66.74671856944039 60.99139402740988 1
71 32.72283304060323 43.30717306430063 0
72 64.0393204150601 78.03168802018232 1
73 72.34649422579923 96.22759296761404 1
74 60.45788573918959 73.09499809758037 1
75 58.84095621726802 75.85844831279042 1
76 99.82785779692128 72.36925193383885 1
77 47.26426910848174 88.47586499559782 1
78 50.45815980285988 75.80985952982456 1
79 60.45555629271532 42.50840943572217 0
80 82.22666157785568 42.71987853716458 0
81 88.9138964166533 69.80378889835472 1
82 94.83450672430196 45.69430680250754 1
83 67.31925746917527 66.58935317747915 1
84 57.23870631569862 59.51428198012956 1
85 80.36675600171273 90.96014789746954 1
86 68.46852178591112 85.59430710452014 1
87 42.0754545384731 78.84478600148043 0
88 75.47770200533905 90.42453899753964 1
89 78.63542434898018 96.64742716885644 1
90 52.34800398794107 60.76950525602592 0
91 94.09433112516793 77.15910509073893 1
92 90.44855097096364 87.50879176484702 1
93 55.48216114069585 35.57070347228866 0
94 74.49269241843041 84.84513684930135 1
95 89.84580670720979 45.35828361091658 1
96 83.48916274498238 48.38028579728175 1
97 42.2617008099817 87.10385094025457 1
98 99.31500880510394 68.77540947206617 1
99 55.34001756003703 64.9319380069486 1
100 74.77589300092767 89.52981289513276 1

View File

@@ -0,0 +1,535 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "bcf79585",
"metadata": {},
"source": [
"# Exercice 2 - System evaluation"
]
},
{
"cell_type": "markdown",
"id": "f642cedb",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "9421a4e1",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"id": "a0d67fa6",
"metadata": {},
"source": [
"## Load data"
]
},
{
"cell_type": "markdown",
"id": "5fe90672",
"metadata": {},
"source": [
"Define the path of the data file"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "ecd4a4cf",
"metadata": {},
"outputs": [],
"source": [
"path = \"ex2-system-a.csv\""
]
},
{
"cell_type": "markdown",
"id": "246e7392",
"metadata": {},
"source": [
"Read the CSV file using `read_csv`"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "623096a5",
"metadata": {},
"outputs": [],
"source": [
"dataset_a = pd.read_csv(path, sep=\";\", index_col=False, names=[\"0\", \"1\", \"2\", \"3\", \"4\", \"5\", \"6\", \"7\", \"8\", \"9\", \"y_true\"])"
]
},
{
"cell_type": "markdown",
"id": "6f764c56",
"metadata": {},
"source": [
"Display first rows"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "c59a1651",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" <th>5</th>\n",
" <th>6</th>\n",
" <th>7</th>\n",
" <th>8</th>\n",
" <th>9</th>\n",
" <th>y_true</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5.348450e-08</td>\n",
" <td>7.493480e-10</td>\n",
" <td>8.083470e-07</td>\n",
" <td>2.082290e-05</td>\n",
" <td>5.222360e-10</td>\n",
" <td>2.330260e-08</td>\n",
" <td>5.241270e-12</td>\n",
" <td>9.999650e-01</td>\n",
" <td>4.808590e-07</td>\n",
" <td>0.000013</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1.334270e-03</td>\n",
" <td>3.202960e-05</td>\n",
" <td>8.504280e-01</td>\n",
" <td>1.669090e-03</td>\n",
" <td>1.546460e-07</td>\n",
" <td>2.412940e-04</td>\n",
" <td>1.448280e-01</td>\n",
" <td>1.122810e-11</td>\n",
" <td>1.456330e-03</td>\n",
" <td>0.000011</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3.643050e-06</td>\n",
" <td>9.962760e-01</td>\n",
" <td>2.045910e-03</td>\n",
" <td>4.210530e-04</td>\n",
" <td>2.194020e-05</td>\n",
" <td>1.644130e-05</td>\n",
" <td>2.838160e-04</td>\n",
" <td>3.722960e-04</td>\n",
" <td>5.150120e-04</td>\n",
" <td>0.000044</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>9.998200e-01</td>\n",
" <td>2.550390e-10</td>\n",
" <td>1.112010e-05</td>\n",
" <td>1.653200e-05</td>\n",
" <td>5.375730e-10</td>\n",
" <td>8.999750e-05</td>\n",
" <td>9.380920e-06</td>\n",
" <td>4.464470e-05</td>\n",
" <td>2.418440e-06</td>\n",
" <td>0.000006</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2.092460e-08</td>\n",
" <td>7.464220e-08</td>\n",
" <td>3.560820e-05</td>\n",
" <td>5.496200e-07</td>\n",
" <td>9.988960e-01</td>\n",
" <td>3.070920e-08</td>\n",
" <td>2.346150e-04</td>\n",
" <td>9.748010e-07</td>\n",
" <td>1.071610e-06</td>\n",
" <td>0.000831</td>\n",
" <td>4</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0 1 2 3 4 \\\n",
"0 5.348450e-08 7.493480e-10 8.083470e-07 2.082290e-05 5.222360e-10 \n",
"1 1.334270e-03 3.202960e-05 8.504280e-01 1.669090e-03 1.546460e-07 \n",
"2 3.643050e-06 9.962760e-01 2.045910e-03 4.210530e-04 2.194020e-05 \n",
"3 9.998200e-01 2.550390e-10 1.112010e-05 1.653200e-05 5.375730e-10 \n",
"4 2.092460e-08 7.464220e-08 3.560820e-05 5.496200e-07 9.988960e-01 \n",
"\n",
" 5 6 7 8 9 y_true \n",
"0 2.330260e-08 5.241270e-12 9.999650e-01 4.808590e-07 0.000013 7 \n",
"1 2.412940e-04 1.448280e-01 1.122810e-11 1.456330e-03 0.000011 2 \n",
"2 1.644130e-05 2.838160e-04 3.722960e-04 5.150120e-04 0.000044 1 \n",
"3 8.999750e-05 9.380920e-06 4.464470e-05 2.418440e-06 0.000006 0 \n",
"4 3.070920e-08 2.346150e-04 9.748010e-07 1.071610e-06 0.000831 4 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataset_a.head()"
]
},
{
"cell_type": "markdown",
"id": "41f040b0",
"metadata": {},
"source": [
"Store some useful statistics (class names + number of classes)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "fd0adce4",
"metadata": {},
"outputs": [],
"source": [
"class_names = [\"0\", \"1\", \"2\", \"3\", \"4\", \"5\", \"6\", \"7\", \"8\", \"9\"]\n",
"nb_classes = len(class_names)"
]
},
{
"cell_type": "markdown",
"id": "5a0ab85a",
"metadata": {},
"source": [
"## Exercise's steps"
]
},
{
"cell_type": "markdown",
"id": "66ae582e",
"metadata": {},
"source": [
"a) Write a function to take classification decisions on such outputs according to Bayesrule."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "3c36b377",
"metadata": {},
"outputs": [],
"source": [
"def bayes_classification(df):\n",
" \"\"\"\n",
" Take classification decisions according to Bayes rule.\n",
" \n",
" Parameters\n",
" ----------\n",
" df : Pandas DataFrame of shape (n_samples, n_features + ground truth)\n",
" Dataset.\n",
" \n",
" Returns\n",
" -------\n",
" preds : Numpy array of shape (n_samples,)\n",
" Class labels for each data sample.\n",
" \"\"\"\n",
" # Your code here\n",
" pass"
]
},
{
"cell_type": "markdown",
"id": "b5e8140b",
"metadata": {},
"source": [
"b) What is the overall error rate of the system ?"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "f3b21bfb",
"metadata": {},
"outputs": [],
"source": [
"# Your code here: compute and print the error rate of the system"
]
},
{
"cell_type": "markdown",
"id": "a4f0fa5f",
"metadata": {},
"source": [
"c) Compute and report the confusion matrix of the system."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "bb106415",
"metadata": {},
"outputs": [],
"source": [
"def confusion_matrix(y_true, y_pred, n_classes):\n",
" \"\"\"\n",
" Compute the confusion matrix.\n",
" \n",
" Parameters\n",
" ----------\n",
" y_true : Numpy array of shape (n_samples,)\n",
" Ground truth.\n",
" y_pred : Numpy array of shape (n_samples,)\n",
" Predictions.\n",
" n_classes : Integer\n",
" Number of classes.\n",
" \n",
" Returns\n",
" -------\n",
" cm : Numpy array of shape (n_classes, n_classes)\n",
" Confusion matrix.\n",
" \"\"\"\n",
" # Your code here\n",
" pass"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "1b38e3a8",
"metadata": {},
"outputs": [],
"source": [
"# Your code here: compute and print the confusion matrix"
]
},
{
"cell_type": "markdown",
"id": "ed8db908",
"metadata": {},
"source": [
"d) What are the worst and best classes in terms of precision and recall ?"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "0e229ce0",
"metadata": {},
"outputs": [],
"source": [
"def precision_per_class(cm):\n",
" \"\"\"\n",
" Compute the precision per class.\n",
" \n",
" Parameters\n",
" ----------\n",
" cm : Numpy array of shape (n_classes, n_classes)\n",
" Confusion matrix.\n",
" \n",
" Returns\n",
" -------\n",
" precisions : Numpy array of shape (n_classes,)\n",
" Precision per class.\n",
" \"\"\"\n",
" # Your code here\n",
" pass"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "95325772",
"metadata": {},
"outputs": [],
"source": [
"def recall_per_class(cm):\n",
" \"\"\"\n",
" Compute the recall per class.\n",
" \n",
" Parameters\n",
" ----------\n",
" cm : Numpy array of shape (n_classes, n_classes)\n",
" Confusion matrix.\n",
" \n",
" Returns\n",
" -------\n",
" recalls : Numpy array of shape (n_classes,)\n",
" Recall per class.\n",
" \"\"\"\n",
" # Your code here\n",
" pass"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "a0fb19e3",
"metadata": {},
"outputs": [],
"source": [
"# Your code here: find and print the worst and best classes in terms of precision"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "42c3edd8",
"metadata": {},
"outputs": [],
"source": [
"# Your code here: find and print the worst and best classes in terms of recall"
]
},
{
"cell_type": "markdown",
"id": "7ac6fe5d",
"metadata": {},
"source": [
"e) In file `ex1-system-b.csv` you find the output of a second system B. What is the best system between (a) and (b) in terms of error rate and F1."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "b98c2545",
"metadata": {},
"outputs": [],
"source": [
"# Your code here: load the data of the system B"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "050091b9",
"metadata": {},
"outputs": [],
"source": [
"def system_accuracy(cm):\n",
" \"\"\"\n",
" Compute the system accuracy.\n",
" \n",
" Parameters\n",
" ----------\n",
" cm : Numpy array of shape (n_classes, n_classes)\n",
" Confusion matrix.\n",
" \n",
" Returns\n",
" -------\n",
" accuracy : Float\n",
" Accuracy of the system.\n",
" \"\"\"\n",
" # Your code here\n",
" pass"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "adc0f138",
"metadata": {},
"outputs": [],
"source": [
"def system_f1_score(cm):\n",
" \"\"\"\n",
" Compute the system F1 score.\n",
" \n",
" Parameters\n",
" ----------\n",
" cm : Numpy array of shape (n_classes, n_classes)\n",
" Confusion matrix.\n",
" \n",
" Returns\n",
" -------\n",
" f1_score : Float\n",
" F1 score of the system.\n",
" \"\"\"\n",
" # Your code here\n",
" pass"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "f1385c87",
"metadata": {},
"outputs": [],
"source": [
"# Your code here: compute and print the accuracy and the F1 score of the system A"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "50c64d08",
"metadata": {},
"outputs": [],
"source": [
"# Your code here: compute and print the accuracy and the F1 score of the system B"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

10000
PW-3/ex2/ex2-system-a.csv Normal file

File diff suppressed because it is too large Load Diff

10000
PW-3/ex2/ex2-system-b.csv Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,172 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ad0d40d6",
"metadata": {},
"source": [
"# Exercice 3 - Review questions"
]
},
{
"cell_type": "markdown",
"id": "3e556a9d",
"metadata": {},
"source": [
"**a) Assuming an univariate input *x*, what is the complexity at inference time of a Bayesian classifier based on histogram computation of the likelihood ?**"
]
},
{
"cell_type": "markdown",
"id": "8d2fb7ef",
"metadata": {},
"source": [
"TODO"
]
},
{
"cell_type": "markdown",
"id": "99632770",
"metadata": {},
"source": [
"**b) Bayesian models are said to be generative as they can be used to generate new samples. Taking the implementation of the exercise 1.a, explain the steps to generate new samples using the system you have put into place.**\n",
" "
]
},
{
"cell_type": "markdown",
"id": "88ab64b2",
"metadata": {},
"source": [
"TODO"
]
},
{
"cell_type": "markdown",
"id": "e2f611fe",
"metadata": {},
"source": [
"***Optional*: Provide an implementation in a function generateSample(priors, histValues, edgeValues, n)**"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "14aba0f7",
"metadata": {},
"outputs": [],
"source": [
"pass"
]
},
{
"cell_type": "markdown",
"id": "ed8c4f6b",
"metadata": {},
"source": [
"**c) What is the minimum overall accuracy of a 2-class system relying only on priors and that is built on a training set that includes 5 times more samples in class A than in class B?**"
]
},
{
"cell_type": "markdown",
"id": "4bb03365",
"metadata": {},
"source": [
"TODO"
]
},
{
"cell_type": "markdown",
"id": "58450ff6",
"metadata": {},
"source": [
"**d) Lets look back at the PW02 exercise 3 of last week. We have built a knn classification systems for images of digits on the MNIST database.**\n",
"\n",
"**How would you build a Bayesian classification for the same task ? Comment on the prior probabilities and on the likelihood estimators. More specifically, what kind of likelihood estimator could we use in this case ?**"
]
},
{
"cell_type": "markdown",
"id": "d2bf1500",
"metadata": {},
"source": [
"TODO"
]
},
{
"cell_type": "markdown",
"id": "a3ca9715",
"metadata": {},
"source": [
"***Optional:* implement it and report performance !**"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "4de72736",
"metadata": {},
"outputs": [],
"source": [
"pass"
]
},
{
"cell_type": "markdown",
"id": "b812b46f",
"metadata": {},
"source": [
"**e) Read [europe-border-control-ai-lie-detector](https://theintercept.com/2019/07/26/europe-border-control-ai-lie-detector/). The described system is \"a virtual policeman designed to strengthen European borders\". It can be seen as a 2-class problem, either you are a suspicious traveler or you are not. If you are declared as suspicious by the system, you are routed to a human border agent who analyses your case in a more careful way.**\n",
"\n",
"1. What kind of errors can the system make ? Explain them in your own words.\n",
"2. Is one error more critical than the other ? Explain why.\n",
"3. According to the previous points, which metric would you recommend to tune your MLsystem ?"
]
},
{
"cell_type": "markdown",
"id": "1adf1760",
"metadata": {},
"source": [
"TODO"
]
},
{
"cell_type": "markdown",
"id": "195a1f73-c0f7-4707-9551-c71bfa379960",
"metadata": {},
"source": [
"**f) When a deep learning architecture is trained using an unbalanced training set, we usually observe a problem of bias, i.e. the system favors one class over another one. Using the Bayes equation, explain what is the origin of the problem.**"
]
},
{
"cell_type": "markdown",
"id": "fa5ffd45-0645-4093-9a1b-0a7aeaeece0e",
"metadata": {},
"source": [
"TODO"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}