Table of contents
Overview
You will have a project component to the course.
Project 1: Probability, Data, and Conditional Reasoning
Due Date: Wednesday, February 18th, 11:59PM
Get help early!
This project is meant to be a straightforward extension of what you are learning in class and recitation, but it still may be challenging for you! Make sure you’re getting started early and looking for help!
Mohi’s office hours
Feel free to attend Kenny’s office hours, listed on the syllabus, or Mohi’s, which are from 2-3PM Mondays in the first floor atrium of Davis Hall (Mohi will find a table and post up!)
This is an individual assignment, please ensure you follow the rules of Academic Integrity established by UB and in the course!
This project is designed to introduce you to working with data in Python using basic Python features (such as lists, dictionaries, loops, and functions) and NumPy, while reinforcing key ideas from Chapters 1 and 2 of the course, including sets, counting, probability, conditional probability, and Bayes’ rule.
You will complete the project in three parts. Each part builds toward using real data to reason about uncertainty in a meaningful real-world context. You can check this link for help.
Part 1: Sets, Counting, and Probability (30 points)
In this part, you will practice basic Python operations and use code to compute quantities that would be difficult or tedious to calculate by hand.
- Using Python sets, define at least two sets (can be defined using any random functions in Python or manually defined by you) and compute:
- their union
- their intersection
- their difference
Briefly explain what each operation represents.
(10 points) - Consider a simple probability scenario (for example, drawing cards from a deck, rolling dice, or another discrete experiment).
- Use Python to compute the probability of an event of interest. Your code should be able to receive inputs and print the proper output.
- Explain why using code is helpful for this calculation.
(20 points)
Part 2: Conditional Probability (30 points)
In this part, you will use data visualization to explore conditional probabilities.
-
Create a data structure to represent a small dataset using lists or NumPy arrays (for example, a list of lists or a 2D NumPy array where rows represent observations and columns represent variables).
(5 points) -
Use basic Python loops or NumPy operations to compute summary statistics (such as counts or frequencies) for your data and print them in a readable format (e.g., as a text-based table).
(10 points) -
Compute one conditional probability of your own choice and explain what it means.
(10 points) -
Compute one conditional probability of your own choice and explain what it means.
(5 points)
Part 3: Bayes’ Rule and Real-World Data (40 points)
The dataset for this project concerns COVID-19 symptoms and test results.
You can download the CSV file from here.
Each row represents an individual, and columns indicate whether that person tested positive for COVID-19 and whether they experienced certain symptoms, as well as Gender, Age, and City.
Use Python’s built-in csv module to read the file into a list of lists or dictionaries, then convert relevant parts to NumPy arrays for computations if needed. Avoid using any external libraries beyond NumPy for data processing.
-
Compute the probability of a positive COVID-19 test in this dataset.
(10 points) -
Compute at least two conditional probabilities involving symptoms (for example, the probability of a symptom given a positive test, or the probability of a positive test given a symptom).
(10 points) -
Use Bayes’ rule to compute other conditional probabilities indirectly. Use your previous examples.
(10 points) -
Interpret your result in plain language. What does this probability mean, and why is Bayes’ rule useful in this context?
(10 points)
Submission Instructions
Submit a single Jupyter notebook that includes:
- All code used in the project
- Written explanations in Markdown cells
- Deadline: Feb 18th
- Upload location: UBLearns → Course Page → Assignments → Project 1
Clarity of explanation and correctness of reasoning are more important than code complexity.
Project 2: Matrices and Vectors
Matrix Vector Multiplication
Please read the provided two scenarios carefully and then proceed to do the tasks.
You should come up with a way to INPUT the matrices and vectors for any scenario you’re coming up with. Please don’t use the same students example, you can use service rating, warehouse storage or other scenarios.
Scenario
A school wants a simple scoring rule to estimate whether a student is likely to pass a quiz.
Each student has two inputs:
x1 = hours studiedx2 = hours of sleep
The scoring rule uses a weight vector:
w = [2
1]
This means studying counts twice as much as sleep. The score for one student is:
s = x · w = 2x1 + 1x2
Given Data
The input data for 6 students is stored in the matrix X, where each row is one student:
X = [1 6
2 5
3 4
4 3
5 2
6 1]
Compute the score vector:
s = Xw
Sample Answer
Given
X = [1 6
2 5
3 4
4 3
5 2
6 1]
w = [2
1]
Compute
s = Xw
by multiplying each row of X by w:
s = [(1)(2) + (6)(1)
(2)(2) + (5)(1)
(3)(2) + (4)(1)
(4)(2) + (3)(1)
(5)(2) + (2)(1)
(6)(2) + (1)(1)]
= [8
9
10
11
12
13]
If the passing rule is s ≥ 10, then the predicted label vector is:
y_hat = [0
0
1
1
1
1]
because 8 < 10, 9 < 10, and 10, 11, 12, 13 ≥ 10.
If we change the weight vector to:
w' = [1
2]
then:
s' = Xw'
= [(1)(1) + (6)(2)
(2)(1) + (5)(2)
(3)(1) + (4)(2)
(4)(1) + (3)(2)
(5)(1) + (2)(2)
(6)(1) + (1)(2)]
= [13
12
11
10
9
8]
Tasks
- Come up with a scenario with 3 inputs.
- Compute the score for each desired case using matrix vector multiplication.
- List the scores as a column vector.
- Define a passing rule and check if that happens for each case.
- Answer these questions:
- (a) Which cases are predicted to pass
- (b) How many cases are predicted to pass
- (c) Which case has the highest score
- (d) What happens to all scores if we change the weight vector to:
w = [1
2
3]
Explain what this means in words.
Expected Output Format
Students should print results in a clear format such as:
- Case number
- Input vector
[x1, x2, x3] - Score
s - Prediction
y_hat
Two Matrix Multiplication
Scenario
A teacher has two raw grades for each student:
- Homework
- Quizzes
The teacher wants to convert these raw grades into two new scores that will appear on the report card:
- Knowledge Score
- Participation Score
The idea is:
- Knowledge Score should depend more on Quizzes.
- Participation Score should depend more on Homework.
Given Data
The raw grade matrix for 4 students is:
X = [9 7
6 8
10 5
7 6]
Weight Matrix
The teacher chooses a weight matrix W that turns HW, Quiz into Knowledge, Participation:
W = [1 2
3 1]
This means:
Knowledge Score = 1·HW + 3·QuizParticipation Score = 2·HW + 1·Quiz
Compute the report card scores using:
Y = XW
Answer
Y = XW
= [9 7
6 8
10 5
7 6]
[1 2
3 1]
Row 1 of Y (student 1):
y11 = (9)(1) + (7)(3) = 30
y12 = (9)(2) + (7)(1) = 25
Row 2 of Y (student 2):
y21 = (6)(1) + (8)(3) = 30
y22 = (6)(2) + (8)(1) = 20
Row 3 of Y (student 3):
y31 = (10)(1) + (5)(3) = 25
y32 = (10)(2) + (5)(1) = 25
Row 4 of Y (student 4):
y41 = (7)(1) + (6)(3) = 25
y42 = (7)(2) + (6)(1) = 20
So the final report card score matrix is:
Y = [30 25
30 20
25 25
25 20]
Tasks
- Come up with a scenario of the same nature.
- Compute the score for each desired case using matrix multiplication.
- List the scores.
- Define a rule (can be mean of the scores or separated rule for each part) and check if that happens for each case.
- Answer these questions:
- (a) Which cases are predicted to pass
- (b) How many cases are predicted to pass
- (c) Which case has the highest average score
Expected Output Format
Students should print results in a clear format such as:
- Case number
- Input Matrix
[x1, x2] - Score Vector
s - Prediction
y_hat
Final Deliverables
Students will submit:
- A Notebook that includes their code and prints results.
- Short written explanation INSIDE the notebook markdowns.
Project 3: Building and Training a Neuron
1. Introduction
This project builds a beginner friendly machine learning model that connects vectors and matrices to neural networks. The model used here is a single neuron, which is also called logistic regression. The purpose of the project is to show how an input vector can be combined with a weight vector and a bias to produce a probability between 0 and 1. This gives a simple example of how neural networks work at their most basic level.
2. A Motivating Example
Suppose we want to classify points into one of two groups using two input features. Each data point has two values, and the goal is to predict whether the label should be 0 or 1. Instead of making the prediction by hand each time, we want a model that learns a rule from the data.
This project follows the same general learning process used in class:
- Define a model class
- Define a loss or error idea
- Define an optimization method to improve the model
3. Model Class
The model class in this project is a single neuron. Each example is represented by a vector:
\[\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}\]The model parameters are a weight vector and a bias:
\[\mathbf{w} = \begin{bmatrix} w_1 \\ w_2 \end{bmatrix}, \qquad b \in \mathbb{R}\]The linear score is computed as:
\[z = \mathbf{w} \cdot \mathbf{x} + b\]This score is then passed through the sigmoid function to get a probability:
\[p = \sigma(z) = \frac{1}{1 + e^{-z}}\]The final class prediction is:
\[\hat{y} = \begin{cases} 1 & \text{if } p \geq 0.5 \\ 0 & \text{if } p < 0.5 \end{cases}\]4. Loss and Error Idea
The model makes a probability prediction $p$, and we compare it to the true label $y$. A simple error term for one example is:
\[\text{error} = p - y\]This error tells us whether the model predicted too high or too low. If the error is positive, the prediction was too large. If the error is negative, the prediction was too small.
5. Optimization with Gradient Descent
To improve the model, we update the weights and bias using gradient descent. For a learning rate $\alpha$, the update rules are:
\[w_1 \leftarrow w_1 - \alpha \cdot \text{error} \cdot x_1\] \[w_2 \leftarrow w_2 - \alpha \cdot \text{error} \cdot x_2\] \[b \leftarrow b - \alpha \cdot \text{error}\]These updates are repeated over the dataset until the model improves.
6. Vector and Matrix Representation
The full dataset can be stored as an input matrix:
\[X = \begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \\ \vdots & \vdots \\ x_{n1} & x_{n2} \end{bmatrix}\]and a label vector:
\[\mathbf{y} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}\]This matrix form makes it easier to compute predictions for many examples and connects directly to the linear algebra ideas discussed in class.
7. Implementation Steps
The notebook for this project should follow these steps:
- Create a small dataset with two input features and binary labels
- Store the data as a matrix $X$ and a label vector $\mathbf{y}$
- Write a dot product function
- Write the sigmoid function
- Write a prediction function that returns probabilities
- Train the model using gradient descent
- Evaluate the model using accuracy
- Display a small prediction table and explain what the learned weights mean
8. Results
After training, the model should report the final learned parameters, predicted values, and classification accuracy.
| Example | x1 | x2 | True Label | Predicted Probability | Predicted Label |
|---|---|---|---|---|---|
| 1 | 0.2 | 1.1 | 0 | 0.31 | 0 |
| 2 | 1.4 | 2.0 | 1 | 0.84 | 1 |
| 3 | 0.8 | 0.5 | 0 | 0.42 | 0 |
| 4 | 1.7 | 1.3 | 1 | 0.76 | 1 |
Table 1. Example predictions from the trained single neuron model.
8.1 Accuracy
The classification accuracy is computed as:
\[\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}}\]9. Interpretation of the Weights
The learned weights show how strongly each feature affects the prediction. A positive weight means that increasing that feature makes class 1 more likely. A negative weight means that increasing that feature makes class 1 less likely. The bias shifts the decision boundary.
10. What the Final Submission Should Look Like
The final submission should be a completed Jupyter Notebook. It should not just be raw code by itself. The notebook should combine short markdown explanations with code cells and printed results.
A strong submission should include the following parts in order:
- A title and short introduction explaining the purpose of the project
- A motivating example or short explanation of the classification task
- A code cell that creates a small dataset with two input features and binary labels
- A markdown explanation of the input vector, weight vector, bias, and dataset matrix
- Code cells for the dot product function and sigmoid function
- A markdown explanation of how the model computes:
and converts that score into a probability.
- A code cell for the prediction function
- A markdown explanation of gradient descent and the update rule
- A code cell that trains the model by updating the weights and bias
- A results section that prints the final weights, final bias, and model accuracy
- A small example prediction table with actual values
- A short concluding explanation of what the learned weights mean
The final notebook should read like a guided walkthrough. It should include both code and short written explanations inside markdown cells. The example prediction table should be filled in with real model outputs, not left blank.
11. Conclusion
This project shows how a simple neural network unit can be built from basic linear algebra. By representing data as vectors and matrices, computing a dot product, applying the sigmoid function, and updating parameters with gradient descent, we can train a beginner friendly classification model. This makes the connection between linear algebra and neural networks much easier to see.