Probability Fundamentals

Probability Basics

Probability quantifies uncertainty. It measures the likelihood of an event occurring on a scale from 0 (impossible) to 1 (certain). Understanding probability is essential for statistical inference, machine learning, and data-driven decision making.

Key Concept

What is Probability?

Probability is a number between 0 and 1 that represents the likelihood of an event. P(A) = 0 means event A is impossible; P(A) = 1 means event A is certain.

Formula: P(Event) = Number of favorable outcomes / Total number of possible outcomes

Key Terminology

Experiment

A process that generates well-defined outcomes (e.g., rolling a die).

Sample Space

The set of all possible outcomes (e.g., {1, 2, 3, 4, 5, 6}).

Event

A subset of the sample space (e.g., "rolling an even number").

Probability

A number from 0 to 1 indicating the likelihood of an event.

Basic Probability Calculations in Python

Interactive: Law of Large Numbers

Simulate

Watch how the proportion of heads approaches 0.5 (50%) as you flip more coins. This demonstrates the Law of Large Numbers!

🪙

Total Flips

Heads

Proportion

0.000

From 0.5

±0.000

0% Heads Target: 50% 100% Heads

Click the coin or buttons to start flipping!

import numpy as np
from fractions import Fraction

# Example: Rolling a fair die
sample_space = {1, 2, 3, 4, 5, 6}

# Event A: Rolling an even number
event_a = {2, 4, 6}
p_even = len(event_a) / len(sample_space)
print(f"P(even) = {p_even}")  # 0.5

# Event B: Rolling a number greater than 4
event_b = {5, 6}
p_greater_4 = len(event_b) / len(sample_space)
print(f"P(>4) = {p_greater_4}")  # 0.333...

# Using fractions for exact probability
p_even_fraction = Fraction(len(event_a), len(sample_space))
print(f"P(even) = {p_even_fraction}")  # 1/2

# Simulating probability
n_simulations = 100000
rolls = np.random.randint(1, 7, n_simulations)
simulated_prob = np.mean(rolls % 2 == 0)
print(f"Simulated P(even) = {simulated_prob:.4f}")  # ~0.5

Types of Probability

Classical

Based on equally likely outcomes. Used when all outcomes have the same chance.

P(A) = favorable / total

Empirical

Based on observed data. Calculated from relative frequency of occurrence.

P(A) = count of A / total trials

Subjective

Based on personal judgment or belief. Used when data is unavailable.

Expert opinion, intuition

Law of Large Numbers: As the number of trials increases, the empirical probability approaches the true (theoretical) probability.

Probability Rules

Probability rules provide the mathematical framework for combining and manipulating probabilities. These rules allow us to calculate complex probabilities from simpler ones.

Rule 1: Complement Rule

The probability of an event NOT occurring is 1 minus the probability of it occurring.

P(not A) = P(A') = 1 - P(A)

# Complement Rule Example
# P(at least one head in 3 coin flips)

# Direct calculation is complex, use complement
# P(at least one head) = 1 - P(no heads)
p_no_heads = (1/2) ** 3  # All tails
p_at_least_one_head = 1 - p_no_heads
print(f"P(at least one head) = {p_at_least_one_head}")  # 0.875

# Verification with simulation
n_sims = 100000
flips = np.random.choice(['H', 'T'], size=(n_sims, 3))
at_least_one = np.sum(np.any(flips == 'H', axis=1)) / n_sims
print(f"Simulated: {at_least_one:.4f}")

Rule 2: Addition Rule

The probability of A OR B occurring depends on whether they're mutually exclusive.

Mutually Exclusive

P(A or B) = P(A) + P(B)

Events cannot happen together
Example: Rolling a 2 OR a 5

Not Mutually Exclusive

P(A or B) = P(A) + P(B) - P(A and B)

Events can overlap
Example: Rolling even OR >3

# Addition Rule Examples
sample_space = {1, 2, 3, 4, 5, 6}

# Mutually exclusive: rolling 2 OR 5
p_2 = 1/6
p_5 = 1/6
p_2_or_5 = p_2 + p_5  # = 2/6 = 1/3
print(f"P(2 or 5) = {p_2_or_5:.4f}")

# Not mutually exclusive: even OR greater than 3
even = {2, 4, 6}
greater_3 = {4, 5, 6}
both = even & greater_3  # {4, 6}

p_even = len(even) / 6
p_greater_3 = len(greater_3) / 6
p_both = len(both) / 6

p_even_or_greater_3 = p_even + p_greater_3 - p_both
print(f"P(even or >3) = {p_even_or_greater_3:.4f}")  # 4/6 = 0.667

Rule 3: Multiplication Rule

The probability of A AND B occurring depends on whether they're independent.

Independent Events

P(A and B) = P(A) × P(B)

Outcome of A doesn't affect B
Example: Two coin flips

Dependent Events

P(A and B) = P(A) × P(B|A)

Outcome of A affects B
Example: Drawing cards without replacement

# Multiplication Rule Examples

# Independent: Two coin flips, both heads
p_head = 0.5
p_two_heads = p_head * p_head  # 0.25
print(f"P(two heads) = {p_two_heads}")

# Dependent: Drawing 2 aces without replacement
# First ace: 4/52
# Second ace: 3/51 (one ace removed)
p_first_ace = 4/52
p_second_ace_given_first = 3/51
p_two_aces = p_first_ace * p_second_ace_given_first
print(f"P(two aces) = {p_two_aces:.6f}")  # ~0.0045

# Compare to WITH replacement (independent)
p_two_aces_replace = (4/52) * (4/52)
print(f"P(two aces with replacement) = {p_two_aces_replace:.6f}")  # ~0.0059

Practice: Probability Rules

Task: On an e-commerce site, P(purchase) = 0.15, P(newsletter signup) = 0.25, and P(both purchase AND signup) = 0.08. What is the probability that a visitor either makes a purchase OR signs up for the newsletter?

Show Solution

# Addition Rule: P(A or B) = P(A) + P(B) - P(A and B)
p_purchase = 0.15
p_newsletter = 0.25
p_both = 0.08

p_either = p_purchase + p_newsletter - p_both
print(f"P(purchase OR newsletter) = {p_either:.2f}")  # 0.32

# 32% of visitors either buy or sign up (or both)

Task: A secure facility has 3 independent checkpoints. The probability of passing each checkpoint is: Badge scan (0.98), Facial recognition (0.92), PIN entry (0.95). What's the probability of passing all three? What's the probability of failing at least one?

Show Solution

# Multiplication Rule for independent events
p_badge = 0.98
p_facial = 0.92
p_pin = 0.95

# P(all three) - multiply for AND with independent events
p_all_pass = p_badge * p_facial * p_pin
print(f"P(pass all 3) = {p_all_pass:.4f}")  # 0.8566

# P(fail at least one) = complement of passing all
p_fail_at_least_one = 1 - p_all_pass
print(f"P(fail at least 1) = {p_fail_at_least_one:.4f}")  # 0.1434

# ~14% chance of being stopped at some checkpoint

Conditional Probability

Conditional probability measures the likelihood of an event occurring given that another event has already occurred. It's written as P(A|B), read as "probability of A given B."

P(A|B) = P(A and B) / P(B)

Probability of A given that B has occurred

Understanding Conditional Probability

Conditional probability updates our beliefs based on new information. The sample space is reduced to only those outcomes where the condition is true.

import pandas as pd
import numpy as np

# Example: Medical test results
# Create contingency table
data = {
    'Disease': ['Yes', 'Yes', 'No', 'No'],
    'Test': ['Positive', 'Negative', 'Positive', 'Negative'],
    'Count': [45, 5, 10, 940]
}
df = pd.DataFrame(data)
print("Contingency Table:")
pivot = df.pivot(index='Disease', columns='Test', values='Count')
print(pivot)

# Total people
total = df['Count'].sum()  # 1000

# P(Disease | Positive Test)
# Given a positive test, what's the probability of disease?
positive_tests = 45 + 10  # 55
disease_and_positive = 45

p_disease_given_positive = disease_and_positive / positive_tests
print(f"\nP(Disease | Positive) = {p_disease_given_positive:.4f}")  # 0.818

Using the Formula

# Using the conditional probability formula
# P(A|B) = P(A and B) / P(B)

# Continuing medical example
total = 1000
p_disease_and_positive = 45 / total  # P(A and B)
p_positive = 55 / total  # P(B)

p_disease_given_positive = p_disease_and_positive / p_positive
print(f"P(Disease | Positive) = {p_disease_given_positive:.4f}")  # 0.818

# The OTHER conditional probability
# P(Positive | Disease)
p_disease = 50 / total  # P(A)
p_positive_given_disease = p_disease_and_positive / (50/total)
print(f"P(Positive | Disease) = {p_positive_given_disease:.4f}")  # 0.90

# Note: P(A|B) ≠ P(B|A) in general!

Common Mistake: P(A|B) is NOT the same as P(B|A). "Probability of disease given positive test" differs from "probability of positive test given disease."

Independence and Conditional Probability

Two events are independent if knowing one doesn't change the probability of the other: P(A|B) = P(A).

# Testing for independence
# Events are independent if P(A|B) = P(A)

# Example: Coin flips
p_head_first = 0.5
p_head_second_given_first = 0.5  # Doesn't change
# Since P(H2|H1) = P(H2), events are independent

# Example: Drawing cards WITHOUT replacement
# P(Ace on 2nd draw | Ace on 1st) = 3/51
# P(Ace on 2nd draw) = 4/52
# Since 3/51 ≠ 4/52, events are NOT independent

p_ace = 4/52
p_ace_second_given_ace_first = 3/51
print(f"P(Ace 2nd | Ace 1st) = {p_ace_second_given_ace_first:.4f}")
print(f"P(Ace) = {p_ace:.4f}")
print(f"Independent? {abs(p_ace - p_ace_second_given_ace_first) < 0.001}")

Law of Total Probability

If B₁, B₂, ..., Bₙ partition the sample space, then P(A) is the weighted sum of conditional probabilities.

# Law of Total Probability
# P(A) = P(A|B1)P(B1) + P(A|B2)P(B2) + ... + P(A|Bn)P(Bn)

# Example: Factory production
# Factory has 3 machines
# Machine 1: produces 30%, 2% defective
# Machine 2: produces 45%, 3% defective  
# Machine 3: produces 25%, 5% defective

# What's the overall defect rate?
p_m1, p_def_m1 = 0.30, 0.02
p_m2, p_def_m2 = 0.45, 0.03
p_m3, p_def_m3 = 0.25, 0.05

p_defective = (p_def_m1 * p_m1 + 
               p_def_m2 * p_m2 + 
               p_def_m3 * p_m3)
print(f"P(Defective) = {p_defective:.4f}")  # 0.032 = 3.2%

Practice: Conditional Probability

Task: A SaaS company has data: 15% of customers churned last year. Among churners, 80% had filed 3+ support tickets. Among non-churners, only 25% filed 3+ tickets. Using the Law of Total Probability, what is the overall probability that a random customer filed 3+ support tickets?

Show Solution

# Law of Total Probability: P(B) = P(B|A)*P(A) + P(B|not A)*P(not A)

p_churn = 0.15
p_no_churn = 0.85

p_tickets_given_churn = 0.80
p_tickets_given_no_churn = 0.25

# Total probability of 3+ tickets
p_tickets = (p_tickets_given_churn * p_churn + 
             p_tickets_given_no_churn * p_no_churn)

print(f"P(3+ tickets) = {p_tickets:.3f}")  # 0.333

# 33.3% of all customers file 3+ support tickets

Task: From analytics: P(conversion) = 0.05, P(viewed pricing) = 0.30, P(conversion AND viewed pricing) = 0.04. What is the conversion rate for users who viewed the pricing page? Is viewing pricing a good indicator of intent?

Show Solution

# Conditional probability: P(A|B) = P(A and B) / P(B)

p_conversion = 0.05
p_pricing = 0.30
p_conversion_and_pricing = 0.04

# P(conversion | viewed pricing)
p_conv_given_pricing = p_conversion_and_pricing / p_pricing

print(f"Overall conversion rate: {p_conversion*100:.1f}%")
print(f"Conversion rate if viewed pricing: {p_conv_given_pricing*100:.1f}%")

# Compare: 13.3% vs 5% - viewing pricing indicates 2.7x higher intent!
lift = p_conv_given_pricing / p_conversion
print(f"Lift: {lift:.1f}x more likely to convert")

Bayes' Theorem

Bayes' theorem is one of the most powerful tools in statistics and machine learning. It allows us to update our beliefs (probabilities) based on new evidence, flipping conditional probabilities.

Key Concept

Bayes' Theorem

P(A|B) = [P(B|A) × P(A)] / P(B)

P(A) = Prior probability (belief before evidence)

P(B|A) = Likelihood (probability of evidence given hypothesis)

P(B) = Marginal likelihood (total probability of evidence)

P(A|B) = Posterior probability (updated belief after evidence)

Classic Example: Medical Diagnosis

A disease affects 1% of the population. A test is 99% accurate (detects disease when present) and has a 5% false positive rate. If you test positive, what's the probability you have the disease?

# Bayes' Theorem: Medical Test Example
# P(Disease) = 0.01 (1% prevalence)
# P(Positive | Disease) = 0.99 (sensitivity)
# P(Positive | No Disease) = 0.05 (false positive rate)

# We want: P(Disease | Positive)

p_disease = 0.01  # Prior
p_positive_given_disease = 0.99  # Likelihood
p_positive_given_no_disease = 0.05  # False positive

# Calculate P(Positive) using Law of Total Probability
p_no_disease = 1 - p_disease
p_positive = (p_positive_given_disease * p_disease + 
              p_positive_given_no_disease * p_no_disease)

print(f"P(Positive) = {p_positive:.4f}")  # 0.0594

# Bayes' Theorem
p_disease_given_positive = (p_positive_given_disease * p_disease) / p_positive
print(f"P(Disease | Positive) = {p_disease_given_positive:.4f}")  # 0.167!

# Surprising! Only ~17% chance of disease even with positive test
# This is because the disease is rare (low prior)

Base Rate Fallacy: People often overestimate P(Disease|Positive) because they ignore the low base rate (prior probability) of the disease.

Bayes' Theorem in Practice

# Create a Bayesian inference function
def bayes_theorem(prior, likelihood, false_positive_rate):
    """
    Calculate posterior probability using Bayes' theorem.
    
    prior: P(A) - probability of hypothesis before evidence
    likelihood: P(B|A) - probability of evidence if hypothesis is true
    false_positive_rate: P(B|not A) - probability of evidence if hypothesis is false
    """
    # P(B) = P(B|A)*P(A) + P(B|not A)*P(not A)
    p_evidence = likelihood * prior + false_positive_rate * (1 - prior)
    
    # P(A|B) = P(B|A) * P(A) / P(B)
    posterior = (likelihood * prior) / p_evidence
    
    return posterior

# Medical test example
posterior = bayes_theorem(
    prior=0.01,
    likelihood=0.99,
    false_positive_rate=0.05
)
print(f"P(Disease | Positive) = {posterior:.4f}")  # 0.167

# What if disease is more common?
posterior_10pct = bayes_theorem(0.10, 0.99, 0.05)
print(f"If 10% prevalence: {posterior_10pct:.4f}")  # 0.688

Updating Beliefs with Multiple Evidence

Bayes' theorem can be applied iteratively - the posterior from one update becomes the prior for the next.

# Sequential Bayesian updating
# Start with prior belief, update with each piece of evidence

def bayesian_update(prior, evidence_likelihood, evidence_false_positive):
    """Update belief with new evidence"""
    return bayes_theorem(prior, evidence_likelihood, evidence_false_positive)

# Initial belief: 50% chance of spam email
prior = 0.50

# Evidence 1: Contains "FREE" - 60% of spam has it, 10% of non-spam
posterior1 = bayesian_update(prior, 0.60, 0.10)
print(f"After 'FREE': P(spam) = {posterior1:.4f}")  # 0.857

# Evidence 2: Contains link - 80% of spam, 40% of non-spam
posterior2 = bayesian_update(posterior1, 0.80, 0.40)
print(f"After link: P(spam) = {posterior2:.4f}")  # 0.923

# Evidence 3: From known contact - 5% of spam, 70% of non-spam
posterior3 = bayesian_update(posterior2, 0.05, 0.70)
print(f"After known contact: P(spam) = {posterior3:.4f}")  # 0.462

Practice: Bayes' Theorem

Task: Your email data shows: 40% of emails are spam. The word "FREE" appears in 70% of spam emails but only 5% of legitimate emails. An email contains the word "FREE". What's the probability it's spam?

Show Solution

# Bayes' Theorem: P(Spam|FREE) = P(FREE|Spam) * P(Spam) / P(FREE)

p_spam = 0.40
p_legit = 0.60

p_free_given_spam = 0.70
p_free_given_legit = 0.05

# Total probability of "FREE"
p_free = (p_free_given_spam * p_spam + 
          p_free_given_legit * p_legit)

# Posterior probability
p_spam_given_free = (p_free_given_spam * p_spam) / p_free

print(f"P(FREE) = {p_free:.3f}")  # 0.310
print(f"P(Spam | FREE) = {p_spam_given_free:.3f}")  # 0.903

# 90.3% chance it's spam if it contains "FREE"!

Task: A rapid COVID test has: Sensitivity (true positive rate) = 85%, Specificity (true negative rate) = 98%. During low prevalence (2% population infected), you test positive. What's the actual probability you have COVID? Why is this counterintuitive?

Show Solution

# Bayes' Theorem for medical diagnostics

p_covid = 0.02  # Prevalence (prior)
p_no_covid = 0.98

sensitivity = 0.85  # P(Positive | COVID)
specificity = 0.98  # P(Negative | No COVID)
false_positive_rate = 1 - specificity  # P(Positive | No COVID) = 0.02

# P(Positive) using Law of Total Probability
p_positive = (sensitivity * p_covid + 
              false_positive_rate * p_no_covid)

# P(COVID | Positive) - what we want to know
p_covid_given_positive = (sensitivity * p_covid) / p_positive

print(f"P(Positive test) = {p_positive:.4f}")
print(f"P(COVID | Positive test) = {p_covid_given_positive:.3f}")
print(f"Probability as percentage: {p_covid_given_positive*100:.1f}%")

# Only ~46.4% chance of actually having COVID with a positive test!
# This is because false positives in the large healthy population 
# outnumber true positives in the small infected population.

Common Probability Distributions

Probability distributions describe how likely different outcomes are. They're the foundation of statistical modeling. Let's explore the most important distributions for data science.

Discrete Distributions

Bernoulli Distribution

Single trial with two outcomes (success/failure). The building block for other distributions.

from scipy import stats
import numpy as np

# Bernoulli: Single coin flip (p=0.5)
bernoulli = stats.bernoulli(p=0.5)
print(f"P(X=1) = {bernoulli.pmf(1)}")  # 0.5
print(f"P(X=0) = {bernoulli.pmf(0)}")  # 0.5

# Generate samples
samples = bernoulli.rvs(size=10)
print(f"Samples: {samples}")

Binomial Distribution

Number of successes in n independent Bernoulli trials.

# Binomial: 10 coin flips, probability of heads
n, p = 10, 0.5
binomial = stats.binom(n=n, p=p)

# P(exactly 5 heads)
print(f"P(X=5) = {binomial.pmf(5):.4f}")  # 0.246

# P(at most 3 heads)
print(f"P(X≤3) = {binomial.cdf(3):.4f}")  # 0.172

# Mean and variance
print(f"Mean: {binomial.mean()}")  # 5
print(f"Variance: {binomial.var()}")  # 2.5

Poisson Distribution

Number of events in a fixed interval (time, space) when events occur independently at a constant rate.

# Poisson: Average 5 customers per hour
lambda_param = 5
poisson = stats.poisson(mu=lambda_param)

# P(exactly 3 customers)
print(f"P(X=3) = {poisson.pmf(3):.4f}")  # 0.140

# P(more than 7 customers)
print(f"P(X>7) = {1 - poisson.cdf(7):.4f}")  # 0.133

# Mean = Variance = lambda
print(f"Mean = Variance = {poisson.mean()}")

Continuous Distributions

Normal (Gaussian) Distribution

The "bell curve" - the most important distribution in statistics. Many natural phenomena follow it.

# Normal distribution: mean=0, std=1 (standard normal)
normal = stats.norm(loc=0, scale=1)

# Probability density at x=0
print(f"PDF at x=0: {normal.pdf(0):.4f}")  # 0.399

# P(X < 1.96) - important for confidence intervals
print(f"P(X < 1.96) = {normal.cdf(1.96):.4f}")  # 0.975

# P(-1 < X < 1) - within 1 std
print(f"P(-1 < X < 1) = {normal.cdf(1) - normal.cdf(-1):.4f}")  # 0.683

# Z-score: how many std from mean
x = 75  # observed value
mean, std = 70, 5
z_score = (x - mean) / std
print(f"Z-score: {z_score}")  # 1.0

# Generate normal samples
samples = np.random.normal(mean, std, 1000)

Uniform Distribution

All values in a range are equally likely.

# Uniform distribution between 0 and 10
uniform = stats.uniform(loc=0, scale=10)

# P(X < 3)
print(f"P(X < 3) = {uniform.cdf(3):.4f}")  # 0.3

# Mean and variance
print(f"Mean: {uniform.mean()}")  # 5.0
print(f"Variance: {uniform.var():.4f}")  # 8.33

# Generate samples
samples = np.random.uniform(0, 10, 1000)

Exponential Distribution

Time between events in a Poisson process. Models waiting times.

# Exponential: average 5 minutes between customers
lambda_rate = 1/5  # rate = 1/mean
exponential = stats.expon(scale=5)

# P(wait less than 3 minutes)
print(f"P(X < 3) = {exponential.cdf(3):.4f}")  # 0.451

# P(wait more than 10 minutes)
print(f"P(X > 10) = {1 - exponential.cdf(10):.4f}")  # 0.135

# Mean = 1/lambda
print(f"Mean wait time: {exponential.mean()}")  # 5

Central Limit Theorem: The sum (or mean) of many independent random variables tends toward a normal distribution, regardless of the original distribution. This is why the normal distribution is so important!

Practice: Distributions

Task: A factory produces widgets with a 3% defect rate. In a batch of 50 widgets, what's the probability of finding exactly 2 defects? What's the probability of finding 2 or fewer defects?

Show Solution

from scipy import stats

# Binomial: n trials, p probability of "success" (defect)
binom = stats.binom(n=50, p=0.03)

# P(exactly 2 defects)
p_exactly_2 = binom.pmf(2)
print(f"P(exactly 2 defects) = {p_exactly_2:.4f}")  # ~0.2555

# P(2 or fewer defects) - cumulative
p_2_or_fewer = binom.cdf(2)
print(f"P(≤2 defects) = {p_2_or_fewer:.4f}")  # ~0.8108

# Expected number of defects
expected = 50 * 0.03
print(f"Expected defects per batch: {expected:.1f}")

Task: SAT scores are normally distributed with mean=1050 and std=200. (1) What percentile is a score of 1250? (2) What score is needed to be in the top 10%? (3) What percentage of students score between 900 and 1200?

Show Solution

from scipy import stats

sat = stats.norm(loc=1050, scale=200)

# (1) Percentile for score of 1250
percentile_1250 = sat.cdf(1250) * 100
print(f"Score 1250 = {percentile_1250:.1f}th percentile")

# (2) Score needed for top 10% (90th percentile)
top_10_score = sat.ppf(0.90)
print(f"Top 10% requires: {top_10_score:.0f}+")

# (3) Percentage between 900 and 1200
p_between = (sat.cdf(1200) - sat.cdf(900)) * 100
print(f"% scoring 900-1200: {p_between:.1f}%")

# Z-score calculation
z_1250 = (1250 - 1050) / 200
print(f"\nZ-score for 1250: {z_1250:.1f} std above mean")

Task: A coffee shop averages 12 customers per hour during morning rush. What's the probability of: (1) exactly 15 customers in an hour? (2) more than 18 customers (understaffed scenario)? (3) fewer than 8 customers (overstaffed scenario)?

Show Solution

from scipy import stats

# Poisson: λ = 12 customers/hour (average rate)
poisson = stats.poisson(mu=12)

# (1) P(exactly 15 customers)
p_exactly_15 = poisson.pmf(15)
print(f"P(exactly 15) = {p_exactly_15:.4f}")  # ~0.0724

# (2) P(more than 18) - understaffed risk
p_more_than_18 = 1 - poisson.cdf(18)
print(f"P(>18 customers) = {p_more_than_18:.4f}")  # ~0.0630

# (3) P(fewer than 8) - overstaffed risk  
p_less_than_8 = poisson.cdf(7)  # cdf(7) gives P(X ≤ 7)
print(f"P(<8 customers) = {p_less_than_8:.4f}")  # ~0.0895

# Staffing insight
print(f"\n~6.3% risk of being overwhelmed (>18)")
print(f"~9.0% risk of being overstaffed (<8)")

Key Takeaways

Probability Range

Probability is always between 0 and 1. P=0 means impossible, P=1 means certain. The sum of all probabilities in a sample space equals 1.

Addition & Multiplication

OR → Add (subtract overlap). AND → Multiply. Remember to adjust for independence and mutual exclusivity.

Conditional ≠ Reverse

P(A|B) ≠ P(B|A). The probability of disease given positive test differs from probability of positive test given disease. Bayes' theorem connects them.

Bayes Updates Beliefs

Posterior = (Likelihood × Prior) / Evidence. Start with a prior belief, update with evidence to get a posterior. Priors matter!

Normal is Central

The normal distribution appears everywhere due to the Central Limit Theorem. Know the 68-95-99.7 rule for quick estimation.

SciPy for Distributions

Use scipy.stats for all distributions. Methods: .pmf()/.pdf() for probability, .cdf() for cumulative, .rvs() for samples.

What You'll Learn

Contents

Probability Basics

What is Probability?

Key Terminology

Experiment

Sample Space

Event

Probability

Basic Probability Calculations in Python

Interactive: Law of Large Numbers

Types of Probability

Classical

Empirical

Subjective

Probability Rules

Rule 1: Complement Rule

Rule 2: Addition Rule

Mutually Exclusive

Not Mutually Exclusive

Rule 3: Multiplication Rule

Independent Events

Dependent Events

Practice: Probability Rules

Medium Calculate probability of website visitor converting or signing up for newsletter

Hard Model probability of passing a security system with 3 independent checks

Conditional Probability

Understanding Conditional Probability

Using the Formula

Independence and Conditional Probability

Law of Total Probability

Practice: Conditional Probability

Hard Analyze customer churn probability given support ticket history

Medium Calculate conversion rate given that user viewed pricing page

Bayes' Theorem

Bayes' Theorem

Classic Example: Medical Diagnosis

Bayes' Theorem in Practice

Updating Beliefs with Multiple Evidence

Practice: Bayes' Theorem

Medium Build a simple spam classifier using Bayesian probability

Hard Interpret a COVID rapid test result using sensitivity and specificity

Common Probability Distributions

Discrete Distributions

Bernoulli Distribution

Binomial Distribution

Poisson Distribution

Continuous Distributions

Normal (Gaussian) Distribution

Uniform Distribution

Exponential Distribution

Practice: Distributions

Easy Model defect rates in manufacturing using binomial distribution

Hard Calculate percentile ranks for SAT scores using normal distribution

Medium Model customer arrivals at a store using Poisson distribution

Key Takeaways

Probability Range

Addition & Multiplication

Conditional ≠ Reverse

Bayes Updates Beliefs

Normal is Central

SciPy for Distributions

Knowledge Check