Seaborn Statistical Plots

Introduction to Seaborn

Seaborn is a powerful statistical visualization library built on top of Matplotlib. It provides a high-level interface for creating attractive and informative statistical graphics with minimal code. Seaborn integrates closely with pandas DataFrames, making it the go-to choice for exploratory data analysis in Python.

Key Concept

What is Seaborn?

Seaborn is a Python data visualization library based on Matplotlib that provides a high-level interface for drawing attractive and informative statistical graphics. It is designed to work with pandas DataFrames and integrates statistical analysis directly into visualizations.

Why it matters: Seaborn simplifies complex visualizations that would require many lines of Matplotlib code into simple one-liner functions, while automatically handling statistical computations and aesthetic styling.

Why Use Seaborn Over Matplotlib?

While Matplotlib is extremely flexible, Seaborn provides several advantages for statistical visualization. It offers better default aesthetics, built-in themes, automatic statistical aggregation, and seamless integration with pandas. For exploratory data analysis, Seaborn often requires 5-10x less code than equivalent Matplotlib plots.

Beautiful Defaults

Attractive visualizations out of the box with professional color palettes and styling

Statistical Focus

Built-in statistical estimation, aggregation, and uncertainty visualization

DataFrame Integration

Works directly with pandas DataFrames using column names as parameters

Installing and Importing Seaborn

Seaborn can be installed via pip and is typically imported alongside pandas, numpy, and matplotlib. The conventional alias for seaborn is sns, named after the character Samuel Norman Seaborn from "The West Wing" TV series.

# Install seaborn (if not already installed)
# pip install seaborn

# Standard imports for data visualization
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set the default style
sns.set_theme(style="darkgrid")

# Check seaborn version
print(f"Seaborn version: {sns.__version__}")

Loading Built-in Datasets

Seaborn comes with several built-in datasets that are perfect for learning and experimentation. These datasets cover various domains and are commonly used in tutorials and documentation. You can load them using the load_dataset() function.

# Load popular built-in datasets
tips = sns.load_dataset("tips")       # Restaurant tips data
iris = sns.load_dataset("iris")       # Famous iris flower dataset
titanic = sns.load_dataset("titanic") # Titanic passenger data
penguins = sns.load_dataset("penguins") # Palmer penguins dataset

# View the tips dataset structure
print(tips.head())
print(f"\nShape: {tips.shape}")
print(f"\nColumns: {tips.columns.tolist()}")

Pro Tip: Use sns.get_dataset_names() to see all available built-in datasets. These are great for practicing visualization techniques before working with your own data.

Seaborn Plot Types Overview

Seaborn organizes its plotting functions into three main categories based on the type of data relationship you want to visualize. Understanding these categories helps you quickly choose the right plot for your data.

Category	Purpose	Key Functions
Distribution	Visualize data distributions	`histplot`, `kdeplot`, `boxplot`, `violinplot`
Categorical	Compare categories	`barplot`, `countplot`, `stripplot`, `swarmplot`
Relational	Show variable relationships	`scatterplot`, `lineplot`, `pairplot`, `heatmap`

Distribution Plots

Distribution plots help you understand the underlying distribution of your data. Seaborn provides several functions for visualizing univariate and bivariate distributions, from simple histograms to sophisticated kernel density estimates and box plots.

Histogram with histplot()

The histplot() function creates histograms that show the frequency distribution of a continuous variable. It can also overlay a kernel density estimate (KDE) curve to show the estimated probability density.

import seaborn as sns
import matplotlib.pyplot as plt

# Load the tips dataset
tips = sns.load_dataset("tips")

# Create a figure with subplots
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Basic histogram
sns.histplot(data=tips, x="total_bill", ax=axes[0])
axes[0].set_title("Basic Histogram")

# Histogram with KDE overlay
sns.histplot(data=tips, x="total_bill", kde=True, ax=axes[1])
axes[1].set_title("Histogram with KDE")

# Histogram with hue (grouping)
sns.histplot(data=tips, x="total_bill", hue="time", ax=axes[2])
axes[2].set_title("Histogram by Time of Day")

plt.tight_layout()
plt.show()

Kernel Density Estimation with kdeplot()

The kdeplot() function creates smooth density curves that estimate the probability density function of a continuous variable. KDE plots are great for comparing distributions and identifying multimodal patterns.

# Create KDE plots
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Single variable KDE
sns.kdeplot(data=tips, x="total_bill", fill=True, ax=axes[0])
axes[0].set_title("Filled KDE Plot")

# Multiple KDEs with hue
sns.kdeplot(data=tips, x="total_bill", hue="day", 
            fill=True, alpha=0.5, ax=axes[1])
axes[1].set_title("KDE by Day of Week")

plt.tight_layout()
plt.show()

Box Plots with boxplot()

Box plots (also called box-and-whisker plots) display the distribution of data through quartiles. They show the median, interquartile range (IQR), and potential outliers, making them excellent for comparing distributions across categories.

# Create box plots
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Basic box plot
sns.boxplot(data=tips, x="day", y="total_bill", ax=axes[0])
axes[0].set_title("Total Bill by Day")

# Box plot with hue
sns.boxplot(data=tips, x="day", y="total_bill", hue="sex", ax=axes[1])
axes[1].set_title("Total Bill by Day and Gender")

plt.tight_layout()
plt.show()

Box Plot Anatomy: The box shows Q1 to Q3 (IQR), the line inside is the median, whiskers extend to 1.5*IQR, and points beyond are outliers.

Violin Plots with violinplot()

Violin plots combine box plots with kernel density estimation. They show the full distribution shape, making it easier to see multimodal distributions and compare density across categories.

# Create violin plots
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Basic violin plot
sns.violinplot(data=tips, x="day", y="total_bill", ax=axes[0])
axes[0].set_title("Violin Plot - Total Bill by Day")

# Split violin plot
sns.violinplot(data=tips, x="day", y="total_bill", hue="sex", 
               split=True, ax=axes[1])
axes[1].set_title("Split Violin - By Day and Gender")

plt.tight_layout()
plt.show()

Practice: Distribution Plots

Task: Using the tips dataset, create overlapping KDE plots to compare the distribution of tips between smokers and non-smokers. Use different colors and make the plots semi-transparent so both are visible.

Show Solution

import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")

plt.figure(figsize=(10, 6))
sns.kdeplot(data=tips, x="tip", hue="smoker", fill=True, 
            alpha=0.5, palette=["coral", "steelblue"])
plt.title("Tip Distribution: Smokers vs Non-Smokers")
plt.xlabel("Tip Amount ($)")
plt.ylabel("Density")
plt.show()

Task: Create a 2x2 grid of histograms showing total bill distribution, with rows representing time (Lunch/Dinner) and columns representing days (Sat/Sun only). Add KDE overlays and use a consistent color scheme.

Show Solution

import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")

# Filter for weekend only
weekend_tips = tips[tips['day'].isin(['Sat', 'Sun'])]

g = sns.FacetGrid(weekend_tips, row="time", col="day", 
                  height=4, aspect=1.2)
g.map_dataframe(sns.histplot, x="total_bill", kde=True, 
                color="teal", alpha=0.7)
g.set_titles(row_template="{row_name}", col_template="{col_name}")
g.set_axis_labels("Total Bill ($)", "Count")
g.fig.suptitle("Weekend Bill Distribution by Time & Day", y=1.02)
plt.show()

Categorical Plots

Categorical plots are designed for visualizing data that involves categorical variables. Seaborn provides specialized functions for comparing values across categories, showing counts, and visualizing the distribution of a continuous variable within each category.

Bar Plots with barplot()

The barplot() function shows point estimates and confidence intervals for a numerical variable across categories. By default, it calculates the mean and shows a 95% confidence interval, but you can customize the estimator function.

# Load tips dataset
tips = sns.load_dataset("tips")

# Create bar plots
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Basic bar plot (shows mean with confidence interval)
sns.barplot(data=tips, x="day", y="total_bill", ax=axes[0])
axes[0].set_title("Average Total Bill by Day")

# Bar plot with hue grouping
sns.barplot(data=tips, x="day", y="total_bill", hue="sex", ax=axes[1])
axes[1].set_title("Average Total Bill by Day and Gender")

plt.tight_layout()
plt.show()

Count Plots with countplot()

The countplot() function is like a histogram for categorical data. It shows the count of observations in each category, which is useful for understanding the distribution of categorical variables.

# Create count plots
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Count of observations by day
sns.countplot(data=tips, x="day", ax=axes[0])
axes[0].set_title("Number of Visits by Day")

# Count with hue for stacked comparison
sns.countplot(data=tips, x="day", hue="time", ax=axes[1])
axes[1].set_title("Visits by Day and Time")

plt.tight_layout()
plt.show()

Strip and Swarm Plots

Strip plots and swarm plots show individual data points. Strip plots can have overlapping points, while swarm plots adjust points to avoid overlap, showing the distribution more clearly.

# Create strip and swarm plots
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Strip plot (points may overlap)
sns.stripplot(data=tips, x="day", y="total_bill", ax=axes[0], alpha=0.6)
axes[0].set_title("Strip Plot - Individual Points")

# Swarm plot (points don't overlap)
sns.swarmplot(data=tips, x="day", y="total_bill", ax=axes[1], size=4)
axes[1].set_title("Swarm Plot - No Overlap")

plt.tight_layout()
plt.show()

Performance Note: Swarm plots can be slow for large datasets (>1000 points). Use strip plots or sample your data for better performance.

Combining Plot Types

One of Seaborn's strengths is the ability to combine plot types. For example, you can overlay a strip plot on a box plot to show both the distribution summary and individual data points.

# Combine box plot with strip plot
plt.figure(figsize=(10, 6))

# First, draw the box plot
sns.boxplot(data=tips, x="day", y="total_bill", 
            palette="pastel", width=0.6)

# Then overlay individual points
sns.stripplot(data=tips, x="day", y="total_bill", 
              color="black", alpha=0.4, size=4)

plt.title("Box Plot with Individual Points Overlay")
plt.show()

Practice: Categorical Plots

Task: Calculate tip percentage (tip/total_bill * 100) and create a combined visualization: violin plot showing distribution shape with a swarm plot overlay showing individual data points. Group by party size (column "size"). Use a pastel color palette for violins and dark points for swarm.

Show Solution

import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")
tips['tip_pct'] = (tips['tip'] / tips['total_bill']) * 100

plt.figure(figsize=(12, 6))

# Violin plot as base
sns.violinplot(data=tips, x="size", y="tip_pct", 
               palette="pastel", inner=None, alpha=0.7)

# Swarm plot overlay
sns.swarmplot(data=tips, x="size", y="tip_pct", 
              color=".2", size=3, alpha=0.7)

plt.title("Tip Percentage Distribution by Party Size")
plt.xlabel("Party Size")
plt.ylabel("Tip Percentage (%)")
plt.ylim(0, 35)
plt.show()

Task: Create a grouped bar plot showing average total_bill for each day, with bars grouped by time (Lunch vs Dinner). Add error bars, use a professional color palette, and ensure bars are ordered Thursday through Sunday.

Show Solution

import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")

# Set day order
day_order = ['Thur', 'Fri', 'Sat', 'Sun']

plt.figure(figsize=(10, 6))
sns.barplot(data=tips, x="day", y="total_bill", hue="time",
            order=day_order, palette="Set2", errorbar="sd")

plt.title("Average Spending: Lunch vs Dinner by Day")
plt.xlabel("Day of Week")
plt.ylabel("Average Total Bill ($)")
plt.legend(title="Meal Time")
plt.show()

Relationship Plots

Relationship plots help you visualize the relationship between two or more variables. These are essential for understanding correlations, patterns, and trends in your data. Seaborn provides powerful functions for scatter plots, pair plots, and heatmaps.

Scatter Plots with scatterplot()

The scatterplot() function creates scatter plots that can encode additional variables using color (hue), size, and style. This allows you to visualize relationships between multiple variables simultaneously.

# Load tips dataset
tips = sns.load_dataset("tips")

# Create scatter plots
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Basic scatter plot
sns.scatterplot(data=tips, x="total_bill", y="tip", ax=axes[0])
axes[0].set_title("Total Bill vs Tip")

# Scatter plot with multiple encodings
sns.scatterplot(data=tips, x="total_bill", y="tip", 
                hue="day", size="size", style="time",
                ax=axes[1])
axes[1].set_title("Multi-variable Scatter Plot")

plt.tight_layout()
plt.show()

Pair Plots with pairplot()

The pairplot() function creates a grid of scatter plots for all pairs of numerical variables in a DataFrame, with histograms on the diagonal. This is incredibly useful for exploratory data analysis.

# Load iris dataset
iris = sns.load_dataset("iris")

# Create a pair plot
sns.pairplot(data=iris, hue="species", height=2.5)
plt.suptitle("Iris Dataset Pairplot", y=1.02)
plt.show()

Pairplot Options: Use diag_kind="kde" for KDE plots on diagonal, corner=True for lower triangle only, and vars=["col1", "col2"] to select specific columns.

Heatmaps with heatmap()

Heatmaps visualize matrix data using color intensity. They are commonly used for correlation matrices, confusion matrices, and any 2D data where color represents magnitude.

# Calculate correlation matrix
tips_numeric = tips.select_dtypes(include=['number'])
correlation = tips_numeric.corr()

# Create heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(correlation, annot=True, cmap="coolwarm", 
            center=0, linewidths=0.5, fmt=".2f",
            square=True)
plt.title("Correlation Heatmap")
plt.show()

Regression Plots with regplot() and lmplot()

Seaborn can automatically fit and plot regression lines. The regplot() function adds a regression line to a scatter plot, while lmplot() provides more options for faceting.

# Create regression plots
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Basic regression plot
sns.regplot(data=tips, x="total_bill", y="tip", ax=axes[0])
axes[0].set_title("Linear Regression: Bill vs Tip")

# Polynomial regression
sns.regplot(data=tips, x="total_bill", y="tip", 
            order=2, ax=axes[1])
axes[1].set_title("Polynomial Regression (degree=2)")

plt.tight_layout()
plt.show()

# lmplot with faceting
g = sns.lmplot(data=tips, x="total_bill", y="tip", 
               col="time", hue="smoker", height=4)
g.fig.suptitle("Regression by Time and Smoker Status", y=1.02)
plt.show()

Practice: Relationship Plots

Task: Create a scatter plot of total_bill vs tip with a regression line showing the trend. Color points by time (Lunch/Dinner) and add a 95% confidence interval around the regression line.

Show Solution

import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")

plt.figure(figsize=(10, 6))
sns.regplot(data=tips, x="total_bill", y="tip", 
            scatter_kws={'alpha':0.5}, ci=95, color="teal")
plt.title("Total Bill vs Tip with Regression Line")
plt.xlabel("Total Bill ($)")
plt.ylabel("Tip ($)")
plt.show()

Task: Load the iris dataset and create a correlation heatmap. Mask the upper triangle (redundant info), annotate values to 2 decimal places, use a diverging colormap centered at 0, and add a title explaining what the correlation reveals about sepal/petal measurements.

Show Solution

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

iris = sns.load_dataset("iris")
iris_numeric = iris.select_dtypes(include=['number'])
corr = iris_numeric.corr()

# Create mask for upper triangle
mask = np.triu(np.ones_like(corr, dtype=bool))

plt.figure(figsize=(8, 6))
sns.heatmap(corr, mask=mask, annot=True, fmt=".2f",
            cmap="RdBu_r", center=0, vmin=-1, vmax=1,
            square=True, linewidths=0.5)
plt.title("Iris Feature Correlations\n(Petal measurements are highly correlated)")
plt.tight_layout()
plt.show()

Task: Create a scatter plot of total_bill vs tip where: color represents day of week, point size represents party size, and marker style represents smoker status. Add a legend that explains all encodings.

Show Solution

import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")

plt.figure(figsize=(12, 7))
sns.scatterplot(data=tips, x="total_bill", y="tip",
                hue="day", size="size", style="smoker",
                sizes=(50, 200), alpha=0.7,
                palette="husl")

plt.title("Tipping Behavior Analysis\n(Color=Day, Size=Party, Shape=Smoker)")
plt.xlabel("Total Bill ($)")
plt.ylabel("Tip ($)")
plt.legend(bbox_to_anchor=(1.02, 1), loc='upper left')
plt.tight_layout()
plt.show()

Themes & Color Palettes

Seaborn provides built-in themes and color palettes that make your visualizations look professional with minimal effort. Understanding how to customize themes and colors is essential for creating publication-quality figures and maintaining visual consistency.

Built-in Themes

Seaborn comes with five built-in themes that control the overall appearance of your plots: darkgrid, whitegrid, dark, white, and ticks. Each theme is suitable for different contexts.

# Demonstrate different themes
fig, axes = plt.subplots(2, 3, figsize=(15, 8))
themes = ['darkgrid', 'whitegrid', 'dark', 'white', 'ticks']

for ax, theme in zip(axes.flat[:5], themes):
    with sns.axes_style(theme):
        sns.lineplot(x=[1, 2, 3, 4, 5], y=[1, 3, 2, 4, 3], ax=ax)
        ax.set_title(f"Theme: {theme}")

axes.flat[5].axis('off')  # Hide the empty subplot
plt.tight_layout()
plt.show()

Theme	Description	Best For
`darkgrid`	Dark background with grid lines	Default, general purpose
`whitegrid`	White background with grid lines	Presentations, publications
`dark`	Dark background without grid	Emphasizing data over structure
`white`	White background without grid	Clean, minimal look
`ticks`	White background with axis ticks	Scientific publications

Setting Themes Globally

You can set themes globally using sns.set_theme() or temporarily using the context manager sns.axes_style(). The global setting affects all subsequent plots.

# Set theme globally
sns.set_theme(style="whitegrid")

# Set theme with additional parameters
sns.set_theme(style="whitegrid", 
              palette="husl",
              font_scale=1.2)

# Temporary theme change
with sns.axes_style("dark"):
    # Plots here use dark style
    sns.histplot(data=tips, x="total_bill")
    plt.show()
# After the block, style reverts to global setting

Color Palettes

Seaborn offers many color palettes for different purposes: qualitative palettes for categorical data, sequential palettes for ordered data, and diverging palettes for data with a meaningful center point.

# Display color palettes
fig, axes = plt.subplots(3, 2, figsize=(12, 8))

palettes = [
    ('deep', 'Qualitative'),
    ('husl', 'Qualitative'),
    ('Blues', 'Sequential'),
    ('viridis', 'Sequential'),
    ('coolwarm', 'Diverging'),
    ('RdYlGn', 'Diverging')
]

for ax, (pal, ptype) in zip(axes.flat, palettes):
    sns.palplot(sns.color_palette(pal, 8), ax=ax)
    ax.set_title(f"{pal} ({ptype})")

plt.tight_layout()
plt.show()

Using Palettes in Plots

You can apply palettes to plots using the palette parameter. Palettes can be specified by name, as a list of colors, or generated using Seaborn functions.

# Using different palettes
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Named palette
sns.barplot(data=tips, x="day", y="total_bill", 
            palette="Set2", ax=axes[0, 0])
axes[0, 0].set_title("Set2 Palette")

# Custom palette
custom = ["#ff6b6b", "#4ecdc4", "#45b7d1", "#96ceb4"]
sns.barplot(data=tips, x="day", y="total_bill", 
            palette=custom, ax=axes[0, 1])
axes[0, 1].set_title("Custom Colors")

# Color Brewer palette
sns.boxplot(data=tips, x="day", y="total_bill", 
            palette="RdYlBu", ax=axes[1, 0])
axes[1, 0].set_title("RdYlBu Diverging")

# Cubehelix palette
sns.violinplot(data=tips, x="day", y="total_bill", 
               palette="ch:start=.2,rot=-.3", ax=axes[1, 1])
axes[1, 1].set_title("Cubehelix Palette")

plt.tight_layout()
plt.show()

Colorblind-Friendly: Use palette="colorblind" or palette="husl" for visualizations that need to be accessible to colorblind viewers.

Customizing Figure Size and Context

Use sns.set_context() to control the scale of plot elements for different contexts like papers, notebooks, talks, or posters.

# Set context for different use cases
contexts = ['paper', 'notebook', 'talk', 'poster']

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

for ax, context in zip(axes.flat, contexts):
    with sns.plotting_context(context):
        sns.lineplot(x=[1, 2, 3, 4], y=[1, 2, 1.5, 3], ax=ax)
        ax.set_title(f"Context: {context}")

plt.tight_layout()
plt.show()

Practice: Themes & Palettes

Task: Create a bar plot showing average tips by day with: "whitegrid" style, "talk" context (larger text for presentations), the "muted" color palette, and properly labeled axes. Remove the top and right spines for a cleaner look.

Show Solution

import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")

# Set presentation-ready theme
sns.set_theme(style="whitegrid", context="talk", palette="muted")

plt.figure(figsize=(10, 6))
sns.barplot(data=tips, x="day", y="tip", 
            order=['Thur', 'Fri', 'Sat', 'Sun'])
plt.title("Average Tips by Day of Week")
plt.xlabel("Day")
plt.ylabel("Tip Amount ($)")
sns.despine()  # Remove top and right spines
plt.tight_layout()
plt.show()

# Reset to default after
sns.set_theme()

Task: Create a 1x2 subplot with: (1) scatter plot of bill vs tip colored by time, (2) box plot of tips by day. Use the "colorblind" palette throughout, "ticks" style, "paper" context for publication, and add subplot labels (A, B) in the top-left corners.

Show Solution

import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")

# Publication-ready settings
sns.set_theme(style="ticks", context="paper", palette="colorblind")

fig, axes = plt.subplots(1, 2, figsize=(10, 4))

# Panel A: Scatter
sns.scatterplot(data=tips, x="total_bill", y="tip", 
                hue="time", ax=axes[0])
axes[0].set_title("Bill vs Tip by Time")
axes[0].text(-0.1, 1.05, 'A', transform=axes[0].transAxes, 
             fontsize=14, fontweight='bold')

# Panel B: Box plot
sns.boxplot(data=tips, x="day", y="tip", ax=axes[1],
            order=['Thur', 'Fri', 'Sat', 'Sun'])
axes[1].set_title("Tips by Day")
axes[1].text(-0.1, 1.05, 'B', transform=axes[1].transAxes, 
             fontsize=14, fontweight='bold')

sns.despine()  # Clean look
plt.tight_layout()
plt.savefig('publication_figure.png', dpi=300, bbox_inches='tight')
plt.show()

# Reset
sns.set_theme()

Key Takeaways

High-Level Interface

Seaborn provides a high-level API built on Matplotlib, enabling beautiful statistical visualizations with minimal code

Distribution Plots

Use histplot, kdeplot, boxplot, and violinplot to visualize data distributions and identify patterns

Categorical Comparisons

barplot, countplot, and swarmplot help compare values and distributions across categorical groups

Relationship Analysis

scatterplot, pairplot, and heatmap reveal correlations and relationships between multiple variables

Themes & Palettes

Built-in themes (darkgrid, whitegrid) and color palettes create professional, consistent visualizations

DataFrame Integration

Seaborn works seamlessly with pandas DataFrames, using column names directly as parameters

What You'll Learn

Contents

Introduction to Seaborn

What is Seaborn?

Why Use Seaborn Over Matplotlib?

Beautiful Defaults

Statistical Focus

DataFrame Integration

Installing and Importing Seaborn

Loading Built-in Datasets

Seaborn Plot Types Overview

Distribution Plots

Histogram with histplot()

Kernel Density Estimation with kdeplot()

Box Plots with boxplot()

Violin Plots with violinplot()

Practice: Distribution Plots

Medium Compare tip distributions between smokers and non-smokers using overlapping KDE

Hard Create a faceted histogram showing total bill distribution by day and time

Categorical Plots

Bar Plots with barplot()

Count Plots with countplot()

Strip and Swarm Plots

Combining Plot Types

Practice: Categorical Plots

Hard Visualize tip percentage by party size with violin and swarm overlay

Medium Compare average spending between lunch and dinner across all days

Relationship Plots

Scatter Plots with scatterplot()

Pair Plots with pairplot()

Heatmaps with heatmap()

Regression Plots with regplot() and lmplot()

Practice: Relationship Plots

Easy Explore relationship between bill and tip with regression line

Hard Build an iris species classification heatmap with masked upper triangle

Medium Create a multi-variable scatter exploring tipping behavior

Themes & Color Palettes

Built-in Themes

Setting Themes Globally

Color Palettes

Using Palettes in Plots

Customizing Figure Size and Context

Practice: Themes & Palettes

Medium Create a presentation-ready visualization with custom styling

Hard Design a colorblind-friendly multi-panel figure for publication

Key Takeaways

High-Level Interface

Distribution Plots

Categorical Comparisons

Relationship Analysis

Themes & Palettes

DataFrame Integration

Knowledge Check

1 Which Seaborn function would you use to visualize the distribution of a continuous variable as a smooth curve?

2 What does the `hue` parameter do in Seaborn plots?

3 Which function creates a grid of scatter plots for all pairs of numerical variables in a DataFrame?

4 What is the default estimator used by barplot() to calculate bar heights?

5 Which Seaborn theme would be most appropriate for a scientific publication?

6 What is the difference between stripplot() and swarmplot()?