Project 4: Social Network Graph | Java DSA Course

Project Overview

This capstone project brings together everything you have learned in the Java DSA course. You will work with real social network data containing 50 users, 100 connections, and 50 posts with engagement metrics. Your goal is to implement graph algorithms that power features like friend suggestions, connection paths, and community detection - the same algorithms used by LinkedIn, Facebook, and Twitter.

Skills Applied: This project tests your proficiency in Graphs (adjacency list/matrix), BFS/DFS Traversals (friend recommendations), Dijkstra's Algorithm (shortest paths), Union-Find (communities), and HashMap/HashSet (efficient lookups).

Graph Structure

Adjacency list with weighted edges

Friend Suggestions

BFS/DFS based recommendations

Connection Paths

Dijkstra's shortest path algorithm

Communities

Union-Find for group detection

Learning Objectives

Technical Skills

Implement Graph using adjacency list representation
Build BFS traversal for friend-of-friend recommendations
Apply Dijkstra's algorithm for weighted shortest paths
Use Union-Find data structure for community detection
Calculate network metrics like degree centrality and influence

Problem-Solving Skills

Analyze real-world social network patterns
Understand graph algorithm time complexity
Handle large-scale graph operations efficiently
Design recommendation systems based on connections
Test with actual social network interaction data

Ready to submit? Already completed the project? Submit your work now!

Submit Now

Problem Scenario

ConnectHub

You have been hired as a Backend Engineer at ConnectHub, a growing social networking platform. The company is struggling with user engagement - their current system shows random friend suggestions and users can't discover how they're connected to others. The product team needs you to build an intelligent recommendation engine using graph algorithms.

"Our users want to see 'People You May Know' based on mutual friends, not random suggestions. They also want to know 'how am I connected to this person?' like LinkedIn's connection path feature. Can you build a graph-based system that handles these features efficiently?"

Marcus Thompson, VP of Product

Key Challenges to Solve

Friend Recommendations

Find friends-of-friends efficiently (2-hop connections)
Rank suggestions by mutual friend count
Filter already-connected users
Handle edge cases gracefully

Connection Paths

Find shortest path between two users
Consider connection strength as edge weight
Display the chain of connections
Handle disconnected users

Community Detection

Identify groups of closely connected users
Use Union-Find for efficient grouping
Detect isolated communities
Analyze community sizes and patterns

Influence Analysis

Calculate degree centrality (connections)
Identify most influential users
Analyze network topology metrics
Find bridge users between communities

Pro Tip: Think like a software engineer! Your graph implementation should be efficient, scalable, and handle real-world data patterns. Focus on algorithm optimization and clean code structure.

The Dataset

You will work with a comprehensive social network dataset containing real-world interaction patterns. Download the CSV files containing users, connections, and posts data for graph analysis:

Dataset Download

Download the social network dataset files and save them to your project folder. The CSV files contain all necessary data for graph algorithms and network analysis.

social_network_users.csv social_network_connections.csv social_network_posts.csv

Original Data Source

This project uses a curated Social Network Graph Dataset inspired by popular Kaggle datasets and real-world platforms like LinkedIn, Twitter, and Facebook. The dataset contains realistic user profiles, friendship connections with varying strengths, and engagement metrics. Perfect for learning graph algorithms and network analysis techniques used by major tech companies.

View on Kaggle Explore Similar Datasets

Dataset Schema

Column	Type	Description
`user_id`	String	Unique user identifier (U001-U050)
`username`	String	Username handle for the profile
`full_name`	String	User's full name
`email`	String	User's email address
`join_date`	Date	Account creation date (YYYY-MM-DD)
`location`	String	User's city/region
`followers_count`	Integer	Number of followers
`following_count`	Integer	Number of users followed
`posts_count`	Integer	Total posts published
`is_verified`	Boolean	Account verification status
`profile_status`	String	Active, Inactive, or Dormant

Column	Type	Description
`connection_id`	String	Unique connection identifier (C001-C100)
`user_id`	String	Source user (follower/connector)
`friend_id`	String	Target user (followed/connected)
`connection_type`	String	Friend, Follower, or Following
`connected_date`	Date	Date connection was established
`interaction_count`	Integer	Total interactions between users
`last_interaction`	Date	Date of most recent interaction
`strength_score`	Decimal	Connection strength (0.0-1.0), edge weight

Column	Type	Description
`post_id`	String	Unique post identifier (P001-P050)
`user_id`	String	User who created the post
`content`	String	Post text/content
`post_date`	Date	Date post was published (YYYY-MM-DD)
`likes_count`	Integer	Number of likes received
`comments_count`	Integer	Number of comments
`shares_count`	Integer	Number of shares
`post_type`	String	Text, Image, Video, or Article
`visibility`	String	Public, Friends Only, or Private
`engagement_score`	Decimal	Calculated engagement metric (0.0-100.0)

Project Requirements

Your project must include all of the following components using Java, graph data structures, and algorithms. Structure your implementation with clear organization and comprehensive documentation.

Graph Data Structure & CSV Parsing

Data Loading:

Parse users.csv and populate user profiles (User objects)
Parse connections.csv and build adjacency list graph
Parse posts.csv for engagement metrics (optional for recommendation scoring)
Validate data integrity and handle malformed CSV rows

Graph Construction:

Implement adjacency list using HashMap<String, List<Edge>>
Store connection strength as edge weights (0.0-1.0)
Support bidirectional friendship connections
Efficient O(1) neighbor lookup and O(E) edge iteration

Data Classes:

User.java - Store user profile with followers, following, posts count
Connection.java - Edge representation with strength score
SocialGraph.java - Main graph class with core operations

Deliverable: Java classes for User, Connection, and SocialGraph with CSV parsing and proper error handling.

Friend Recommendations (BFS Algorithm)

Algorithm Implementation:

Implement BFS to explore 2-hop connections from source user
Level 1: Direct friends (visited, exclude from recommendations)
Level 2: Friends-of-friends (candidates for recommendation)
Skip self-loops and already-connected users

Ranking & Filtering:

For each Level 2 user, count number of mutual friends
Rank candidates by mutual friend count (descending)
Return top N recommendations with scores
Handle users with no friends (return empty list)

Performance Requirements:

Time Complexity: O(V + E) for BFS traversal
Space Complexity: O(V) for visited set and result storage
Optimize with HashMap for constant-time lookups

Deliverable: FriendRecommender.java class with recommendFriends() method returning top-N recommendations with mutual friend counts.

Shortest Connection Path (Dijkstra's Algorithm)

Algorithm Implementation:

Implement Dijkstra's algorithm with binary heap (PriorityQueue)
Use connection strength as edge weight: cost = 1 - strength_score
Stronger connections (higher strength) = shorter distances (preferred)
Find optimal path maximizing total connection strength

Path Reconstruction:

Track parent pointers during algorithm execution
Reconstruct full path from source to destination
Return path as list of User objects or IDs
Include total distance/cost in result

Edge Cases:

Handle disconnected users (no path exists)
Handle direct connections (source == destination)
Handle single-hop paths (direct friends)

Deliverable: PathFinder.java class with findShortestPath() method returning the path with minimum cost (maximum strength).

Community Detection (Union-Find Algorithm)

Union-Find Implementation:

Implement Disjoint Set Union (Union-Find) data structure
Use int[] parent array to track set membership
Use int[] rank array for union-by-rank optimization
Implement path compression in find() method

Community Grouping:

Union users connected by edges with strength_score ≥ 0.7 (strong connections)
Identify distinct connected components (communities)
Return mapping of users to community IDs
Report community sizes and member lists

Optimizations:

Path compression in find() - O(α(n)) amortized time
Union by rank - maintain balanced tree structure
Overall complexity: Near O(1) per operation

Deliverable: CommunityDetector.java class with detectCommunities() method returning community assignments for all users.

Bonus Features (+75 points)

Influence Analysis (+50 pts)

Calculate network influence metrics:

Degree centrality for each user (connection count)
Weighted degree (sum of connection strengths)
Identify top 10 most influential users
Find bridge users connecting multiple communities
Betweenness centrality (times in shortest paths)

Network Visualization (+25 pts)

Generate visual representation:

Export graph in DOT format (Graphviz)
Color-code nodes by community membership
Size nodes by degree centrality
Label edges with connection strength
Highlight bridge users and communities

Algorithm Complexity Reference

Understanding the time and space complexity of each algorithm is critical for performance and scalability. These are the expected complexities for your implementations.

BFS (Friend Recommendations)

Time: O(V + E)
Space: O(V)
Why: Visit each vertex and edge once

With 50 users and 100 connections: ~150 operations

Dijkstra's (Shortest Path)

Time: O((V+E) log V)
Space: O(V)
Why: Extract-min from heap V times

With 50 users and 100 connections: ~600 operations

Union-Find (Communities)

Time: O(α(n)) per op
Space: O(V)
Why: Path compression ≈ constant

With 50 users: ~50 near-constant ops

DFS (Graph Traversal)

Time: O(V + E)
Space: O(V)
Why: Stack depth + edges

With 50 users and 100 connections: ~150 operations

Data Structures

Choose the right data structure for each component. Efficiency matters when working with graphs!

Adjacency List

HashMap<String, List<Edge>>

Map user ID → list of neighbors
O(1) neighbor access
O(E) space for edges

Priority Queue

PriorityQueue<Vertex>

Extract min distance in O(log V)
Used for Dijkstra's algorithm
Custom Comparator by distance

Union-Find

int[] parent, int[] rank

Path compression in find()
Union by rank optimization
Nearly O(1) per operation

Hash Collections

HashMap, HashSet

O(1) average lookups
Visited tracking (BFS/DFS)
User/connection storage

Submission Requirements

GitHub Repository

Create a public GitHub repository named social-network-graph with complete source code, data files, test cases, and comprehensive documentation. The repository will be verified for required files and code structure.

Submit Your Project

Required Repository Structure

social-network-graph/

├── src/

│ ├── User.java

│ ├── Connection.java

│ ├── SocialGraph.java

│ ├── FriendRecommender.java

│ ├── PathFinder.java

│ └── CommunityDetector.java

├── test/

│ └── SocialGraphTest.java (or use JUnit)

├── data/

│ ├── social_network_users.csv

│ ├── social_network_connections.csv

│ └── social_network_posts.csv

├── README.md

└── .gitignore

README.md Must Include

Project Details

Your full name and submission date
Project overview and features
Core algorithms implemented
Bonus features (if any)

Technical Documentation

Time/space complexity analysis
Data structure explanations
Compilation & execution instructions
Sample output for each feature

Design & Implementation

Design decisions and trade-offs
Graph representation explanation
Algorithm choice justification
Edge case handling

Testing & Results

Test cases with expected results
Performance on dataset
Screenshots of output
Known limitations

Grading Rubric

Your project will be evaluated on correctness of implementation, code quality, and comprehensive documentation. Bonus features offer additional points for advanced work.

Component	Points	Criteria
Graph Construction	100	Correct CSV parsing, adjacency list implementation, bidirectional edges, weight handling
Friend Recommendations (BFS)	100	Correct BFS traversal, 2-hop detection, ranking by mutual friends, proper exclusions
Shortest Path (Dijkstra's)	100	Correct algorithm implementation, priority queue usage, path reconstruction, edge weighting
Community Detection (Union-Find)	100	Correct Union-Find implementation, path compression, union by rank, threshold-based grouping
Code Quality	50	Clean code structure, meaningful variable names, proper comments, efficient implementation
Documentation & Testing	50	Comprehensive README, complexity analysis, test cases, sample output, design explanation
Bonus: Influence Analysis	+50	Degree centrality calculation, top influencers, bridge user detection
Bonus: Network Visualization	+25	DOT format export, community coloring, node sizing by influence
Total	500	Base score + up to +75 bonus points

Tip: Bonus features can boost your score significantly. Start with core features, then add influence analysis and visualization if you have time. Both bonuses are worth pursuing!

Social Network Graph System

What You Will Build

Contents

Project Overview

Graph Structure

Friend Suggestions

Connection Paths

Communities

Learning Objectives

Technical Skills

Problem-Solving Skills

Problem Scenario

ConnectHub

Key Challenges to Solve

The Dataset

Dataset Download

Original Data Source

Dataset Schema

Project Requirements

Graph Data Structure & CSV Parsing

Friend Recommendations (BFS Algorithm)

Shortest Connection Path (Dijkstra's Algorithm)

Community Detection (Union-Find Algorithm)

Bonus Features (+75 points)

Influence Analysis (+50 pts)

Network Visualization (+25 pts)

Algorithm Complexity Reference

Data Structures

Submission Requirements

GitHub Repository

Required Repository Structure

README.md Must Include

Project Details

Technical Documentation

Design & Implementation

Testing & Results

Grading Rubric

Social Network Graph System

What You Will Build

Contents

Project Overview

Graph Structure

Friend Suggestions

Connection Paths

Communities

Learning Objectives

Technical Skills

Problem-Solving Skills

Problem Scenario

ConnectHub

Key Challenges to Solve

The Dataset

Dataset Download

Original Data Source

Dataset Schema

1 social_network_users.csv (50 user profiles)

2 social_network_connections.csv (100 friendships/edges)

3 social_network_posts.csv (50 posts with engagement)

Project Requirements

Graph Data Structure & CSV Parsing

Friend Recommendations (BFS Algorithm)

Shortest Connection Path (Dijkstra's Algorithm)

Community Detection (Union-Find Algorithm)

Bonus Features (+75 points)

Influence Analysis (+50 pts)

Network Visualization (+25 pts)

Algorithm Complexity Reference

Data Structures

Submission Requirements

GitHub Repository

Required Repository Structure

README.md Must Include

Project Details

Technical Documentation

Design & Implementation

Testing & Results

Grading Rubric