DSPython Logo DSPython

SQL Set Operations

Combine the results of two or more independent SQL queries using UNION, INTERSECT, and EXCEPT.

SQL Intermediate 45 min

💡 Introduction: Set Theory in SQL

**Set Operations** are used to combine the **result sets of two or more independent queries** based on mathematical set theory (like **Venn Diagrams**).

Unlike a JOIN, which combines columns horizontally, Set Operations combine the **rows vertically**. This requires the columns in the participating tables to be fully compatible.

[Image of Venn diagram illustrating SQL set operations]

⬆️ Topic 1: UNION vs UNION ALL

The **UNION** operator is the most common set operation. It stacks the rows of two or more queries into a single result.

⚠️ Key Compatibility Rule:

  • The number of columns in all queries must be **identical**.
  • The corresponding columns must have **compatible data types**.

UNION (Distinct Union):

The standard **UNION** automatically removes duplicate rows from the final result set.

UNION ALL (Full Union):

The **UNION ALL** operator keeps all rows, including duplicates. Since it doesn't have to check for duplicates, **UNION ALL is faster** than UNION.

💻 Example: UNION ALL

-- Combine data from two regions, keeping all rows
SELECT Name, City FROM Customers_North
UNION ALL
SELECT Name, City FROM Customers_South;

✂️ Topic 2: INTERSECT and EXCEPT

These operations allow you to compare results to find commonalities or differences between the two sets of data.

1. INTERSECT:

Returns only the rows that are present in **both** the first and the second query results. (The overlapping area of the Venn Diagram).

2. EXCEPT / MINUS:

Returns rows that are in the **first query** result but **NOT** in the second query result. (The difference).

💻 Example: Finding Users with Access in Both Systems

-- Find users who are both 'Managers' AND 'Developers'
SELECT UserID FROM Managers
INTERSECT
SELECT UserID FROM Developers;

⚔️ Topic 3: The Key Distinction: JOIN vs SET

This is the most crucial distinction for interviews. While both operations combine data, they do so on different axes.

JOIN: Horizontal Combination

Combines tables side-by-side, **adding columns**. Requires a common key (e.g., CustomerID).

UNION: Vertical Combination

Combines tables end-to-end, **adding rows**. Requires columns to match up in type and number.

💻 Example: When to use UNION ALL

-- If you get data logs from two separate servers (Log_A and Log_B)
-- and need all of them, use UNION ALL to merge the results quickly.
SELECT Timestamp, ErrorCode FROM Log_A
UNION ALL
SELECT Timestamp, ErrorCode FROM Log_B;

📚 Module Summary

  • UNION: Combines rows, removes duplicates (Slower).
  • UNION ALL: Combines rows, keeps duplicates (Faster).
  • INTERSECT: Finds rows present in both queries.
  • EXCEPT: Finds rows in Query 1 that are missing in Query 2.
  • Compatibility: All participating queries must have the same column count and types.

🤔 Interview Q&A

Tap on the questions below to reveal the answers.

UNION ALL is faster because it skips the internal process of sorting the entire result set and checking for duplicate rows, which is mandatory for the standard UNION operator.

UNION stacks rows vertically, meaning the result must fit under a single column structure. JOIN combines tables horizontally, meaning the columns from both tables are added side-by-side, so only the linking key must match.

If the result sets have different column names but the same number of columns and compatible types, **INTERSECT will still run**, but it will use the column names from the first query in the final result set.

You use EXCEPT when comparing two large and complex result sets. While NOT IN works with subqueries, EXCEPT often offers a cleaner, more optimized way to find the difference between two full queries, especially in modern SQL systems.

🤖
DSPython AI Assistant
👋 Hi! I’m your AI assistant. Paste your code here, I will find bugs for you.