Multivariate Analysis
Analyze three or more variables at once to find complex relationships.
๐ Topic 1: What is Multivariate Analysis?
Multivariate Analysis is the analysis of **three or more variables** simultaneously. This is the **"3D Glasses"** view of your dataโit reveals a deeper context that simple 2D plots miss.
For example, Bivariate analysis tells you that rich people pay more **Fare**. Multivariate analysis asks: *Do rich people still pay more Fare, even after controlling for their Age?*
We achieve this by assigning data variables to the visual properties of the plot: X-axis, Y-axis, **Color (Hue)**, **Size**, and **Shape (Style)**.
๐จ Topic 2: Multidimensional Plotting (Hue & Size)
We use **hue** and **size** parameters to represent the third and fourth variables directly on a Scatterplot.
1. hue (The Color Code):
Adds a **categorical** variable (like 'Sex' or 'Pclass'). Seaborn automatically colors the data points differently for each group, letting you compare the X-Y relationship across those groups.
2. size (The Scale):
Adds a **numerical** variable (like 'Family Size'). This changes the size of the plotted dot proportionally to the variable's value, adding a fourth dimension of information.
๐ป Example: 4-Variable Plot
# X=Age, Y=Fare, Color=Sex, Size=Pclass (4 variables!)
sns.scatterplot(x='Age', y='Fare', data=df, hue='Sex', size='Pclass')
plt.show()
๐ผ๏ธ Topic 3: The Grid View: sns.pairplot()
The **sns.pairplot()** function is the single fastest way to get a high-level overview of **all** relationships in your numerical data.
๐ก How Pair Plot Works:
It automatically creates a **matrix/grid** where each numerical variable is plotted against every other numerical variable. The diagonal often shows a **Histogram** or **KDE Plot** of the individual variable.
๐ป Example: Full Numerical Overview
# Analyze Age, Fare, and a third variable 'SibSp'
sns.pairplot(df[['Age', 'Fare', 'SibSp']])
plt.show()
๐ Topic 4: Combining Pair Plot with hue
You can combine the power of the **Pair Plot** with the insight of the **hue** parameter. This creates separate color groups for every scatterplot in the matrix, making it a powerful multivariate tool.
๐ป Example: Titanic Survival Prediction
# Visualize how Age and Fare relate for SURVIVED (1) vs DIED (0)
sns.pairplot(df[['Age', 'Fare', 'Pclass', 'Survived']], hue='Survived')
plt.show()
๐ Module Summary
- Multivariate Analysis: Study of 3 or more variables simultaneously.
- Hue: Assigns a **Color** dimension (categorical variable).
- Size: Assigns a **Scale/Radius** dimension (numerical variable).
- Pair Plot: Creates a quick matrix of all numerical relationships.
- Goal: Find deep interactions missed by simple 2D analysis.
๐ค Interview Q&A
Tap on the questions below to reveal the answers.
The primary benefit of hue is that it allows you to test for an interaction effect. It helps you determine if the relationship between your X and Y variables is different for different subgroups (e.g., if Age affects Salary only for men, but not for women).
A Pair Plot typically involves three or more numerical variables (N) and plots all N*(N-1)/2 unique bivariate relationships, plus the N distributions on the diagonal. You can add a fourth variable using hue.
The main limitation is that the human eye is poor at accurately judging differences in circle area. While hue is very precise, using size to represent a fourth variable can make the plot difficult to interpret accurately.
By default, the diagonal shows a Histogram (or a Kernel Density Estimate/KDE plot) for the distribution of the single variable listed in that row/column. This is used for quick Univariate Analysis.
โ ๏ธ CRITICAL NOTE: External Libraries
Meeru code lo **Seaborn** functions (sns.scatterplot, sns.pairplot) vadina, avi output console lo render avvalante:
- Import:
import matplotlib.pyplot as pltandimport seaborn as snskachitamga run avvali. - Output: Meeru prathi visualization taruvatha **
plt.show()** ni kachitamga call cheyali, appude plot browser ki send avtundi. - Installation: Nee server environment lo **Matplotlib** and **Seaborn** packages kachitamga **`pip install`** chesi undali.
Visualization rakapothe, idi most probably **Python packages missing** leda **Matplotlib output settings** problem avuthundi. Code logic (hue, size) mathram ee code lo correct ga undi.