How to scatter plot each group of a pandas DataFrame

I am making a scatter plot with the geyser dataset from seaborn. I am coloring the points based on the 'kind' column but for some reason, the legend only shows 'long' but leaves out 'short'. I don't know what I am missing. I also was wondering if there is a simpler way to color code the data one that does not use a for-loop. Thanks!

x = geyser_df['waiting']
y = geyser_df['duration']
col = []
for i in range(len(geyser_df)): if (geyser_df['kind'][i] == 'short'): col.append('MediumVioletRed') elif(geyser_df['kind'][i] == 'long'): col.append('Navy')
plt.scatter(x, y, c=col)
plt.legend(('long','short'))
plt.xlabel('Waiting')
plt.ylabel("Duration")
plt.suptitle("Waiting vs Duration")
plt.show()

2 Answers

The correct way to do this with pandas is with pandas.DataFrame.groupby and pandas.DataFrame.plot.
Tested in python 3.8.12, pandas 1.3.4, matplotlib 3.4.3, seaborn 0.11.2

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# load data
df = sns.load_dataset('geyser')
# plot
fig, ax = plt.subplots(figsize=(6, 4))
colors = {'short': 'MediumVioletRed', 'long': 'Navy'}
for kind, data in df.groupby('kind'): data.plot(kind='scatter', x='waiting', y='duration', label=kind, color=colors[kind], ax=ax)
ax.set(xlabel='Waiting', ylabel='Duration')
fig.suptitle('Waiting vs Duration')
plt.show()

The easiest way is with seaborn, a high-level API for matplotlib, where hue is used to separate groups by color.
- sns.scatterplot: an axes-level plot
- sns.relplot: a figure-level plot where kind='scatter' is the default plot style

fig, ax = plt.subplots(figsize=(6, 4))
colors = {'short': 'MediumVioletRed', 'long': 'Navy'}
sns.scatterplot(data=df, x='waiting', y='duration', hue='kind', palette=colors, ax=ax)
ax.set(xlabel='Waiting', ylabel='Duration')
fig.suptitle('Waiting vs Duration')
plt.show()

colors = {'short': 'MediumVioletRed', 'long': 'Navy'}
p = sns.relplot(data=df, x='waiting', y='duration', hue='kind', palette=colors, height=4, aspect=1.5)
ax = p.axes.flat[0] # extract the single subplot axes
ax.set(xlabel='Waiting', ylabel='Duration')
p.fig.suptitle('Waiting vs Duration', y=1.1)
plt.show()

You're passing x = geyser_df ['waiting'] and y = geyser_df ['duration'] as a single dataset which causes plt.scatter to only use as label="long" as legend. I don't have enough experience using this type of libraries but to reproduce the example you describe you need to write a program like this:

long = [[], []]
short = [[], []]
col=['MediumVioletRed', 'Navy']
for i in range(len(geyser_df["kind"])): if (geyser_df["kind"][i] == "long"): long[0].append([geyser_df['waiting'][i]]) long[1].append([geyser_df['duration'][i]]) else: short[0].append([geyser_df['waiting'][i]]) short[1].append([geyser_df['duration'][i]])
plt.scatter(long[0], long[1], c=col[1], label="long")
plt.scatter(short[0], short[1], c=col[0], label="short")
plt.legend()
plt.xlabel('Waiting')
plt.ylabel("Duration")
plt.suptitle("Waiting vs Duration")
plt.show()

Pop Glow

How to scatter plot each group of a pandas DataFrame

2 Answers

Your Answer

Sign up or log in

Post as a guest

You Might Also Like

Minecraft Not Enough Items problem

Do these new maps exist in GTA V for PS3?

Master Quest - Does it reinforce the temple order?

How can I find the cars I buy on my character's cell phone?