糖尿病康复,内容丰富有趣,生活中的好帮手!
糖尿病康复 > python年龄阶段划分_在Python Pandas中对年龄列进行分组/分类

python年龄阶段划分_在Python Pandas中对年龄列进行分组/分类

时间:2023-12-14 01:13:16

相关推荐

python年龄阶段划分_在Python Pandas中对年龄列进行分组/分类

我有一个数据框说df。 df有一列'Ages'

>>> df['Age']

我想分组这个年龄并创建一个类似这样的新列

If age >= 0 & age < 2 then AgeGroup = Infant

If age >= 2 & age < 4 then AgeGroup = Toddler

If age >= 4 & age < 13 then AgeGroup = Kid

If age >= 13 & age < 20 then AgeGroup = Teen

and so on .....

如何使用Pandas库实现此目的。

我尝试过这样的事情

X_train_data['AgeGroup'][ X_train_data.Age < 13 ] = 'Kid'

X_train_data['AgeGroup'][ X_train_data.Age < 3 ] = 'Toddler'

X_train_data['AgeGroup'][ X_train_data.Age < 1 ] = 'Infant'

但这样做我得到了这个警告

/Users/Anand/miniconda3/envs/learn/lib/python3.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: /pandas-docs/stable/indexing.html#indexing-view-versus-copy

This is separate from the ipykernel package so we can avoid doing imports until

/Users/Anand/miniconda3/envs/learn/lib/python3.7/site-packages/ipykernel_launcher.py:4: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame

如何避免此警告并以更好的方式执行此操作。

pandas的可能重复基于来自其他列的值创建新列

-1的预期输出是多少?

@jezrael可能是未知的

使用带参数right=False的pandas.cut不包括bin的最右边:

X_train_data = pd.DataFrame({'Age':[0,2,4,13,35,-1,54]})

bins= [0,2,4,13,20,110]

labels = ['Infant','Toddler','Kid','Teen','Adult']

X_train_data['AgeGroup'] = pd.cut(X_train_data['Age'], bins=bins, labels=labels, right=False)

print (X_train_data)

Age AgeGroup

0 0 Infant

1 2 Toddler

2 4 Kid

3 13 Teen

4 35 Adult

5 -1 NaN

6 54 Adult

最后替换缺失值使用add_categories和fillna:

X_train_data['AgeGroup'] = X_train_data['AgeGroup'].cat.add_categories('unknown')

.fillna('unknown')

print (X_train_data)

Age AgeGroup

0 0 Infant

1 2 Toddler

2 4 Kid

3 13 Teen

4 35 Adult

5 -1 unknown

6 54 Adult

bins= [-1,0,2,4,13,20, 110]

labels = ['unknown','Infant','Toddler','Kid','Teen', 'Adult']

X_train_data['AgeGroup'] = pd.cut(X_train_data['Age'], bins=bins, labels=labels, right=False)

print (X_train_data)

Age AgeGroup

0 0 Infant

1 2 Toddler

2 4 Kid

3 13 Teen

4 35 Adult

5 -1 unknown

6 54 Adult

请编辑以显示-1如何设置为Unknown @jezrael

@AnandSiddharth - 答案已被编辑。

我们可以将-1分配给bins并完成它

@AnandSiddharth - 是的,这是更好的解决方案;)

所以看起来像这样?bins= [-1, 0,2,4,13,20,110] labels = ['Unknown', 'Infant','Toddler','Kid','Teen','Adult']

@AnandSiddharth - 是的,exaclty,答案被编辑。

只需使用:

X_train_data.loc[(X_train_data.Age < 13), 'AgeGroup'] = 'Kid'

如果觉得《python年龄阶段划分_在Python Pandas中对年龄列进行分组/分类》对你有帮助,请点赞、收藏,并留下你的观点哦!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。