Cannot convert the series to [closed]

I have a set of data with an Age column. I want to remove all the rows that are aged more than 90 and less than 1856.

This is the head of the dataframe:

Enter image description here

This is what I attempted:

Enter image description here

2

3 Answers

Your error is on line 2. df['intage'] = int(df['age']) is not valid, and you can't pass a pandas series to the int function.

You need to use astype if df['age'] is object dtype.

df['intage'] = df['age'].astype(int)

Or since you are subtracting two dates, you need to use the dt accessor with the days attribute to get the number of days as an integer:

df['intage'] = df['age'].dt.days
2

Since the dtype is timedelta64[ns] you can either use between, specifying two timedeltas as the endpoints, or you can first convert the days to a numeric type using numpy.

Setup

import pandas as pd
import numpy as np
df = pd.DataFrame({'age': [83, 108, 83, 63, 81]})
df['age'] = pd.to_timedelta(df.age, unit='days')

Find those between 82 and 107 days:

df[df.age.between(pd.to_timedelta(82, unit='days'), pd.to_timedelta(107, unit='days'))]
# age
#0 83 days
#2 83 days

With numpy

df[((1, 'D')).between(82, 107)]
# age
#0 83 days
#2 83 days

One solution would be to extract days from the timedelta variables you have within the age column.

In the below toy example, you can see how you can achieve that:

import pandas as pd
import datetime
from datetime import timedelta as td
# Create example DataFrame
df = pd.DataFrame([td(83),td(108),td(83),td(63),td(81)], columns=["age"])
print df
# Get days from timedeltas
df.age = df.age.apply(lambda x: x.days)
print df
# Filter ages
df = df[df.age.between(91,1956, inclusive=True)]
print df

It results in the below output:

>>> age
0 83 days
1 108 days
2 83 days
3 63 days
4 81 days age
0 83
1 108
2 83
3 63
4 81 age
1 108

You Might Also Like