AttributeError: 'list' object has no attribute 'words' in python gensim module

While training using doc2vec, I got this error:

AttributeError: 'list' object has no attribute 'words' in python gensim module

This is my code:

# Extracting titles from csv to list
with open(query+'_titles.csv', 'rb') as f: reader = csv.reader(f) titlelist = list(reader)
# build
model = doc2vec.Doc2Vec(size=30, window=1, alpha=0.01, min_count=2, sample=1e-5, workers=100)
model.build_vocab(titlelist)
titlearray = np.asarray(titlelist)
print 'Training Model...'

I use python 2.7.11 and gensim version is 3.2.0 if that helps. There must be something I am really missing.

1 Answer

Doc2Vec requires not just the list of sentences, but the list of tagged sentences. From this discussion on DS.SE:

In word2vec there is no need to label the words, because every word has their own semantic meaning in the vocabulary. But in case of doc2vec, there is a need to specify that how many number of words or sentences convey a semantic meaning, so that the algorithm could identify it as a single entity. For this reason, we are specifying labels or tags to sentence or paragraph depending on the level of semantic meaning conveyed.

Consequently, Gensim expects the following input:

sentences = [doc2vec.TaggedDocument(sentence, 'tag') for sentence in titlelist]
model.build_vocab(sentences)

Obviously, you might want to set different tags depending on the sentences to get meaningful vectors. By the way, are you sure you want to read CSV in binary mode?

Pop Glow

AttributeError: 'list' object has no attribute 'words' in python gensim module

1 Answer

Your Answer

Sign up or log in

Post as a guest

You Might Also Like

Minecraft Not Enough Items problem

Do these new maps exist in GTA V for PS3?

Master Quest - Does it reinforce the temple order?

How can I find the cars I buy on my character's cell phone?