AttributeError: 'list' object has no attribute 'words' in python gensim module

While training using doc2vec, I got this error:

AttributeError: 'list' object has no attribute 'words' in python gensim module

This is my code:

# Extracting titles from csv to list
with open(query+'_titles.csv', 'rb') as f: reader = csv.reader(f) titlelist = list(reader)
# build
model = doc2vec.Doc2Vec(size=30, window=1, alpha=0.01, min_count=2, sample=1e-5, workers=100)
model.build_vocab(titlelist)
titlearray = np.asarray(titlelist)
print 'Training Model...'

I use python 2.7.11 and gensim version is 3.2.0 if that helps. There must be something I am really missing.

2

1 Answer

Doc2Vec requires not just the list of sentences, but the list of tagged sentences. From this discussion on DS.SE:

In word2vec there is no need to label the words, because every word has their own semantic meaning in the vocabulary. But in case of doc2vec, there is a need to specify that how many number of words or sentences convey a semantic meaning, so that the algorithm could identify it as a single entity. For this reason, we are specifying labels or tags to sentence or paragraph depending on the level of semantic meaning conveyed.

Consequently, Gensim expects the following input:

sentences = [doc2vec.TaggedDocument(sentence, 'tag') for sentence in titlelist]
model.build_vocab(sentences)

Obviously, you might want to set different tags depending on the sentences to get meaningful vectors. By the way, are you sure you want to read CSV in binary mode?

2

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like