问题描述
尝试用单个标签列出所有单词。 当我将不同的评论分为单词列表时,然后尝试将它们添加到名为pos / neg_bag_of_words的变量中。 这似乎适用于一个评论,但是当我遍历整个评论语料库时,它似乎覆盖了一个标签的先前单词列表,而另一个标签列表的值为None。 我究竟做错了什么?
review1 = 'this dumbest films ever seen rips nearly ever'
review2 = 'whole mess there plot afterthought \
acting goes there nothing good nothing honestly cant \
understand this type nonsense gets produced actually'
review3 = 'released does somebody somewhere some stage think this \
really load shite call crap like this that people'
review4 = 'downloading illegally trailer looks like completely \
different film least have download haven wasted your \
time money waste your time this painful'
labels = 'POSITIVE', 'NEGATIVE', 'NEGATIVE', 'POSITIVE'
reviews = [review1, review2, review3, review4]
for review, label in zip(reviews, labels):
pos_bag_of_words = []
neg_bag_of_words = []
if label == 'NEGATIVE':
# neg_bag_of_words.extend(list(review.split()))
neg_bag_of_words = list(review.split()) + neg_bag_of_words
if label == 'POSITIVE':
# pos_bag_of_words.extend(list(review.split()))
pos_bag_of_words = list(review.split()) + pos_bag_of_words
退货
#There are positive words in the entire corpus... but I get nothing
>>> pos_bag_of_words
['downloading',
'illegally',
'trailer',
'looks',
'like',
'completely',
'different',
'film',
'least',
'have',
'download',
'haven',
'wasted',
'your',
'time',
'money',
'waste',
'your',
'time',
'this',
'painful']
>>> neg_bag_of_words
[]
1楼
您应该将neg_bag_of_words
和pos_bag_of_words
初始化放在for
循环之外。
否则,每次执行for
循环时,您的列表都会重新初始化为空列表。
这就是为什么neg_bag_of_words
一无所获的neg_bag_of_words
。
做这样的事情:
pos_bag_of_words = []
neg_bag_of_words = []
for review, label in zip(reviews, labels):
if label == 'NEGATIVE':
neg_bag_of_words = list(review.split()) + neg_bag_of_words
if label == 'POSITIVE':
pos_bag_of_words = list(review.split()) + pos_bag_of_words