当前位置: 代码迷 >> python >> 在纽约时报列表中加入关键字
  详细解决方案

在纽约时报列表中加入关键字

热度:55   发布时间:2023-06-13 17:17:53.0

对于返回关键字列表的每篇文章。 我们希望使用键->值将所有单词连接到一个列表中,如下所示。 在执行附加操作之前,我想从列表中删除“ u”。 然后,我们想比较两个列表中有多少个普通单词并返回结果。

dic['keywords']返回的示例列表:

第一条返回:

    [
  {
    u'value': u'Dunford, Joseph F Jr',
    u'name': u'persons',
    u'rank': u'1'
  },
  {
    u'value': u'Afghanistan',
    u'name': u'glocations',
    u'rank': u'1'
  },
  {
    u'value': u'Afghan National Police',
    u'name': u'organizations',
    u'rank': u'1'
  },
  {
    u'value': u'Afghanistan War (2001- )',
    u'name': u'subject',
    u'rank': u'1'
  },
  {
    u'value': u'Defense and Military Forces',
    u'name': u'subject',
    u'rank': u'2'
  }
]

第二条退货:

[
  {
    u'value': u'Gall, Carlotta',
    u'name': u'persons',
    u'rank': u'1'
  },
  {
    u'value': u'Gannon, Kathy',
    u'name': u'persons',
    u'rank': u'2'
  },
  {
    u'value': u'Niedringhaus, Anja (1965-2014)',
    u'name': u'persons',
    u'rank': u'3'
  },
  {
    u'value': u'Kabul (Afghanistan)',
    u'name': u'glocations',
    u'rank': u'2'
  },
  {
    u'value': u'Afghanistan',
    u'name': u'glocations',
    u'rank': u'1'
  },
  {
    u'value': u'Afghan National Police',
    u'name': u'organizations',
    u'rank': u'1'
  },
  {
    u'value': u'Afghanistan War (2001- )',
    u'name': u'subject',
    u'rank': u'1'
  }
]

所需输出:

List1 = ['Dunford, Joseph F Jr',’ Afghanistan’, ‘Afghan National Police’, ‘: Afghanistan War (2001- )’, ‘Defense and Military Forces’]
List2 = [‘Gall, Carlotta'’,’ u'Gannon, Kathy',’ Niedringhaus, Anja (1965-2014)’,’Afghanistan’]

共有关键字:2

我的代码如下:

  from flask import Flask, render_template, request, session, g, redirect, url_for
  from nytimesarticle import articleAPI

  api = articleAPI('X')

articles = api.search( q = 'Afghan War',
 fq = {'headline':'', 'source':['Reuters','AP', 'The New York Times']},
 begin_date = 20111231 )

def parse_articles(articles):
'''
This function takes in a response to the NYT api and parses
the articles into a list of dictionaries
'''
news = []
for i in articles['response']['docs']:
    dic = {}
    dic['id'] = i['_id']
    if i['abstract'] is not None:
        dic['abstract'] = i['abstract'].encode("utf8")
    dic['headline'] = i['headline']['main'].encode("utf8")
    dic['desk'] = i['news_desk']
    dic['date'] = i['pub_date'][0:10] # cutting time of day.
    dic['section'] = i['section_name']
    dic['keywords'] = i['keywords']
    print dic['keywords']
    if i['snippet'] is not None:
        dic['snippet'] = i['snippet'].encode("utf8")
    dic['source'] = i['source']
    dic['type'] = i['type_of_material']
    dic['url'] = i['web_url']
    dic['word_count'] = i['word_count']
    # locations
    locations = []
    for x in range(0,len(i['keywords'])):
        if 'glocations' in i['keywords'][x]['name']:
            locations.append(i['keywords'][x]['value'])
    dic['locations'] = locations
    # subject
    subjects = []
    for x in range(0,len(i['keywords'])):
        if 'subject' in i['keywords'][x]['name']:
            subjects.append(i['keywords'][x]['value'])
    dic['subjects'] = subjects
    news.append(dic)
return(news)

print(parse_articles(articles))

您可以使用列表推导根据给定的字典构建列表。

d = [{u'value': u'Dunford, Joseph F Jr', u'name': u'persons', u'rank': u'1'}, {u'value': u'Afghanistan', u'name': u'glocations', u'rank': u'1'}, {u'value': u'Afghan National Police', u'name': u'organizations', u'rank': u'1'}, {u'value': u'Afghanistan War (2001- )', u'name': u'subject', u'rank': u'1'}, {u'value': u'Defense and Military Forces', u'name': u'subject', u'rank': u'2'}]
print [v['value'] for v in d] # prints [u'Dunford, Joseph F Jr', u'Afghanistan', u'Afghan National Police', u'Afghanistan War (2001- )', u'Defense and Military Forces']