问题描述
我是Python的新手。 我昨天刚开始。 我想抓取一个网站并在字典中收集数据。 所有导入都在python脚本的开头添加
title_and_urls = {} #dictionary
totalNumberOfPages = 12
for x in range(1,int(totalNumberOfPages)+1):
url_pages = 'https://abd.com/api?&page=' +str(x)+'&year=2017'
resp = requests.get(url_pages, timeout=60)
soup = BeautifulSoup(resp.text, 'lxml')
for div in soup.find_all('div', {"class": "block2"}):
a = div.find('a')
h3 = a.find('h3')
print(h3,url_pages) #prints correct
title_and_urls[h3.text] = base_enthu_url+a.attrs['href']
print(title_and_urls)
with open('dict.csv', 'wb') as csv_file:
writer = csv.writer(csv_file)
for key, value in title_and_urls.items():
writer.writerow([key, value])
这里有一些问题1.我共有12页,但是跳过了第7页和第8页2.打印行print(h3,url_pages)
打印了60个项目,而csv文件只有36个。
我感谢所有帮助和解释。 请提出最佳做法
1楼
使用try函数
**title_and_urls = {} #dictionary
totalNumberOfPages = 12
for x in range(1,int(totalNumberOfPages)+1):
try:
url_pages = 'https://abd.com/api?&page=' +str(x)+'&year=2017'
resp = requests.get(url_pages, timeout=60)
soup = BeautifulSoup(resp.text, 'lxml')
for div in soup.find_all('div', {"class": "block2"}):
a = div.find('a')
h3 = a.find('h3')
print(h3,url_pages) #prints correct
title_and_urls[h3.text] = base_enthu_url+a.attrs['href']
except:
pass
with open('dict.csv', 'wb') as csv_file:
writer = csv.writer(csv_file)
for key, value in title_and_urls.items():
writer.writerow([key, value])**