利用BeautifulSoup4來(lái)抓取 www.pm25.com 上的PM2.5數(shù)據(jù),之所以抓取這個(gè)網(wǎng)站,是因?yàn)樯厦嬗谐鞘蠵M2.5濃度排名(其實(shí)真正的原因是,它是百度搜PM2.5出來(lái)的第一個(gè)網(wǎng)站!)
程序里只對(duì)比了兩個(gè)城市,所以多線程的速度提升并不是很明顯,大家可以弄10個(gè)城市并開10個(gè)線程試試。
最后吐槽一下:上海的空氣質(zhì)量怎么這么差?。。?/p>
PM25.py
代碼如下:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# by ustcwq
import urllib2
import threading
from time import ctime
from bs4 import BeautifulSoup
def getPM25(cityname):
site = 'http://www.pm25.com/' + cityname + '.html'
html = urllib2.urlopen(site)
soup = BeautifulSoup(html)
city = soup.find(class_ = 'bi_loaction_city') # 城市名稱
aqi = soup.find("a",{"class","bi_aqiarea_num"}) # AQI指數(shù)
quality = soup.select(".bi_aqiarea_right span") # 空氣質(zhì)量等級(jí)
result = soup.find("div",class_ ='bi_aqiarea_bottom') # 空氣質(zhì)量描述
print city.text + u'AQI指數(shù):' + aqi.text + u'\n空氣質(zhì)量:' + quality[0].text + result.text
print '*'*20 + ctime() + '*'*20
def one_thread(): # 單線程
print 'One_thread Start: ' + ctime() + '\n'
getPM25('hefei')
getPM25('shanghai')
def two_thread(): # 多線程
print 'Two_thread Start: ' + ctime() + '\n'
threads = []
t1 = threading.Thread(target=getPM25,args=('hefei',))
threads.append(t1)
t2 = threading.Thread(target=getPM25,args=('shanghai',))
threads.append(t2)
for t in threads:
# t.setDaemon(True)
t.start()
if __name__ == '__main__':
one_thread()
print '\n' * 2
two_thread()
聲明:本網(wǎng)頁(yè)內(nèi)容旨在傳播知識(shí),若有侵權(quán)等問題請(qǐng)及時(shí)與本網(wǎng)聯(lián)系,我們將在第一時(shí)間刪除處理。TEL:177 7030 7066 E-MAIL:11247931@qq.com