Measurements for Shenyang, Chengdu, Beijing, Guangzhou, and Shanghai
数据来源:https://www.kaggle.com/uciml/pm25-data-for-five-chinese-cities
北京PM2.5随时间变化情况
数据列
The time period for this data is between Jan 1st, 2010 to Dec 31st, 2015. Missing data are denoted as NA.
- No: row number
- year: year of data in this row
- month: month of data in this row
- day: day of data in this row
- hour: hour of data in this row
- season: season of data in this row
- PM: PM2.5 concentration (ug/m^3)
- DEWP: Dew Point (Celsius Degree)
- TEMP: Temperature (Celsius Degree)
- HUMI: Humidity (%)
- PRES: Pressure (hPa)
- cbwd: Combined wind direction
- Iws: Cumulated wind speed (m/s)
- precipitation: hourly precipitation (mm)
- Iprec: Cumulated precipitation (mm)
将数据中的分离的时间字段重组为时间序列
period = pd.PeriodIndex(year=df['year'], month=df['month'], day=df['day'], hour=df['hour'], freq='H')
df['datetime'] = period
时间频率freq
将datetime设置为Index
- inplace:True替换原有数据,默认False返回新对象
df.set_index('datetime', inplace=True)
数据较多,取一个月的均值
df = df.resample('M').mean()
代码
import pandas as pd
from matplotlib import pyplot as pltfile_path = './data/BeijingPM20100101_20151231.csv'
df = pd.read_csv(file_path)# 将数据中的分离的时间字段重组为时间序列
period = pd.PeriodIndex(year=df['year'], month=df['month'], day=df['day'], hour=df['hour'], freq='H')
df['datetime'] = period# 将datetime指定为index
df.set_index('datetime', inplace=True)
# 取1个月的均值
df = df.resample('M').mean()# US检测的数据
data = df['PM_US Post']data_china = df['PM_Nongzhanguan']_x = [i.strftime('%Y%m%d') for i in data.index]
_x_china = [i.strftime('%Y%m%d') for i in data_china.index]
_y = data.values
_y_china = data_china.valuesplt.rcParams['font.sans-serif'] = ['SimHei']
plt.figure(figsize=(20, 8))
plt.plot(range(len(_x)), _y, label='US')
plt.plot(range(len(_x_china)), _y_china, label='农展馆')
plt.xticks(range(0, len(_x_china), 3), list(_x_china)[::3])
plt.legend()
plt.xlabel('日期')
plt.ylabel('PM2.5浓度(ug/m^3)')
plt.title('北京2010-2015年PM2.5变化趋势')
plt.show()
五地PM2.5数据对比
同一取US监测数据
import pandas as pd
from matplotlib import pyplot as pltbej_file_path = './data/BeijingPM20100101_20151231.csv'
ctu_file_path = './data/ChengduPM20100101_20151231.csv'
snh_file_path = './data/ShanghaiPM20100101_20151231.csv'
gnz_file_path = './data/GuangzhouPM20100101_20151231.csv'
shy_file_path = './data/ShenyangPM20100101_20151231.csv'bej_df = pd.read_csv(bej_file_path)
ctu_df = pd.read_csv(ctu_file_path)
snh_df = pd.read_csv(snh_file_path)
gnz_df = pd.read_csv(gnz_file_path)
shy_df = pd.read_csv(shy_file_path)# 将数据中的分离的时间字段重组为时间序列
period = pd.PeriodIndex(year=bej_df['year'], month=bej_df['month'], day=bej_df['day'], hour=bej_df['hour'], freq='H')bej_df['datetime'] = period
ctu_df['datetime'] = period
snh_df['datetime'] = period
gnz_df['datetime'] = period
shy_df['datetime'] = period# 将datetime指定为index
bej_df.set_index('datetime', inplace=True)
ctu_df.set_index('datetime', inplace=True)
snh_df.set_index('datetime', inplace=True)
gnz_df.set_index('datetime', inplace=True)
shy_df.set_index('datetime', inplace=True)# 取1个月的均值
bej_df = bej_df.resample('M').mean()
ctu_df = ctu_df.resample('M').mean()
snh_df = snh_df.resample('M').mean()
gnz_df = gnz_df.resample('M').mean()
shy_df = shy_df.resample('M').mean()# 取US检测的数据
bej_data = bej_df['PM_US Post']
ctu_data = ctu_df['PM_US Post']
snh_data = snh_df['PM_US Post']
gnz_data = gnz_df['PM_US Post']
shy_data = shy_df['PM_US Post']_x_bej = [i.strftime('%Y%m%d') for i in bej_data.index]
_x_ctu = [i.strftime('%Y%m%d') for i in ctu_data.index]
_x_snh = [i.strftime('%Y%m%d') for i in snh_data.index]
_x_gnz = [i.strftime('%Y%m%d') for i in gnz_data.index]
_x_shy = [i.strftime('%Y%m%d') for i in shy_data.index]_y_bej = bej_data.values
_y_ctu = ctu_data.values
_y_snh = snh_data.values
_y_gnz = gnz_data.values
_y_shy = shy_data.valuesplt.rcParams['font.sans-serif'] = ['SimHei']
plt.figure(figsize=(20, 8))
plt.plot(range(len(_x_bej)), _y_bej, label='北京')
plt.plot(range(len(_x_ctu)), _y_ctu, label='成都')
plt.plot(range(len(_x_snh)), _y_snh, label='上海')
plt.plot(range(len(_x_gnz)), _y_gnz, label='广州')
plt.plot(range(len(_x_shy)), _y_shy, label='沈阳')
plt.xticks(range(0, len(_x_bej), 3), list(_x_bej)[::3])
plt.legend()
plt.xlabel('日期')
plt.ylabel('PM2.5浓度(ug/m^3)')
plt.title('五地2010-2015年PM2.5变化趋势')
plt.show()