当前位置: 代码迷 >> 综合 >> LightGBM预测饭店流量1: 数据处理
  详细解决方案

LightGBM预测饭店流量1: 数据处理

热度:67   发布时间:2023-12-20 21:56:33.0
  • 饭店流量数据
import pandas as pd
air_visit = pd.read_csv('air_visit_data.csv')
air_visit.index = pd.to_datetime(air_visit['visit_date'])
air_visit.head()

在这里插入图片描述

# 按天来算
air_visit = air_visit.groupby('air_store_id').apply(lambda i: i['visitors'].resample('1d'))# 缺失值填0
air_visit['visit_date'] = air_visit['visit_date'].dt.strftime('%Y-%m-%d')
air_visit['was_mil'] = air_visit['visitors'].isnull()
air_visit['visitors'].fillna(0, inplace = True)
air_visit.head()

在这里插入图片描述

  • 日历数据

shift()操作对数据进行移动, 可以观察前一天和后一天是不是节假日

date_info = pd.read_csv('date_info.csv')date_info.rename(columns={
    'holiday_flg':'is_holiday', 'calendar_date':'visit_date'}, inplace=True)date_info['prev_day_is_holiday'] = date_info['is_holiday'].shift().fillna(0)
date_info['next_day_is_holiday'] = date_info['is_holiday'].shift(-1).fillna(0)
date_info.head()

在这里插入图片描述

  • 地区数据
air_store_info = pd.read_csv('air_store_info.csv')
air_store_info.head()

在这里插入图片描述

  • 测试集
import numpy as np
submission = pd.read_csv('sample_submission.csv')submission['air_store_id'] = submission['id'].str.slice(0, 20)
submission['visit_date'] = submission['id'].str.slice(21)
submission['is_test'] = True
submission['visitors'] = na.nan
submission['test_number'] = range(len(submission))
submission.head()

在这里插入图片描述

数据汇总

print(air_visit.shape, submission.shape)
data = pd.concat((air_visit, submission.drop('id', axis = 'columns')))
print(data.shape)
data.head()

(296279, 4) (32019, 6)
(328298, 6)
在这里插入图片描述

data['is_test'].fillna(False, inplace = True)
print(data_info.shape, data.shape)
data = pd.merge(left=data, right=date_info, on='visit_date', how='left')print(air_store_info.shape, data.shape)
data = pd.merge(left=data, right=air_store_info, on='air_store_id', how='left'
  相关解决方案