缺少日期的Pandas Date MultiIndex-滚动总和_python

我有一个看起来像的熊猫系列

Attribute      DateEvent     Value
Type A         2015-04-01    4
               2015-04-02    5
               2015-04-05    3
Type B         2015-04-01    1
               2015-04-03    4
               2015-04-05    1

我如何将这些值转换为滚动总和（例如，过去两天），同时确保考虑到DateEvent索引中的缺失日期（假设其开始日期和结束日期是完整范围）（例如， 2015-04-03类型A缺少2015-04-03和2015-04-04 ，类型B缺少2015-04-02和2015-04-04 ）。

我对您想要的东西做了一些假设， 请澄清 ：

您希望将缺少日期的行视为具有Value = NaN 。
因此，如果滚动窗口中缺少日期，则过去2天的滚动总和应该返回NaN 。
您要计算Type A和Type B 每个组中的滚动总和

如果我假设正确，

创建样本数据集

import pandas as pd
import numpy as np
import io

datastring = io.StringIO(
"""
Attribute,DateEvent,Value
Type A,2017-04-02,1
Type A,2017-04-03,2
Type A,2017-04-04,3
Type A,2017-04-05,4
Type B,2017-04-02,1
Type B,2017-04-03,2
Type B,2017-04-04,3
Type B,2017-04-05,4
""")

s = pd.read_csv(
            datastring, 
            index_col=['Attribute', 'DateEvent'],
            parse_dates=True)
print(s)

这是它的样子。 Type A和Type B都缺少2017-04-01 。

                      Value
Attribute DateEvent        
Type A    2017-04-02      1
          2017-04-03      2
          2017-04-04      3
          2017-04-05      4
Type B    2017-04-02      1
          2017-04-03      2
          2017-04-04      3
          2017-04-05      4

解

根据，您必须重建索引，然后重新索引Series以得到一个包含所有日期的索引。

# reconstruct index with all the dates
dates = pd.date_range("2017-04-01","2017-04-05", freq="1D")
attributes = ["Type A", "Type B"]
# create a new MultiIndex
index = pd.MultiIndex.from_product([attributes,dates], 
        names=["Attribute","DateEvent"])
# reindex the series
sNew = s.reindex(index)

添加缺少的日期， Value = NaN 。

                      Value
Attribute DateEvent        
Type A    2017-04-01    NaN
          2017-04-02    1.0
          2017-04-03    2.0
          2017-04-04    3.0
          2017-04-05    4.0
Type B    2017-04-01    NaN
          2017-04-02    1.0
          2017-04-03    2.0
          2017-04-04    3.0
          2017-04-05    4.0

现在，按“ Attribute索引列对Series进行分组，并使用sum()应用大小为2的滚动窗口

# group the series by the `Attribute` column
grouped = sNew.groupby(level="Attribute")
# Apply a 2 day rolling window
summed = grouped.rolling(2).sum()

最终输出

                                Value
Attribute Attribute DateEvent        
Type A    Type A    2017-04-01    NaN
                    2017-04-02    NaN
                    2017-04-03    3.0
                    2017-04-04    5.0
                    2017-04-05    7.0
Type B    Type B    2017-04-01    NaN
                    2017-04-02    NaN
                    2017-04-03    3.0
                    2017-04-04    5.0
                    2017-04-05    7.0

最后说明：不知道为什么现在有两个“ Attribute索引”列，如果有人知道，请让我知道。

编辑：原来类似的问题在被问到。 看看这个。

来源：

缺少日期的Pandas Date MultiIndex-滚动总和

问题描述

1楼

创建样本数据集

解

最终输出