当前位置: 代码迷 >> python >> 如何使多个if语句在python中运行得更快
  详细解决方案

如何使多个if语句在python中运行得更快

热度:52   发布时间:2023-07-14 09:52:53.0

我有以下熊猫数据框

 Code      Sum      Quantity
 0         -12      0
 1          23      0
 2         -10      0
 3         -12      0
 4         100      0
 5         102      201
 6          34      0
 7         -34      0
 8         -23      0
 9         100      0
 10        100      0
 11        102      300
 12        -23       0
 13        -25       0
 14        100      123
 15        167      167  

我想要的数据框是

Code      Sum      Quantity    new_sum
0         -12      0          -12
1          23      0           23
2         -10      0          -10
3         -12      0          -12
4         100      0           0
5         102      201         202 
6          34      0           34
7         -34      0          -34
8         -23      0          -23
9         100      0           0
10        100      0           0
11        102      300         302
12        -23       0          -23
13        -25       0          -25
14        100      123         100 
15        167      167         167

逻辑是:

首先,我将检查数量列中的非零值。 在上面的示例数据中,我们在索引 4 处获得了数量的第一个非零出现,即 201。然后我想添加列总和,直到我在行中获得负值。

我写了一个代码,它使用嵌套的if语句。但是,由于多个 if 和 row wise 比较,执行代码需要很多时间。

current_stock = 0
for i in range(len(test)):
    if(test['Quantity'][i] != 0):
        current_stock = test['Sum'][i]
        if(test['Sum'][i-1] > 0):
            current_stock = current_stock + test['Sum'][i-1]
            test['new_sum'][i-1] = 0
            if(test['Sum'][i-2] > 0):
                current_stock = current_stock + test['Sum'][i-2]
                test['new_sum'][i-2] = 0
                if(test['Sum'][i-3] > 0):
                    current_stock = current_stock + test['Sum'][i-3]
                    test['new_sum'][i-3] = 0
                else:
                    test['new_sum'][i] = current_stock
            else:
                test['new_sum'][i] = current_stock
        else:
            test['new_sum'][i] = current_stock
    else:
        test['new_sum'][i] =  test['Sum'][i]

有没有更好的方法来做到这一点?

让我们看一下三种解决方案,并在最后提供性能比较。

一种试图接近熊猫的方法如下:

def f1(df):
    # Group together the elements of df.Sum that might have to be added
    pos_groups = (df.Sum <= 0).cumsum()
    pos_groups[df.Sum <= 0] = -1
    # Create the new column and populate it with what is in df.Sum
    df['new_sum'] = df.Sum
    # Find the indices of the new column that need to be calculated as a sum
    indices = df[df.Quantity > 0].index
    for i in indices:
        # Find the relevant group of positive integers to be summed, ensuring
        # that we only consider those that come /before/ the one to be calculated
        group = pos_groups[:i+1] == pos_groups[i]
        # Zero out all the elements that will be part of the sum
        df.new_sum[:i+1][group] = 0
        # Calculate the actual sum and store that
        df.new_sum[i] = df.Sum[:i+1][group].sum()

f1(df)

一个可能有改进空间的地方是pos_groups[:i+1] == pos_groups[i]它检查所有i+1元素,根据您的数据的样子,它可能会检查分数那些。 然而,这在实践中可能仍然更有效。 如果没有,您可能需要手动迭代以查找组:

def f2(sums, quantities):
    new_sums = np.copy(sums)
    indices = np.where(quantities > 0)[0]
    for i in indices:
        a = i
        while sums[a] > 0:
            s = new_sums[a]
            new_sums[a] = 0
            new_sums[i] += s
            a -= 1
    return new_sums

df['new_sum'] = f2(df.Sum.values, df.Quantity.values)

最后,再次取决于您的数据是什么样的,使用可以改进后一种方法的可能性:

from numba import jit
f3 = jit(f2)
df['new_sum'] = f3(df.Sum.values, df.Quantity.values)

对于问题中提供的数据(可能太小而无法提供正确的图片),性能测试如下所示:

In [13]: %timeit f1(df)
5.32 ms ± 77.7 ?s per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [14]: %timeit df['new_sum'] = f2(df.Sum.values, df.Quantity.values)
190 ?s ± 5.23 ?s per loop (mean ± std. dev. of 7 runs, 10000 loops each

In [18]: %timeit df['new_sum'] = f3(df.Sum.values, df.Quantity.values)
178 ?s ± 10.1 ?s per loop (mean ± std. dev. of 7 runs, 10000 loops each)

在这里,大部分时间都花在更新数据框上。 如果数据大 1000 倍,Numba 解决方案最终将成为明显的赢家:

In [28]: df_large = pd.concat([df]*1000).reset_index()

In [29]: %timeit f1(df_large)
5.82 s ± 63.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [30]: %timeit df_large['new_sum'] = f2(df_large.Sum.values, df_large.Quantity.values)
6.27 ms ± 146 ?s per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [31]: %timeit df_large['new_sum'] = f3(df_large.Sum.values, df_large.Quantity.values)
215 ?s ± 5.76 ?s per loop (mean ± std. dev. of 7 runs, 1000 loops each)