问题描述
我在熊猫中有以下数据框
prod S X
a 10 123
b 20 150
b 30 140
a 40 100
Formula for product a and b is as follows
a = IF(S>X, (0.6/100(S-X)),0)
b = IF(S>X, (0.2/100(S-X)),0)
如何根据现有数据框中产品 a 和 b 的公式计算new column
1楼
您可以使用np.where
然后使用np.select
。
数据来自@AnnaIliukovich-Strakovskaia。
a = np.where(df['S'] > df['X'], 0.6/100*(df['S'] - df['X']), 0)
b = np.where(df['S'] > df['X'], 0.2/100*(df['S'] - df['X']), 0)
df['result'] = np.select([df['prod'].eq('a'), df['prod'].eq('b')], [a, b], np.nan)
print(df)
prod S X result
0 a 10 123 0.00
1 b 20 150 0.00
2 b 30 140 0.00
3 a 140 100 0.24
2楼
您可以将apply
与您定义的函数一起使用。
数据:
df = pd.DataFrame({'prod':['a','b','b','a'] ,
'S':[10,20,30,140],
'X':[123,150,140,100]})
S X prod
0 10 123 a
1 20 150 b
2 30 140 b
3 140 100 a
功能:
def func(df):
result = 0
if df.S > df.X:
if df['prod'] == 'a':
result = 0.6/100*(df.S-df.X)
if df['prod'] == 'b':
result = 0.2/100*(df.S-df.X)
return result
用它:
df.join(df.apply(func, axis=1).rename('col'))
结果:
S X prod col
0 10 123 a 0.00
1 20 150 b 0.00
2 30 140 b 0.00
3 140 100 a 0.24
3楼
如果您正在寻找速度,如果不那么可读,这会更快。 where 和 if 是通过布尔索引完成的。
设置
import pandas as pd
import numpy as np
df = pd.DataFrame({'prod':['a','b','b','a'] ,
'S':[10,20,30,140],
'X':[123,150,140,100]})
print(df)
prod S X
0 a 10 123
1 b 20 150
2 b 30 140
3 a 140 100
代码
# make an array to hold results
results = np.zeros(len(df))
# make arrays from df values
SX_vals = df[['S', 'X']].values
prod = df['prod'].values
# product multiplier dictionary
prod_dict = {'a': .006, 'b': .002}
# make array of S - X
sub_result = np.subtract(SX_vals[:,0], SX_vals[:,1])
# make boolean mask of subtraction results are positive
s_bigger = (sub_result > 0)
# loop through products (keys) of prod_dict
for key in prod_dict.keys():
# mask where (S-X) > 0 and prod == key
mask = s_bigger & (prod == key)
# multiply and insert into result array
results[mask] = sub_result[mask] * prod_dict[key]
# assign result array to dataframe
df['result'] = results
结果
print(df)
prod S X result
0 a 10 123 0.00
1 b 20 150 0.00
2 b 30 140 0.00
3 a 140 100 0.24
4楼
pandas.Series.map
和pandas.Series.where
d = {'a': .6, 'b': .2}
df.assign(
result=df['prod'].map(d).mul(df.S - df.X).where(df.S > df.X, 0) / 100
)
prod S X result
0 a 10 123 0.00
1 b 20 150 0.00
2 b 30 140 0.00
3 a 140 100 0.24