方法一:
df_class_0 = df_train[df_train['label'] == 0]
df_class_1 = df_train[df_train['label'] == 1]
df_class_1_over = df_class_1.sample(count_class_0, replace=True)
df_test_over = pd.concat([df_class_0, df_class_1_over], axis=0)
方法二:
train_1= train_initial.where(col('label')==1).sample(True, 10.0, seed = 2018)
#step 2. Merge this data with label = 0 datatrain_0=train_initial.where(col('label')==0)
train_final = train_0.union(train_1)
参考:
- stackOverflow