1.采样(随机的将数据分成30%和70%)
esproc
A | |
1 | =now() |
2 | =file("C:\\Users\\Sean\\Desktop\\esproc_vs_python\\EMPLOYEE.txt") |
3 | =A2.import@t() |
4 | =A3.sort(rand())(to(A3.len()*0.3)) |
5 | =A3\A4 |
6 | =interval@ms(A1,now()) |
A4:A.sort(x)按照x对A进行排序,并取长度的30%
A5:差集得到剩下的70%
python:
import time
import pandas as pd
import datetime
import numpy as np
import random
s = time.time()
data = pd.read_csv("C:/Users/Sean/Desktop/esproc_vs_python/EMPLOYEE_nan.txt",sep="\t")
row_no = pd.Series(range(data.shape[0]))
per_30_no = row_no.sample(frac=0.3)
per_70_no = row_no[~row_no.isin(per_30_no)]
data_per_30 = data.iloc[per_30_no,:]
data_per_70 = data.iloc[per_70_no,:]
print(data_per_30)
print(data_per_70)
e = time.time()
print(e-s)
pd.Series()得到所有行的行号
Series.sample()进行抽样,~表示逻辑非。最后通过iloc[]切片截取数据。
结果:
esproc
python
耗时 | |
esproc | 0.006 |
python | 0.067 |
2.数字的字段不变,其他字段转换为数字
esproc
A | B | |
1 | =now() | |
2 | =file("C:\\Users\\Sean\\Desktop\\esproc_vs_python\\EMPLOYEE.txt") | |
3 | =A2.import@t() | |
4 | = A3(1).array().pselect@a(!ifnumber(~)) | |
5 |