修剪numpy数组中的部分值_python

我只想要数组中每个值的前 10 个字符。

这是数组：

array(['2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
   '2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
   '2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
   '2018-06-30T00:00:00.000000000', '2018-09-30T00:00:00.000000000']

我想编写代码来给我这个：

array(['2018-06-30','2018-06-30'   .... etc

这是更新：我的代码是：

x = np.array(df4['per_end_date'])
x

输出是：

array(['2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
   '2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
   '2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
   '2018-06-30T00:00:00.000000000', '2018-09-30T00:00:00.000000000',
   '2018-09-30T00:00:00.000000000', '2018-09-30T00:00:00.000000000', etc

我只想要数组中每个值的前 10 个字符。 以下代码给我错误 IndexError: invalid index to scalar variable。

x = np.array([y[:9] for y in x])

尽管numpy并不总是操作字符串的最佳方式，但您可以向量化此操作，并且一如既往，应优先使用向量化函数而不是迭代。

设置

arr = np.array(['2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
   '2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
   '2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
   '2018-06-30T00:00:00.000000000', '2018-09-30T00:00:00.000000000'],
  dtype='<U29')

使用np.frombuffer

np.frombuffer(
    arr.view((str, 1 )).reshape(arr.shape[0], -1)[:, :10].tostring(),
    dtype=(str,10)
)

array(['2018-06-30', '2018-06-30', '2018-06-30', '2018-06-30',
       '2018-06-30', '2018-06-30', '2018-06-30', '2018-09-30'],
      dtype='<U10')

时间安排

arr = np.repeat(arr, 10000)

%timeit np.array([y[:10] for y in arr])
48.6 ms ± 961 ?s per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
np.frombuffer(
    arr.view((str, 1 )).reshape(arr.shape[0], -1)[:, :10].tostring(),
    dtype=(str,10)
)

6.87 ms ± 311 ?s per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit np.array(arr,dtype= 'datetime64[D]')
44.9 ms ± 2.93 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

在 python 中处理列表是一项非常基本的任务

import numpy
x = numpy.array(['2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
           '2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
           '2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
           '2018-06-30T00:00:00.000000000', '2018-09-30T00:00:00.000000000'])
numpy.array([y[:10] for y in x])
# array(['2018-06-30', '2018-06-30', '2018-06-30', '2018-06-30',
#        '2018-06-30', '2018-09-30'], 
#        dtype='|S10')

有关更多信息，您应该阅读一些关于的文档。

好吧，我想通了。

df4['per_end_date'].dtype

输出：

dtype('<M8[ns]')

因此，以下代码完美运行。

x = np.array(df4['per_end_date'],dtype= 'datetime64[D]')
x

输出：

array(['2018-06-30', '2018-06-30', '2018-06-30', '2018-06-30',
   '2018-06-30', '2018-06-30', '2018-06-30', '2018-09-30',
   '2018-09-30', '2018-09-30', '2018-09-30', '2018-09-30',
   '2018-09-30', '2018-09-30', '2018-09-30', '2018-09-30', etc

当你能弄清楚的时候很棒。 :)

修剪numpy数组中的部分值

问题描述

1楼

2楼

3楼