在进行分布式进行训练,
INFO:tensorflow:Reduce to /replica:0/task:0/device:CPU:0 then broadcast to ('/replica:0/task:0/device:CPU:0',).
I0408 04:01:41.507015 140706188736256 cross_device_ops.py:427] Reduce to /replica:0/task:0/device:CPU:0 then broadcast to ('/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Create CheckpointSaverHook.
I0408 04:01:44.424420 140706188736256 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
terminate called after throwing an instance of 'std::length_error'
what(): basic_string::append
Fatal Python error: Aborted
饶了一大圈排查,通过减少gpu数量,可正常运行了