当前位置: 代码迷 >> 综合 >> mxnet Check failed CUDA: unknown error simple_bind
  详细解决方案

mxnet Check failed CUDA: unknown error simple_bind

热度:15   发布时间:2024-02-04 12:45:17.0

mxnet 1.6.0 运行报错:


...
cuda error:Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: unknown error....Traceback (most recent call last):
File "/home/user1/anaconda3/lib/python3.x/site-packages/mxnet/symbol/symbol.py", line 1488, in simple_bind
ctypes.byref(exe_handle)))
File "/home/user1/anaconda3/lib/python3.x/site-packages/mxnet/base.py", line 146, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))During handling of the above exception, another exception occurred:Traceback (most recent call last):
File "/home/user1/mxnetproject/new_scene.py", line 90, in
mod_score = fit(new_sym, new_args, aux_params, train, val, batch_size, num_gpus=1)
File "/home/user1/mxnetproject/new_scene.py", line 84, in fit
eval_metric='acc')
File "/home/user1/anaconda3/lib/python3.6/site-packages/mxnet/module/base_module.py", line 460, in fit
for_training=True, force_rebind=force_rebind)
File "/home/user1/anaconda3/lib/python3.6/site-packages/mxnet/module/module.py", line 428, in bind
state_names=self._state_names)
File "/home/user1/anaconda3/lib/python3.6/site-packages/mxnet/module/executor_group.py", line 237, in init
self.bind_exec(data_shapes, label_shapes, shared_group)
File "/home/use1/anaconda3/lib/python3.6/site-packages/mxnet/module/executor_group.py", line 333, in bind_exec
shared_group))
File "/home/use1/anaconda3/lib/python3.6/site-packages/mxnet/module/executor_group.py", line 611, in _bind_ith_exec
shared_buffer=shared_data_arrays, **input_shapes)
File "/home/user1/anaconda3/lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 1494, in simple_bind
raise RuntimeError(error_msg)
...

检查:使用 $ nvidia-smi检查是不是有一张显卡error了。导致了bind错误。如果有的话尝试重启,如果还是有问题,尝试只使用剩下的其他显卡。

  相关解决方案