当前位置: 代码迷 >> 综合 >> MxNet base.h:459: Check failed: e == cudaSuccess (30 vs. 0) : CUDA: unknown error nvidia-smi显卡ERR!
  详细解决方案

MxNet base.h:459: Check failed: e == cudaSuccess (30 vs. 0) : CUDA: unknown error nvidia-smi显卡ERR!

热度:40   发布时间:2023-12-15 16:38:02.0

MxNet报错:

Traceback (most recent call last):File "/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/symbol/symbol.py", line 1776, in simple_bindctypes.byref(exe_handle)))File "/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/base.py", line 255, in check_callraise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [11:24:40] include/mxnet/base.h:459: Check failed: e == cudaSuccess (30 vs. 0) :  CUDA: unknown error
Stack trace:[bt] (0) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x6b41eb) [0x7fcc1ae1a1eb][bt] (1) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x3715455) [0x7fcc1de7b455][bt] (2) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x40c0ae5) [0x7fcc1e826ae5][bt] (3) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0xbb9d0d) [0x7fcc1b31fd0d][bt] (4) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x373d40f) [0x7fcc1dea340f][bt] (5) /home/user1/miniconda3/lib/python3.7/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7fccc14eaec0][bt] (6) /home/user1/miniconda3/lib/python3.7/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7fccc14ea87d][bt] (7) /home/user1/miniconda3/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7fccc16ffede][bt] (8) /home/user1/miniconda3/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(+0x12914) [0x7fccc1700914]During handling of the above exception, another exception occurred:Traceback (most recent call last):File "train_0723.py", line 455, in <module>main()File "train_0723.py", line 451, in maintrain_net(args)File "train_0723.py", line 445, in train_netepoch_end_callback=epoch_cb)File "/home/user1/recognition/parall_module_local_v1_gluon_group.py", line 535, in fitfor_training=True, force_rebind=force_rebind)File "/home/user1/recognition/parall_module_local_v1_gluon_group.py", line 237, in bindforce_rebind=False, shared_module=None)File "/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/module/module.py", line 429, in bindstate_names=self._state_names)File "/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/module/executor_group.py", line 280, in __init__self.bind_exec(data_shapes, label_shapes, shared_group)File "/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/module/executor_group.py", line 376, in bind_execshared_group))File "/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/module/executor_group.py", line 670, in _bind_ith_execshared_buffer=shared_data_arrays, **input_shapes)File "/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/symbol/symbol.py", line 1782, in simple_bindraise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
data: (100, 3, 112, 112)
softmax_label: (100,)
[11:24:40] include/mxnet/base.h:459: Check failed: e == cudaSuccess (30 vs. 0) :  CUDA: unknown error
Stack trace:[bt] (0) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x6b41eb) [0x7fcc1ae1a1eb][bt] (1) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x3715455) [0x7fcc1de7b455][bt] (2) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x40c0ae5) [0x7fcc1e826ae5][bt] (3) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0xbb9d0d) [0x7fcc1b31fd0d][bt] (4) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x373d40f) [0x7fcc1dea340f][bt] (5) /home/user1/miniconda3/lib/python3.7/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7fccc14eaec0][bt] (6) /home/user1/miniconda3/lib/python3.7/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7fccc14ea87d][bt] (7) /home/user1/miniconda3/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7fccc16ffede][bt] (8) /home/user1/miniconda3/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(+0x12914) [0x7fccc1700914]

nvidia-smi命令输出缓慢,并且报ERR
在这里插入图片描述
依次尝试以下办法解决:
1,重启机器(试过,管用)
2,重装驱动
3,维修或换显卡 (试过,管用)

显卡型号:titan Xp 12GB 多卡

其他问题:
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

  相关解决方案