当前位置: 代码迷 >> 综合 >> cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:88
  详细解决方案

cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:88

热度:102   发布时间:2023-11-21 16:57:33.0

赞赏码 & 联系方式 & 个人闲话

我在执行torch.load时遇到该报错,感到非常奇怪。因为我只有一块GPU,怎么会出现无效的设备序号呢?

查找__init__.py报错源码,发现self.idx的值确实为1,然而应该为0。仔细看load函数的注释会发现已经写明解决方法。

"""Loads an object saved with :func:`torch.save` from a file.torch.load uses Python's unpickling facilities but treats storages,which underlie tensors, specially. They are first deserialized on theCPU and are then moved to the device they were saved from. If this fails(e.g. because the run time system doesn't have certain devices), an exceptionis raised. However, storages can be dynamically remapped to an alternativeset of devices using the map_location argument.If map_location is a callable, it will be called once for each serializedstorage with two arguments: storage and location. The storage argumentwill be the initial deserialization of the storage, residing on the CPU.Each serialized storage has a location tag associated with it whichidentifies the device it was saved from, and this tag is the secondargument passed to map_location. The builtin location tags are 'cpu' forCPU tensors and 'cuda:device_id' (e.g. 'cuda:2') for CUDA tensors.map_location should return either None or a storage. If map_location returnsa storage, it will be used as the final deserialized object, already moved tothe right device. Otherwise, torch.load will fall back to the default behavior,as if map_location wasn't specified.If map_location is a dict, it will be used to remap location tagsappearing in the file (keys), to ones that specify where to put thestorages (values).User extensions can register their own location tags and tagging anddeserialization methods using register_package.Args:f: a file-like object (has to implement fileno that returns a filedescriptor, and must implement seek), or a string containing a filenamemap_location: a function or a dict specifying how to remap storagelocationspickle_module: module used for unpickling metadata and objects (has tomatch the pickle_module used to serialize file)Example:>>> torch.load('tensors.pt')# Load all tensors onto the CPU>>> torch.load('tensors.pt', map_location=lambda storage, loc: storage)# Load all tensors onto GPU 1>>> torch.load('tensors.pt', map_location=lambda storage, loc: storage.cuda(1))# Map tensors from GPU 1 to GPU 0>>> torch.load('tensors.pt', map_location={'cuda:1':'cuda:0'})"""

最后的例子比较重要哈,如果我们想把GPU1修正到GPU0需要使用例子中的这句话(修改tensors.pt为自己的文件哈)

torch.load('tensors.pt', map_location={'cuda:1':'cuda:0'})

将GPU1映射到GPU0,问题成功解决。

  相关解决方案