Ox01

gdb对pytorch进行c++语言级调试的过程

编译安装PyTorch软件包

git clone https://github.com/pytorch/pytorch.git 

cd pytorch

DEBUG=1  python setup.py develop

检测是否安装成功

(pytorch_learn) fengwen@oneflow:/data/dataset/fengwen/pytorch/.fengwen$ python
Python 3.8.16 (default, Mar  2 2023, 03:21:46) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()   
True

卸载 PyTorch 软件包

pip uninstall torch && python setup.py clean

Developing PyTorch 详细介绍

> 截取自pytorch开发指导 > 原文链接地址： https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md#developing-pytorch ## Developing PyTorch Follow the instructions for [installing PyTorch from source](https://github.com/pytorch/pytorch#from-source). If you get stuck when developing PyTorch on your machine, check out the [tips and debugging](#tips-and-debugging) section below for common solutions. ### Tips and Debugging * If you want to have no-op incremental rebuilds (which are fast), see [Make no-op build fast](#make-no-op-build-fast) below. * When installing with `python setup.py develop` (in contrast to `python setup.py install`) you will symlink the Python files from the current local source-tree into the Python install. This way you do not need to repeatedly install after modifying Python files (`.py`). However, you would need to reinstall if you modify Python interface (`.pyi`, `.pyi.in`) or non-Python files (`.cpp`, `.cc`, `.cu`, `.h`, ...). To reinstall, first uninstall all existing PyTorch installs. You may need to run `pip uninstall torch` multiple times. You'll know `torch` is fully uninstalled when you see `WARNING: Skipping torch as it is not installed`. (You should only have to `pip uninstall` a few times, but you can always `uninstall` with `timeout` or in a loop if you're feeling lazy.)

conda uninstall pytorch -y
yes | pip uninstall torch

Next run `python setup.py clean`. After that, you can install in `develop` mode again. * If a commit is simple and doesn't affect any code (keep in mind that some docstrings contain code that is used in tests), you can add `[skip ci]` (case sensitive) somewhere in your commit message to [skip all build / test steps](https://github.blog/changelog/2021-02-08-github-actions-skip-pull-request-and-push-workflows-with-skip-ci/). Note that changing the pull request body or title on GitHub itself has no effect. * If you run into errors when running `python setup.py develop`, here are some debugging steps: 1. Run `printf '#include \nint main() { printf("Hello World");}'|clang -x c -; ./a.out` to make sure your CMake works and can compile this simple Hello World program without errors. 2. Nuke your `build` directory. The `setup.py` script compiles binaries into the `build` folder and caches many details along the way, which saves time the next time you build. If you're running into issues, you can always `rm -rf build` from the toplevel `pytorch` directory and start over. 3. If you have made edits to the PyTorch repo, commit any change you'd like to keep and clean the repo with the following commands (note that clean _really_ removes all untracked files and changes.):

git submodule deinit -f .
git clean -xdf
python setup.py clean
git submodule update --init --recursive # very important to sync the submodules
python setup.py develop                 # then try running the command again

4. The main step within `python setup.py develop` is running `make` from the `build` directory. If you want to experiment with some environment variables, you can pass them into the command:

ENV_KEY1=ENV_VAL1[, ENV_KEY2=ENV_VAL2]* python setup.py develop

* If you run into issue running `git submodule update --init --recursive`. Please try the following: - If you encounter an error such as

error: Submodule 'third_party/pybind11' could not be updated

check whether your Git local or global config file contains any `submodule.*` settings. If yes, remove them and try again. (please reference [this doc](https://git-scm.com/docs/git-config#Documentation/git-config.txt-submoduleltnamegturl) for more info). - If you encounter an error such as

fatal: unable to access 'https://github.com/pybind11/pybind11.git': could not load PEM client certificate ...

this is likely that you are using HTTP proxying and the certificate expired. To check if the certificate is valid, run `git config --global --list` and search for config like `http.proxysslcert=`. Then check certificate valid date by running

openssl x509 -noout -in <cert_file> -dates

- If you encounter an error that some third_party modules are not checked out correctly, such as

Could not find .../pytorch/third_party/pybind11/CMakeLists.txt

remove any `submodule.*` settings in your local git config (`.git/config` of your pytorch repo) and try again. * If you're a Windows contributor, please check out [Best Practices](https://github.com/pytorch/pytorch/wiki/Best-Practices-to-Edit-and-Compile-Pytorch-Source-Code-On-Windows). * For help with any part of the contributing process, please don’t hesitate to utilize our Zoom office hours! See details [here](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours)

gdb demo

test.py

# aten/src/ATen/native/Scalar.cpp 
import torch 
x = torch.tensor([1]).cuda() 
y = x.item()

demo

(pytorch_learn) fengwen@oneflow-25:/data/dataset/fengwen/pytorch/.fengwen$ gdb --args python test.py 
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...
(gdb) b aten/src/ATen/native/Scalar.cpp:19
No source file named aten/src/ATen/native/Scalar.cpp.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (aten/src/ATen/native/Scalar.cpp:19) pending.
(gdb) r
Starting program: /data/home/fengwen/miniconda3/envs/pytorch_learn/bin/python test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 468387]
....
Thread 1 "python" hit Breakpoint 1, at::native::item (self=...)
    at /data/dataset/fengwen/pytorch/aten/src/ATen/native/Scalar.cpp:19
warning: Source file is more recent than executable.
19        auto numel = self.sym_numel();
(gdb) l
14      namespace at {
15      namespace native {
16        
17
18      Scalar item(const Tensor& self) {
19        auto numel = self.sym_numel();
20        TORCH_CHECK(numel == 1, "a Tensor with ", numel, " elements cannot be converted to Scalar");
21        if (self.is_sparse()) {
22          if (self._nnz() == 0) return Scalar(0);
23          if (self.is_coalesced()) return at::_local_scalar_dense(self._values());