This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| tamiwiki:projects:egpu [2023/06/18 18:11] – [P40] yair | tamiwiki:projects:egpu [2023/11/04 11:07] (current) – [1080Ti] yair | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== EGPU ====== | ====== EGPU ====== | ||
| - | https:// | + | {{ : |
| + | |||
| + | we are using the [[https:// | ||
| + | |||
| + | Linux Kernal notes > https:// | ||
| + | [[https:// | ||
| + | |||
| + | |||
| + | === ThunderBolt check and setup === | ||
| TLDR | TLDR | ||
| - upgrade kernel (??) | - upgrade kernel (??) | ||
| Line 46: | Line 55: | ||
| <code bash> | <code bash> | ||
| (base) user@eight: | (base) user@eight: | ||
| - | 1 | ||
| - | (base) user@eight: | ||
| - | == / | ||
| - | modalias : pci: | ||
| - | vendor | ||
| - | model : GF106GL [Quadro 2000] | ||
| - | manual_install: | ||
| - | driver | ||
| - | driver | ||
| - | |||
| </ | </ | ||
| - | just an old card... | ||
| - | but EEK | ||
| - | <code bash> | + | ==== 1080Ti |
| - | (base) user@eight: | + | |
| - | 08:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge DD 2018] (rev 06) | + | |
| - | 09:00.0 VGA compatible controller: NVIDIA Corporation GF106GL [Quadro 2000] (rev a1) | + | |
| - | 09:00.1 Audio device: NVIDIA Corporation GF106 High Definition Audio Controller (rev a1) | + | |
| - | + | ||
| - | $sudo dmesg | + | |
| - | [ 1041.053826] nvidia: module license ' | + | |
| - | [ 1041.053831] Disabling lock debugging due to kernel taint | + | |
| - | [ 1041.484017] nvidia-nvlink: | + | |
| - | [ 1041.484032] NVRM: The NVIDIA Quadro 2000 GPU installed in this system is | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| - | [ 1041.501047] NVRM: No NVIDIA GPU found. | + | |
| - | [ 1041.521176] nvidia-nvlink: | + | |
| - | [ 1042.332830] nvidia-nvlink: | + | |
| - | [ 1042.332842] NVRM: The NVIDIA Quadro 2000 GPU installed in this system is | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| - | [ 1042.335282] NVRM: No NVIDIA GPU found. | + | |
| - | [ 1042.335835] nvidia-nvlink: | + | |
| - | </ | + | |
| - | + | ||
| - | <WRAP center round alert 33%>WE ARE TAINTED</ | + | |
| - | + | ||
| - | ==== driver | + | |
| - | + | ||
| - | we went with ubuntu selection | + | |
| - | + | ||
| - | but cute https:// | + | |
| + | {{ : | ||
| + | looks legit | ||
| <code bash> | <code bash> | ||
| - | $ sudo apt installl nvidia-headless-535 | + | $sudo dmesg -w |
| - | + | [96236.873213] | |
| - | + | ||
| - | #downgrade nvidia to quadro supported version | + | |
| - | sudo apt install nvidia-headless-390 | + | |
| - | + | ||
| - | # EEK | + | |
| - | RROR (dkms apport): kernel package linux-headers-6.3.7-060307-generic is not supported | + | |
| - | Error! Bad return status for module build on kernel: 6.3.7-060307-generic (x86_64) | + | |
| - | Consult / | + | |
| - | dpkg: error processing package | + | |
| - | | + | |
| - | dpkg: dependency problems prevent configuration of nvidia-headless-390: | + | |
| - | | + | |
| - | Package nvidia-dkms-390 | + | |
| - | + | ||
| - | dpkg: error processing package nvidia-headless-390 (--configure): | + | |
| - | | + | |
| - | Processing triggers for libc-bin (2.36-0ubuntu4) ... | + | |
| - | No apport report written because the error message indicates its a followup error from a previous failure. | + | |
| - | / | + | |
| - | ndi.so.4 is not a symbolic link | + | |
| - | + | ||
| - | Processing triggers for man-db (2.10.2-2) ... | + | |
| - | Processing triggers for initramfs-tools (0.140ubuntu17) ... | + | |
| - | update-initramfs: | + | |
| - | Errors were encountered while processing: | + | |
| - | | + | |
| - | | + | |
| + | [96236.874544] nvidia 0000: | ||
| + | [96236.874646] nvidia 0000: | ||
| + | [96236.991272] NVRM: loading NVIDIA UNIX x86_64 Kernel Module | ||
| + | [96237.009537] nvidia-modeset: | ||
| + | [96237.013346] [drm] [nvidia-drm] [GPU ID 0x00000900] Loading driver | ||
| + | [96238.239429] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000: | ||
| + | [96238.269008] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint. | ||
| + | [96238.330257] nvidia-uvm: Loaded the UVM driver, major device number 507. | ||
| + | [96238.399348] NVRM: API mismatch: the client has the version 390.157, but | ||
| + | NVRM: this kernel module has the version 535.43.02. | ||
| + | NVRM: make sure that this kernel module and all NVIDIA driver | ||
| </ | </ | ||
| - | downgrading but to headless, | + | update |
| - | without touching | + | |
| - | + | ||
| - | + | ||
| - | going with [[https:// | + | |
| - | + | ||
| - | [1] A 3D video game environment and benchmark designed from scratch for reinforcement learning research | + | |
| <code bash> | <code bash> | ||
| - | conda create | + | $ ubuntu-drivers devices |
| - | conda activate avalon | + | == / |
| + | modalias : pci: | ||
| + | vendor | ||
| + | model : GP102 [GeForce GTX 1080 Ti] | ||
| + | manual_install: | ||
| + | driver | ||
| + | driver | ||
| + | driver | ||
| + | driver | ||
| + | driver | ||
| + | driver | ||
| + | driver | ||
| + | driver | ||
| + | driver | ||
| + | driver | ||
| + | driver | ||
| + | driver | ||
| - | sudo apt install | + | $ sudo ubuntu-drivers autoinstall |
| - | #this will also install torch... | ||
| - | pip install avalon-rl[train] | ||
| - | |||
| - | python -m avalon.install_godot_binary | ||
| - | python -m avalon.common.check_install | ||
| </ | </ | ||
| - | why even bother, the quaDRO IS JUST A TEST. \\ | ||
| - | NEED TO CLEAN REMOVE THE 390 driver AND MOVE BACK TO | ||
| - | NVIDIA-CURRENT | + | ==== P40 ==== |
| + | <WRAP center round important 60%> | ||
| + | this doesnt work on our test machine | ||
| + | </ | ||
| - | ==== P40 ==== | ||
| {{ : | {{ : | ||
| - | <WRAP center round alert 60%> | + | |
| - | unlike other cards the blue led doesnt turn green on thunderbolt connection.\\ | + | the P40 needs modern motherboard that allow for '' |
| - | </WRAP> | + | |
| NVIDIA Tesla P40 24GB DDR5 GPU Accelerator Card Dual PCI-E 3.0 x16\\ | NVIDIA Tesla P40 24GB DDR5 GPU Accelerator Card Dual PCI-E 3.0 x16\\ | ||
| Line 199: | Line 152: | ||
| :( | :( | ||
| - | ==== 1080Ti ==== | ||
| - | |||
| - | {{ : | ||
| - | looks legit | ||
| - | <code bash> | ||
| - | $sudo dmesg -w | ||
| - | [96236.873213] nvidia-nvlink: | ||
| - | |||
| - | [96236.874544] nvidia 0000: | ||
| - | [96236.874646] nvidia 0000: | ||
| - | [96236.991272] NVRM: loading NVIDIA UNIX x86_64 Kernel Module | ||
| - | [96237.009537] nvidia-modeset: | ||
| - | [96237.013346] [drm] [nvidia-drm] [GPU ID 0x00000900] Loading driver | ||
| - | [96238.239429] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000: | ||
| - | [96238.269008] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint. | ||
| - | [96238.330257] nvidia-uvm: Loaded the UVM driver, major device number 507. | ||
| - | [96238.399348] NVRM: API mismatch: the client has the version 390.157, but | ||
| - | NVRM: this kernel module has the version 535.43.02. | ||
| - | NVRM: make sure that this kernel module and all NVIDIA driver | ||
| - | |||
| - | </ | ||
| - | |||
| - | update the driver to fit | ||
| - | <code bash> | ||
| - | $ ubuntu-drivers devices | ||
| - | == / | ||
| - | modalias : pci: | ||
| - | vendor | ||
| - | model : GP102 [GeForce GTX 1080 Ti] | ||
| - | manual_install: | ||
| - | driver | ||
| - | driver | ||
| - | driver | ||
| - | driver | ||
| - | driver | ||
| - | driver | ||
| - | driver | ||
| - | driver | ||
| - | driver | ||
| - | driver | ||
| - | driver | ||
| - | driver | ||
| - | |||
| - | $ sudo ubuntu-drivers autoinstall | ||
| - | 1The following additional packages will be installed: | ||
| - | libnvidia-common-535 libnvidia-compute-535: | ||
| - | libnvidia-decode-535 libnvidia-decode-535: | ||
| - | libnvidia-encode-535 libnvidia-encode-535: | ||
| - | libnvidia-extra-535 libnvidia-fbc1-535 libnvidia-fbc1-535: | ||
| - | libnvidia-gl-535 libnvidia-gl-535: | ||
| - | nvidia-settings nvidia-utils-535 screen-resolution-extra | ||
| - | xserver-xorg-video-nvidia-535 | ||
| - | The following packages will be REMOVED: | ||
| - | libnvidia-common-390 libnvidia-gl-390 | ||
| - | The following NEW packages will be installed: | ||
| - | libnvidia-common-535 libnvidia-compute-535: | ||
| - | libnvidia-decode-535 libnvidia-decode-535: | ||
| - | libnvidia-encode-535 libnvidia-encode-535: | ||
| - | libnvidia-extra-535 libnvidia-fbc1-535 libnvidia-fbc1-535: | ||
| - | libnvidia-gl-535 libnvidia-gl-535: | ||
| - | nvidia-prime nvidia-settings nvidia-utils-535 | ||
| - | screen-resolution-extra xserver-xorg-video-nvidia-535 | ||
| - | |||
| - | </ | ||
| ==== misc ==== | ==== misc ==== | ||