User Tools

Site Tools


tamiwiki:projects:egpu

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
tamiwiki:projects:egpu [2023/06/18 18:11] – [P40] yairtamiwiki:projects:egpu [2023/11/04 11:07] (current) – [1080Ti] yair
Line 1: Line 1:
 ====== EGPU ====== ====== EGPU ======
-https://docs.kernel.org/admin-guide/thunderbolt.html+{{ :tamiwiki:projects:pasted:20230618-183833.png}} 
 + 
 +we are using the [[https://egpu.io/best-egpu-buyers-guide/|TH3P4G3 eGPU external thunderbolt]] thing.\\ 
 + 
 +Linux Kernal notes > https://docs.kernel.org/admin-guide/thunderbolt.html\\ 
 +[[https://realtechtalk.com/Nvidia_Tesla_GPUs_K40K80M40P40P100V100_at_homedesktop_hacking_cooling_powering_cable_solutions_Tutorial_AIO_Solutions-2465-articles|realtechtalk guide]], [[https://archive.is/Kgj7E|mirror]] 
 + 
 + 
 +=== ThunderBolt check and setup === 
 TLDR TLDR
   - upgrade kernel (??)   - upgrade kernel (??)
Line 46: Line 55:
 <code bash> <code bash>
 (base) user@eight:~$ echo 1 | sudo tee /sys/bus/thunderbolt/devices/0-1/authorized (base) user@eight:~$ echo 1 | sudo tee /sys/bus/thunderbolt/devices/0-1/authorized
-1 
-(base) user@eight:~$ sudo ubuntu-drivers devices 
-== /sys/devices/pci0000:00/0000:00:1c.4/0000:04:00.0/0000:05:01.0/0000:07:00.0/0000:08:01.0/0000:09:00.0 == 
-modalias : pci:v000010DEd00000DD8sv000010DEsd0000084Abc03sc00i00 
-vendor   : NVIDIA Corporation 
-model    : GF106GL [Quadro 2000] 
-manual_install: True 
-driver   : nvidia-driver-390 - distro non-free recommended 
-driver   : xserver-xorg-video-nouveau - distro free builtin 
- 
 </code> </code>
  
-just an old card... 
  
-but EEK  
  
  
-<code bash> +==== 1080Ti ====
-(base) user@eight:~$ lspci | tail +
-08:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge DD 2018] (rev 06) +
-09:00.0 VGA compatible controller: NVIDIA Corporation GF106GL [Quadro 2000] (rev a1) +
-09:00.1 Audio device: NVIDIA Corporation GF106 High Definition Audio Controller (rev a1) +
- +
-$sudo dmesg +
-[ 1041.053826] nvidia: module license 'NVIDIA' taints kernel. +
-[ 1041.053831] Disabling lock debugging due to kernel taint +
-[ 1041.484017] nvidia-nvlink: Nvlink Core is being initialized, major device number 509 +
-[ 1041.484032] NVRM: The NVIDIA Quadro 2000 GPU installed in this system is +
-               NVRM:  supported through the NVIDIA 390.xx Legacy drivers. Please +
-               NVRM:  visit http://www.nvidia.com/object/unix.html for more +
-               NVRM:  information.  The 535.43.02 NVIDIA driver will ignore +
-               NVRM:  this GPU.  Continuing probe... +
-[ 1041.501047] NVRM: No NVIDIA GPU found. +
-[ 1041.521176] nvidia-nvlink: Unregistered Nvlink Core, major device number 509 +
-[ 1042.332830] nvidia-nvlink: Nvlink Core is being initialized, major device number 509 +
-[ 1042.332842] NVRM: The NVIDIA Quadro 2000 GPU installed in this system is +
-               NVRM:  supported through the NVIDIA 390.xx Legacy drivers. Please +
-               NVRM:  visit http://www.nvidia.com/object/unix.html for more +
-               NVRM:  information.  The 535.43.02 NVIDIA driver will ignore +
-               NVRM:  this GPU.  Continuing probe... +
-[ 1042.335282] NVRM: No NVIDIA GPU found. +
-[ 1042.335835] nvidia-nvlink: Unregistered Nvlink Core, major device number 509 +
-</code> +
- +
-<WRAP center round alert 33%>WE ARE TAINTED</WRAP> +
- +
-==== driver ==== +
- +
-we went with ubuntu selection  +
- +
-but cute https://www.nvidia.com/en-us/drivers/unix/+
  
 +{{ :tamiwiki:projects:pxl_20230616_212235630.jpg?400|}}
 +looks legit
 <code bash> <code bash>
-$ sudo apt installl nvidia-headless-535 +$sudo dmesg -w 
- +[96236.873213] nvidia-nvlinkNvlink Core is being initialized, major device number 509
- +
-#downgrade nvidia to quadro supported version +
-sudo apt install nvidia-headless-390 +
- +
-# EEK +
-RROR (dkms apport): kernel package linux-headers-6.3.7-060307-generic is not supported +
-Error! Bad return status for module build on kernel: 6.3.7-060307-generic (x86_64) +
-Consult /var/lib/dkms/nvidia/390.157/build/make.log for more information. +
-dpkg: error processing package nvidia-dkms-390 (--configure): +
- installed nvidia-dkms-390 package post-installation script subprocess returned error exit status 10 +
-dpkg: dependency problems prevent configuration of nvidia-headless-390: +
- nvidia-headless-390 depends on nvidia-dkms-390; however: +
-  Package nvidia-dkms-390 is not configured yet. +
- +
-dpkg: error processing package nvidia-headless-390 (--configure): +
- dependency problems - leaving unconfigured +
-Processing triggers for libc-bin (2.36-0ubuntu4) ... +
-No apport report written because the error message indicates its a followup error from a previous failure. +
-                                                                                                          /sbin/ldconfig.real: /lib/lib +
-ndi.so.4 is not a symbolic link +
- +
-Processing triggers for man-db (2.10.2-2) ... +
-Processing triggers for initramfs-tools (0.140ubuntu17) ... +
-update-initramfs: Generating /boot/initrd.img-6.3.7-060307-generic +
-Errors were encountered while processing: +
- nvidia-dkms-390 +
- nvidia-headless-390+
  
 +[96236.874544] nvidia 0000:09:00.0: enabling device (0006 -> 0007)
 +[96236.874646] nvidia 0000:09:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
 +[96236.991272] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.43.02  Mon May 22 20:46:13 UTC 2023
 +[96237.009537] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  535.43.02  Mon May 22 20:25:24 UTC 2023
 +[96237.013346] [drm] [nvidia-drm] [GPU ID 0x00000900] Loading driver
 +[96238.239429] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:09:00.0 on minor 1
 +[96238.269008] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
 +[96238.330257] nvidia-uvm: Loaded the UVM driver, major device number 507.
 +[96238.399348] NVRM: API mismatch: the client has the version 390.157, but
 +               NVRM: this kernel module has the version 535.43.02.  Please
 +               NVRM: make sure that this kernel module and all NVIDIA driver
  
 </code> </code>
  
-downgrading but to headless,\\ +update the driver to fit
-without touching the x config? +
- +
- +
-going with [[https://github.com/Avalon-Benchmark/avalon|avalon]] readme +
- +
-[1]  A 3D video game environment and benchmark designed from scratch for reinforcement learning research  +
 <code bash> <code bash>
-conda create -n avalon python=3.9 +$ ubuntu-drivers devices 
-conda activate avalon+== /sys/devices/pci0000:00/0000:00:1c.4/0000:04:00.0/0000:05:01.0/0000:07:00.0/0000:08:01.0/0000:09:00.0 == 
 +modalias : pci:v000010DEd00001B06sv00001458sd0000377Abc03sc00i00 
 +vendor   : NVIDIA Corporation 
 +model    : GP102 [GeForce GTX 1080 Ti] 
 +manual_install: True 
 +driver   : nvidia-driver-450-server - distro non-free 
 +driver   : nvidia-driver-510 - distro non-free 
 +driver   : nvidia-driver-390 - distro non-free 
 +driver   : nvidia-driver-470 - distro non-free 
 +driver   : nvidia-driver-525-server - distro non-free 
 +driver   : nvidia-driver-525 - distro non-free 
 +driver   : nvidia-driver-535 - third-party non-free recommended 
 +driver   : nvidia-driver-515 - distro non-free 
 +driver   : nvidia-driver-515-server - distro non-free 
 +driver   : nvidia-driver-530 - distro non-free 
 +driver   : nvidia-driver-470-server - distro non-free 
 +driver   : xserver-xorg-video-nouveau - distro free builtin
  
-sudo apt install --no-install-recommends libegl-dev libglew-dev libglfw3-dev libnvidia-gl libopengl-dev libosmesa6 mesa-utils-extra+sudo ubuntu-drivers autoinstall
  
-#this will also install torch... 
-pip install avalon-rl[train]  
- 
-python -m avalon.install_godot_binary 
-python -m avalon.common.check_install 
 </code> </code>
  
-why even bother, the quaDRO IS JUST A TEST. \\ 
-NEED TO CLEAN REMOVE THE 390 driver AND MOVE BACK TO  
  
-NVIDIA-CURRENT+==== P40 ==== 
 +<WRAP center round important 60%> 
 +this doesnt work on our test machine 
 +</WRAP>
  
  
-==== P40 ==== 
 {{ :tamiwiki:projects:pxl_20230615_095016755.jpg?400|}} {{ :tamiwiki:projects:pxl_20230615_095016755.jpg?400|}}
-<WRAP center round alert 60%> + 
-unlike other cards the blue led doesnt turn green on thunderbolt connection.\\ +the P40 needs modern motherboard that allow for ''Enable Above 4G memory'' bios see [[https://github.com/JingShing/How-to-use-tesla-p40#bios-settings|link]], see [[tamiwiki:projects:P40a|P40]] page for info on dedicated machine.
-</WRAP>+
  
 NVIDIA Tesla P40 24GB DDR5 GPU Accelerator Card Dual PCI-E 3.0 x16\\ NVIDIA Tesla P40 24GB DDR5 GPU Accelerator Card Dual PCI-E 3.0 x16\\
Line 199: Line 152:
 :( :(
  
-==== 1080Ti ==== 
- 
-{{ :tamiwiki:projects:pxl_20230616_212235630.jpg?400|}} 
-looks legit 
-<code bash> 
-$sudo dmesg -w 
-[96236.873213] nvidia-nvlink: Nvlink Core is being initialized, major device number 509 
- 
-[96236.874544] nvidia 0000:09:00.0: enabling device (0006 -> 0007) 
-[96236.874646] nvidia 0000:09:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none 
-[96236.991272] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.43.02  Mon May 22 20:46:13 UTC 2023 
-[96237.009537] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  535.43.02  Mon May 22 20:25:24 UTC 2023 
-[96237.013346] [drm] [nvidia-drm] [GPU ID 0x00000900] Loading driver 
-[96238.239429] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:09:00.0 on minor 1 
-[96238.269008] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint. 
-[96238.330257] nvidia-uvm: Loaded the UVM driver, major device number 507. 
-[96238.399348] NVRM: API mismatch: the client has the version 390.157, but 
-               NVRM: this kernel module has the version 535.43.02.  Please 
-               NVRM: make sure that this kernel module and all NVIDIA driver 
- 
-</code> 
- 
-update the driver to fit 
-<code bash> 
-$ ubuntu-drivers devices 
-== /sys/devices/pci0000:00/0000:00:1c.4/0000:04:00.0/0000:05:01.0/0000:07:00.0/0000:08:01.0/0000:09:00.0 == 
-modalias : pci:v000010DEd00001B06sv00001458sd0000377Abc03sc00i00 
-vendor   : NVIDIA Corporation 
-model    : GP102 [GeForce GTX 1080 Ti] 
-manual_install: True 
-driver   : nvidia-driver-450-server - distro non-free 
-driver   : nvidia-driver-510 - distro non-free 
-driver   : nvidia-driver-390 - distro non-free 
-driver   : nvidia-driver-470 - distro non-free 
-driver   : nvidia-driver-525-server - distro non-free 
-driver   : nvidia-driver-525 - distro non-free 
-driver   : nvidia-driver-535 - third-party non-free recommended 
-driver   : nvidia-driver-515 - distro non-free 
-driver   : nvidia-driver-515-server - distro non-free 
-driver   : nvidia-driver-530 - distro non-free 
-driver   : nvidia-driver-470-server - distro non-free 
-driver   : xserver-xorg-video-nouveau - distro free builtin 
- 
-$ sudo ubuntu-drivers autoinstall 
-1The following additional packages will be installed: 
-  libnvidia-common-535 libnvidia-compute-535:i386 
-  libnvidia-decode-535 libnvidia-decode-535:i386 
-  libnvidia-encode-535 libnvidia-encode-535:i386 
-  libnvidia-extra-535 libnvidia-fbc1-535 libnvidia-fbc1-535:i386 
-  libnvidia-gl-535 libnvidia-gl-535:i386 nvidia-prime 
-  nvidia-settings nvidia-utils-535 screen-resolution-extra 
-  xserver-xorg-video-nvidia-535 
-The following packages will be REMOVED: 
-  libnvidia-common-390 libnvidia-gl-390 
-The following NEW packages will be installed: 
-  libnvidia-common-535 libnvidia-compute-535:i386 
-  libnvidia-decode-535 libnvidia-decode-535:i386 
-  libnvidia-encode-535 libnvidia-encode-535:i386 
-  libnvidia-extra-535 libnvidia-fbc1-535 libnvidia-fbc1-535:i386 
-  libnvidia-gl-535 libnvidia-gl-535:i386 nvidia-driver-535 
-  nvidia-prime nvidia-settings nvidia-utils-535 
-  screen-resolution-extra xserver-xorg-video-nvidia-535 
- 
-</code> 
 ==== misc ==== ==== misc ====
  
tamiwiki/projects/egpu.1687101075.txt.gz · Last modified: 2023/06/18 18:11 by yair