User Tools

Site Tools


tamiwiki:projects:egpu

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
tamiwiki:projects:egpu [2023/06/15 02:36] – [driver] yairtamiwiki:projects:egpu [2023/11/04 11:07] (current) – [1080Ti] yair
Line 1: Line 1:
 ====== EGPU ====== ====== EGPU ======
-https://docs.kernel.org/admin-guide/thunderbolt.html+{{ :tamiwiki:projects:pasted:20230618-183833.png}} 
 + 
 +we are using the [[https://egpu.io/best-egpu-buyers-guide/|TH3P4G3 eGPU external thunderbolt]] thing.\\ 
 + 
 +Linux Kernal notes > https://docs.kernel.org/admin-guide/thunderbolt.html\\ 
 +[[https://realtechtalk.com/Nvidia_Tesla_GPUs_K40K80M40P40P100V100_at_homedesktop_hacking_cooling_powering_cable_solutions_Tutorial_AIO_Solutions-2465-articles|realtechtalk guide]], [[https://archive.is/Kgj7E|mirror]] 
 + 
 + 
 +=== ThunderBolt check and setup === 
 TLDR TLDR
   - upgrade kernel (??)   - upgrade kernel (??)
Line 46: Line 55:
 <code bash> <code bash>
 (base) user@eight:~$ echo 1 | sudo tee /sys/bus/thunderbolt/devices/0-1/authorized (base) user@eight:~$ echo 1 | sudo tee /sys/bus/thunderbolt/devices/0-1/authorized
-1 
-(base) user@eight:~$ sudo ubuntu-drivers devices 
-== /sys/devices/pci0000:00/0000:00:1c.4/0000:04:00.0/0000:05:01.0/0000:07:00.0/0000:08:01.0/0000:09:00.0 == 
-modalias : pci:v000010DEd00000DD8sv000010DEsd0000084Abc03sc00i00 
-vendor   : NVIDIA Corporation 
-model    : GF106GL [Quadro 2000] 
-manual_install: True 
-driver   : nvidia-driver-390 - distro non-free recommended 
-driver   : xserver-xorg-video-nouveau - distro free builtin 
- 
 </code> </code>
  
-just an old card... 
  
-but EEK  
  
  
 +==== 1080Ti ====
 +
 +{{ :tamiwiki:projects:pxl_20230616_212235630.jpg?400|}}
 +looks legit
 <code bash> <code bash>
-(base) user@eight:~lspci | tail +$sudo dmesg -w 
-08:04.0 PCI bridgeIntel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge DD 2018] (rev 06+[96236.873213] nvidia-nvlinkNvlink Core is being initialized, major device number 509 
-09:00.0 VGA compatible controller: NVIDIA Corporation GF106GL [Quadro 2000(rev a1) + 
-09:00.1 Audio device: NVIDIA Corporation GF106 High Definition Audio Controller (rev a1)+[96236.874544nvidia 0000:09:00.0: enabling device (0006 -> 0007
 +[96236.874646] nvidia 0000:09:00.0: vgaarb: changed VGA decodesolddecodes=io+mem,decodes=none:owns=none 
 +[96236.991272] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.43.02  Mon May 22 20:46:13 UTC 2023 
 +[96237.009537nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  535.43.02  Mon May 22 20:25:24 UTC 2023 
 +[96237.013346] [drm] [nvidia-drm] [GPU ID 0x00000900] Loading driver 
 +[96238.239429] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:09:00.0 on minor 1 
 +[96238.269008] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint. 
 +[96238.330257] nvidia-uvm: Loaded the UVM driver, major device number 507. 
 +[96238.399348] NVRMAPI mismatch: the client has the version 390.157, but 
 +               NVRM: this kernel module has the version 535.43.02.  Please 
 +               NVRM: make sure that this kernel module and all NVIDIA driver
  
-$sudo dmesg 
-[ 1041.053826] nvidia: module license 'NVIDIA' taints kernel. 
-[ 1041.053831] Disabling lock debugging due to kernel taint 
-[ 1041.484017] nvidia-nvlink: Nvlink Core is being initialized, major device number 509 
-[ 1041.484032] NVRM: The NVIDIA Quadro 2000 GPU installed in this system is 
-               NVRM:  supported through the NVIDIA 390.xx Legacy drivers. Please 
-               NVRM:  visit http://www.nvidia.com/object/unix.html for more 
-               NVRM:  information.  The 535.43.02 NVIDIA driver will ignore 
-               NVRM:  this GPU.  Continuing probe... 
-[ 1041.501047] NVRM: No NVIDIA GPU found. 
-[ 1041.521176] nvidia-nvlink: Unregistered Nvlink Core, major device number 509 
-[ 1042.332830] nvidia-nvlink: Nvlink Core is being initialized, major device number 509 
-[ 1042.332842] NVRM: The NVIDIA Quadro 2000 GPU installed in this system is 
-               NVRM:  supported through the NVIDIA 390.xx Legacy drivers. Please 
-               NVRM:  visit http://www.nvidia.com/object/unix.html for more 
-               NVRM:  information.  The 535.43.02 NVIDIA driver will ignore 
-               NVRM:  this GPU.  Continuing probe... 
-[ 1042.335282] NVRM: No NVIDIA GPU found. 
-[ 1042.335835] nvidia-nvlink: Unregistered Nvlink Core, major device number 509 
 </code> </code>
  
-<WRAP center round alert 33%>WE ARE TAINTED</WRAP> +update the driver to fit
- +
-==== driver ==== +
- +
-we went with ubuntu selection  +
- +
-but cute https://www.nvidia.com/en-us/drivers/unix/ +
 <code bash> <code bash>
-sudo apt installl nvidia-headless-535+ubuntu-drivers devices 
 +== /sys/devices/pci0000:00/0000:00:1c.4/0000:04:00.0/0000:05:01.0/0000:07:00.0/0000:08:01.0/0000:09:00.0 == 
 +modalias : pci:v000010DEd00001B06sv00001458sd0000377Abc03sc00i00 
 +vendor   : NVIDIA Corporation 
 +model    : GP102 [GeForce GTX 1080 Ti] 
 +manual_install: True 
 +driver   : nvidia-driver-450-server - distro non-free 
 +driver   : nvidia-driver-510 - distro non-free 
 +driver   : nvidia-driver-390 - distro non-free 
 +driver   : nvidia-driver-470 - distro non-free 
 +driver   : nvidia-driver-525-server - distro non-free 
 +driver   : nvidia-driver-525 - distro non-free 
 +driver   : nvidia-driver-535 - third-party non-free recommended 
 +driver   : nvidia-driver-515 - distro non-free 
 +driver   : nvidia-driver-515-server - distro non-free 
 +driver   : nvidia-driver-530 - distro non-free 
 +driver   : nvidia-driver-470-server - distro non-free 
 +driver   : xserver-xorg-video-nouveau - distro free builtin
  
 +$ sudo ubuntu-drivers autoinstall
  
-#downgrade nvidia to quadro supported version +</code>
-sudo apt install nvidia-headless-390+
  
-# EEK 
-RROR (dkms apport): kernel package linux-headers-6.3.7-060307-generic is not supported 
-Error! Bad return status for module build on kernel: 6.3.7-060307-generic (x86_64) 
-Consult /var/lib/dkms/nvidia/390.157/build/make.log for more information. 
-dpkg: error processing package nvidia-dkms-390 (--configure): 
- installed nvidia-dkms-390 package post-installation script subprocess returned error exit status 10 
-dpkg: dependency problems prevent configuration of nvidia-headless-390: 
- nvidia-headless-390 depends on nvidia-dkms-390; however: 
-  Package nvidia-dkms-390 is not configured yet. 
  
-dpkg: error processing package nvidia-headless-390 (--configure): +==== P40 ==== 
- dependency problems - leaving unconfigured +<WRAP center round important 60%> 
-Processing triggers for libc-bin (2.36-0ubuntu4) ... +this doesnt work on our test machine 
-No apport report written because the error message indicates its a followup error from a previous failure. +</WRAP>
-                                                                                                          /sbin/ldconfig.real: /lib/lib +
-ndi.so.4 is not a symbolic link+
  
-Processing triggers for man-db (2.10.2-2) ... 
-Processing triggers for initramfs-tools (0.140ubuntu17) ... 
-update-initramfs: Generating /boot/initrd.img-6.3.7-060307-generic 
-Errors were encountered while processing: 
- nvidia-dkms-390 
- nvidia-headless-390 
  
 +{{ :tamiwiki:projects:pxl_20230615_095016755.jpg?400|}}
  
-</code>+the P40 needs modern motherboard that allow for ''Enable Above 4G memory'' bios see [[https://github.com/JingShing/How-to-use-tesla-p40#bios-settings|link]], see [[tamiwiki:projects:P40a|P40]] page for info on dedicated machine.
  
-downgrading but to headless,\\ +NVIDIA Tesla P40 24GB DDR5 GPU Accelerator Card Dual PCI-E 3.0 x16\\ 
-without touching the x config?+need to retrofit with a FAN,it doesnt come with one
  
 +got one on ebay for 200$(+shipping) ([[https://archive.md/SL4Kq|ebay mirror]])\\
  
-going with [[https://github.com/Avalon-Benchmark/avalon|avalon]] readme+some dude got it working, https://github.com/JingShing/How-to-use-tesla-p40 
 +=== SPECIFICATIONS: ===
  
-[1]  A 3D video game environment and benchmark designed from scratch for reinforcement learning research +    * GPU Architecture: NVIDIA Pascal  
 +    * Single-Precision Performance 12 TeraFLOPS*  
 +    * Integer Operations (INT8) 47 TOPS* (TeraOperations per Second)  
 +    * GPU Memory 24 GB  
 +    * Memory Bandwidth 346 GB/s  
 +    * System Interface PCI Express 3.0 x16  
 +    * Form Factor 4.4” H x 10.5” L, Dual Slot, Full Height  
 +    * Max Power 250 W  
 +    * Enhanced Programmability with Page Migration Engine Yes  
 +    * ECC Protection Yes  
 +    * Server-Optimized for Data Center Deployment Yes  
 +    * Hardware-Accelerated Video Engine 1x Decode Engine, 2x Encode Engine /> 
 +    * NVPN: 699-2G610-0200-100 
 +    * NVIDIA® CUDA® cores: 3840
  
 +
 +installing 
 <code bash> <code bash>
-conda create -n avalon python=3.9 +sudo apt install nvidia-headless-535 
-conda activate avalon+</code>
  
-sudo apt install --no-install-recommends libegl-dev libglew-dev libglfw3-dev libnvidia-gl libopengl-dev libosmesa6 mesa-utils-extra+there is some issue, unlike other cards the blue led doesnt turn green on thunderbolt connection.\\ 
 +no power passing to the gPU.\\
  
-#this will also install torch... +:(
-pip install avalon-rl[train] +
  
-python -m avalon.install_godot_binary 
-python -m avalon.common.check_install 
-</code> 
 ==== misc ==== ==== misc ====
  
tamiwiki/projects/egpu.1686785773.txt.gz · Last modified: 2023/06/15 02:36 by yair