User Tools

Site Tools


tamiwiki:projects:egpu

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
tamiwiki:projects:egpu [2023/06/15 13:08] – [P40] yairtamiwiki:projects:egpu [2023/11/04 11:07] (current) – [1080Ti] yair
Line 1: Line 1:
 ====== EGPU ====== ====== EGPU ======
-https://docs.kernel.org/admin-guide/thunderbolt.html+{{ :tamiwiki:projects:pasted:20230618-183833.png}} 
 + 
 +we are using the [[https://egpu.io/best-egpu-buyers-guide/|TH3P4G3 eGPU external thunderbolt]] thing.\\ 
 + 
 +Linux Kernal notes > https://docs.kernel.org/admin-guide/thunderbolt.html\\ 
 +[[https://realtechtalk.com/Nvidia_Tesla_GPUs_K40K80M40P40P100V100_at_homedesktop_hacking_cooling_powering_cable_solutions_Tutorial_AIO_Solutions-2465-articles|realtechtalk guide]], [[https://archive.is/Kgj7E|mirror]] 
 + 
 + 
 +=== ThunderBolt check and setup === 
 TLDR TLDR
   - upgrade kernel (??)   - upgrade kernel (??)
Line 46: Line 55:
 <code bash> <code bash>
 (base) user@eight:~$ echo 1 | sudo tee /sys/bus/thunderbolt/devices/0-1/authorized (base) user@eight:~$ echo 1 | sudo tee /sys/bus/thunderbolt/devices/0-1/authorized
-1 
-(base) user@eight:~$ sudo ubuntu-drivers devices 
-== /sys/devices/pci0000:00/0000:00:1c.4/0000:04:00.0/0000:05:01.0/0000:07:00.0/0000:08:01.0/0000:09:00.0 == 
-modalias : pci:v000010DEd00000DD8sv000010DEsd0000084Abc03sc00i00 
-vendor   : NVIDIA Corporation 
-model    : GF106GL [Quadro 2000] 
-manual_install: True 
-driver   : nvidia-driver-390 - distro non-free recommended 
-driver   : xserver-xorg-video-nouveau - distro free builtin 
- 
 </code> </code>
  
-just an old card... 
  
-but EEK  
  
  
 +==== 1080Ti ====
 +
 +{{ :tamiwiki:projects:pxl_20230616_212235630.jpg?400|}}
 +looks legit
 <code bash> <code bash>
-(base) user@eight:~lspci | tail +$sudo dmesg -w 
-08:04.0 PCI bridgeIntel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge DD 2018] (rev 06+[96236.873213] nvidia-nvlinkNvlink Core is being initialized, major device number 509 
-09:00.0 VGA compatible controller: NVIDIA Corporation GF106GL [Quadro 2000(rev a1) + 
-09:00.1 Audio device: NVIDIA Corporation GF106 High Definition Audio Controller (rev a1)+[96236.874544nvidia 0000:09:00.0: enabling device (0006 -> 0007
 +[96236.874646] nvidia 0000:09:00.0: vgaarb: changed VGA decodesolddecodes=io+mem,decodes=none:owns=none 
 +[96236.991272] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.43.02  Mon May 22 20:46:13 UTC 2023 
 +[96237.009537nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  535.43.02  Mon May 22 20:25:24 UTC 2023 
 +[96237.013346] [drm] [nvidia-drm] [GPU ID 0x00000900] Loading driver 
 +[96238.239429] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:09:00.0 on minor 1 
 +[96238.269008] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint. 
 +[96238.330257] nvidia-uvm: Loaded the UVM driver, major device number 507. 
 +[96238.399348] NVRMAPI mismatch: the client has the version 390.157, but 
 +               NVRM: this kernel module has the version 535.43.02.  Please 
 +               NVRM: make sure that this kernel module and all NVIDIA driver
  
-$sudo dmesg 
-[ 1041.053826] nvidia: module license 'NVIDIA' taints kernel. 
-[ 1041.053831] Disabling lock debugging due to kernel taint 
-[ 1041.484017] nvidia-nvlink: Nvlink Core is being initialized, major device number 509 
-[ 1041.484032] NVRM: The NVIDIA Quadro 2000 GPU installed in this system is 
-               NVRM:  supported through the NVIDIA 390.xx Legacy drivers. Please 
-               NVRM:  visit http://www.nvidia.com/object/unix.html for more 
-               NVRM:  information.  The 535.43.02 NVIDIA driver will ignore 
-               NVRM:  this GPU.  Continuing probe... 
-[ 1041.501047] NVRM: No NVIDIA GPU found. 
-[ 1041.521176] nvidia-nvlink: Unregistered Nvlink Core, major device number 509 
-[ 1042.332830] nvidia-nvlink: Nvlink Core is being initialized, major device number 509 
-[ 1042.332842] NVRM: The NVIDIA Quadro 2000 GPU installed in this system is 
-               NVRM:  supported through the NVIDIA 390.xx Legacy drivers. Please 
-               NVRM:  visit http://www.nvidia.com/object/unix.html for more 
-               NVRM:  information.  The 535.43.02 NVIDIA driver will ignore 
-               NVRM:  this GPU.  Continuing probe... 
-[ 1042.335282] NVRM: No NVIDIA GPU found. 
-[ 1042.335835] nvidia-nvlink: Unregistered Nvlink Core, major device number 509 
 </code> </code>
  
-<WRAP center round alert 33%>WE ARE TAINTED</WRAP> +update the driver to fit
- +
-==== driver ==== +
- +
-we went with ubuntu selection  +
- +
-but cute https://www.nvidia.com/en-us/drivers/unix/ +
 <code bash> <code bash>
-sudo apt installl nvidia-headless-535 +ubuntu-drivers devices 
- +== /sys/devices/pci0000:00/0000:00:1c.4/0000:04:00.0/0000:05:01.0/0000:07:00.0/0000:08:01.0/0000:09:00.0 == 
- +modalias : pci:v000010DEd00001B06sv00001458sd0000377Abc03sc00i00 
-#downgrade nvidia to quadro supported version +vendor   : NVIDIA Corporation 
-sudo apt install nvidia-headless-390 +model    : GP102 [GeForce GTX 1080 Ti] 
- +manual_install: True 
-# EEK +driver   nvidia-driver-450-server distro non-free 
-RROR (dkms apport)kernel package linux-headers-6.3.7-060307-generic is not supported +driver   nvidia-driver-510 - distro non-free 
-Error! Bad return status for module build on kernel6.3.7-060307-generic (x86_64) +driver   : nvidia-driver-390 - distro non-free 
-Consult /var/lib/dkms/nvidia/390.157/build/make.log for more information. +driver   : nvidia-driver-470 distro non-free 
-dpkgerror processing package nvidia-dkms-390 (--configure): +driver   : nvidia-driver-525-server - distro non-free 
- installed nvidia-dkms-390 package post-installation script subprocess returned error exit status 10 +driver   : nvidia-driver-525 distro non-free 
-dpkgdependency problems prevent configuration of nvidia-headless-390: +driver   : nvidia-driver-535 - third-party non-free recommended 
- nvidia-headless-390 depends on nvidia-dkms-390; however: +driver   : nvidia-driver-515 distro non-free 
-  Package nvidia-dkms-390 is not configured yet. +driver   nvidia-driver-515-server distro non-free 
- +driver   nvidia-driver-530 - distro non-free 
-dpkgerror processing package nvidia-headless-390 (--configure): +driver   nvidia-driver-470-server - distro non-free 
- dependency problems leaving unconfigured +driver   xserver-xorg-video-nouveau distro free builtin
-Processing triggers for libc-bin (2.36-0ubuntu4) ... +
-No apport report written because the error message indicates its a followup error from a previous failure. +
-                                                                                                          /sbin/ldconfig.real/lib/lib +
-ndi.so.4 is not a symbolic link +
- +
-Processing triggers for man-db (2.10.2-2) ... +
-Processing triggers for initramfs-tools (0.140ubuntu17) ... +
-update-initramfsGenerating /boot/initrd.img-6.3.7-060307-generic +
-Errors were encountered while processing: +
- nvidia-dkms-390 +
- nvidia-headless-390+
  
 +$ sudo ubuntu-drivers autoinstall
  
 </code> </code>
  
-downgrading but to headless,\\ 
-without touching the x config? 
  
 +==== P40 ====
 +<WRAP center round important 60%>
 +this doesnt work on our test machine
 +</WRAP>
  
-going with [[https://github.com/Avalon-Benchmark/avalon|avalon]] readme 
  
-[1]  A 3D video game environment and benchmark designed from scratch for reinforcement learning research +{{ :tamiwiki:projects:pxl_20230615_095016755.jpg?400|}}
  
-<code bash> +the P40 needs modern motherboard that allow for ''Enable Above 4G memory'' bios see [[https://github.com/JingShing/How-to-use-tesla-p40#bios-settings|link]], see [[tamiwiki:projects:P40a|P40]] page for info on dedicated machine.
-conda create -n avalon python=3.+
-conda activate avalon+
  
-sudo apt install --no-install-recommends libegl-dev libglew-dev libglfw3-dev libnvidia-gl libopengl-dev libosmesa6 mesa-utils-extra+NVIDIA Tesla P40 24GB DDR5 GPU Accelerator Card Dual PCI-E 3.0 x16\\ 
 +need to retrofit with a FAN,it doesnt come with one
  
-#this will also install torch... +got one on ebay for 200$(+shipping) ([[https://archive.md/SL4Kq|ebay mirror]])\\
-pip install avalon-rl[train+
  
-python -m avalon.install_godot_binary +some dude got it working, https://github.com/JingShing/How-to-use-tesla-p40 
-python -m avalon.common.check_install +=== SPECIFICATIONS: ===
-</code>+
  
-why even bother, the quaDRO IS JUST A TEST\\ +    * GPU Architecture: NVIDIA Pascal  
-NEED TO CLEAN REMOVE THE 390 driver AND MOVE BACK TO +    * Single-Precision Performance 12 TeraFLOPS*  
 +    * Integer Operations (INT8) 47 TOPS* (TeraOperations per Second)  
 +    * GPU Memory 24 GB  
 +    * Memory Bandwidth 346 GB/s  
 +    * System Interface PCI Express 3.0 x16  
 +    * Form Factor 4.4” H x 10.5” L, Dual Slot, Full Height  
 +    * Max Power 250 W  
 +    * Enhanced Programmability with Page Migration Engine Yes  
 +    * ECC Protection Yes  
 +    * Server-Optimized for Data Center Deployment Yes  
 +    * Hardware-Accelerated Video Engine 1x Decode Engine, 2x Encode Engine /> 
 +    * NVPN: 699-2G610-0200-100 
 +    * NVIDIA® CUDA® cores: 3840
  
-NVIDIA-CURRENT 
- 
- 
-==== P40 ==== 
-https://github.com/JingShing/How-to-use-tesla-p40 
  
 installing  installing 
 <code bash> <code bash>
-sudo apt instakll nvidia-headless-535+sudo apt install nvidia-headless-535
 </code> </code>
  
-there is some issue(?) with the power connector\\ +there is some issue, unlike other cards the blue led doesnt turn green on thunderbolt connection.\\ 
-not sure its needed+no power passing to the gPU.\\
  
-{{:tamiwiki:projects:pasted:20230615-124931.png?400}}+:(
  
 ==== misc ==== ==== misc ====
tamiwiki/projects/egpu.1686823724.txt.gz · Last modified: 2023/06/15 13:08 by yair