User Tools

Site Tools


tamiwiki:projects:egpu

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
tamiwiki:projects:egpu [2023/06/15 14:39] – [P40] yairtamiwiki:projects:egpu [2023/11/04 11:07] (current) – [1080Ti] yair
Line 1: Line 1:
 ====== EGPU ====== ====== EGPU ======
-https://docs.kernel.org/admin-guide/thunderbolt.html+{{ :tamiwiki:projects:pasted:20230618-183833.png}} 
 + 
 +we are using the [[https://egpu.io/best-egpu-buyers-guide/|TH3P4G3 eGPU external thunderbolt]] thing.\\ 
 + 
 +Linux Kernal notes > https://docs.kernel.org/admin-guide/thunderbolt.html\\ 
 +[[https://realtechtalk.com/Nvidia_Tesla_GPUs_K40K80M40P40P100V100_at_homedesktop_hacking_cooling_powering_cable_solutions_Tutorial_AIO_Solutions-2465-articles|realtechtalk guide]], [[https://archive.is/Kgj7E|mirror]] 
 + 
 + 
 +=== ThunderBolt check and setup === 
 TLDR TLDR
   - upgrade kernel (??)   - upgrade kernel (??)
Line 46: Line 55:
 <code bash> <code bash>
 (base) user@eight:~$ echo 1 | sudo tee /sys/bus/thunderbolt/devices/0-1/authorized (base) user@eight:~$ echo 1 | sudo tee /sys/bus/thunderbolt/devices/0-1/authorized
-+</code>
-(base) user@eight:~$ sudo ubuntu-drivers devices +
-== /sys/devices/pci0000:00/0000:00:1c.4/0000:04:00.0/0000:05:01.0/0000:07:00.0/0000:08:01.0/0000:09:00.0 == +
-modalias : pci:v000010DEd00000DD8sv000010DEsd0000084Abc03sc00i00 +
-vendor   : NVIDIA Corporation +
-model    : GF106GL [Quadro 2000] +
-manual_install: True +
-driver   : nvidia-driver-390 - distro non-free recommended +
-driver   : xserver-xorg-video-nouveau - distro free builtin+
  
-</code> 
  
-just an old card... 
  
-but EEK  
  
 +==== 1080Ti ====
  
 +{{ :tamiwiki:projects:pxl_20230616_212235630.jpg?400|}}
 +looks legit
 <code bash> <code bash>
-(base) user@eight:~lspci | tail +$sudo dmesg -w 
-08:04.0 PCI bridgeIntel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge DD 2018] (rev 06+[96236.873213] nvidia-nvlinkNvlink Core is being initialized, major device number 509 
-09:00.0 VGA compatible controller: NVIDIA Corporation GF106GL [Quadro 2000(rev a1) + 
-09:00.1 Audio device: NVIDIA Corporation GF106 High Definition Audio Controller (rev a1)+[96236.874544nvidia 0000:09:00.0: enabling device (0006 -> 0007
 +[96236.874646] nvidia 0000:09:00.0: vgaarb: changed VGA decodesolddecodes=io+mem,decodes=none:owns=none 
 +[96236.991272] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.43.02  Mon May 22 20:46:13 UTC 2023 
 +[96237.009537nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  535.43.02  Mon May 22 20:25:24 UTC 2023 
 +[96237.013346] [drm] [nvidia-drm] [GPU ID 0x00000900] Loading driver 
 +[96238.239429] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:09:00.0 on minor 1 
 +[96238.269008] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint. 
 +[96238.330257] nvidia-uvm: Loaded the UVM driver, major device number 507. 
 +[96238.399348] NVRMAPI mismatch: the client has the version 390.157, but 
 +               NVRM: this kernel module has the version 535.43.02.  Please 
 +               NVRM: make sure that this kernel module and all NVIDIA driver
  
-$sudo dmesg 
-[ 1041.053826] nvidia: module license 'NVIDIA' taints kernel. 
-[ 1041.053831] Disabling lock debugging due to kernel taint 
-[ 1041.484017] nvidia-nvlink: Nvlink Core is being initialized, major device number 509 
-[ 1041.484032] NVRM: The NVIDIA Quadro 2000 GPU installed in this system is 
-               NVRM:  supported through the NVIDIA 390.xx Legacy drivers. Please 
-               NVRM:  visit http://www.nvidia.com/object/unix.html for more 
-               NVRM:  information.  The 535.43.02 NVIDIA driver will ignore 
-               NVRM:  this GPU.  Continuing probe... 
-[ 1041.501047] NVRM: No NVIDIA GPU found. 
-[ 1041.521176] nvidia-nvlink: Unregistered Nvlink Core, major device number 509 
-[ 1042.332830] nvidia-nvlink: Nvlink Core is being initialized, major device number 509 
-[ 1042.332842] NVRM: The NVIDIA Quadro 2000 GPU installed in this system is 
-               NVRM:  supported through the NVIDIA 390.xx Legacy drivers. Please 
-               NVRM:  visit http://www.nvidia.com/object/unix.html for more 
-               NVRM:  information.  The 535.43.02 NVIDIA driver will ignore 
-               NVRM:  this GPU.  Continuing probe... 
-[ 1042.335282] NVRM: No NVIDIA GPU found. 
-[ 1042.335835] nvidia-nvlink: Unregistered Nvlink Core, major device number 509 
 </code> </code>
  
-<WRAP center round alert 33%>WE ARE TAINTED</WRAP> +update the driver to fit
- +
-==== driver ==== +
- +
-we went with ubuntu selection  +
- +
-but cute https://www.nvidia.com/en-us/drivers/unix/ +
 <code bash> <code bash>
-sudo apt installl nvidia-headless-535 +ubuntu-drivers devices 
- +== /sys/devices/pci0000:00/0000:00:1c.4/0000:04:00.0/0000:05:01.0/0000:07:00.0/0000:08:01.0/0000:09:00.0 == 
- +modalias : pci:v000010DEd00001B06sv00001458sd0000377Abc03sc00i00 
-#downgrade nvidia to quadro supported version +vendor   : NVIDIA Corporation 
-sudo apt install nvidia-headless-390 +model    : GP102 [GeForce GTX 1080 Ti] 
- +manual_install: True 
-# EEK +driver   nvidia-driver-450-server distro non-free 
-RROR (dkms apport)kernel package linux-headers-6.3.7-060307-generic is not supported +driver   nvidia-driver-510 - distro non-free 
-Error! Bad return status for module build on kernel6.3.7-060307-generic (x86_64) +driver   : nvidia-driver-390 - distro non-free 
-Consult /var/lib/dkms/nvidia/390.157/build/make.log for more information. +driver   : nvidia-driver-470 distro non-free 
-dpkgerror processing package nvidia-dkms-390 (--configure): +driver   : nvidia-driver-525-server - distro non-free 
- installed nvidia-dkms-390 package post-installation script subprocess returned error exit status 10 +driver   : nvidia-driver-525 distro non-free 
-dpkgdependency problems prevent configuration of nvidia-headless-390: +driver   : nvidia-driver-535 - third-party non-free recommended 
- nvidia-headless-390 depends on nvidia-dkms-390; however: +driver   : nvidia-driver-515 distro non-free 
-  Package nvidia-dkms-390 is not configured yet. +driver   nvidia-driver-515-server distro non-free 
- +driver   nvidia-driver-530 - distro non-free 
-dpkgerror processing package nvidia-headless-390 (--configure): +driver   nvidia-driver-470-server - distro non-free 
- dependency problems leaving unconfigured +driver   xserver-xorg-video-nouveau distro free builtin
-Processing triggers for libc-bin (2.36-0ubuntu4) ... +
-No apport report written because the error message indicates its a followup error from a previous failure. +
-                                                                                                          /sbin/ldconfig.real/lib/lib +
-ndi.so.4 is not a symbolic link +
- +
-Processing triggers for man-db (2.10.2-2) ... +
-Processing triggers for initramfs-tools (0.140ubuntu17) ... +
-update-initramfsGenerating /boot/initrd.img-6.3.7-060307-generic +
-Errors were encountered while processing: +
- nvidia-dkms-390 +
- nvidia-headless-390+
  
 +$ sudo ubuntu-drivers autoinstall
  
 </code> </code>
  
-downgrading but to headless,\\ 
-without touching the x config? 
  
 +==== P40 ====
 +<WRAP center round important 60%>
 +this doesnt work on our test machine
 +</WRAP>
  
-going with [[https://github.com/Avalon-Benchmark/avalon|avalon]] readme 
  
-[1]  A 3D video game environment and benchmark designed from scratch for reinforcement learning research +{{ :tamiwiki:projects:pxl_20230615_095016755.jpg?400|}}
  
-<code bash> +the P40 needs modern motherboard that allow for ''Enable Above 4G memory'' bios see [[https://github.com/JingShing/How-to-use-tesla-p40#bios-settings|link]], see [[tamiwiki:projects:P40a|P40]] page for info on dedicated machine.
-conda create -n avalon python=3.+
-conda activate avalon+
  
-sudo apt install --no-install-recommends libegl-dev libglew-dev libglfw3-dev libnvidia-gl libopengl-dev libosmesa6 mesa-utils-extra+NVIDIA Tesla P40 24GB DDR5 GPU Accelerator Card Dual PCI-E 3.0 x16\\ 
 +need to retrofit with a FAN,it doesnt come with one
  
-#this will also install torch... +got one on ebay for 200$(+shipping) ([[https://archive.md/SL4Kq|ebay mirror]])\\
-pip install avalon-rl[train+
  
-python -m avalon.install_godot_binary +some dude got it working, https://github.com/JingShing/How-to-use-tesla-p40 
-python -m avalon.common.check_install +=== SPECIFICATIONS: ===
-</code>+
  
-why even bother, the quaDRO IS JUST A TEST\\ +    * GPU Architecture: NVIDIA Pascal  
-NEED TO CLEAN REMOVE THE 390 driver AND MOVE BACK TO +    * Single-Precision Performance 12 TeraFLOPS*  
 +    * Integer Operations (INT8) 47 TOPS* (TeraOperations per Second)  
 +    * GPU Memory 24 GB  
 +    * Memory Bandwidth 346 GB/s  
 +    * System Interface PCI Express 3.0 x16  
 +    * Form Factor 4.4” H x 10.5” L, Dual Slot, Full Height  
 +    * Max Power 250 W  
 +    * Enhanced Programmability with Page Migration Engine Yes  
 +    * ECC Protection Yes  
 +    * Server-Optimized for Data Center Deployment Yes  
 +    * Hardware-Accelerated Video Engine 1x Decode Engine, 2x Encode Engine /> 
 +    * NVPN: 699-2G610-0200-100 
 +    * NVIDIA® CUDA® cores: 3840
  
-NVIDIA-CURRENT 
- 
- 
-==== P40 ==== 
-https://github.com/JingShing/How-to-use-tesla-p40 
  
 installing  installing 
 <code bash> <code bash>
-sudo apt instakll nvidia-headless-535+sudo apt install nvidia-headless-535
 </code> </code>
  
 there is some issue, unlike other cards the blue led doesnt turn green on thunderbolt connection.\\ there is some issue, unlike other cards the blue led doesnt turn green on thunderbolt connection.\\
 no power passing to the gPU.\\ no power passing to the gPU.\\
-unlike with other cards we tried (quadro 2000 and 660Ti) 
  
 :( :(
 +
 ==== misc ==== ==== misc ====
  
tamiwiki/projects/egpu.1686829194.txt.gz · Last modified: 2023/06/15 14:39 by yair