User Tools

Site Tools


tamiwiki:projects:egpu

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
tamiwiki:projects:egpu [2023/06/17 00:27] – [1080Ti] yairtamiwiki:projects:egpu [2023/11/04 11:07] (current) – [1080Ti] yair
Line 1: Line 1:
 ====== EGPU ====== ====== EGPU ======
-https://docs.kernel.org/admin-guide/thunderbolt.html+{{ :tamiwiki:projects:pasted:20230618-183833.png}} 
 + 
 +we are using the [[https://egpu.io/best-egpu-buyers-guide/|TH3P4G3 eGPU external thunderbolt]] thing.\\ 
 + 
 +Linux Kernal notes > https://docs.kernel.org/admin-guide/thunderbolt.html\\ 
 +[[https://realtechtalk.com/Nvidia_Tesla_GPUs_K40K80M40P40P100V100_at_homedesktop_hacking_cooling_powering_cable_solutions_Tutorial_AIO_Solutions-2465-articles|realtechtalk guide]], [[https://archive.is/Kgj7E|mirror]] 
 + 
 + 
 +=== ThunderBolt check and setup === 
 TLDR TLDR
   - upgrade kernel (??)   - upgrade kernel (??)
Line 46: Line 55:
 <code bash> <code bash>
 (base) user@eight:~$ echo 1 | sudo tee /sys/bus/thunderbolt/devices/0-1/authorized (base) user@eight:~$ echo 1 | sudo tee /sys/bus/thunderbolt/devices/0-1/authorized
-1 
-(base) user@eight:~$ sudo ubuntu-drivers devices 
-== /sys/devices/pci0000:00/0000:00:1c.4/0000:04:00.0/0000:05:01.0/0000:07:00.0/0000:08:01.0/0000:09:00.0 == 
-modalias : pci:v000010DEd00000DD8sv000010DEsd0000084Abc03sc00i00 
-vendor   : NVIDIA Corporation 
-model    : GF106GL [Quadro 2000] 
-manual_install: True 
-driver   : nvidia-driver-390 - distro non-free recommended 
-driver   : xserver-xorg-video-nouveau - distro free builtin 
- 
 </code> </code>
  
-just an old card... 
  
-but EEK  
  
- 
-<code bash> 
-(base) user@eight:~$ lspci | tail 
-08:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge DD 2018] (rev 06) 
-09:00.0 VGA compatible controller: NVIDIA Corporation GF106GL [Quadro 2000] (rev a1) 
-09:00.1 Audio device: NVIDIA Corporation GF106 High Definition Audio Controller (rev a1) 
- 
-$sudo dmesg 
-[ 1041.053826] nvidia: module license 'NVIDIA' taints kernel. 
-[ 1041.053831] Disabling lock debugging due to kernel taint 
-[ 1041.484017] nvidia-nvlink: Nvlink Core is being initialized, major device number 509 
-[ 1041.484032] NVRM: The NVIDIA Quadro 2000 GPU installed in this system is 
-               NVRM:  supported through the NVIDIA 390.xx Legacy drivers. Please 
-               NVRM:  visit http://www.nvidia.com/object/unix.html for more 
-               NVRM:  information.  The 535.43.02 NVIDIA driver will ignore 
-               NVRM:  this GPU.  Continuing probe... 
-[ 1041.501047] NVRM: No NVIDIA GPU found. 
-[ 1041.521176] nvidia-nvlink: Unregistered Nvlink Core, major device number 509 
-[ 1042.332830] nvidia-nvlink: Nvlink Core is being initialized, major device number 509 
-[ 1042.332842] NVRM: The NVIDIA Quadro 2000 GPU installed in this system is 
-               NVRM:  supported through the NVIDIA 390.xx Legacy drivers. Please 
-               NVRM:  visit http://www.nvidia.com/object/unix.html for more 
-               NVRM:  information.  The 535.43.02 NVIDIA driver will ignore 
-               NVRM:  this GPU.  Continuing probe... 
-[ 1042.335282] NVRM: No NVIDIA GPU found. 
-[ 1042.335835] nvidia-nvlink: Unregistered Nvlink Core, major device number 509 
-</code> 
- 
-<WRAP center round alert 33%>WE ARE TAINTED</WRAP> 
- 
-==== driver ==== 
- 
-we went with ubuntu selection  
- 
-but cute https://www.nvidia.com/en-us/drivers/unix/ 
- 
-<code bash> 
-$ sudo apt installl nvidia-headless-535 
- 
- 
-#downgrade nvidia to quadro supported version 
-sudo apt install nvidia-headless-390 
- 
-# EEK 
-RROR (dkms apport): kernel package linux-headers-6.3.7-060307-generic is not supported 
-Error! Bad return status for module build on kernel: 6.3.7-060307-generic (x86_64) 
-Consult /var/lib/dkms/nvidia/390.157/build/make.log for more information. 
-dpkg: error processing package nvidia-dkms-390 (--configure): 
- installed nvidia-dkms-390 package post-installation script subprocess returned error exit status 10 
-dpkg: dependency problems prevent configuration of nvidia-headless-390: 
- nvidia-headless-390 depends on nvidia-dkms-390; however: 
-  Package nvidia-dkms-390 is not configured yet. 
- 
-dpkg: error processing package nvidia-headless-390 (--configure): 
- dependency problems - leaving unconfigured 
-Processing triggers for libc-bin (2.36-0ubuntu4) ... 
-No apport report written because the error message indicates its a followup error from a previous failure. 
-                                                                                                          /sbin/ldconfig.real: /lib/lib 
-ndi.so.4 is not a symbolic link 
- 
-Processing triggers for man-db (2.10.2-2) ... 
-Processing triggers for initramfs-tools (0.140ubuntu17) ... 
-update-initramfs: Generating /boot/initrd.img-6.3.7-060307-generic 
-Errors were encountered while processing: 
- nvidia-dkms-390 
- nvidia-headless-390 
- 
- 
-</code> 
- 
-downgrading but to headless,\\ 
-without touching the x config? 
- 
- 
-going with [[https://github.com/Avalon-Benchmark/avalon|avalon]] readme 
- 
-[1]  A 3D video game environment and benchmark designed from scratch for reinforcement learning research  
- 
-<code bash> 
-conda create -n avalon python=3.9 
-conda activate avalon 
- 
-sudo apt install --no-install-recommends libegl-dev libglew-dev libglfw3-dev libnvidia-gl libopengl-dev libosmesa6 mesa-utils-extra 
- 
-#this will also install torch... 
-pip install avalon-rl[train]  
- 
-python -m avalon.install_godot_binary 
-python -m avalon.common.check_install 
-</code> 
- 
-why even bother, the quaDRO IS JUST A TEST. \\ 
-NEED TO CLEAN REMOVE THE 390 driver AND MOVE BACK TO  
- 
-NVIDIA-CURRENT 
- 
- 
-==== P40 ==== 
-https://github.com/JingShing/How-to-use-tesla-p40 
- 
-installing  
-<code bash> 
-sudo apt instakll nvidia-headless-535 
-</code> 
- 
-there is some issue, unlike other cards the blue led doesnt turn green on thunderbolt connection.\\ 
-no power passing to the gPU.\\ 
-unlike with other cards we tried (quadro 2000 and 660Ti) 
- 
-:( 
  
 ==== 1080Ti ==== ==== 1080Ti ====
Line 217: Line 104:
  
 $ sudo ubuntu-drivers autoinstall $ sudo ubuntu-drivers autoinstall
-1The following additional packages will be installed: 
-  libnvidia-common-535 libnvidia-compute-535:i386 
-  libnvidia-decode-535 libnvidia-decode-535:i386 
-  libnvidia-encode-535 libnvidia-encode-535:i386 
-  libnvidia-extra-535 libnvidia-fbc1-535 libnvidia-fbc1-535:i386 
-  libnvidia-gl-535 libnvidia-gl-535:i386 nvidia-prime 
-  nvidia-settings nvidia-utils-535 screen-resolution-extra 
-  xserver-xorg-video-nvidia-535 
-The following packages will be REMOVED: 
-  libnvidia-common-390 libnvidia-gl-390 
-The following NEW packages will be installed: 
-  libnvidia-common-535 libnvidia-compute-535:i386 
-  libnvidia-decode-535 libnvidia-decode-535:i386 
-  libnvidia-encode-535 libnvidia-encode-535:i386 
-  libnvidia-extra-535 libnvidia-fbc1-535 libnvidia-fbc1-535:i386 
-  libnvidia-gl-535 libnvidia-gl-535:i386 nvidia-driver-535 
-  nvidia-prime nvidia-settings nvidia-utils-535 
-  screen-resolution-extra xserver-xorg-video-nvidia-535 
  
 </code> </code>
 +
 +
 +==== P40 ====
 +<WRAP center round important 60%>
 +this doesnt work on our test machine
 +</WRAP>
 +
 +
 +{{ :tamiwiki:projects:pxl_20230615_095016755.jpg?400|}}
 +
 +the P40 needs modern motherboard that allow for ''Enable Above 4G memory'' bios see [[https://github.com/JingShing/How-to-use-tesla-p40#bios-settings|link]], see [[tamiwiki:projects:P40a|P40]] page for info on dedicated machine.
 +
 +NVIDIA Tesla P40 24GB DDR5 GPU Accelerator Card Dual PCI-E 3.0 x16\\
 +need to retrofit with a FAN,it doesnt come with one
 +
 +got one on ebay for 200$(+shipping) ([[https://archive.md/SL4Kq|ebay mirror]])\\
 +
 +some dude got it working, https://github.com/JingShing/How-to-use-tesla-p40
 +=== SPECIFICATIONS: ===
 +
 +    * GPU Architecture: NVIDIA Pascal 
 +    * Single-Precision Performance 12 TeraFLOPS* 
 +    * Integer Operations (INT8) 47 TOPS* (TeraOperations per Second) 
 +    * GPU Memory 24 GB 
 +    * Memory Bandwidth 346 GB/s 
 +    * System Interface PCI Express 3.0 x16 
 +    * Form Factor 4.4” H x 10.5” L, Dual Slot, Full Height 
 +    * Max Power 250 W 
 +    * Enhanced Programmability with Page Migration Engine Yes 
 +    * ECC Protection Yes 
 +    * Server-Optimized for Data Center Deployment Yes 
 +    * Hardware-Accelerated Video Engine 1x Decode Engine, 2x Encode Engine />
 +    * NVPN: 699-2G610-0200-100
 +    * NVIDIA® CUDA® cores: 3840
 +
 +
 +installing 
 +<code bash>
 +sudo apt install nvidia-headless-535
 +</code>
 +
 +there is some issue, unlike other cards the blue led doesnt turn green on thunderbolt connection.\\
 +no power passing to the gPU.\\
 +
 +:(
 +
 ==== misc ==== ==== misc ====
  
tamiwiki/projects/egpu.1686950832.txt.gz · Last modified: 2023/06/17 00:27 by yair