User Tools

Site Tools


tamiwiki:projects:egpu

This is an old revision of the document!


Table of Contents

EGPU

https://docs.kernel.org/admin-guide/thunderbolt.html TLDR

  1. upgrade kernel (??)
  2. install gfx (nvidia|amd) drivers
  3. plug card
  4. reboot
  5. trust thunderbolt
The authorized attribute reads 0 which means no PCIe tunnels are created yet. The user can authorize the device by simply entering:

# echo 1 > /sys/bus/thunderbolt/devices/0-1/authorized

This will create the PCIe tunnels and the device is now connected.

upgrade kernel

from mainline, (the ubuntu dist-upgrade is too conservative (5.19))

cd /tmp
rm -i *deb
 
wget -c   https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.3.7/amd64/linux-headers-6.3.7-060307-generic_6.3.7-060307.202306090936_amd64.deb
wget -c   https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.3.7/amd64/linux-headers-6.3.7-060307_6.3.7-060307.202306090936_all.deb
wget -c   https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.3.7/amd64/linux-image-unsigned-6.3.7-060307-generic_6.3.7-060307.202306090936_amd64.deb
wget -c   https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.3.7/amd64/linux-modules-6.3.7-060307-generic_6.3.7-060307.202306090936_amd64.deb
 
sudo dpkg -i *.deb

trust

hmm, you need to connect before boot.
now permissions

$ sudo dmesg
dprobe" pid=563 comm="apparmor_parser"
[    7.888207] audit: type=1400 audit(1686781044.331:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=563 comm="apparmor_parser"

authorized the tamala!

(base) user@eight:~$ echo 1 | sudo tee /sys/bus/thunderbolt/devices/0-1/authorized
1
(base) user@eight:~$ sudo ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:1c.4/0000:04:00.0/0000:05:01.0/0000:07:00.0/0000:08:01.0/0000:09:00.0 ==
modalias : pci:v000010DEd00000DD8sv000010DEsd0000084Abc03sc00i00
vendor   : NVIDIA Corporation
model    : GF106GL [Quadro 2000]
manual_install: True
driver   : nvidia-driver-390 - distro non-free recommended
driver   : xserver-xorg-video-nouveau - distro free builtin

just an old card…

but EEK

(base) user@eight:~$ lspci | tail
08:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge DD 2018] (rev 06)
09:00.0 VGA compatible controller: NVIDIA Corporation GF106GL [Quadro 2000] (rev a1)
09:00.1 Audio device: NVIDIA Corporation GF106 High Definition Audio Controller (rev a1)
 
$sudo dmesg
[ 1041.053826] nvidia: module license 'NVIDIA' taints kernel.
[ 1041.053831] Disabling lock debugging due to kernel taint
[ 1041.484017] nvidia-nvlink: Nvlink Core is being initialized, major device number 509
[ 1041.484032] NVRM: The NVIDIA Quadro 2000 GPU installed in this system is
               NVRM:  supported through the NVIDIA 390.xx Legacy drivers. Please
               NVRM:  visit http://www.nvidia.com/object/unix.html for more
               NVRM:  information.  The 535.43.02 NVIDIA driver will ignore
               NVRM:  this GPU.  Continuing probe...
[ 1041.501047] NVRM: No NVIDIA GPU found.
[ 1041.521176] nvidia-nvlink: Unregistered Nvlink Core, major device number 509
[ 1042.332830] nvidia-nvlink: Nvlink Core is being initialized, major device number 509
[ 1042.332842] NVRM: The NVIDIA Quadro 2000 GPU installed in this system is
               NVRM:  supported through the NVIDIA 390.xx Legacy drivers. Please
               NVRM:  visit http://www.nvidia.com/object/unix.html for more
               NVRM:  information.  The 535.43.02 NVIDIA driver will ignore
               NVRM:  this GPU.  Continuing probe...
[ 1042.335282] NVRM: No NVIDIA GPU found.
[ 1042.335835] nvidia-nvlink: Unregistered Nvlink Core, major device number 509

WE ARE TAINTED

driver

we went with ubuntu selection

but cute https://www.nvidia.com/en-us/drivers/unix/

$ sudo apt installl nvidia-headless-535
 
 
#downgrade nvidia to quadro supported version
sudo apt install nvidia-headless-390
 
# EEK
RROR (dkms apport): kernel package linux-headers-6.3.7-060307-generic is not supported
Error! Bad return status for module build on kernel: 6.3.7-060307-generic (x86_64)
Consult /var/lib/dkms/nvidia/390.157/build/make.log for more information.
dpkg: error processing package nvidia-dkms-390 (--configure):
 installed nvidia-dkms-390 package post-installation script subprocess returned error exit status 10
dpkg: dependency problems prevent configuration of nvidia-headless-390:
 nvidia-headless-390 depends on nvidia-dkms-390; however:
  Package nvidia-dkms-390 is not configured yet.
 
dpkg: error processing package nvidia-headless-390 (--configure):
 dependency problems - leaving unconfigured
Processing triggers for libc-bin (2.36-0ubuntu4) ...
No apport report written because the error message indicates its a followup error from a previous failure.
                                                                                                          /sbin/ldconfig.real: /lib/lib
ndi.so.4 is not a symbolic link
 
Processing triggers for man-db (2.10.2-2) ...
Processing triggers for initramfs-tools (0.140ubuntu17) ...
update-initramfs: Generating /boot/initrd.img-6.3.7-060307-generic
Errors were encountered while processing:
 nvidia-dkms-390
 nvidia-headless-390

downgrading but to headless,
without touching the x config?

going with avalon readme

[1] A 3D video game environment and benchmark designed from scratch for reinforcement learning research

conda create -n avalon python=3.9
conda activate avalon
 
sudo apt install --no-install-recommends libegl-dev libglew-dev libglfw3-dev libnvidia-gl libopengl-dev libosmesa6 mesa-utils-extra
 
#this will also install torch...
pip install avalon-rl[train] 
 
python -m avalon.install_godot_binary
python -m avalon.common.check_install

why even bother, the quaDRO IS JUST A TEST.
NEED TO CLEAN REMOVE THE 390 driver AND MOVE BACK TO

NVIDIA-CURRENT

P40

https://github.com/JingShing/How-to-use-tesla-p40

installing

sudo apt instakll nvidia-headless-535

there is some issue, unlike other cards the blue led doesnt turn green on thunderbolt connection.
no power passing to the gPU.
unlike with other cards we tried (quadro 2000 and 660Ti)

:(

misc

 lspci -v | grep -A 2 -E "(VGA comp|3D)"
00:02.0 VGA compatible controller: Intel Corporation Iris Pro Graphics 580 (rev 09) (prog-if 00 [VGA controller])
	DeviceName:  CPU
	Subsystem: Intel Corporation Iris Pro Graphics 580
--
09:00.0 VGA compatible controller: NVIDIA Corporation GF106GL [Quadro 2000] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: NVIDIA Corporation GF106GL [Quadro 2000]
	Flags: bus master, fast devsel, latency 0

power from 12v dc plug (150W?)
https://www.reddit.com/r/eGPU/comments/ukqto9/comment/ige1rwv

https://egpu.io/forums/thunderbolt-linux-setup/

tamiwiki/projects/egpu.1686829194.txt.gz · Last modified: 2023/06/15 14:39 by yair