User Tools

Site Tools


tamiwiki:projects:egpu

This is an old revision of the document!


Table of Contents

EGPU

https://docs.kernel.org/admin-guide/thunderbolt.html TLDR

  1. upgrade kernel (??)
  2. install gfx (nvidia|amd) drivers
  3. plug card
  4. reboot
  5. trust thunderbolt
The authorized attribute reads 0 which means no PCIe tunnels are created yet. The user can authorize the device by simply entering:

# echo 1 > /sys/bus/thunderbolt/devices/0-1/authorized

This will create the PCIe tunnels and the device is now connected.

upgrade kernel

from mainline, (the ubuntu dist-upgrade is too conservative (5.19))

cd /tmp
rm -i *deb
 
wget -c   https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.3.7/amd64/linux-headers-6.3.7-060307-generic_6.3.7-060307.202306090936_amd64.deb
wget -c   https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.3.7/amd64/linux-headers-6.3.7-060307_6.3.7-060307.202306090936_all.deb
wget -c   https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.3.7/amd64/linux-image-unsigned-6.3.7-060307-generic_6.3.7-060307.202306090936_amd64.deb
wget -c   https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.3.7/amd64/linux-modules-6.3.7-060307-generic_6.3.7-060307.202306090936_amd64.deb
 
sudo dpkg -i *.deb

trust

hmm, you need to connect before boot.
now permissions

$ sudo dmesg
dprobe" pid=563 comm="apparmor_parser"
[    7.888207] audit: type=1400 audit(1686781044.331:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=563 comm="apparmor_parser"

authorized the tamala!

(base) user@eight:~$ echo 1 | sudo tee /sys/bus/thunderbolt/devices/0-1/authorized
1
(base) user@eight:~$ sudo ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:1c.4/0000:04:00.0/0000:05:01.0/0000:07:00.0/0000:08:01.0/0000:09:00.0 ==
modalias : pci:v000010DEd00000DD8sv000010DEsd0000084Abc03sc00i00
vendor   : NVIDIA Corporation
model    : GF106GL [Quadro 2000]
manual_install: True
driver   : nvidia-driver-390 - distro non-free recommended
driver   : xserver-xorg-video-nouveau - distro free builtin

just an old card…

but EEK

(base) user@eight:~$ lspci | tail
08:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge DD 2018] (rev 06)
09:00.0 VGA compatible controller: NVIDIA Corporation GF106GL [Quadro 2000] (rev a1)
09:00.1 Audio device: NVIDIA Corporation GF106 High Definition Audio Controller (rev a1)
 
$sudo dmesg
[ 1041.053826] nvidia: module license 'NVIDIA' taints kernel.
[ 1041.053831] Disabling lock debugging due to kernel taint
[ 1041.484017] nvidia-nvlink: Nvlink Core is being initialized, major device number 509
[ 1041.484032] NVRM: The NVIDIA Quadro 2000 GPU installed in this system is
               NVRM:  supported through the NVIDIA 390.xx Legacy drivers. Please
               NVRM:  visit http://www.nvidia.com/object/unix.html for more
               NVRM:  information.  The 535.43.02 NVIDIA driver will ignore
               NVRM:  this GPU.  Continuing probe...
[ 1041.501047] NVRM: No NVIDIA GPU found.
[ 1041.521176] nvidia-nvlink: Unregistered Nvlink Core, major device number 509
[ 1042.332830] nvidia-nvlink: Nvlink Core is being initialized, major device number 509
[ 1042.332842] NVRM: The NVIDIA Quadro 2000 GPU installed in this system is
               NVRM:  supported through the NVIDIA 390.xx Legacy drivers. Please
               NVRM:  visit http://www.nvidia.com/object/unix.html for more
               NVRM:  information.  The 535.43.02 NVIDIA driver will ignore
               NVRM:  this GPU.  Continuing probe...
[ 1042.335282] NVRM: No NVIDIA GPU found.
[ 1042.335835] nvidia-nvlink: Unregistered Nvlink Core, major device number 509

WE ARE TAINTED

driver

we went with ubuntu selection

but cute https://www.nvidia.com/en-us/drivers/unix/

$ sudo apt installl nvidia-headless-535
 
 
#downgrade nvidia to quadro supported version
sudo apt install nvidia-headless-390
 
# EEK
RROR (dkms apport): kernel package linux-headers-6.3.7-060307-generic is not supported
Error! Bad return status for module build on kernel: 6.3.7-060307-generic (x86_64)
Consult /var/lib/dkms/nvidia/390.157/build/make.log for more information.
dpkg: error processing package nvidia-dkms-390 (--configure):
 installed nvidia-dkms-390 package post-installation script subprocess returned error exit status 10
dpkg: dependency problems prevent configuration of nvidia-headless-390:
 nvidia-headless-390 depends on nvidia-dkms-390; however:
  Package nvidia-dkms-390 is not configured yet.
 
dpkg: error processing package nvidia-headless-390 (--configure):
 dependency problems - leaving unconfigured
Processing triggers for libc-bin (2.36-0ubuntu4) ...
No apport report written because the error message indicates its a followup error from a previous failure.
                                                                                                          /sbin/ldconfig.real: /lib/lib
ndi.so.4 is not a symbolic link
 
Processing triggers for man-db (2.10.2-2) ...
Processing triggers for initramfs-tools (0.140ubuntu17) ...
update-initramfs: Generating /boot/initrd.img-6.3.7-060307-generic
Errors were encountered while processing:
 nvidia-dkms-390
 nvidia-headless-390

downgrading but to headless,
without touching the x config?

going with avalon readme

[1] A 3D video game environment and benchmark designed from scratch for reinforcement learning research

conda create -n avalon python=3.9
conda activate avalon
 
sudo apt install --no-install-recommends libegl-dev libglew-dev libglfw3-dev libnvidia-gl libopengl-dev libosmesa6 mesa-utils-extra
 
#this will also install torch...
pip install avalon-rl[train] 
 
python -m avalon.install_godot_binary
python -m avalon.common.check_install

why even bother, the quaDRO IS JUST A TEST.
NEED TO CLEAN REMOVE THE 390 driver AND MOVE BACK TO

NVIDIA-CURRENT

P40

https://github.com/JingShing/How-to-use-tesla-p40

installing

sudo apt instakll nvidia-headless-535

there is some issue(?) with the power connector
not sure its needed.

misc

 lspci -v | grep -A 2 -E "(VGA comp|3D)"
00:02.0 VGA compatible controller: Intel Corporation Iris Pro Graphics 580 (rev 09) (prog-if 00 [VGA controller])
	DeviceName:  CPU
	Subsystem: Intel Corporation Iris Pro Graphics 580
--
09:00.0 VGA compatible controller: NVIDIA Corporation GF106GL [Quadro 2000] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: NVIDIA Corporation GF106GL [Quadro 2000]
	Flags: bus master, fast devsel, latency 0

power from 12v dc plug (150W?)
https://www.reddit.com/r/eGPU/comments/ukqto9/comment/ige1rwv

https://egpu.io/forums/thunderbolt-linux-setup/

tamiwiki/projects/egpu.1686823724.txt.gz · Last modified: 2023/06/15 13:08 by yair