feat:添加GPU容器如何开设的说明,添加agent模式录入节点的说明

This commit is contained in:
spiritlhl
2026-05-16 13:06:31 +00:00
parent 6c8fc20e46
commit 73b2db7042
25 changed files with 210 additions and 0 deletions

View File

@@ -216,6 +216,7 @@ function getGuideSidebarZhCN() {
{ text: '系统和硬件配置要求', link: '/guide/oneclickvirt/oneclickvirt_precheck.html' },
{ text: '主体安装', link: '/guide/oneclickvirt/oneclickvirt_install.html' },
{ text: '使用说明', link: '/guide/oneclickvirt/oneclickvirt_usage.html' },
{ text: '自定义', link: '/guide/oneclickvirt/oneclickvirt_custom.html' },
{ text: '致谢', link: '/guide/oneclickvirt/oneclickvirt_thanks.html' },
{ text: '常见问题答疑', link: '/guide/oneclickvirt/oneclickvirt_qa.html' }
]
@@ -412,6 +413,7 @@ function getGuideSidebarEnUS() {
{ text: 'Configuration requirements', link: '/en/guide/oneclickvirt/oneclickvirt_precheck.html' },
{ text: 'Main installation', link: '/en/guide/oneclickvirt/oneclickvirt_install.html' },
{ text: 'Instructions for use', link: '/en/guide/oneclickvirt/oneclickvirt_usage.html' },
{ text: 'Custom', link: '/en/guide/oneclickvirt/oneclickvirt_custom.html' },
{ text: 'Acknowledgements', link: '/en/guide/oneclickvirt/oneclickvirt_thanks.html' },
{ text: 'FAQ', link: '/en/guide/oneclickvirt/oneclickvirt_qa.html' }
]

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 65 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 105 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 105 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 238 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 59 KiB

View File

@@ -0,0 +1,104 @@
---
outline: deep
---
# Custom
## Manage nodes without a dedicated public IPv4 address via Agent mode
For some local devices, the node may have IPv4 internet access but does not have a fixed/dynamic dedicated public IPv4 address. In this case, you cannot directly manage the node through SSH in standard mode. Here we provide a new management method: Agent mode.
![](./images/agent0.png)
When adding a new node, click the corresponding mode, then enter the Basic Information page.
![](./images/agent1.png)
Unlike standard mode, IP address and port are no longer required fields. You can still manage the node if they are left empty. For local nodes, do not fill these two fields. For cloud servers or other nodes with a fixed public IPv4 address, you can fill them in. If they are left empty, only the `Network Type` option `No Port Mapping` is supported later. If filled in, the `Network Type` options are the same as standard mode.
![](./images/agent2.png)
After clicking Save, you can see the command generation button in `Connection Configuration`. Once saved, the node token is fixed. If you need to update the token, you must delete and re-add the node, and all configuration must be filled again. So do not leak the token.
![](./images/agent3.png)
After clicking Generate (as shown below), copy the command and run it directly on the local node server to complete management access. After installation finishes, use the detection button at the bottom of this page for verification.
![](./images/agent4.png)
![](./images/agent7.jpeg)
After detection succeeds, the following configuration pages can be operated according to the original standard-mode instructions; there is no major difference.
![](./images/agent5.png)
Only this section differs for local nodes: choose `No Port Mapping`, so you can later manually perform `Manual Add Port` from the administrator `Port Management` page, and tunnel ports to the controller's IPv4 address for use.
![](./images/agent6.png)
When manually adding port mapping, choose `Controller Forwarding (Intranet Penetration)`. Non-required fields can be left empty. The system will automatically select controller ports for mapping.
There is one limitation: make sure the `Controller Panel` is deployed by `Script Deployment` or local compiled deployment. Docker or Docker Compose deployment is not supported. Non-`Linux` deployment is also not supported, because the controller deployment must have firewall control over the server where it is deployed, so deployment in a `root environment` is required.
The `Intranet Penetration Port` feature is `only for nodes managed in Agent mode`, forwarding via WSS proxy. During deployment, make sure your reverse proxy is configured for WS/WSS according to the instructions. Do not forget this when configuring your own reverse proxy.
Also, if the controller is upgraded, make sure to upgrade the node side accordingly. Click Edit Node, go to `Connection Configuration`, regenerate the command, and run the installation again.
## Use LXD/INCUS to create containers with shared GPU devices
For nodes that need shared GPU devices, make sure the node has already installed the corresponding GPU driver before management, and that GPU commands run correctly, for example:
```shell
nvidia-smi
```
Make sure the output is similar to:
```
root@a12-ThinkStation-P620:/root/sharefile# nvidia-smi
Sat May 16 20:23:07 2026
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------|
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA RTX A6000 Off | 00000000:61:00.0 Off | Off |
| 30% 42C P0 83W / 300W | 0MiB / 49140MiB | 1% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
```
Only after the host machine has the driver installed can GPU resources be shared into containers.
Then follow the Incus/LXD tutorial in this documentation to complete local environment installation. After installation, finish Agent-mode management through the controller and pass health checks before proceeding.
It is recommended to enable node "redemption-code-only claim" mode, then create containers with GPU devices from the administrator redemption-code page.
![](./images/gpu1.jpeg)
After creation succeeds, switch to the administrator's regular-user view to redeem it, then switch back to administrator view and go to the Port Management page to tunnel container ports so you can connect and configure directly via web SSH.
![](./images/gpu2.jpeg)
After adding successfully, you can directly use web SSH to connect and manage this new local container.
Inside the container, install the same driver version as the external host. During installation, make sure it does not load into the kernel by adding the `--no-kernel-module` parameter.
For detailed driver installation steps, refer to: https://www.spiritysdx.top/20240513/#%E5%AE%B9%E5%99%A8%E5%86%85%E5%AE%89%E8%A3%85gpu%E9%A9%B1%E5%8A%A8
After installation, running `nvidia-smi` inside the container should also return output, which proves GPU sharing is active.
![](./images/gpu3.jpg)
At this point, you can stop this container and use it as a template. Use the redemption-code batch container creation "copy mode", set this container as the source container, and clone new containers from it.

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 65 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 105 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 105 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 238 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 59 KiB

View File

@@ -0,0 +1,104 @@
---
outline: deep
---
# 自定义
## 使用agent模式纳管无独立公网IPV4地址的节点
对于一些本地设备节点虽然有IPV4公网访问权限但是没有固定/动态的独立IPV4地址无法直接使用标准模式通过SSH纳管节点这里提供一种新的方式进行纳管----agent模式。
![](./images/agent0.png)
新增节点点击对应的模式后,进入基础信息页面
![](./images/agent1.png)
和常规的标注模式录入不一样IP地址和端口不再是必填项如果不填写也可以纳管。对于本地节点不要填写这两个框对于云服务器等有固定的IPV4的节点可以填写。不填写留空的仅支持后续的```网络模式```选择```无端口映射模式```,而填写不留空的```网络模式```和正常标注模式一样可以选择所有的类型。
![](./images/agent2.png)
点击保存后,可以在```连接配置```这块看到具体的生成命令的按钮一经保存该节点token会写死如果需要更新token需要删除节点后重新新增节点一切配置都得重新填写了所以不要泄露token否则很麻烦。
![](./images/agent3.png)
点击生成后如图,直接复制对应的命令到本地的节点服务器上执行即可纳管,执行安装完毕后,可在当前页面下方的检测按钮上进行检测。
![](./images/agent4.png)
![](./images/agent7.jpeg)
点击检测成功后,后续的相关配置页面按照原先的标准模式的说明来即可,没有什么不同。
![](./images/agent5.png)
只有这块对于本地节点有所不同,需要选择```无端口映射```类型,方便后续手动通过管理员的```端口管理```页面进行```手动添加端口```操作内穿端口到主控的IPV4地址上进行使用。
![](./images/agent6.png)
手动添加端口映射时选择```控制端转发(内网穿透)```即可,非必要填写的项目可以留空不填,自动会筛选主控的端口进行映射的。
但这样有一个限制,务必要确保```主控面板```的部署方式是```脚本部署```或者本地编译部署的不支持docker或docker compose部署方式不支持非```Linux系统```的部署,因为主控部署的时候需要确保拥有对于主控部署的服务器的防火墙的操控权,所以也```需要root环境```下部署。
```内穿端口```这项功能```仅限使用agent模式```纳管的节点通过wss转发代理的方式进行所以部署的时候务必确保反代端口时按照说明有反代ws/wss协议不要自行反代忘记了这点。
同时如果主控有更新了版本,务必确保节点侧也进行对应的更新,点击编辑节点后进入```连接配置```页面点击重新生成命令,重新安装一遍就行了。
## 使用LXD/INCUS开设共享GPU设备的容器
对于需要共享GPU设备的节点务必确保纳管节点前节点本身已安装号对应的显卡驱动且该显卡本身的命令执行无误比如
```shell
nvidia-smi
```
要确保显示类似
```
root@a12-ThinkStation-P620:/root/sharefile# nvidia-smi
Sat May 16 20:23:07 2026
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA RTX A6000 Off | 00000000:61:00.0 Off | Off |
| 30% 42C P0 83W / 300W | 0MiB / 49140MiB | 1% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
```
只有宿主机安装好了驱动才能进行容器化共享GPU资源。
然后需要通过本教程中的Incus/Lxd的教程进行好本地环境的安装安装完毕后通过主控的agent模式纳管完毕且执行健康检测无误后才进行后续的操作。
推荐开启节点仅兑换码兑换模式通过管理员的兑换码页面选择GPU设备创建容器。
![](./images/gpu1.jpeg)
创建成功后切换为管理员的普通用户视图兑换掉然后回到管理员视图去端口管理页面进行容器的端口内穿方便直接通过web的ssh进行连接配置。
![](./images/gpu2.jpeg)
添加成功后可以直接通过web的ssh进行连接操控本地的这个新容器了。
进入容器后,需要安装对应的和外部宿主机一样的驱动版本,只不过这次安装的时候,要确保不要加载进入内核,添加命令参数```--no-kernel-module```。
具体如何找驱动安装驱动,详见 https://www.spiritysdx.top/20240513/#%E5%AE%B9%E5%99%A8%E5%86%85%E5%AE%89%E8%A3%85gpu%E9%A9%B1%E5%8A%A8 如何进行的驱动安装。
安装完毕后,容器内也可以执行```nvidia-smi```得到输出证明GPU已共享使用了。
![](./images/gpu3.jpg)
那此时就可以停止这个容器,以此为母本,使用兑换码的批量开设容器的复制模式,设置此容器为源容器进行复制开设新容器了。