|
17 | 17 | | KernelModulePlugin | cat /proc/modules<br>modinfo amdgpu<br>wmic os get Version /Value | **Analyzer Args:**<br>- `kernel_modules`: dict[str, dict] — Expected kernel module name -> {version, etc.}. Analyzer checks collected modules match.<br>- `regex_filter`: list[str] — List of regex patterns to filter which collected modules are checked (default: amd). | - | [KernelModuleDataModel](#KernelModuleDataModel-Model) | [KernelModuleCollector](#Collector-Class-KernelModuleCollector) | [KernelModuleAnalyzer](#Data-Analyzer-Class-KernelModuleAnalyzer) | |
18 | 18 | | MemoryPlugin | free -b<br>lsmem<br>numactl -H<br>wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value | **Analyzer Args:**<br>- `ratio`: float — Required free-memory ratio (0-1). Analysis fails if free/total < ratio.<br>- `memory_threshold`: str — Minimum free memory required (e.g. '30Gi', '1T'). Used when ratio is not sufficient. | - | [MemoryDataModel](#MemoryDataModel-Model) | [MemoryCollector](#Collector-Class-MemoryCollector) | [MemoryAnalyzer](#Data-Analyzer-Class-MemoryAnalyzer) | |
19 | 19 | | NetworkPlugin | ip addr show<br>curl<br>ethtool -S {interface}<br>ethtool {interface}<br>lldpcli show neighbor<br>lldpctl<br>ip neighbor show<br>ping<br>ip route show<br>ip rule show<br>wget | - | **Collection Args:**<br>- `url`: Optional[str] — Optional URL to probe for network connectivity (used with netprobe).<br>- `netprobe`: Optional[Literal['ping', 'wget', 'curl']] — Tool to use for network connectivity probe: ping, wget, or curl. | [NetworkDataModel](#NetworkDataModel-Model) | [NetworkCollector](#Collector-Class-NetworkCollector) | - | |
20 | | -| NicPlugin | - | **Analyzer Args:**<br>- `expected_values`: Optional[Dict[str, Dict[str, Any]]] — Per-command expected checks keyed by canonical key (see command_to_canonical_key).<br>- `performance_profile_expected`: str — Expected Broadcom performance_profile value (case-insensitive). Default RoCE.<br>- `support_rdma_disabled_values`: List[str] — Values that indicate RDMA is not supported (case-insensitive).<br>- `pcie_relaxed_ordering_expected`: str — Expected Broadcom pcie_relaxed_ordering value (e.g. 'Relaxed ordering = enabled'); checked case-insensitively. Defaul...<br>- `expected_qos_prio_map`: Optional[Dict[Any, Any]] — Expected priority-to-TC map (e.g. {0: 0, 1: 1}; keys may be int or str in config). Checked per device when set.<br>- `expected_qos_pfc_enabled`: Optional[int] — Expected PFC enabled value (0/1 or bitmask). Checked per device when set.<br>- `expected_qos_tsa_map`: Optional[Dict[Any, Any]] — Expected TSA map for ETS (e.g. {0: 'ets', 1: 'strict'}; keys may be int or str in config). Checked per device when set.<br>- `expected_qos_tc_bandwidth`: Optional[List[int]] — Expected TC bandwidth percentages. Checked per device when set.<br>- `require_qos_consistent_across_adapters`: bool — When True and no expected_qos_* are set, require all adapters to have the same prio_map, pfc_enabled, and tsa_map.<br>- `nicctl_log_error_regex`: Optional[List[Dict[str, Any]]] — Optional list of error patterns for nicctl show card logs. | **Collection Args:**<br>- `commands`: Optional[List[str]] — Optional list of niccli/nicctl commands to run. When None, default command set is used.<br>- `use_sudo_niccli`: bool — If True, run niccli commands with sudo when required.<br>- `use_sudo_nicctl`: bool — If True, run nicctl commands with sudo when required. | [NicDataModel](#NicDataModel-Model) | [NicCollector](#Collector-Class-NicCollector) | [NicAnalyzer](#Data-Analyzer-Class-NicAnalyzer) | |
| 20 | +| NicPlugin | niccli --listdev<br>niccli --list<br>niccli --list_devices<br>niccli -dev {device_num} nvm -getoption pcie_relaxed_ordering<br>niccli --dev {device_num} nvm --getoption pcie_relaxed_ordering<br>niccli -dev {device_num} nvm -getoption performance_profile<br>niccli --dev {device_num} nvm --getoption performance_profile<br>niccli -dev {device_num} nvm -getoption support_rdma -scope 0<br>niccli -dev {device_num} getqos<br>niccli --dev {device_num} nvm --getoption support_rdma<br>niccli --dev {device_num} qos --ets --show<br>niccli --version<br>nicctl show card<br>nicctl --version<br>nicctl show card flash partition --json<br>nicctl show card interrupts --json<br>nicctl show card logs --non-persistent<br>nicctl show card logs --boot-fault<br>nicctl show card logs --persistent<br>nicctl show card profile --json<br>nicctl show card time --json<br>nicctl show card statistics packet-buffer summary --json<br>nicctl show lif statistics --json<br>nicctl show lif internal queue-to-ud-pinning<br>nicctl show pipeline internal anomalies<br>nicctl show pipeline internal rsq-ring<br>nicctl show pipeline internal statistics memory<br>nicctl show port fsm<br>nicctl show port transceiver --json<br>nicctl show port statistics --json<br>nicctl show port internal mac<br>nicctl show qos headroom --json<br>nicctl show rdma queue --json<br>nicctl show rdma queue-pair --detail --json<br>nicctl show version firmware<br>nicctl show dcqcn<br>nicctl show environment<br>nicctl show lif<br>nicctl show pcie ats<br>nicctl show port<br>nicctl show qos<br>nicctl show rdma statistics<br>nicctl show version host-software<br>nicctl show dcqcn --card {card_id} --json<br>nicctl show card hardware-config --card {card_id} | **Analyzer Args:**<br>- `expected_values`: Optional[Dict[str, Dict[str, Any]]] — Per-command expected checks keyed by canonical key (see command_to_canonical_key).<br>- `performance_profile_expected`: str — Expected Broadcom performance_profile value (case-insensitive). Default RoCE.<br>- `support_rdma_disabled_values`: List[str] — Values that indicate RDMA is not supported (case-insensitive).<br>- `pcie_relaxed_ordering_expected`: str — Expected Broadcom pcie_relaxed_ordering value (e.g. 'Relaxed ordering = enabled'); checked case-insensitively. Defaul...<br>- `expected_qos_prio_map`: Optional[Dict[Any, Any]] — Expected priority-to-TC map (e.g. {0: 0, 1: 1}; keys may be int or str in config). Checked per device when set.<br>- `expected_qos_pfc_enabled`: Optional[int] — Expected PFC enabled value (0/1 or bitmask). Checked per device when set.<br>- `expected_qos_tsa_map`: Optional[Dict[Any, Any]] — Expected TSA map for ETS (e.g. {0: 'ets', 1: 'strict'}; keys may be int or str in config). Checked per device when set.<br>- `expected_qos_tc_bandwidth`: Optional[List[int]] — Expected TC bandwidth percentages. Checked per device when set.<br>- `require_qos_consistent_across_adapters`: bool — When True and no expected_qos_* are set, require all adapters to have the same prio_map, pfc_enabled, and tsa_map.<br>- `nicctl_log_error_regex`: Optional[List[Dict[str, Any]]] — Optional list of error patterns for nicctl show card logs. | **Collection Args:**<br>- `commands`: Optional[List[str]] — Optional list of niccli/nicctl commands to run. When None, default command set is used.<br>- `use_sudo_niccli`: bool — If True, run niccli commands with sudo when required.<br>- `use_sudo_nicctl`: bool — If True, run nicctl commands with sudo when required. | [NicDataModel](#NicDataModel-Model) | [NicCollector](#Collector-Class-NicCollector) | [NicAnalyzer](#Data-Analyzer-Class-NicAnalyzer) | |
21 | 21 | | NvmePlugin | nvme smart-log {dev}<br>nvme error-log {dev} --log-entries=256<br>nvme id-ctrl {dev}<br>nvme id-ns {dev}{ns}<br>nvme fw-log {dev}<br>nvme self-test-log {dev}<br>nvme get-log {dev} --log-id=6 --log-len=512<br>nvme telemetry-log {dev} --output-file={dev}_{f_name}<br>nvme list -o json | - | - | [NvmeDataModel](#NvmeDataModel-Model) | [NvmeCollector](#Collector-Class-NvmeCollector) | - | |
22 | 22 | | OsPlugin | sh -c '( lsb_release -ds || (cat /etc/*release | grep PRETTY_NAME) || uname -om ) 2>/dev/null | head -n1'<br>cat /etc/*release | grep VERSION_ID<br>wmic os get Version /value<br>wmic os get Caption /Value | **Analyzer Args:**<br>- `exp_os`: Union[str, list] — Expected OS name/version string(s) to match (e.g. from lsb_release or /etc/os-release).<br>- `exact_match`: bool — If True, require exact match for exp_os; otherwise substring match. | - | [OsDataModel](#OsDataModel-Model) | [OsCollector](#Collector-Class-OsCollector) | [OsAnalyzer](#Data-Analyzer-Class-OsAnalyzer) | |
23 | 23 | | PackagePlugin | dnf list --installed<br>dpkg-query -W<br>pacman -Q<br>cat /etc/*release<br>wmic product get name,version | **Analyzer Args:**<br>- `exp_package_ver`: Dict[str, Optional[str]] — Map package name -> expected version (None = any version). Checked against installed packages.<br>- `regex_match`: bool — If True, match package versions with regex; otherwise exact or prefix match.<br>- `rocm_regex`: Optional[str] — Optional regex to identify ROCm package version (used when enable_rocm_regex is True).<br>- `enable_rocm_regex`: bool — If True, use rocm_regex (or default pattern) to extract ROCm version for checks. | - | [PackageDataModel](#PackageDataModel-Model) | [PackageCollector](#Collector-Class-PackageCollector) | [PackageAnalyzer](#Data-Analyzer-Class-PackageAnalyzer) | |
@@ -439,10 +439,135 @@ Collect raw output from niccli (Broadcom) and nicctl (Pensando) commands. |
439 | 439 |
|
440 | 440 | **Link to code**: [nic_collector.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/nic/nic_collector.py) |
441 | 441 |
|
| 442 | +### Class Variables |
| 443 | + |
| 444 | +- **CMD_NICCLI_VERSION**: `niccli --version` |
| 445 | +- **CMD_NICCLI_LIST**: `niccli --list` |
| 446 | +- **CMD_NICCLI_LIST_DEVICES**: `niccli --list_devices` |
| 447 | +- **CMD_NICCLI_LIST_DEVICES_LEGACY**: `niccli --listdev` |
| 448 | +- **CMD_NICCLI_DISCOVERY_LEGACY**: `['niccli --listdev', 'niccli --list']` |
| 449 | +- **CMD_NICCLI_DISCOVERY_NEW**: `['niccli --list_devices', 'niccli --list']` |
| 450 | +- **CMD_NICCLI_DISCOVERY**: `['niccli --listdev', 'niccli --list']` |
| 451 | +- **CMD_NICCLI_DISCOVERY_ALL**: `frozenset({'niccli --listdev', 'niccli --list_devices', 'niccli --list'})` |
| 452 | +- **CMD_NICCLI_SUPPORT_RDMA_TEMPLATE_LEGACY**: `niccli -dev {device_num} nvm -getoption support_rdma -scope 0` |
| 453 | +- **CMD_NICCLI_PERFORMANCE_PROFILE_TEMPLATE_LEGACY**: `niccli -dev {device_num} nvm -getoption performance_profile` |
| 454 | +- **CMD_NICCLI_PCIE_RELAXED_ORDERING_TEMPLATE_LEGACY**: `niccli -dev {device_num} nvm -getoption pcie_relaxed_ordering` |
| 455 | +- **CMD_NICCLI_QOS_TEMPLATE_LEGACY**: `niccli -dev {device_num} getqos` |
| 456 | +- **CMD_NICCLI_PER_DEVICE_LEGACY**: `[ |
| 457 | + niccli -dev {device_num} nvm -getoption support_rdma -scope 0, |
| 458 | + niccli -dev {device_num} nvm -getoption performance_profile, |
| 459 | + niccli -dev {device_num} nvm -getoption pcie_relaxed_ordering, |
| 460 | + niccli -dev {device_num} getqos |
| 461 | +]` |
| 462 | +- **CMD_NICCLI_SUPPORT_RDMA_TEMPLATE_NEW**: `niccli --dev {device_num} nvm --getoption support_rdma` |
| 463 | +- **CMD_NICCLI_PERFORMANCE_PROFILE_TEMPLATE_NEW**: `niccli --dev {device_num} nvm --getoption performance_profile` |
| 464 | +- **CMD_NICCLI_PCIE_RELAXED_ORDERING_TEMPLATE_NEW**: `niccli --dev {device_num} nvm --getoption pcie_relaxed_ordering` |
| 465 | +- **CMD_NICCLI_QOS_TEMPLATE_NEW**: `niccli --dev {device_num} qos --ets --show` |
| 466 | +- **CMD_NICCLI_PER_DEVICE_NEW**: `[ |
| 467 | + niccli --dev {device_num} nvm --getoption support_rdma, |
| 468 | + niccli --dev {device_num} nvm --getoption performance_profile, |
| 469 | + niccli --dev {device_num} nvm --getoption pcie_relaxed_ordering, |
| 470 | + niccli --dev {device_num} qos --ets --show |
| 471 | +]` |
| 472 | +- **CMD_NICCLI_SUPPORT_RDMA_TEMPLATE**: `niccli -dev {device_num} nvm -getoption support_rdma -scope 0` |
| 473 | +- **CMD_NICCLI_PERFORMANCE_PROFILE_TEMPLATE**: `niccli -dev {device_num} nvm -getoption performance_profile` |
| 474 | +- **CMD_NICCLI_PCIE_RELAXED_ORDERING_TEMPLATE**: `niccli -dev {device_num} nvm -getoption pcie_relaxed_ordering` |
| 475 | +- **CMD_NICCLI_PER_DEVICE**: `[ |
| 476 | + niccli -dev {device_num} nvm -getoption support_rdma -scope 0, |
| 477 | + niccli -dev {device_num} nvm -getoption performance_profile, |
| 478 | + niccli -dev {device_num} nvm -getoption pcie_relaxed_ordering, |
| 479 | + niccli -dev {device_num} getqos |
| 480 | +]` |
| 481 | +- **CMD_NICCTL_CARD_TEXT**: `nicctl show card` |
| 482 | +- **CMD_NICCTL_GLOBAL**: `[ |
| 483 | + nicctl --version, |
| 484 | + nicctl show card flash partition --json, |
| 485 | + nicctl show card interrupts --json, |
| 486 | + nicctl show card logs --non-persistent, |
| 487 | + nicctl show card logs --boot-fault, |
| 488 | + nicctl show card logs --persistent, |
| 489 | + nicctl show card profile --json, |
| 490 | + nicctl show card time --json, |
| 491 | + nicctl show card statistics packet-buffer summary --json, |
| 492 | + nicctl show lif statistics --json, |
| 493 | + nicctl show lif internal queue-to-ud-pinning, |
| 494 | + nicctl show pipeline internal anomalies, |
| 495 | + nicctl show pipeline internal rsq-ring, |
| 496 | + nicctl show pipeline internal statistics memory, |
| 497 | + nicctl show port fsm, |
| 498 | + nicctl show port transceiver --json, |
| 499 | + nicctl show port statistics --json, |
| 500 | + nicctl show port internal mac, |
| 501 | + nicctl show qos headroom --json, |
| 502 | + nicctl show rdma queue --json, |
| 503 | + nicctl show rdma queue-pair --detail --json, |
| 504 | + nicctl show version firmware |
| 505 | +]` |
| 506 | +- **CMD_NICCTL_PER_CARD**: `['nicctl show dcqcn --card {card_id} --json', 'nicctl show card hardware-config --card {card_id}']` |
| 507 | +- **CMD_NICCTL_LEGACY_TEXT**: `[ |
| 508 | + nicctl show card, |
| 509 | + nicctl show dcqcn, |
| 510 | + nicctl show environment, |
| 511 | + nicctl show lif, |
| 512 | + nicctl show pcie ats, |
| 513 | + nicctl show port, |
| 514 | + nicctl show qos, |
| 515 | + nicctl show rdma statistics, |
| 516 | + nicctl show version host-software |
| 517 | +]` |
| 518 | + |
442 | 519 | ### Provides Data |
443 | 520 |
|
444 | 521 | NicDataModel |
445 | 522 |
|
| 523 | +### Commands |
| 524 | + |
| 525 | +- niccli --listdev |
| 526 | +- niccli --list |
| 527 | +- niccli --list_devices |
| 528 | +- niccli -dev {device_num} nvm -getoption pcie_relaxed_ordering |
| 529 | +- niccli --dev {device_num} nvm --getoption pcie_relaxed_ordering |
| 530 | +- niccli -dev {device_num} nvm -getoption performance_profile |
| 531 | +- niccli --dev {device_num} nvm --getoption performance_profile |
| 532 | +- niccli -dev {device_num} nvm -getoption support_rdma -scope 0 |
| 533 | +- niccli -dev {device_num} getqos |
| 534 | +- niccli --dev {device_num} nvm --getoption support_rdma |
| 535 | +- niccli --dev {device_num} qos --ets --show |
| 536 | +- niccli --version |
| 537 | +- nicctl show card |
| 538 | +- nicctl --version |
| 539 | +- nicctl show card flash partition --json |
| 540 | +- nicctl show card interrupts --json |
| 541 | +- nicctl show card logs --non-persistent |
| 542 | +- nicctl show card logs --boot-fault |
| 543 | +- nicctl show card logs --persistent |
| 544 | +- nicctl show card profile --json |
| 545 | +- nicctl show card time --json |
| 546 | +- nicctl show card statistics packet-buffer summary --json |
| 547 | +- nicctl show lif statistics --json |
| 548 | +- nicctl show lif internal queue-to-ud-pinning |
| 549 | +- nicctl show pipeline internal anomalies |
| 550 | +- nicctl show pipeline internal rsq-ring |
| 551 | +- nicctl show pipeline internal statistics memory |
| 552 | +- nicctl show port fsm |
| 553 | +- nicctl show port transceiver --json |
| 554 | +- nicctl show port statistics --json |
| 555 | +- nicctl show port internal mac |
| 556 | +- nicctl show qos headroom --json |
| 557 | +- nicctl show rdma queue --json |
| 558 | +- nicctl show rdma queue-pair --detail --json |
| 559 | +- nicctl show version firmware |
| 560 | +- nicctl show dcqcn |
| 561 | +- nicctl show environment |
| 562 | +- nicctl show lif |
| 563 | +- nicctl show pcie ats |
| 564 | +- nicctl show port |
| 565 | +- nicctl show qos |
| 566 | +- nicctl show rdma statistics |
| 567 | +- nicctl show version host-software |
| 568 | +- nicctl show dcqcn --card {card_id} --json |
| 569 | +- nicctl show card hardware-config --card {card_id} |
| 570 | + |
446 | 571 | ## Collector Class NvmeCollector |
447 | 572 |
|
448 | 573 | ### Description |
|
0 commit comments