Skip to content

Commit 2456858

Browse files
Merge pull request #165 from amd/alex_doc_fix
Documentation update
2 parents 5697b1a + afad8da commit 2456858

3 files changed

Lines changed: 299 additions & 187 deletions

File tree

docs/PLUGIN_DOC.md

Lines changed: 126 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
| KernelModulePlugin | cat /proc/modules<br>modinfo amdgpu<br>wmic os get Version /Value | **Analyzer Args:**<br>- `kernel_modules`: dict[str, dict] — Expected kernel module name -> {version, etc.}. Analyzer checks collected modules match.<br>- `regex_filter`: list[str] — List of regex patterns to filter which collected modules are checked (default: amd). | - | [KernelModuleDataModel](#KernelModuleDataModel-Model) | [KernelModuleCollector](#Collector-Class-KernelModuleCollector) | [KernelModuleAnalyzer](#Data-Analyzer-Class-KernelModuleAnalyzer) |
1818
| MemoryPlugin | free -b<br>lsmem<br>numactl -H<br>wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value | **Analyzer Args:**<br>- `ratio`: float — Required free-memory ratio (0-1). Analysis fails if free/total < ratio.<br>- `memory_threshold`: str — Minimum free memory required (e.g. '30Gi', '1T'). Used when ratio is not sufficient. | - | [MemoryDataModel](#MemoryDataModel-Model) | [MemoryCollector](#Collector-Class-MemoryCollector) | [MemoryAnalyzer](#Data-Analyzer-Class-MemoryAnalyzer) |
1919
| NetworkPlugin | ip addr show<br>curl<br>ethtool -S {interface}<br>ethtool {interface}<br>lldpcli show neighbor<br>lldpctl<br>ip neighbor show<br>ping<br>ip route show<br>ip rule show<br>wget | - | **Collection Args:**<br>- `url`: Optional[str] — Optional URL to probe for network connectivity (used with netprobe).<br>- `netprobe`: Optional[Literal['ping', 'wget', 'curl']] — Tool to use for network connectivity probe: ping, wget, or curl. | [NetworkDataModel](#NetworkDataModel-Model) | [NetworkCollector](#Collector-Class-NetworkCollector) | - |
20-
| NicPlugin | - | **Analyzer Args:**<br>- `expected_values`: Optional[Dict[str, Dict[str, Any]]] — Per-command expected checks keyed by canonical key (see command_to_canonical_key).<br>- `performance_profile_expected`: str — Expected Broadcom performance_profile value (case-insensitive). Default RoCE.<br>- `support_rdma_disabled_values`: List[str] — Values that indicate RDMA is not supported (case-insensitive).<br>- `pcie_relaxed_ordering_expected`: str — Expected Broadcom pcie_relaxed_ordering value (e.g. 'Relaxed ordering = enabled'); checked case-insensitively. Defaul...<br>- `expected_qos_prio_map`: Optional[Dict[Any, Any]] — Expected priority-to-TC map (e.g. {0: 0, 1: 1}; keys may be int or str in config). Checked per device when set.<br>- `expected_qos_pfc_enabled`: Optional[int] — Expected PFC enabled value (0/1 or bitmask). Checked per device when set.<br>- `expected_qos_tsa_map`: Optional[Dict[Any, Any]] — Expected TSA map for ETS (e.g. {0: 'ets', 1: 'strict'}; keys may be int or str in config). Checked per device when set.<br>- `expected_qos_tc_bandwidth`: Optional[List[int]] — Expected TC bandwidth percentages. Checked per device when set.<br>- `require_qos_consistent_across_adapters`: bool — When True and no expected_qos_* are set, require all adapters to have the same prio_map, pfc_enabled, and tsa_map.<br>- `nicctl_log_error_regex`: Optional[List[Dict[str, Any]]] — Optional list of error patterns for nicctl show card logs. | **Collection Args:**<br>- `commands`: Optional[List[str]] — Optional list of niccli/nicctl commands to run. When None, default command set is used.<br>- `use_sudo_niccli`: bool — If True, run niccli commands with sudo when required.<br>- `use_sudo_nicctl`: bool — If True, run nicctl commands with sudo when required. | [NicDataModel](#NicDataModel-Model) | [NicCollector](#Collector-Class-NicCollector) | [NicAnalyzer](#Data-Analyzer-Class-NicAnalyzer) |
20+
| NicPlugin | niccli --listdev<br>niccli --list<br>niccli --list_devices<br>niccli -dev {device_num} nvm -getoption pcie_relaxed_ordering<br>niccli --dev {device_num} nvm --getoption pcie_relaxed_ordering<br>niccli -dev {device_num} nvm -getoption performance_profile<br>niccli --dev {device_num} nvm --getoption performance_profile<br>niccli -dev {device_num} nvm -getoption support_rdma -scope 0<br>niccli -dev {device_num} getqos<br>niccli --dev {device_num} nvm --getoption support_rdma<br>niccli --dev {device_num} qos --ets --show<br>niccli --version<br>nicctl show card<br>nicctl --version<br>nicctl show card flash partition --json<br>nicctl show card interrupts --json<br>nicctl show card logs --non-persistent<br>nicctl show card logs --boot-fault<br>nicctl show card logs --persistent<br>nicctl show card profile --json<br>nicctl show card time --json<br>nicctl show card statistics packet-buffer summary --json<br>nicctl show lif statistics --json<br>nicctl show lif internal queue-to-ud-pinning<br>nicctl show pipeline internal anomalies<br>nicctl show pipeline internal rsq-ring<br>nicctl show pipeline internal statistics memory<br>nicctl show port fsm<br>nicctl show port transceiver --json<br>nicctl show port statistics --json<br>nicctl show port internal mac<br>nicctl show qos headroom --json<br>nicctl show rdma queue --json<br>nicctl show rdma queue-pair --detail --json<br>nicctl show version firmware<br>nicctl show dcqcn<br>nicctl show environment<br>nicctl show lif<br>nicctl show pcie ats<br>nicctl show port<br>nicctl show qos<br>nicctl show rdma statistics<br>nicctl show version host-software<br>nicctl show dcqcn --card {card_id} --json<br>nicctl show card hardware-config --card {card_id} | **Analyzer Args:**<br>- `expected_values`: Optional[Dict[str, Dict[str, Any]]] — Per-command expected checks keyed by canonical key (see command_to_canonical_key).<br>- `performance_profile_expected`: str — Expected Broadcom performance_profile value (case-insensitive). Default RoCE.<br>- `support_rdma_disabled_values`: List[str] — Values that indicate RDMA is not supported (case-insensitive).<br>- `pcie_relaxed_ordering_expected`: str — Expected Broadcom pcie_relaxed_ordering value (e.g. 'Relaxed ordering = enabled'); checked case-insensitively. Defaul...<br>- `expected_qos_prio_map`: Optional[Dict[Any, Any]] — Expected priority-to-TC map (e.g. {0: 0, 1: 1}; keys may be int or str in config). Checked per device when set.<br>- `expected_qos_pfc_enabled`: Optional[int] — Expected PFC enabled value (0/1 or bitmask). Checked per device when set.<br>- `expected_qos_tsa_map`: Optional[Dict[Any, Any]] — Expected TSA map for ETS (e.g. {0: 'ets', 1: 'strict'}; keys may be int or str in config). Checked per device when set.<br>- `expected_qos_tc_bandwidth`: Optional[List[int]] — Expected TC bandwidth percentages. Checked per device when set.<br>- `require_qos_consistent_across_adapters`: bool — When True and no expected_qos_* are set, require all adapters to have the same prio_map, pfc_enabled, and tsa_map.<br>- `nicctl_log_error_regex`: Optional[List[Dict[str, Any]]] — Optional list of error patterns for nicctl show card logs. | **Collection Args:**<br>- `commands`: Optional[List[str]] — Optional list of niccli/nicctl commands to run. When None, default command set is used.<br>- `use_sudo_niccli`: bool — If True, run niccli commands with sudo when required.<br>- `use_sudo_nicctl`: bool — If True, run nicctl commands with sudo when required. | [NicDataModel](#NicDataModel-Model) | [NicCollector](#Collector-Class-NicCollector) | [NicAnalyzer](#Data-Analyzer-Class-NicAnalyzer) |
2121
| NvmePlugin | nvme smart-log {dev}<br>nvme error-log {dev} --log-entries=256<br>nvme id-ctrl {dev}<br>nvme id-ns {dev}{ns}<br>nvme fw-log {dev}<br>nvme self-test-log {dev}<br>nvme get-log {dev} --log-id=6 --log-len=512<br>nvme telemetry-log {dev} --output-file={dev}_{f_name}<br>nvme list -o json | - | - | [NvmeDataModel](#NvmeDataModel-Model) | [NvmeCollector](#Collector-Class-NvmeCollector) | - |
2222
| OsPlugin | sh -c '( lsb_release -ds &#124;&#124; (cat /etc/*release &#124; grep PRETTY_NAME) &#124;&#124; uname -om ) 2>/dev/null &#124; head -n1'<br>cat /etc/*release &#124; grep VERSION_ID<br>wmic os get Version /value<br>wmic os get Caption /Value | **Analyzer Args:**<br>- `exp_os`: Union[str, list] — Expected OS name/version string(s) to match (e.g. from lsb_release or /etc/os-release).<br>- `exact_match`: bool — If True, require exact match for exp_os; otherwise substring match. | - | [OsDataModel](#OsDataModel-Model) | [OsCollector](#Collector-Class-OsCollector) | [OsAnalyzer](#Data-Analyzer-Class-OsAnalyzer) |
2323
| PackagePlugin | dnf list --installed<br>dpkg-query -W<br>pacman -Q<br>cat /etc/*release<br>wmic product get name,version | **Analyzer Args:**<br>- `exp_package_ver`: Dict[str, Optional[str]] — Map package name -> expected version (None = any version). Checked against installed packages.<br>- `regex_match`: bool — If True, match package versions with regex; otherwise exact or prefix match.<br>- `rocm_regex`: Optional[str] — Optional regex to identify ROCm package version (used when enable_rocm_regex is True).<br>- `enable_rocm_regex`: bool — If True, use rocm_regex (or default pattern) to extract ROCm version for checks. | - | [PackageDataModel](#PackageDataModel-Model) | [PackageCollector](#Collector-Class-PackageCollector) | [PackageAnalyzer](#Data-Analyzer-Class-PackageAnalyzer) |
@@ -439,10 +439,135 @@ Collect raw output from niccli (Broadcom) and nicctl (Pensando) commands.
439439

440440
**Link to code**: [nic_collector.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/nic/nic_collector.py)
441441

442+
### Class Variables
443+
444+
- **CMD_NICCLI_VERSION**: `niccli --version`
445+
- **CMD_NICCLI_LIST**: `niccli --list`
446+
- **CMD_NICCLI_LIST_DEVICES**: `niccli --list_devices`
447+
- **CMD_NICCLI_LIST_DEVICES_LEGACY**: `niccli --listdev`
448+
- **CMD_NICCLI_DISCOVERY_LEGACY**: `['niccli --listdev', 'niccli --list']`
449+
- **CMD_NICCLI_DISCOVERY_NEW**: `['niccli --list_devices', 'niccli --list']`
450+
- **CMD_NICCLI_DISCOVERY**: `['niccli --listdev', 'niccli --list']`
451+
- **CMD_NICCLI_DISCOVERY_ALL**: `frozenset({'niccli --listdev', 'niccli --list_devices', 'niccli --list'})`
452+
- **CMD_NICCLI_SUPPORT_RDMA_TEMPLATE_LEGACY**: `niccli -dev {device_num} nvm -getoption support_rdma -scope 0`
453+
- **CMD_NICCLI_PERFORMANCE_PROFILE_TEMPLATE_LEGACY**: `niccli -dev {device_num} nvm -getoption performance_profile`
454+
- **CMD_NICCLI_PCIE_RELAXED_ORDERING_TEMPLATE_LEGACY**: `niccli -dev {device_num} nvm -getoption pcie_relaxed_ordering`
455+
- **CMD_NICCLI_QOS_TEMPLATE_LEGACY**: `niccli -dev {device_num} getqos`
456+
- **CMD_NICCLI_PER_DEVICE_LEGACY**: `[
457+
niccli -dev {device_num} nvm -getoption support_rdma -scope 0,
458+
niccli -dev {device_num} nvm -getoption performance_profile,
459+
niccli -dev {device_num} nvm -getoption pcie_relaxed_ordering,
460+
niccli -dev {device_num} getqos
461+
]`
462+
- **CMD_NICCLI_SUPPORT_RDMA_TEMPLATE_NEW**: `niccli --dev {device_num} nvm --getoption support_rdma`
463+
- **CMD_NICCLI_PERFORMANCE_PROFILE_TEMPLATE_NEW**: `niccli --dev {device_num} nvm --getoption performance_profile`
464+
- **CMD_NICCLI_PCIE_RELAXED_ORDERING_TEMPLATE_NEW**: `niccli --dev {device_num} nvm --getoption pcie_relaxed_ordering`
465+
- **CMD_NICCLI_QOS_TEMPLATE_NEW**: `niccli --dev {device_num} qos --ets --show`
466+
- **CMD_NICCLI_PER_DEVICE_NEW**: `[
467+
niccli --dev {device_num} nvm --getoption support_rdma,
468+
niccli --dev {device_num} nvm --getoption performance_profile,
469+
niccli --dev {device_num} nvm --getoption pcie_relaxed_ordering,
470+
niccli --dev {device_num} qos --ets --show
471+
]`
472+
- **CMD_NICCLI_SUPPORT_RDMA_TEMPLATE**: `niccli -dev {device_num} nvm -getoption support_rdma -scope 0`
473+
- **CMD_NICCLI_PERFORMANCE_PROFILE_TEMPLATE**: `niccli -dev {device_num} nvm -getoption performance_profile`
474+
- **CMD_NICCLI_PCIE_RELAXED_ORDERING_TEMPLATE**: `niccli -dev {device_num} nvm -getoption pcie_relaxed_ordering`
475+
- **CMD_NICCLI_PER_DEVICE**: `[
476+
niccli -dev {device_num} nvm -getoption support_rdma -scope 0,
477+
niccli -dev {device_num} nvm -getoption performance_profile,
478+
niccli -dev {device_num} nvm -getoption pcie_relaxed_ordering,
479+
niccli -dev {device_num} getqos
480+
]`
481+
- **CMD_NICCTL_CARD_TEXT**: `nicctl show card`
482+
- **CMD_NICCTL_GLOBAL**: `[
483+
nicctl --version,
484+
nicctl show card flash partition --json,
485+
nicctl show card interrupts --json,
486+
nicctl show card logs --non-persistent,
487+
nicctl show card logs --boot-fault,
488+
nicctl show card logs --persistent,
489+
nicctl show card profile --json,
490+
nicctl show card time --json,
491+
nicctl show card statistics packet-buffer summary --json,
492+
nicctl show lif statistics --json,
493+
nicctl show lif internal queue-to-ud-pinning,
494+
nicctl show pipeline internal anomalies,
495+
nicctl show pipeline internal rsq-ring,
496+
nicctl show pipeline internal statistics memory,
497+
nicctl show port fsm,
498+
nicctl show port transceiver --json,
499+
nicctl show port statistics --json,
500+
nicctl show port internal mac,
501+
nicctl show qos headroom --json,
502+
nicctl show rdma queue --json,
503+
nicctl show rdma queue-pair --detail --json,
504+
nicctl show version firmware
505+
]`
506+
- **CMD_NICCTL_PER_CARD**: `['nicctl show dcqcn --card {card_id} --json', 'nicctl show card hardware-config --card {card_id}']`
507+
- **CMD_NICCTL_LEGACY_TEXT**: `[
508+
nicctl show card,
509+
nicctl show dcqcn,
510+
nicctl show environment,
511+
nicctl show lif,
512+
nicctl show pcie ats,
513+
nicctl show port,
514+
nicctl show qos,
515+
nicctl show rdma statistics,
516+
nicctl show version host-software
517+
]`
518+
442519
### Provides Data
443520

444521
NicDataModel
445522

523+
### Commands
524+
525+
- niccli --listdev
526+
- niccli --list
527+
- niccli --list_devices
528+
- niccli -dev {device_num} nvm -getoption pcie_relaxed_ordering
529+
- niccli --dev {device_num} nvm --getoption pcie_relaxed_ordering
530+
- niccli -dev {device_num} nvm -getoption performance_profile
531+
- niccli --dev {device_num} nvm --getoption performance_profile
532+
- niccli -dev {device_num} nvm -getoption support_rdma -scope 0
533+
- niccli -dev {device_num} getqos
534+
- niccli --dev {device_num} nvm --getoption support_rdma
535+
- niccli --dev {device_num} qos --ets --show
536+
- niccli --version
537+
- nicctl show card
538+
- nicctl --version
539+
- nicctl show card flash partition --json
540+
- nicctl show card interrupts --json
541+
- nicctl show card logs --non-persistent
542+
- nicctl show card logs --boot-fault
543+
- nicctl show card logs --persistent
544+
- nicctl show card profile --json
545+
- nicctl show card time --json
546+
- nicctl show card statistics packet-buffer summary --json
547+
- nicctl show lif statistics --json
548+
- nicctl show lif internal queue-to-ud-pinning
549+
- nicctl show pipeline internal anomalies
550+
- nicctl show pipeline internal rsq-ring
551+
- nicctl show pipeline internal statistics memory
552+
- nicctl show port fsm
553+
- nicctl show port transceiver --json
554+
- nicctl show port statistics --json
555+
- nicctl show port internal mac
556+
- nicctl show qos headroom --json
557+
- nicctl show rdma queue --json
558+
- nicctl show rdma queue-pair --detail --json
559+
- nicctl show version firmware
560+
- nicctl show dcqcn
561+
- nicctl show environment
562+
- nicctl show lif
563+
- nicctl show pcie ats
564+
- nicctl show port
565+
- nicctl show qos
566+
- nicctl show rdma statistics
567+
- nicctl show version host-software
568+
- nicctl show dcqcn --card {card_id} --json
569+
- nicctl show card hardware-config --card {card_id}
570+
446571
## Collector Class NvmeCollector
447572

448573
### Description

docs/generate_plugin_doc_bundle.py

Lines changed: 3 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -439,17 +439,9 @@ def generate_plugin_table_rows(plugins: List[type]) -> List[List[str]]:
439439
an = get_attr(p, "ANALYZER", None)
440440
args = get_attr(p, "ANALYZER_ARGS", None)
441441
collector_args_cls = get_attr(p, "COLLECTOR_ARGS", None)
442-
cmds = []
442+
cmds: List[str] = []
443443
if inspect.isclass(col):
444-
cmds += extract_cmds_from_classvars(col)
445-
seen = set()
446-
uniq = []
447-
for c in cmds:
448-
key = " ".join(c.split())
449-
if key not in seen:
450-
seen.add(key)
451-
uniq.append(c)
452-
cmds = uniq
444+
cmds = extract_cmds_from_classvars(col)
453445

454446
# Extract regexes and args from analyzer
455447
regex_and_args = []
@@ -505,16 +497,8 @@ def render_collector_section(col: type, link_base: str, rel_root: Optional[str])
505497
dm = get_attr(col, "DATA_MODEL", None)
506498
s += md_header("Provides Data", 3) + (f"{dm.__name__}\n\n" if inspect.isclass(dm) else "-\n\n")
507499

508-
cmds = []
509-
cmds += extract_cmds_from_classvars(col)
500+
cmds = extract_cmds_from_classvars(col)
510501
if cmds:
511-
seen, uniq = set(), []
512-
for c in cmds:
513-
key = " ".join(c.split())
514-
if key not in seen:
515-
seen.add(key)
516-
uniq.append(c)
517-
cmds = uniq
518502
s += md_header("Commands", 3) + md_list(cmds)
519503

520504
return s

0 commit comments

Comments
 (0)