monitoring items
monitoring items
monitoring item list of the host
classification | key | description | data type | unit |
host.gpuCount | total number of GPUs | unsigned int | ||
host.gpuUsed | number of GPUs used | unsigned int | ||
GPU | host.gpu.0.name | The product name of the first GPU device, the number in the middle is the number, starting from 0, the same below | string | |
host.gpu.0.busId | The tuple domain:bus:device.function PCI identifier | string | ||
host.gpu.0.memTotal | Total physical device memory | unsigned long long | B | |
host.gpu.0.memFree | Unallocated device memory | unsigned long long | B | |
host.gpu.0.memUsed | Sum of Reserved and Allocated device memory | unsigned long long | B | |
host.gpu.0.gpuUtilization | gpu utilization:Percent of time over the past sample period during which one or more kernels was executing on the GPU | unsigned int | ||
host.gpu.0.memUtilization | mem utilization:Percent of time over the past sample period during which global (device) memory was being read or written | unsigned int | ||
host.gpu.0.powerUsage | the power usage information in milliwatts | unsigned int | milliwatt | |
host.gpu.0.powerCap | the power management limit in milliwatts | unsigned int | milliwatt | |
host.gpu.0.temperature | the current temperature readings for the device | unsigned int | degrees C | |
virtual machine | host.vmCount | total number of virtual machines | unsigned int | |
host.vmRunning | number of running virtual machines | unsigned int | ||
CPU | host.cpuUsage | CPU usage | float | % |
memory | host.memTotal | total memory | unsigned long long | KB |
host.memFree | free memory | unsigned long long | KB | |
host.memUsage | memory usage | float | % | |
flow | host.rxFlow | receive total flow | long long | B |
host.txFlow | total flow sent | long long | B | |
data disk | host.diskTotal | data disk capacity | unsigned long long | KB |
host.diskFree | data disk free capacity | unsigned long long | KB | |
host.diskUsage | data disk usage | float | % | |
host.diskMountStatus | The mount status of the data disk,"lost" or "normal" | string | ||
load average | host.loadAverage.1 | Average load over the past 1 minute | float | |
host.loadAverage.5 | Average load over the past 5 minute | float | ||
host.loadAverage.15 | Average load over the past 15 minute | float | ||
host.dbcVersion | DBC version number | string |
monitoring item list of the virtual machine
classification | key | description | data type | unit | |
virtual machine monitoring | basic information | dom.state | the running state of virtual machine, such as running | string | |
dom.maxMem | the maximum memory in KBytes allowed | unsigned int | KB | ||
dom.memory | the memory in KBytes used by the domain | unsigned int | KB | ||
dom.nrVirtCpu | the number of virtual CPUs for the domain | unsigned int | |||
dom.cpuTime | the CPU time used in nanoseconds | unsigned long long | |||
dom.cpuUsage | average CPU usage | float | % | ||
memory information | memory.total | memory total | unsigned long long | KB | |
memory.unused | real-time memory unused | unsigned long long | KB | ||
memory.available | real-time memory avaliable | unsigned long long | KB | ||
memory.usage | real-time memory usage | float | % | ||
disk information | disk.0.name | The name of the first disk, the number in the middle is the number, starting from 0, the same below | string | ||
disk.0.capacity | logical size in bytes of the image (how much storage the guest will see) | unsigned long long | KB | ||
disk.0.allocation | host storage in bytes occupied by the image (such as highest allocated extent if there are no holes, similar to 'du') | unsigned long long | KB | ||
disk.0.physical | host physical size in bytes of the image container (last offset, similar to 'ls') | unsigned long long | KB | ||
disk.0.rd_req | number of read requests | long long | |||
disk.0.rd_bytes | number of read bytes | long long | B | ||
disk.0.wr_req | number of write requests | long long | |||
disk.0.wr_bytes | number of written bytes | long long | B | ||
disk.0.errs | In Xen this returns the mysterious 'oo_req' | long long | |||
disk.0.rd_speed | disk average read speed | float | B/s | ||
disk.0.wr_speed | disk average write speed | float | B/s | ||
internet information | net.0.name | The name of the first network card, the number in the middle is the number, starting from 0, the same below | string | ||
net.0.rx_bytes | bytes received | long long | B | ||
net.0.rx_packets | received package | long long | |||
net.0.rx_errs | long long | ||||
net.0.rx_drop | long long | ||||
net.0.tx_bytes | bytes sent | long long | B | ||
net.0.tx_packets | package sent | long long | |||
net.0.tx_errs | long long | ||||
net.0.tx_drop | long long | ||||
net.0.rx_speed | average reception speed | float | B/s | ||
net.0.tx_speed | average sending speed | float | B/s | ||
GPU | gpu.graphicsDriverVersion | the version of the system's graphics driver | string | ||
gpu.nvmlVersion | the version of the NVML library | string | |||
gpu.cudaVersion | the version of the CUDA driver | string | |||
gpu.0.name | The product name of the first GPU device, the number in the middle is the number, starting from 0, the same below | string | |||
gpu.0.busId | The tuple domain:bus:device.function PCI identifier | string | |||
gpu.0.memTotal | Total physical device memory | unsigned long long | B | ||
gpu.0.memFree | Unallocated device memory | unsigned long long | B | ||
gpu.0.memUsed | Sum of Reserved and Allocated device memory | unsigned long long | B | ||
gpu.0.gpuUtilization | gpu utilization:Percent of time over the past sample period during which one or more kernels was executing on the GPU | unsigned int | |||
gpu.0.memUtilization | mem utilization:Percent of time over the past sample period during which global (device) memory was being read or written | unsigned int | |||
gpu.0.powerUsage | the power usage information in milliwatts | unsigned int | milliwatt | ||
gpu.0.powerCap | the power management limit in milliwatts | unsigned int | milliwatt | ||
gpu.0.temperature | the current temperature readings for the device | unsigned int | degrees C | ||
protocol | version | version number of dbc | string |
gpu monitoring must read
Video card monitoring must read Because of the isolation of the graphics card device on the host computer, dbc cannot directly obtain the specific information of the graphics card. Therefore, based on the qemu guest agent, we integrated the functions of the NVIDIA Management Library and implemented a set of independent services, namely the dbc guest agent, which obtains the detailed information of the graphics card in the virtual machine through communication with the virtual machine.
For custom images, to monitor graphics card information, please install the dbc guest agent service inside the virtual machine.
- Ubuntu virtual machine installation script: http://112.192.16.27:9000/dbc_guest_agent/install.sh
- Windows 64-bit virtual machine installer: http://112.192.16.27:9000/dbc_guest_agent/qemu-ga-x86_64.msi
注意!
Graphics card monitoring currently only supports NVIDIA graphics cards.
The graphics card monitor can only see the graphics card devices that have been used by the virtual machine.
Calculation of usage and speed
- CPU usage = (cpuTime2 - cpuTime1) / (realTime2 - realTime1) / Number of CPUs
- memory usage = (total - unused) / total
- disk average read speed = (rd_bytes2 - rd_bytes1) / (realTime2 - realTime1)
- average reception speed = (rx_bytes2 - rx_bytes1) / (realTime2 - realTime1)
注意!
When the interval between two data collections is very long, such as once every minute, the disk read/write speed and network transmission speed can only represent the average speed, not the real-time speed.