monitoring items

About 6 min

monitoring items

monitoring item list of the host

classificationkeydescriptiondata typeunit
host.gpuCounttotal number of GPUsunsigned int
host.gpuUsednumber of GPUs usedunsigned int
GPUhost.gpu.0.nameThe product name of the first GPU device, the number in the middle is the number, starting from 0, the same belowstring
host.gpu.0.busIdThe tuple domain:bus:device.function PCI identifierstring
host.gpu.0.memTotalTotal physical device memoryunsigned long longB
host.gpu.0.memFreeUnallocated device memoryunsigned long longB
host.gpu.0.memUsedSum of Reserved and Allocated device memoryunsigned long longB
host.gpu.0.gpuUtilizationgpu utilization:Percent of time over the past sample period during which one or more kernels was executing on the GPUunsigned int
host.gpu.0.memUtilizationmem utilization:Percent of time over the past sample period during which global (device) memory was being read or writtenunsigned int
host.gpu.0.powerUsagethe power usage information in milliwattsunsigned intmilliwatt
host.gpu.0.powerCapthe power management limit in milliwattsunsigned intmilliwatt
host.gpu.0.temperaturethe current temperature readings for the deviceunsigned intdegrees C
virtual machinehost.vmCounttotal number of virtual machinesunsigned int
host.vmRunningnumber of running virtual machinesunsigned int
CPUhost.cpuUsageCPU usagefloat%
memoryhost.memTotaltotal memoryunsigned long longKB
host.memFreefree memoryunsigned long longKB
host.memUsagememory usagefloat%
flowhost.rxFlowreceive total flowlong longB
host.txFlowtotal flow sentlong longB
data diskhost.diskTotaldata disk capacityunsigned long longKB
host.diskFreedata disk free capacityunsigned long longKB
host.diskUsagedata disk usagefloat%
host.diskMountStatusThe mount status of the data disk,"lost" or "normal"string
load averagehost.loadAverage.1Average load over the past 1 minutefloat
host.loadAverage.5Average load over the past 5 minutefloat
host.loadAverage.15Average load over the past 15 minutefloat
host.dbcVersionDBC version numberstring

monitoring item list of the virtual machine

classificationkeydescriptiondata typeunit
virtual machine monitoringbasic informationdom.statethe running state of virtual machine, such as runningstring
dom.maxMemthe maximum memory in KBytes allowedunsigned intKB
dom.memorythe memory in KBytes used by the domainunsigned intKB
dom.nrVirtCputhe number of virtual CPUs for the domainunsigned int
dom.cpuTimethe CPU time used in nanosecondsunsigned long long
dom.cpuUsageaverage CPU usagefloat%
memory informationmemory.totalmemory totalunsigned long longKB
memory.unusedreal-time memory unusedunsigned long longKB
memory.availablereal-time memory avaliableunsigned long longKB
memory.usagereal-time memory usagefloat%
disk informationdisk.0.nameThe name of the first disk, the number in the middle is the number, starting from 0, the same belowstring
disk.0.capacitylogical size in bytes of the image (how much storage the guest will see)unsigned long longKB
disk.0.allocationhost storage in bytes occupied by the image (such as highest allocated extent if there are no holes, similar to 'du')unsigned long longKB
disk.0.physicalhost physical size in bytes of the image container (last offset, similar to 'ls')unsigned long longKB
disk.0.rd_reqnumber of read requestslong long
disk.0.rd_bytesnumber of read byteslong longB
disk.0.wr_reqnumber of write requestslong long
disk.0.wr_bytesnumber of written byteslong longB
disk.0.errsIn Xen this returns the mysterious 'oo_req'long long
disk.0.rd_speeddisk average read speedfloatB/s
disk.0.wr_speeddisk average write speedfloatB/s
internet informationnet.0.nameThe name of the first network card, the number in the middle is the number, starting from 0, the same belowstring
net.0.rx_bytesbytes receivedlong longB
net.0.rx_packetsreceived packagelong long
net.0.rx_errslong long
net.0.rx_droplong long
net.0.tx_bytesbytes sentlong longB
net.0.tx_packetspackage sentlong long
net.0.tx_errslong long
net.0.tx_droplong long
net.0.rx_speedaverage reception speedfloatB/s
net.0.tx_speedaverage sending speedfloatB/s
GPUgpu.graphicsDriverVersionthe version of the system's graphics driverstring
gpu.nvmlVersionthe version of the NVML librarystring
gpu.cudaVersionthe version of the CUDA driverstring
gpu.0.nameThe product name of the first GPU device, the number in the middle is the number, starting from 0, the same belowstring
gpu.0.busIdThe tuple domain:bus:device.function PCI identifierstring
gpu.0.memTotalTotal physical device memoryunsigned long longB
gpu.0.memFreeUnallocated device memoryunsigned long longB
gpu.0.memUsedSum of Reserved and Allocated device memoryunsigned long longB
gpu.0.gpuUtilizationgpu utilization:Percent of time over the past sample period during which one or more kernels was executing on the GPUunsigned int
gpu.0.memUtilizationmem utilization:Percent of time over the past sample period during which global (device) memory was being read or writtenunsigned int
gpu.0.powerUsagethe power usage information in milliwattsunsigned intmilliwatt
gpu.0.powerCapthe power management limit in milliwattsunsigned intmilliwatt
gpu.0.temperaturethe current temperature readings for the deviceunsigned intdegrees C
protocolversionversion number of dbcstring

gpu monitoring must read

Video card monitoring must read Because of the isolation of the graphics card device on the host computer, dbc cannot directly obtain the specific information of the graphics card. Therefore, based on the qemu guest agent, we integrated the functions of the NVIDIA Management Library and implemented a set of independent services, namely the dbc guest agent, which obtains the detailed information of the graphics card in the virtual machine through communication with the virtual machine.

For custom images, to monitor graphics card information, please install the dbc guest agent service inside the virtual machine.

注意!

  1. Graphics card monitoring currently only supports NVIDIA graphics cards.

  2. The graphics card monitor can only see the graphics card devices that have been used by the virtual machine.

Calculation of usage and speed

  • CPU usage = (cpuTime2 - cpuTime1) / (realTime2 - realTime1) / Number of CPUs
  • memory usage = (total - unused) / total
  • disk average read speed = (rd_bytes2 - rd_bytes1) / (realTime2 - realTime1)
  • average reception speed = (rx_bytes2 - rx_bytes1) / (realTime2 - realTime1)

注意!

When the interval between two data collections is very long, such as once every minute, the disk read/write speed and network transmission speed can only represent the average speed, not the real-time speed.

Loading...