monitoring items

About 6 min

monitoring items

monitoring item list of the host

classification	key	description	data type	unit
	host.gpuCount	total number of GPUs	unsigned int
	host.gpuUsed	number of GPUs used	unsigned int
GPU	host.gpu.0.name	The product name of the first GPU device, the number in the middle is the number, starting from 0, the same below	string
	host.gpu.0.busId	The tuple domain:bus:device.function PCI identifier	string
	host.gpu.0.memTotal	Total physical device memory	unsigned long long	B
	host.gpu.0.memFree	Unallocated device memory	unsigned long long	B
	host.gpu.0.memUsed	Sum of Reserved and Allocated device memory	unsigned long long	B
	host.gpu.0.gpuUtilization	gpu utilization：Percent of time over the past sample period during which one or more kernels was executing on the GPU	unsigned int
	host.gpu.0.memUtilization	mem utilization：Percent of time over the past sample period during which global (device) memory was being read or written	unsigned int
	host.gpu.0.powerUsage	the power usage information in milliwatts	unsigned int	milliwatt
	host.gpu.0.powerCap	the power management limit in milliwatts	unsigned int	milliwatt
	host.gpu.0.temperature	the current temperature readings for the device	unsigned int	degrees C
virtual machine	host.vmCount	total number of virtual machines	unsigned int
virtual machine	host.vmRunning	number of running virtual machines	unsigned int
CPU	host.cpuUsage	CPU usage	float	%
memory	host.memTotal	total memory	unsigned long long	KB
	host.memFree	free memory	unsigned long long	KB
	host.memUsage	memory usage	float	%
flow	host.rxFlow	receive total flow	long long	B
flow	host.txFlow	total flow sent	long long	B
data disk	host.diskTotal	data disk capacity	unsigned long long	KB
	host.diskFree	data disk free capacity	unsigned long long	KB
	host.diskUsage	data disk usage	float	%
	host.diskMountStatus	The mount status of the data disk，"lost" or "normal"	string
load average	host.loadAverage.1	Average load over the past 1 minute	float
	host.loadAverage.5	Average load over the past 5 minute	float
	host.loadAverage.15	Average load over the past 15 minute	float
	host.dbcVersion	DBC version number	string

monitoring item list of the virtual machine

	classification	key	description	data type	unit
virtual machine monitoring	basic information	dom.state	the running state of virtual machine, such as running	string
		dom.maxMem	the maximum memory in KBytes allowed	unsigned int	KB
		dom.memory	the memory in KBytes used by the domain	unsigned int	KB
		dom.nrVirtCpu	the number of virtual CPUs for the domain	unsigned int
		dom.cpuTime	the CPU time used in nanoseconds	unsigned long long
		dom.cpuUsage	average CPU usage	float	%
	memory information	memory.total	memory total	unsigned long long	KB
		memory.unused	real-time memory unused	unsigned long long	KB
		memory.available	real-time memory avaliable	unsigned long long	KB
		memory.usage	real-time memory usage	float	%
	disk information	disk.0.name	The name of the first disk, the number in the middle is the number, starting from 0, the same below	string
		disk.0.capacity	logical size in bytes of the image (how much storage the guest will see)	unsigned long long	KB
		disk.0.allocation	host storage in bytes occupied by the image (such as highest allocated extent if there are no holes, similar to 'du')	unsigned long long	KB
		disk.0.physical	host physical size in bytes of the image container (last offset, similar to 'ls')	unsigned long long	KB
		disk.0.rd_req	number of read requests	long long
		disk.0.rd_bytes	number of read bytes	long long	B
		disk.0.wr_req	number of write requests	long long
		disk.0.wr_bytes	number of written bytes	long long	B
		disk.0.errs	In Xen this returns the mysterious 'oo_req'	long long
		disk.0.rd_speed	disk average read speed	float	B/s
		disk.0.wr_speed	disk average write speed	float	B/s
	internet information	net.0.name	The name of the first network card, the number in the middle is the number, starting from 0, the same below	string
		net.0.rx_bytes	bytes received	long long	B
		net.0.rx_packets	received package	long long
		net.0.rx_errs		long long
		net.0.rx_drop		long long
		net.0.tx_bytes	bytes sent	long long	B
		net.0.tx_packets	package sent	long long
		net.0.tx_errs		long long
		net.0.tx_drop		long long
		net.0.rx_speed	average reception speed	float	B/s
		net.0.tx_speed	average sending speed	float	B/s
	GPU	gpu.graphicsDriverVersion	the version of the system's graphics driver	string
		gpu.nvmlVersion	the version of the NVML library	string
		gpu.cudaVersion	the version of the CUDA driver	string
		gpu.0.name	The product name of the first GPU device, the number in the middle is the number, starting from 0, the same below	string
		gpu.0.busId	The tuple domain:bus:device.function PCI identifier	string
		gpu.0.memTotal	Total physical device memory	unsigned long long	B
		gpu.0.memFree	Unallocated device memory	unsigned long long	B
		gpu.0.memUsed	Sum of Reserved and Allocated device memory	unsigned long long	B
		gpu.0.gpuUtilization	gpu utilization：Percent of time over the past sample period during which one or more kernels was executing on the GPU	unsigned int
		gpu.0.memUtilization	mem utilization：Percent of time over the past sample period during which global (device) memory was being read or written	unsigned int
		gpu.0.powerUsage	the power usage information in milliwatts	unsigned int	milliwatt
		gpu.0.powerCap	the power management limit in milliwatts	unsigned int	milliwatt
		gpu.0.temperature	the current temperature readings for the device	unsigned int	degrees C
	protocol	version	version number of dbc	string

gpu monitoring must read

Video card monitoring must read Because of the isolation of the graphics card device on the host computer, dbc cannot directly obtain the specific information of the graphics card. Therefore, based on the qemu guest agent, we integrated the functions of the NVIDIA Management Library and implemented a set of independent services, namely the dbc guest agent, which obtains the detailed information of the graphics card in the virtual machine through communication with the virtual machine.

For custom images, to monitor graphics card information, please install the dbc guest agent service inside the virtual machine.

Ubuntu virtual machine installation script: http://112.192.16.27:9000/dbc_guest_agent/install.shopen in new window
Windows 64-bit virtual machine installer: http://112.192.16.27:9000/dbc_guest_agent/qemu-ga-x86_64.msiopen in new window

注意！

Graphics card monitoring currently only supports NVIDIA graphics cards.
The graphics card monitor can only see the graphics card devices that have been used by the virtual machine.

Calculation of usage and speed

CPU usage = (cpuTime2 - cpuTime1) / (realTime2 - realTime1) / Number of CPUs
memory usage = (total - unused) / total
disk average read speed = (rd_bytes2 - rd_bytes1) / (realTime2 - realTime1)
average reception speed = (rx_bytes2 - rx_bytes1) / (realTime2 - realTime1)

注意！

When the interval between two data collections is very long, such as once every minute, the disk read/write speed and network transmission speed can only represent the average speed, not the real-time speed.

monitoring items

# monitoring items

# monitoring item list of the host

# monitoring item list of the virtual machine

# gpu monitoring must read

# Calculation of usage and speed

monitoring items

monitoring item list of the host

monitoring item list of the virtual machine

gpu monitoring must read

Calculation of usage and speed