DUNE-DAQ
DUNE Trigger and Data Acquisition software
|
Before using opmonlib it is important to understand and define what needs to be monitored.
Monitorable objects can then be captured in a schema
file to create C++ structs using ProtoBuf
. Documentation and instructions on generating schema data structures and using ProtoBuf can be found in the ProtoBuf website. Relevant pages also include the description of the C++ API.
In general each .protobuf
file contains definitions of blocks that are published as single units. Each schema
file will generate a C++ header file containing the structures which hold the monitoring data, as defined in the .proto
file. Typically each module may only need one struct to hold its monitoring information; however it is possible to create multiple nested structs within a schema, which are filled by the same module.
It is preferred to organise the protobuf schemas in the following way:
schema/opmon
inside your repositoryopmon
namespaceHere is an example, taken from the DFOModule.proto
, which contains the schemas used by DFOModule plugin in dfmodules.
As a generic schema language, ProtoBuf
allows you to use simple types, but also lists, maps, etc. Be aware that apart from basic types and nested messages, other quantities are ignored by the monitoring system. An OpMonEntry
message is generated whenever a structure with at least one publishable field is passed to the publish
method, see next section.
The ProtoBuf
C++ API guide describes how to fill the structures you created. In order to publish the metric, the object has to be created from within a MonitorableObject
, see the header file. In particular, a DAQModule
is a MonitorableObject
. Two main functions are relevant for publishing:
publish
takes a ProtoBuf schema object, it timestamps it with the time of the function call, it serializes it (synchronously) and publishes it (asynchronously) via one of the configured OpMonFacilities. This function can be called at anytime.generate_opmon_data
is a function which the monitoring system calls regularly (order of seconds). Its default behaviour is 'null'. Every developer can freely implement this in their MonitorableObject in order to avoid setting up a thread to generate information regularly. Specific implementations are expected to call the publish
function to actually publish the metric.An example of metric publication is
Optional arguments of the publish
function, allow you to:
map<string, string>
, where the key is the type of the source, e.g. channel, and the second is the value, e.g. 4345opmon_id
of the caller MonitorableObject
for the specific metric with more detailed information on the source of this metricThe OpMonLevel
is a priority level designed to control the quantity of metrics generated by a tree. As a default, all messages are published. The lower the level, the higher the priority. They system can decide to entirely disable the metric publication regardless of the OpMonLevel. The system already provides some values to specify the OpMonLevel
via an enum:
but users are welcome to fill the gaps with whatever number they are happy to associate with their metric.
The usage of a custom origin is designed to provide information that is unrelated to software stack. While the software stack might change (e.g, the name of an application or of a module can change because of configuration), some information like a crate number or a channel are hardware related and they are independent of the software stack that provides this information. Examples of valid tags to be used in the custom origins are: server name, channel, links, etc. The value of a tag should not grow indefinitely for retrival efficiency in the database. So, things like run number should not become a custom origin. Adding information like application name or session in the custom origin is discouraged because it would be redundant. In the example above, you see an usage example where TriggerInfo
contains counters grouped by trigger type.
In order to work correctly, each MonitorableObject
has to be part of a monitoring tree, i.e. every MonitorablreObject
has to be registered to the chain. This is done via the method
If not registered, the metric is not completely lost, an ERS error will be generated reporting the usage of an unconfigured reporting system.
The metrics generated by the child will have an opmon_id
in the form parent_opmon_id.child_name
. The registration does not imply ownership of the child by the parent, as internally only weak pointers are utilised. If the child is destroyed, its pointer will eventually be removed from the chain.
DAQModule
s will be automatically registered by the application framework and developers have to write their code assuming that the module is registered in the monitoring tree from the moment of its creation. On the other hand, developers have to take care of the registration of subcomonents living inside their modules.
An example of registration is:
Notice that here the registraion is event driven: something triggers the creation of an object and it's possible to register the object at anytime. Of course a more static approach is possible too.
The registration does not imply ownership, so in order to unregister an object you just need to delete the shared pointer.
The configuration of opmonlib
is currently managed through the environment variables: DUNEDAQ_OPMON_INTERVAL
and DUNEDAQ_OPMON_LEVEL
. These can be seen further in Application.cpp
:
here, DUNEDAQ_OPMON_INTERVAL
sets the interval in seconds between each instance of calling generate_opmon_data
(currently defaulting to 10 seconds), and DUNEDAQ_OPMON_LEVEL
allows the user to define the level for generatre_opmon_data
(currently set to 1).