DUNE-DAQ
DUNE Trigger and Data Acquisition software
|
The trigger record builder, by construction, controls the flow of information as it both starts a data loop and it closes it. Because of that it's naturally suited to prompt a lot of information. This page describes the metrics in details and outlines some typical situations and how they can be recognised by the metrics.
Metrics are grouped in caterogies whose logic reflects in the different ways they are sampled. See the dedicated paragraphs for the details.
These are metrics that report instantaneous glimps of the status of the TRB. They are
In normal conditions these metrics are usually low. That is because the system completes TR contruction much faster than how the system probes the metrics. Yet, it's not an error if these values are bigger than zero, in fact it is expected to be so once we move from regular trigger into proper random requests.
As the name suggests, these are metrics that monitor error conditions, to allow a quick recognition of undesirable events without having to look at the logs. If these metrics are not zero, detailed errors shall be present in the logs as well. Due to the importance of these events, these metrics are resetted only at the start of a new run and during the run they keep accumulating the error conditions.
The list is
trigger number
, run number
and sequence number
. If different trigger decisions come in bearing the same identifier, the TR cannot be created even if the timestamp are different. In that case the trigger decision is dropped, again causing hypotetical data to be lost. Please note that keeping tracks of all the past TR decisions it's not efficient, so if a TR is send out and later another one with the same ID is received, it will not be discarded: this is still an error condition, but it will not be flagged by the TRB, not in metrics, nor in the logs.stop
is called, the present TRs are sent to writing. In case the push is not possible because the queue is full, the system does not wait for the queue to be free as this would delay the completition of the stop transition, so the TRs are deleted. If that happens this counter keeps track of this behaviour. The number of lost fragments is also increased as well according to the number of fragments contained in the deleted TR.In a well configured run, the most likely error condition is obtained when fragments are late, and the signature is lost fragments
= unexpected fragments
!= 0
. Yet, because of the time the metrics are set, during the run this manifests with unepxected fragments
< lost fragments
since a fragments can be flagged as lost as soon as their TR times out, while fragements can only be flagged as unexpected when they are received. Using only metrics, the proper understanding of what happened during the run can only be determined once stop is called and, even then, assuming that the stop didn't prevent all the late fragments to be received and be properly flagged as unexpected. Of course the logs will flag the details of the situation during the run, without delay.
These are quantities evaluated over relatively short time intervals (seconds). Specifically they are calculated between the calls of get_info()
. The list is
In normal conditions the average time per trigger is smaller than the TR timout. In non-busy conditions, that can go down to the sleep time set for the loop.
The sleep and loop counters are design to monitor how busy is the TRB. In tests performed so far, the sleep counter far outnumber the loop counter since the operations are trivial. Once the events size will grow, this might change.
These are counters that are increasing across the run and they are used to cross check if messages and data are correctly received between modules. For the TSB, the counters are: received trigger decisions: the number of valid trigger decisions that are received during the run. Valid means that the correponding TR will be created and the corresponding data requests are sent. generated trigger records: the number of trigger records that are pushed into the output queue.