-
Notifications
You must be signed in to change notification settings - Fork 60
Feature Request: Record and Replay #1606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
======== | ||
|
||
Recording is a function allowing to record a trace, called a *Trace Record*, of | ||
the activity of the system while it is executing. Replay is a function which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can be more precise here.
Instead of "system", I would propose "constrained set of FEO activities". This implies we can ignore the process and task-chain stuff for Replay/Simulation except for some scheduling constraints (like order and triggers).
Instead of "activity", we could write "in- and outputs". While this still ignores the internal state aspect, I'd argue any solution turns the internal state into "in- and output" anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ad "system"
I am not sure if we can ignore process and task-chains. As soon as you have more than one FEO activity, you need to know how these are chained, don't you. You can replicate that for the replay/reprocessing situation "out-of-line" with the actual deployment, but somehow things need to be connected. But I agree, system might to too lazy terminology.
Ad "activity vs in- and outputs"
I agree to that. In replay you can only observe inputs and outputs. Only in reprocessing, you can also observe all execution-time aspects.
|
||
A trace can be recorded from Reprocessing in a similar way as from a SUR. That | ||
Trace Record can be used for both use-cases, replay and reprocessing. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an additional aspect: This allows to verify if an activity behaves deterministically. Reprocess it two or more times and check if the outputs are identical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, assuming the entire reprocessing machinery behaves deterministically in relevant aspects.
reprocessing. | ||
* The Trace Record shall resemble the time-order of messages as seen by FEO at | ||
execution time. (Do we need that requirement?) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be useful to trim recordings. Examples:
- Remove outputs to save space (assuming you are sufficiently confident about the determinism of the SUR)
- Remove confidential parts and give recordings to third-parties
- Cut out pieces by time. An application engineer records 3h of driving data, then you only use minute 73 for your regression test suite as it contains an interesting situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take that as a yes, we want that requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was not related to the "Do we need that requirement?" question. It is a proposal for an additional bullet point. Like this:
* The Trace Record format shall support efficient cutting by time, removal of individual signals, and removal of individual ports. |
internal state needed to reestablish its state for initialization for | ||
reprocessing. | ||
* The Trace Record shall resemble the time-order of messages as seen by FEO at | ||
execution time. (Do we need that requirement?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be a pointless restriction to me. How does it derive from the Rationale?
Maybe something else is behind this requirement: We need good performance such that the data format matters? Then we should rather require that specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, which restriction are you referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The restriction is "resemble the time-order of messages as seen by FEO". At least, I understand this is a restriction on the trace record data format.
Functionally, we only need to ability to recreate the time-order but there is no need to store it in that order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, understood. I will clarify.
|
||
|trace| | ||
|
||
* Initialization information shall only be written once to the Trace Record. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is useful to have more than a single initialization info just like movie formats have regular key frames. It allows to skip forwards and backwards efficiently. With hours of recordings that can save a lot of time (or compute power).
It is possible to work with a single init info though but we should not restrict ourselves at the requirements level here. At least, I don't see how it follows from the Rationale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid. My motivation was to keep the trace files smaller. We can request the condition/frequency at which internal state is written to the recording to be a configuration option of the recording system.
reprocessing. | ||
* The Trace Record shall resemble the time-order of messages as seen by FEO at | ||
execution time. (Do we need that requirement?) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For compatibility reasons a self-descriptive file format seems useful to me. For example, ProtoBuf is not self-descriptive because you always need to know the schema from somewhere else. In my experience it is tough to establish the necessary discipline to ensure that in large projects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add that, but I don't know if that is technically possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It certainly does conflict with other needs like keeping the trace files small.
JSON and XML are the most well-known self-descriptive data formats but they are not suitable here. There are binary alternatives.
information is available in the Trace Record, shall be restored to that state. | ||
* All output interfaces shall publish in reprocessing such that a **reprocessed | ||
trace** can be recorded. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some visualization tools support "live" use. For example, feeding virtual data into the SUR. Currently, I don't see any danger that we violate it but maybe making that an explicit requirement could become useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You refer to tooling visualizing the live output of the SUR, right? Since the requirements states that the usual output interfaces shall be active also in reprocessing, I would conclude, that any for-deployment tooling (i.e., in-car tooling) also can be connected to a reprocessing session. Does that work for you?
* Activities (replace with the correct term from FEO denoting an **execution | ||
unit**) shall be able and required to register (with a recording daemon) their | ||
internal state needed to reestablish its state for initialization for | ||
reprocessing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok to leave the serialization format unspecified?
Somehow it should be ensured that IPC communication and recording format will stay consistent. Otherwise trace recording might eventually require a separate serialization and that would probably a prohibitive performance cost at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional serialization costs are to be avoided. Maybe we need to further discuss how to streamline that in the best possible way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we could do no serialization at all at runtime and shift all of that to some post-processing.
|record| | ||
|
||
* **Input:** The blue box depicts all parts of the larger system which generates | ||
inputs consumed by the SUR via its input interfaces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly are the "input interfaces"? Naively it would just be communication / IPC.
However, there are other inputs which are relevant for determinism: What about time, file systems, ConfigProvider, floating point accuracy, shared memory, GPU scheduling, CPU info, GPIO, POSIX signal handlers, scheduling jitter, build configuration, Key-Value-Storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are trade-offs to be made. Let's discuss.
inputs, executes on them, and generates its outputs. While executing, the SUR | ||
might acquire internal state. | ||
* **Recorder:** The Recorder is a function able to read inputs, outputs, and | ||
possibly the internal state of the SUR and writes them time-ordered to a file, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we want to record internal state, in practice it might mean "not really all" internal state. Developers often need the ability to control what internal state is recorded and what is not. Of course, that requires a careful assessment to stay deterministic.
The essential question is the default. Record everything by default (risking performance) or only record explicitly marked things (risking replay correctness).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further down there is the requirement
Activities shall be able and required to register (with a recording daemon) their internal state needed to reestablish its state for initialization for reprocessing.
As you say, it's specific to the activity what needs recording. I would leave that up to the developers of that activity.
I'm not really familiar with S-CORE bureaucracy, but since this pull request isn't merged yet, it missed some kind of deadline for feature proposals, didn't it? |
Has this PR something todo with the Feature Request here: #1167? |
Added to the feature request. @johannes-esr can you please take a look |
No description provided.