这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@franciscovalentecastro
Copy link
Contributor

@franciscovalentecastro franciscovalentecastro commented Oct 24, 2025

Description

Implement LoggingProcessorParseMultilineRegex and LoggingProcessorParseRegexComplex in Otel Logging.

Details

  • Fixed saphana receiver incorrect use of type "int" which should be "integer".
  • Created logging-otel-receiver_kafka confgenerator test to validate resulting config.
  • Updated a lot of relevant transformation test goldens.

Related issue

b/440599473

How has this been tested?

Checklist:

  • Unit tests
    • Unit tests do not apply.
    • Unit tests have been added/modified and passed for this PR.
  • Integration tests
    • Integration tests do not apply.
    • Integration tests have been added/modified and passed for this PR.
  • Documentation
    • This PR introduces no user visible changes.
    • This PR introduces user visible changes and the corresponding documentation change has been made.
  • Minor version bump
    • This PR introduces no new features.
    • This PR introduces new features, and there is a separate PR to bump the minor version since the last release already.
    • This PR bumps the version.

@franciscovalentecastro franciscovalentecastro force-pushed the fcovalente-parse-multiline-regex branch from 3f4322a to 68b4a04 Compare November 11, 2025 18:54
@franciscovalentecastro franciscovalentecastro requested review from a team, hsmatulis and ridwanmsharif and removed request for a team and hsmatulis November 11, 2025 22:32
@franciscovalentecastro franciscovalentecastro force-pushed the fcovalente-parse-multiline-regex branch from 367d8fa to 6ac6e16 Compare November 13, 2025 01:51
@franciscovalentecastro franciscovalentecastro requested review from a team, avilevy18, jefferbrecht and quentinmit and removed request for a team, avilevy18 and ridwanmsharif November 13, 2025 16:30
@franciscovalentecastro franciscovalentecastro added the kokoro:force-run Forces kokoro to run integration tests on a CL label Nov 13, 2025
@stackdriver-instrumentation-release stackdriver-instrumentation-release removed the kokoro:force-run Forces kokoro to run integration tests on a CL label Nov 13, 2025

var exprParts []string
for _, r := range isFirstEntry {
exprParts = append(exprParts, fmt.Sprintf("body.message matches %q", r))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not build exprParts directly in the first loop and eliminate isFirstEntry altogether?

Copy link
Contributor Author

@franciscovalentecastro franciscovalentecastro Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real reason. In the other parse_multiline PR, we have a more complicated expressions setup, so it made more sense to do it in steps. I simplified it. Done!

// TODO: b/459877163 - Update implementation when opentelemetry supports "state-machine" multiline parsing.
if r.StateName == "start_state" {
isFirstEntry = append(isFirstEntry, r.Regex)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that by ignoring some states, there will be some possible log inputs that won't parse properly as multiline. (If that's not true, then we should be able to refactor the receivers to only have start_state.)

How do you want to approach testing for that gap? E.g. will you add/change transformation tests later along with b/459877163 to validate whichever edge cases wouldn't work today without a full state machine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the current uses of LoggingProcessorParseMultilineRegex 1 only set two states start_state and cont_state in a simplified manner such that start_state = FirstLogLineRegex and cont_state = Negation of "FirstLogLineRegex" (note : i've just double checked 1, also git grep -C 10 "start_state" helps).

ops-agent/apps/solr.go

Lines 84 to 94 in ccfedc9

Rules: []confgenerator.MultilineRule{
{
StateName: "start_state",
NextState: "cont",
Regex: `^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\.\d{3}\s[A-z]+\s{1,5}`,
},
{
StateName: "cont",
NextState: "cont",
Regex: `^(?!\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\.\d{3}\s[A-z]+\s{1,5})`,
},

This implies all current uses of LoggingProcessorParseMultilineRegex can be fully replicated by only setting "is_first_entry" in otel logging.

Proposed Refactor

We could refactor LoggingProcessorParseMultilineRegex to only be able to set a start_state and then set cont_state programatically as the "negation" of the "start_state". This will enforce this simplified use of multiline features.

How do you want to approach testing for that gap? E.g. will you add/change transformation tests later along with b/459877163 to validate whichever edge cases wouldn't work today without a full state machine?

Re @jefferbrecht

It depends. What do you think of the Proposed Refactor ?

If we refactor LoggingProcessorParseMultilineRegex to only set a start_state, then there won't be any feature gaps and the 3P app receiver tests are good enough for this.

If we don't refactor it, creating a "transformation_test" would be artificial since we would need to create a "test processor" that uses all the feature of the "state-machine". A "transformation_test" needs a "registered" processor to be able to add to a pipeline.

Footnotes

  1. https://github.com/search?q=repo%3AGoogleCloudPlatform%2Fops-agent%20LoggingProcessorParseMultilineRegex&type=code 2

Copy link
Contributor Author

@franciscovalentecastro franciscovalentecastro Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NVM, not "all" 3P app receiver implementations use the simplified set of start_state and cont_state. Though it's only mysql_slow and elasticsearch_json the ones that use a more complicated (not too much) set of regexes.

How do you want to approach testing for that gap? E.g. will you add/change transformation tests later along with b/459877163 to validate whichever edge cases wouldn't work today without a full state machine?

The tests of mysql_slow and elasticsearch_json can serve to compare with the use of all "state-machine" like features.

See draft refactor : dc991f6#diff-7def08d2dee0c2606af18bb82d03c649a7ece83d1fb913fcfb800c6558eb942e

@franciscovalentecastro franciscovalentecastro force-pushed the fcovalente-parse-multiline-regex branch from 6ac6e16 to 67d3e0e Compare November 13, 2025 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants