POC JSONTransformWriter
What does this MR do and why?
This MR introduces JSONTransformWriter is an io.Writer that performs
context-aware replacements in streaming JSON data. It identifies a specific
JSON key and replaces occurrences of a from value with a to value only
when they appear as the value for that key.
The writer handles JSON with arbitrary whitespace formatting and correctly
distinguishes between keys and values, avoiding false matches within string
values or comments.
Example that transforms: {"tarball": "https://old.com/package.tgz"} Into: {"tarball":"https://new.com/package.tgz"}
tw := jsonstream.NewJSONTransformWriter(w, "old.com", "new.com", `"tarball"`)
defer tw.Close()
io.Copy(tw, reader)
References
NPM virtual registry: investigate metadata endp... (#549781)
Screenshots or screen recordings
No.
How to set up and validate locally
Basically, we could try to transform any JSON files that are proxied through Workhorse.
In my case I have npm metadata cache with JSON file that I want to transform.
File content
{"name"=>"@packages/sample",
"versions"=>
{"7.0.0"=>
{"dist"=>{"shasum"=>"b7fe281f2dfc22a653b4cd34dd9c54906642fb26", "tarball"=>"http://gdk.test:3000/api/v4/projects/19/packages/npm/@packages/sample/-/@packages/sample-7.0.0.tgz"},
"name"=>"@packages/sample",
"version"=>"7.0.0"},
"5.0.0"=>
{"dist"=>{"shasum"=>"60cfe19157b0018dc4380648ad3bfc7ff1cc192e", "tarball"=>"http://gdk.test:3000/api/v4/projects/19/packages/npm/@packages/sample/-/@packages/sample-5.0.0.tgz"},
"name"=>"@packages/sample",
"version"=>"5.0.0"},
"6.0.0"=>
{"dist"=>{"shasum"=>"6e132459ea1f11ec3db2b4acca4e35d95070d09a", "tarball"=>"http://gdk.test:3000/api/v4/projects/19/packages/npm/@packages/sample/-/@packages/sample-6.0.0.tgz"},
"name"=>"@packages/sample",
"version"=>"6.0.0"}},
"dist-tags"=>{"latest"=>"7.0.0"}}
I'll tell Workhorse to replace the host from http://gdk.test:3000 to https://registry.npmjs.org when it proxies the response.
workhorse diff
diff --git a/workhorse/internal/sendurl/sendurl.go b/workhorse/internal/sendurl/sendurl.go
index f372b8544d74..abd49ecf4ba7 100644
--- a/workhorse/internal/sendurl/sendurl.go
+++ b/workhorse/internal/sendurl/sendurl.go
@@ -15,6 +15,7 @@ import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
+ "gitlab.com/gitlab-org/gitlab/workhorse/internal/jsonstream"
"gitlab.com/gitlab-org/gitlab/workhorse/internal/metrics"
"gitlab.com/gitlab-org/labkit/mask"
@@ -149,6 +150,9 @@ func (e *entry) Inject(w http.ResponseWriter, r *http.Request, sendData string)
return
}
params.RestrictForwardedResponseHeaders.ForwardResponseHeaders(w, resp, preserveHeaderKeys, params.ResponseHeaders)
+ // Remove Content-Length header since we're transforming the response
+ // and the size will change
+ w.Header().Del("Content-Length")
w.WriteHeader(resp.StatusCode)
defer func() {
@@ -212,7 +216,14 @@ func (e *entry) handleRequestError(w http.ResponseWriter, r *http.Request, err e
}
func (e *entry) streamResponse(w http.ResponseWriter, body io.Reader) error {
- n, err := io.Copy(newFlushingResponseWriter(w), body)
+ // Create the flushing writer for immediate client delivery
+ fw := newFlushingResponseWriter(w)
+
+ // Wrap with transformer to modify tarball URLs
+ transformer := jsonstream.NewJSONTransformWriter(fw, "http://gdk.test:3000", "https://registry.npmjs.org", `"tarball"`)
+ defer transformer.Close()
+
+ n, err := io.Copy(transformer, body)
sendURLBytes.Add(float64(n))
return err
}
The last thing I need to do is to have proxy_download enabled in the object store configuration:
object_store:
proxy_download: true
Then I trigger the HTTP request to get the metadata
$ curl -vv --header "PRIVATE-TOKEN: <Token>" http://gdk.test:3000/api/v4/projects/19/packages/npm/@packages%2Fsample
The host in tarball should be updated:
Updated response body
{
"dist-tags": {
"latest": "7.0.0"
},
"name": "@packages/sample",
"versions": {
"5.0.0": {
"dist": {
"shasum": "60cfe19157b0018dc4380648ad3bfc7ff1cc192e",
"tarball": "https://registry.npmjs.org:3000/api/v4/projects/19/packages/npm/@packages/sample/-/@packages/sample-5.0.0.tgz"
},
"name": "@packages/sample",
"version": "5.0.0"
},
"6.0.0": {
"dist": {
"shasum": "6e132459ea1f11ec3db2b4acca4e35d95070d09a",
"tarball": "https://registry.npmjs.org:3000/api/v4/projects/19/packages/npm/@packages/sample/-/@packages/sample-6.0.0.tgz"
},
"name": "@packages/sample",
"version": "6.0.0"
},
"7.0.0": {
"dist": {
"shasum": "b7fe281f2dfc22a653b4cd34dd9c54906642fb26",
"tarball": "https://registry.npmjs.org:3000/api/v4/projects/19/packages/npm/@packages/sample/-/@packages/sample-7.0.0.tgz"
},
"name": "@packages/sample",
"version": "7.0.0"
}
}
}
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #549781