这是indexloc提供的服务,不要输入任何密码
Skip to content

POC JSONTransformWriter

What does this MR do and why?

This MR introduces JSONTransformWriter is an io.Writer that performs context-aware replacements in streaming JSON data. It identifies a specific JSON key and replaces occurrences of a from value with a to value only when they appear as the value for that key.

The writer handles JSON with arbitrary whitespace formatting and correctly distinguishes between keys and values, avoiding false matches within string values or comments.

Example that transforms: {"tarball": "https://old.com/package.tgz"} Into: {"tarball":"https://new.com/package.tgz"}

tw := jsonstream.NewJSONTransformWriter(w, "old.com", "new.com", `"tarball"`)
defer tw.Close()
io.Copy(tw, reader)

References

NPM virtual registry: investigate metadata endp... (#549781)

Screenshots or screen recordings

No.

How to set up and validate locally

Basically, we could try to transform any JSON files that are proxied through Workhorse.

In my case I have npm metadata cache with JSON file that I want to transform.

File content
{"name"=>"@packages/sample",
 "versions"=>
  {"7.0.0"=>
    {"dist"=>{"shasum"=>"b7fe281f2dfc22a653b4cd34dd9c54906642fb26", "tarball"=>"http://gdk.test:3000/api/v4/projects/19/packages/npm/@packages/sample/-/@packages/sample-7.0.0.tgz"},
     "name"=>"@packages/sample",
     "version"=>"7.0.0"},
   "5.0.0"=>
    {"dist"=>{"shasum"=>"60cfe19157b0018dc4380648ad3bfc7ff1cc192e", "tarball"=>"http://gdk.test:3000/api/v4/projects/19/packages/npm/@packages/sample/-/@packages/sample-5.0.0.tgz"},
     "name"=>"@packages/sample",
     "version"=>"5.0.0"},
   "6.0.0"=>
    {"dist"=>{"shasum"=>"6e132459ea1f11ec3db2b4acca4e35d95070d09a", "tarball"=>"http://gdk.test:3000/api/v4/projects/19/packages/npm/@packages/sample/-/@packages/sample-6.0.0.tgz"},
     "name"=>"@packages/sample",
     "version"=>"6.0.0"}},
 "dist-tags"=>{"latest"=>"7.0.0"}}

I'll tell Workhorse to replace the host from http://gdk.test:3000 to https://registry.npmjs.org when it proxies the response.

workhorse diff
diff --git a/workhorse/internal/sendurl/sendurl.go b/workhorse/internal/sendurl/sendurl.go
index f372b8544d74..abd49ecf4ba7 100644
--- a/workhorse/internal/sendurl/sendurl.go
+++ b/workhorse/internal/sendurl/sendurl.go
@@ -15,6 +15,7 @@ import (
        "github.com/prometheus/client_golang/prometheus"
        "github.com/prometheus/client_golang/prometheus/promauto"
 
+       "gitlab.com/gitlab-org/gitlab/workhorse/internal/jsonstream"
        "gitlab.com/gitlab-org/gitlab/workhorse/internal/metrics"
 
        "gitlab.com/gitlab-org/labkit/mask"
@@ -149,6 +150,9 @@ func (e *entry) Inject(w http.ResponseWriter, r *http.Request, sendData string)
                return
        }
        params.RestrictForwardedResponseHeaders.ForwardResponseHeaders(w, resp, preserveHeaderKeys, params.ResponseHeaders)
+       // Remove Content-Length header since we're transforming the response
+       // and the size will change
+       w.Header().Del("Content-Length")
        w.WriteHeader(resp.StatusCode)
 
        defer func() {
@@ -212,7 +216,14 @@ func (e *entry) handleRequestError(w http.ResponseWriter, r *http.Request, err e
 }
 
 func (e *entry) streamResponse(w http.ResponseWriter, body io.Reader) error {
-       n, err := io.Copy(newFlushingResponseWriter(w), body)
+       // Create the flushing writer for immediate client delivery
+       fw := newFlushingResponseWriter(w)
+
+       // Wrap with transformer to modify tarball URLs
+       transformer := jsonstream.NewJSONTransformWriter(fw, "http://gdk.test:3000", "https://registry.npmjs.org", `"tarball"`)
+       defer transformer.Close()
+
+       n, err := io.Copy(transformer, body)
        sendURLBytes.Add(float64(n))
        return err
 }

The last thing I need to do is to have proxy_download enabled in the object store configuration:

object_store:
  proxy_download: true

Then I trigger the HTTP request to get the metadata

$ curl -vv --header "PRIVATE-TOKEN: <Token>" http://gdk.test:3000/api/v4/projects/19/packages/npm/@packages%2Fsample

The host in tarball should be updated:

Updated response body
{
  "dist-tags": {
    "latest": "7.0.0"
  },
  "name": "@packages/sample",
  "versions": {
    "5.0.0": {
      "dist": {
        "shasum": "60cfe19157b0018dc4380648ad3bfc7ff1cc192e",
        "tarball": "https://registry.npmjs.org:3000/api/v4/projects/19/packages/npm/@packages/sample/-/@packages/sample-5.0.0.tgz"
      },
      "name": "@packages/sample",
      "version": "5.0.0"
    },
    "6.0.0": {
      "dist": {
        "shasum": "6e132459ea1f11ec3db2b4acca4e35d95070d09a",
        "tarball": "https://registry.npmjs.org:3000/api/v4/projects/19/packages/npm/@packages/sample/-/@packages/sample-6.0.0.tgz"
      },
      "name": "@packages/sample",
      "version": "6.0.0"
    },
    "7.0.0": {
      "dist": {
        "shasum": "b7fe281f2dfc22a653b4cd34dd9c54906642fb26",
        "tarball": "https://registry.npmjs.org:3000/api/v4/projects/19/packages/npm/@packages/sample/-/@packages/sample-7.0.0.tgz"
      },
      "name": "@packages/sample",
      "version": "7.0.0"
    }
  }
}

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #549781

Edited by Dzmitry (Dima) Meshcharakou

Merge request reports

Loading