-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Recently I added more monitoring in the software I am developing and noticed in storage module the following sequence of errors during listing bucket:
Logs:
2024-02-05 07:54:58 UTC [ERROR] a transient error has happened: Get "https://storage.googleapis.com/storage/v1/...&versions=false": read tcp IP_A:49888->IP_B:443: read: connection reset by peer
2024-02-05 07:54:58 UTC [ERROR] a transient error has happened: Get "https://storage.googleapis.com/storage/v1/...&versions=false": read tcp IP_A:49888->IP_B:443: read: connection reset by peer
// here is non transient error and it is unpacked:
2024-02-05 07:54:58 UTC [INFO] [component:storage] type: "<*net.OpError Value>", value: "read tcp IP_A:49888->IP_B:443: read: connection reset by peer"
2024-02-05 07:54:58 UTC [INFO] [component:storage] type: "<*os.SyscallError Value>", value: "read: connection reset by peer"
2024-02-05 07:54:58 UTC [INFO] [component:storage] type: "<syscall.Errno Value>", value: "connection reset by peer"
2024-02-05 07:54:58 UTC [INFO] [component:storage] error is not unpackable
// the error is propagated further
2024-02-05 07:54:58 UTC [ERROR] ... error while listing the source bucket: error while listing 'gs://...': read tcp IP_A:49888->IP_B:443: read: connection reset by peer
Google client creation
client, err := storage.NewClient(ctx, options...)
// ...
client.SetRetry(storage.WithPolicy(storage.RetryAlways))
client.SetRetry(storage.WithErrorFunc(customErrorFunc)
c = &CloudClient{client: client}
_, err := c.ListPrefix(...)
if err != nil {
return fmt. Errorf("error while listing %q: %w", ..., err)
}
Code for custom retry function:
func unpackError(err error) {
if err == nil {
return
}
rValue := reflect.ValueOf(err)
logger.Infof("type: %q, value: %q", rValue.String(), err.Error())
if e, ok := err.(interface{ Unwrap() error }); ok {
unpackError(e.Unwrap())
} else {
logger.Infoln("error is not unpackable")
}
}
func customErrorFunc(err error) bool {
isRetry := storage.ShouldRetry(err)(err)
if isRetry {
logger.Error(TransientErrorf("a transient error has happened: %v", err))
} else {
unpackError(err)
}
return isRetry
}
Code for listing:
func (c CloudClient) ListPrefix(ctx context.Context, location *Location, timeout time.Duration) ([]File, error) {
ctx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
query := &storage.Query{
Prefix: strings.TrimRight(location.Path, "/") + "/",
Versions: false,
StartOffset: "",
EndOffset: "",
Projection: 0,
}
err := query.SetAttrSelection([]string{"Bucket", "Name", "Size", "MD5", "Updated"})
if err != nil {
logger.Fatalf(err.Error())
}
iter := c.client.Bucket(location.Bucket).Objects(ctx, query)
var result []File
for {
next, err := iter.Next()
if err == iterator.Done {
break
}
if err != nil {
return nil, fmt.Errorf("error while listing '%s': %w", location, err)
}
files = append(files, extractFile(next))
}
return files, nil
}
I have read the documentation which states that only url.Error
is considered a transient error in the case of "ECONNRESET". I suppose that the error is raised during reading (this is my assumption). I have also seen a similar (but not identical) discussion here: Azure/go-autorest#450. There is also a link to a Go standard library test, but it doesn't clarify things https://go.dev/src/net/net_test.go.
What is the goal of my issue?
From my perspective (and from a high-level view of this functionality), any error related to "ECONNRESET" should always be considered retryable. Could you explain whether this makes sense or if perhaps I am mistaken?