这是indexloc提供的服务,不要输入任何密码
Skip to content

storage: Why "ECONNRESET on read" is not transient error  #9478

@0rps

Description

@0rps

Recently I added more monitoring in the software I am developing and noticed in storage module the following sequence of errors during listing bucket:

Logs:

2024-02-05 07:54:58 UTC [ERROR] a transient error has happened: Get "https://storage.googleapis.com/storage/v1/...&versions=false": read tcp IP_A:49888->IP_B:443: read: connection reset by peer
2024-02-05 07:54:58 UTC [ERROR] a transient error has happened: Get "https://storage.googleapis.com/storage/v1/...&versions=false": read tcp IP_A:49888->IP_B:443: read: connection reset by peer

// here is non transient error and it is unpacked:
2024-02-05 07:54:58 UTC [INFO] [component:storage] type: "<*net.OpError Value>", value: "read tcp IP_A:49888->IP_B:443: read: connection reset by peer"
2024-02-05 07:54:58 UTC [INFO] [component:storage] type: "<*os.SyscallError Value>", value: "read: connection reset by peer"
2024-02-05 07:54:58 UTC [INFO] [component:storage] type: "<syscall.Errno Value>", value: "connection reset by peer"
2024-02-05 07:54:58 UTC [INFO] [component:storage] error is not unpackable

// the error is propagated further 
2024-02-05 07:54:58 UTC [ERROR] ... error while listing the source bucket: error while listing 'gs://...': read tcp IP_A:49888->IP_B:443: read: connection reset by peer

Google client creation

client, err := storage.NewClient(ctx, options...)  
// ...
client.SetRetry(storage.WithPolicy(storage.RetryAlways))  
client.SetRetry(storage.WithErrorFunc(customErrorFunc)
c = &CloudClient{client: client}
_, err := c.ListPrefix(...)
if err != nil {
  return fmt. Errorf("error while listing %q: %w", ..., err)
}

Code for custom retry function:

func unpackError(err error) {  
    if err == nil {  
       return  
    }  
    rValue := reflect.ValueOf(err)  
    logger.Infof("type: %q, value: %q", rValue.String(), err.Error())  
  
    if e, ok := err.(interface{ Unwrap() error }); ok {  
       unpackError(e.Unwrap())  
    } else {  
       logger.Infoln("error is not unpackable")  
    }  
}  

func customErrorFunc(err error) bool {  
       isRetry := storage.ShouldRetry(err)(err)  
       if isRetry {  
          logger.Error(TransientErrorf("a transient error has happened: %v", err))  
       } else {  
          unpackError(err)  
       }  
       return isRetry    
}

Code for listing:

func (c CloudClient) ListPrefix(ctx context.Context, location *Location, timeout time.Duration) ([]File, error) {  
    ctx, cancel := context.WithTimeout(ctx, timeout)  
    defer cancel()  
  
    query := &storage.Query{  
       Prefix:      strings.TrimRight(location.Path, "/") + "/",  
       Versions:    false,  
       StartOffset: "",  
       EndOffset:   "",  
       Projection:  0,  
    }  
  
    err := query.SetAttrSelection([]string{"Bucket", "Name", "Size", "MD5", "Updated"})  
    if err != nil {  
       logger.Fatalf(err.Error())  
    }  
  
    iter := c.client.Bucket(location.Bucket).Objects(ctx, query)  
    var result []File  
  
    for {  
       next, err := iter.Next()  
       if err == iterator.Done {  
          break  
       }  
  
       if err != nil {  
          return nil, fmt.Errorf("error while listing '%s': %w", location, err)  
       }  
		files = append(files, extractFile(next))
    }  
  
    return files, nil  
}

I have read the documentation which states that only url.Error is considered a transient error in the case of "ECONNRESET". I suppose that the error is raised during reading (this is my assumption). I have also seen a similar (but not identical) discussion here: Azure/go-autorest#450. There is also a link to a Go standard library test, but it doesn't clarify things https://go.dev/src/net/net_test.go.

What is the goal of my issue?
From my perspective (and from a high-level view of this functionality), any error related to "ECONNRESET" should always be considered retryable. Could you explain whether this makes sense or if perhaps I am mistaken?

Metadata

Metadata

Assignees

Labels

api: storageIssues related to the Cloud Storage API.type: questionRequest for information or clarification. Not an issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions