+
Skip to content

Conversation

ghost
Copy link

@ghost ghost commented Sep 18, 2021

No description provided.

trevorbenson added 30 commits September 15, 2021 14:58
…r s3a filesystem access, updates to config template to handle srebiuldd changes to arc driver paths
…yle. Documentation mentions arc_protection requirement.
trevorbenson added 16 commits September 20, 2021 16:41
…ter PR merged root README.md should have example as well as SOFS_FSCK scripts need to be updated to set from config.
trevorbenson added 5 commits September 29, 2021 15:43
    srebuild_single_path => srebuild_arc_path,
    srebuild_double_path => srebuild_arcdata_path.
    COS defined in the config-template.yml.

revlookupid() relocated into p4, reduction of stored data and total lookups.
    srebuild_single_path => srebuild_arc_path,
    srebuild_double_path => srebuild_arcdata_path.
    COS defined in the config-template.yml.

revlookupid() relocated into p4, reduction of stored data and total lookups.
    srebuild_single_path => srebuild_arc_path,
    srebuild_double_path => srebuild_arcdata_path.
    COS defined in the config-template.yml.

revlookupid() relocated into p4, reduction of stored data and total lookups.
@ghost ghost requested a review from patrickdos September 30, 2021 00:57
@ghost
Copy link
Author

ghost commented Sep 30, 2021

Actually ready for review. Listkeys also now writes via a dataframe instead of print, arc replica class of service is handled, and general cleanup.

key = nnarc[1]
lst.append(key.upper())
for ooarc in oarc:
key = oarc[1]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickdos I didn't grok the marc, narc, oarc.

Should line 115 be key = oarc[1] or key = ooarc[1] like the mmarc and nnarc sections are?

klist.append([ k.rstrip().split(',')[i] for i in [0,1,2,3] ])
# data = [ k.rstrip().split(',')[i] for i in [0,1,2,3] ]
# data = ",".join(data)
# print >> f, data
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is beneficial to cluster memory requirements moving the write function from print() to dataframe.write().

  1. Does this mean the total storage required for listkeys is in memory until the write operation, increasing the ram required?
  2. Can we partition the dataframe beyond the total of 36 rows.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is beneficial to cluster memory requirements moving the write function from print() to dataframe.write().

  1. Does this mean the total storage required for listkeys is in memory until the write operation, increasing the ram required?

Only see the cluster write out full files into temp (or at least the size of my lab doesn't show any partially written files). I suspect it does mean an increase in required memory to parse all keys before writing them to storage.

  1. Can we partition the dataframe beyond the total of 36 rows.

Partition of 180 only appears to write 36 files to _temporary and then moving the to the final storage path. So at first glance it seems like partitions are limited by the number of rows contained in a dataframe.

@ghost ghost marked this pull request as ready for review October 1, 2021 14:37
Copy link

@patrickdos patrickdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ghost ghost merged commit fa2cf1a into master Oct 1, 2021
@ghost ghost deleted the improvement/s3-bucket-storage branch October 1, 2021 15:26
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载