-
Notifications
You must be signed in to change notification settings - Fork 0
improvement/s3-bucket-storage #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…e spark s3a filsystem access.
…r s3a filesystem access, updates to config template to handle srebiuldd changes to arc driver paths
…yle. Documentation mentions arc_protection requirement.
…te space consumed by non indexed keys.
…ecution commands.
…ecution commands.
…ter PR merged root README.md should have example as well as SOFS_FSCK scripts need to be updated to set from config.
… the bucket without storing it first.
… the bucket without storing it first.
… the bucket without storing it first.
… the bucket without storing it first.
… the bucket without storing it first.
srebuild_single_path => srebuild_arc_path, srebuild_double_path => srebuild_arcdata_path. COS defined in the config-template.yml. revlookupid() relocated into p4, reduction of stored data and total lookups.
srebuild_single_path => srebuild_arc_path, srebuild_double_path => srebuild_arcdata_path. COS defined in the config-template.yml. revlookupid() relocated into p4, reduction of stored data and total lookups.
srebuild_single_path => srebuild_arc_path, srebuild_double_path => srebuild_arcdata_path. COS defined in the config-template.yml. revlookupid() relocated into p4, reduction of stored data and total lookups.
…tkeys as no longer a dataframe.
Actually ready for review. Listkeys also now writes via a dataframe instead of print, arc replica class of service is handled, and general cleanup. |
key = nnarc[1] | ||
lst.append(key.upper()) | ||
for ooarc in oarc: | ||
key = oarc[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickdos I didn't grok the marc, narc, oarc.
Should line 115 be key = oarc[1]
or key = ooarc[1]
like the mmarc and nnarc sections are?
klist.append([ k.rstrip().split(',')[i] for i in [0,1,2,3] ]) | ||
# data = [ k.rstrip().split(',')[i] for i in [0,1,2,3] ] | ||
# data = ",".join(data) | ||
# print >> f, data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this is beneficial to cluster memory requirements moving the write function from print() to dataframe.write().
- Does this mean the total storage required for listkeys is in memory until the write operation, increasing the ram required?
- Can we partition the dataframe beyond the total of 36 rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this is beneficial to cluster memory requirements moving the write function from print() to dataframe.write().
- Does this mean the total storage required for listkeys is in memory until the write operation, increasing the ram required?
Only see the cluster write out full files into temp (or at least the size of my lab doesn't show any partially written files). I suspect it does mean an increase in required memory to parse all keys before writing them to storage.
- Can we partition the dataframe beyond the total of 36 rows.
Partition of 180 only appears to write 36 files to _temporary and then moving the to the final storage path. So at first glance it seems like partitions are limited by the number of rows contained in a dataframe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.