improvement/s3-bucket-storage #1

ghost · 2021-09-18T19:58:30Z

No description provided.

…e spark s3a filsystem access.

…r s3a filesystem access, updates to config template to handle srebiuldd changes to arc driver paths

…ect key.

…yle. Documentation mentions arc_protection requirement.

…te space consumed by non indexed keys.

…ecution commands.

…leted.

…ter PR merged root README.md should have example as well as SOFS_FSCK scripts need to be updated to set from config.

… the bucket without storing it first.

srebuild_single_path => srebuild_arc_path, srebuild_double_path => srebuild_arcdata_path. COS defined in the config-template.yml. revlookupid() relocated into p4, reduction of stored data and total lookups.

…tkeys as no longer a dataframe.

ghost · 2021-09-30T00:58:46Z

Actually ready for review. Listkeys also now writes via a dataframe instead of print, arc replica class of service is handled, and general cleanup.

ghost · 2021-09-30T01:04:34Z

scripts/S3_FSCK/s3_fsck_p0.py

+        key = nnarc[1]
+        lst.append(key.upper())
+    for ooarc in oarc:
+        key = oarc[1]


@patrickdos I didn't grok the marc, narc, oarc.

Should line 115 be key = oarc[1] or key = ooarc[1] like the mmarc and nnarc sections are?

ghost · 2021-09-30T17:32:00Z

scripts/listkey.py

+            klist.append([ k.rstrip().split(',')[i] for i in [0,1,2,3] ])
+            # data = [ k.rstrip().split(',')[i] for i in [0,1,2,3] ]
+            # data = ",".join(data)
+            # print >> f, data


Not sure this is beneficial to cluster memory requirements moving the write function from print() to dataframe.write().

Does this mean the total storage required for listkeys is in memory until the write operation, increasing the ram required?

Can we partition the dataframe beyond the total of 36 rows.

Not sure this is beneficial to cluster memory requirements moving the write function from print() to dataframe.write().

Does this mean the total storage required for listkeys is in memory until the write operation, increasing the ram required?

Only see the cluster write out full files into temp (or at least the size of my lab doesn't show any partially written files). I suspect it does mean an increase in required memory to parse all keys before writing them to storage.

Can we partition the dataframe beyond the total of 36 rows.

Partition of 180 only appears to write 36 files to _temporary and then moving the to the final storage path. So at first glance it seems like partitions are limited by the number of rows contained in a dataframe.

patrickdos

LGTM

trevorbenson added 30 commits September 15, 2021 14:58

Enable s3 bucket storage. Listkeys uses s3fs while S3_FSCK uses nativ…

a0c994e

…e spark s3a filsystem access.

Changes for Dockefiles slimming container, including jars required fo…

4b63e8c

…r s3a filesystem access, updates to config template to handle srebiuldd changes to arc driver paths

First pass at obtaining the arc data keys metadata to extract the obj…

5805eeb

…ect key.

implement struct.unpack to access the object key.

d8a01ff

Added missing import

3e1cc63

Adjust data.append to inside loop

fa3e357

Fix indentation

1d60ae3

Missing base64 module

919f8a5

Indentation again from pasting from scratchpad.

276ab1e

print to diagnose where match is not working.

f9e7384

print to diagnose where match is not working.

dedfd04

unstringify the regex search bariable

b0c18ef

extra print to see if pattern ever matches.

986036d

change regex search to if not to isolate why regex is not matching.

f678c40

change regex search to if not to isolate why regex is not matching.

f218fb2

change regex search to if not to isolate why regex is not matching.

5d0e635

fix CHUNK_STATUS_OK to CHUNKAPI_STATUS_OK.

6033aaf

Fixing mixed indentation lengths.

445ca45

debug cleanup

bf5c4ac

Fixing mixed indentation lengths

999dc43

Fixing mixed indentation lengths

1d130cf

Fixing mixed indentation lengths

b409ca6

Fixing mixed indentation lengths

4f245c3

Cleanup prints for debugging.

fbeece6

Update to documentation

8ba1df3

Update to documentation

175bb18

normalizing cfg/cmdline args to uppercase. Using same double quote st…

2d897e8

…yle. Documentation mentions arc_protection requirement.

Fixing indentation after normalization

ebbed3f

Fixing indentation after normalization

5ea72ff

Implementing protection and arcdatakeypattern to calculate the accura…

e7beb24

…te space consumed by non indexed keys.

trevorbenson added 16 commits September 20, 2021 16:41

remove extra print status and store NOK for failure in REVLOOKUPID

0299ec5

remove extra print status and store NOK for failure in REVLOOKUPID

5b7c30b

Update readme for s3_fsck_pX files.

b569376

Update readme for s3_fsck_pX files and warning on changing cluster ex…

67392e2

…ecution commands.

Update readme for s3_fsck_pX files and warning on changing cluster ex…

03f23b7

…ecution commands.

Readme script directory/name fixes.

8dfbcc2

dataframe.index not working as described

c62fd5e

Debug prints to isolate where join drops 90% of keys.

958f76f

df.show() has no len(), reverting debug prints.

8eb22c2

remove filter and drop to diagnose why keep ending up with 50 keys de…

e188598

…leted.

Adding arc_protection to the config-template.yml, used by S3_FSCK. Af…

0e4ddae

…ter PR merged root README.md should have example as well as SOFS_FSCK scripts need to be updated to set from config.

Documentation update which shows how to stream the keys.txt file into…

77a74d2

… the bucket without storing it first.

Documentation update which shows how to stream the keys.txt file into…

d8555a2

… the bucket without storing it first.

Documentation update which shows how to stream the keys.txt file into…

2885609

… the bucket without storing it first.

Documentation update which shows how to stream the keys.txt file into…

ba65b5f

… the bucket without storing it first.

Documentation update which shows how to stream the keys.txt file into…

3419c27

… the bucket without storing it first.

patrickdos approved these changes Sep 27, 2021

View reviewed changes

trevorbenson added 5 commits September 29, 2021 15:43

BREAKING CHANGES: config.yml parameters renamed!

76c3383

srebuild_single_path => srebuild_arc_path, srebuild_double_path => srebuild_arcdata_path. COS defined in the config-template.yml. revlookupid() relocated into p4, reduction of stored data and total lookups.

BREAKING CHANGES: config.yml parameters renamed!

aa4a93e

srebuild_single_path => srebuild_arc_path, srebuild_double_path => srebuild_arcdata_path. COS defined in the config-template.yml. revlookupid() relocated into p4, reduction of stored data and total lookups.

BREAKING CHANGES: config.yml parameters renamed!

3ced5d1

srebuild_single_path => srebuild_arc_path, srebuild_double_path => srebuild_arcdata_path. COS defined in the config-template.yml. revlookupid() relocated into p4, reduction of stored data and total lookups.

Write via dataframes to the bucket.

6dc7176

Unify filename/path for both protocols and print node status from lis…

498516a

…tkeys as no longer a dataframe.

ghost requested a review from patrickdos September 30, 2021 00:57

ghost commented Sep 30, 2021

View reviewed changes

ghost marked this pull request as ready for review October 1, 2021 14:37

patrickdos approved these changes Oct 1, 2021

View reviewed changes

ghost merged commit fa2cf1a into master Oct 1, 2021

ghost deleted the improvement/s3-bucket-storage branch October 1, 2021 15:26

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

improvement/s3-bucket-storage #1

improvement/s3-bucket-storage #1

Uh oh!

ghost commented Sep 18, 2021

Uh oh!

ghost commented Sep 30, 2021

Uh oh!

ghost Sep 30, 2021

Uh oh!

ghost Sep 30, 2021

Uh oh!

ghost Oct 1, 2021

Uh oh!

patrickdos left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

improvement/s3-bucket-storage #1

improvement/s3-bucket-storage #1

Uh oh!

Conversation

ghost commented Sep 18, 2021

Uh oh!

ghost commented Sep 30, 2021

Uh oh!

ghost Sep 30, 2021

Choose a reason for hiding this comment

Uh oh!

ghost Sep 30, 2021

Choose a reason for hiding this comment

Uh oh!

ghost Oct 1, 2021

Choose a reason for hiding this comment

Uh oh!

patrickdos left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant