-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Problem
The aggregation service team has heard feedback (#67) from our partners about the difficulties they face in submitting job to aggregation service. This is due to the rigid input prefix requirements of the API that might not work well with the data organization pattern chosen by an adtech.
Currently, aggregation service's CreateJob API accepts a single value for the parameter input_data_blob_prefix
. This parameter refers to the input prefix where the aggregatable reports are stored. The aggregation job includes all the reports stored under this prefix hierarchy when generating the summary report.
This seems to be a problem in situations when only a subset of the reports stored under a prefix are intended to be used in an aggregation job, yet the said prefix is the only common prefix for all such reports. An example of such a situation can be found in issue #67.
The current workaround for this issue (suggested here) is that adtechs need to reorganize their reports such that all the reports under a given input prefix are only the ones that are intended to be aggregated in the given aggregation job. When adtech’s data organization pattern is different from their querying pattern, they need to either copy the reports or move them around to meet the above requirements. This is error prone, time consuming and, in case of report copying, also leads to higher storage costs.
Proposal
To address this problem, we are proposing the following changes to the CreateJob API.
- Introduction of a new field
input_data_blob_prefix_list
which would accept a list of input prefixes under which the aggregatable reports for a job are stored. Aggregation worker would read all reports stored under each of the prefixes provided in the list and include their contributions in the generated summary report. - This field would accept a list with a maximum size of 50 entries. This number can be increased in future based on adtech feedback.
- [Backwards compatibility]
- We would be introducing this field in a backwards compatible way. This will be an optional field in the current version of the API.
- Exactly one of the two fields
input_data_blob_prefix
andinput_data_blob_prefix_list
would be required to be specified in the CreateJob request. - Users of aggregation service who do not see the need to specify multiple input prefixes can continue using the field
input_data_blob_prefix
.
API changes
Current CreateJob API request payload schema
{
// other fields of CreateJob request
"input_data_bucket_name": "my-bucket",
"input_data_blob_prefix": "my-month/my-day/",
"job_parameters": {
// fields inside this json object
}
}
Proposed CreateJob API request payload schema
{
// other fields of CreateJob request
"input_data_bucket_name": "my-bucket",
"input_data_blob_prefix": "my-month/my-day", //should be absent if input_data_blob_prefix_list is provided
"input_data_blob_prefix_list": ["my-month/my-day/hour-00",
"my-month/my-day/hour-01",
"my-month/my-day/hour-02"
"my-month/my-day/hour-03"
"my-month/my-day/hour-04"], //optional field
"job_parameters": {
// fields inside this json object
}
}
Feedback request
If you have any feedback on the above proposal, please let us know by responding to this issue.
We would really appreciate your feedback on these API changes. In particular:
- Would adtechs find this feature useful?
- We're proposing a limit of 50 on the number of input prefixes. Do adtechs find this limit sufficient for their use cases?