-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Terminology
Seller Frontend Service (SFS) is the front-end service of the system that runs in the trusted execution environment on a supported cloud platform. The service receives requests from the seller's ad service to initiate the Protected Audience auction flow. Then the service orchestrates requests (in parallel) to Buyers / DSPs participating in the auction for bidding.
Buyer Frontend Service (BFS) is the front-end service of the system that runs in the trusted execution environment on a supported cloud platform. This service receives requests to generate bids from a SellerFrontEnd service. This service fetches real-time bidding signals that are required for bidding and calls the Bidding service.
ROMA is a C++ library used for executing untrusted code in a secure, isolated environment. It’s currently used for BFS and SFS
Problem Statement
Hosting B&A servers is a considerable incremental cost driver, primarily for the DSPs. The DSP BuyerFrontEnd, Bidding Service, and Key/Value Service servers will be called for every combination of a DSP-owned Interest group and the SSP used to communicate that impression opportunity to the DSP, which could lead to thousands of calls per impression opportunity, inflating the processing costs greatly. Obviously most of these calls would be either redundant or duplicative and thus most of the incremental cost will be wasteful. Expected HW optimization is around 50%
High-level Design
Writing encoded logs in Seller Frontend Service
To build models it’s required to log parameters of incoming requests such as interest groups and context signals. Buyer interest groups should be made available for Seller’s UDF which is responsible for filtering. As not to allow tracking users, the logs are encrypted and keys for decrypting them are managed in the same way as keys for decoding auction data from browsers (there are coordinators who provide the keys.
Parameters for encrypted logging are provided by Sellers ML engineers. In the configuration of Seller Frontend Service, there may be a section that describes what fields are logged or alternatively there should be API in UDFs.
function shouldSendRequestToBuyer(auctionConfig, trafficShapingParams) {
const isLearning = Math.random() < 0.05
if (isLearning) {
return {
requestSent: true,
isLearning: isLearning
}
} else {
const p = runInference('./path-to-model', {
buyer: trafficShapingParams.buyer,
interestGroups: trafficShapingParams.interestGroups,
country: auctionConfig.sellerSignals.country,
publisher: auctionConfig.sellerSignals.publisher
})
return {
requestSent: p > 0.42,
isLearning: isLearning
}
}
}
function scoreAd(adMetadata, bid, auctionConfig, trustedScoringSignals, browserSignals,
directFromSellerSignals, crossOriginTrustedSignals, trafficShapingParams) {
if (trafficShapingParams.isLearning) {
writeEncryptedLog({
bid: bid,
buyer: trafficShapingParams.buyer,
interestGroups: trafficShapingParams.interestGroups,
country: auctionConfig.sellerSignals.country,
publisher: auctionConfig.sellerSignals.publisher
})
}
}
Introduction of Seller Model Builder Service
It’s a service that will build inference models based on encrypted logs from Seller Frontend Service. It’s hosted in TEE and sandboxed in the same fashion as Seller or Buyer Frontend Service. Considering the amount of traffic Seller Model Builder Service should be scalable. Keys for decoding are managed in the same way as keys for decoding auction data from browsers (there are coordinators who provide the keys).
The code for building models are provided by Sellers ML engineers. It could be deployed to a cloud storage and used by Seller Model Builder Service. Below is an example of the code that is used for building models.
# Libraries
import torch as T
import torch.nn as nn
import numpy as np
# Specify the device
device = T.device("cpu")
# Setup configuration of the model
class NeuralNet(nn.Module):
def __init__(self, x_data_dim):
super(NeuralNet, self).__init__()
self.hidden_layer1 = nn.Linear(x_data_dim, 2)
def forward(self, x):
return T.log_softmax(self.hidden_layer1(x), dim=1)
# Training dataset
class SellerDataset(T.utils.data.Dataset):
def __init__(self, src_file):
tmp_x = self._load_features_from_encrypted_logs(src_file)
tmp_y = self._load_labels_from_encrypted_logs(src_file)
self.x_data = T.tensor(features, dtype=T.float32).to(device)
self.y_data = T.tensor(features, dtype=T.long).to(device)
def _load_features_from_encrypted_logs(src_file):
# API for reading encrypted logs is used in this function
...
def _load_labels_from_encrypted_logs(src_file):
# API for reading encrypted logs is used in this function
...
# Data for training
training_dataset = SellerDataset("some_path_to_encrypted_logs")
# Initialize a model
x_data_dim = training_dataset.x_data.shape[1]
model = NeuralNet(x_data_dim)
# Some code for model training
...
# Save the model
model_scripted = T.jit.script(model)
model_scripted.save('path-to-model')
Feedback loop between Seller Model Builder Service and Seller Frontend Service
To deliver encrypted logs and encrypted models there should be automation that processes that.
Metrics
- We suggest tracking the following metrics, most of which should be also available for big slices of traffic (e.g., country, publisher, interest group)
- Seller Model Builder Service. Cross entropy (log-likelihood), ROC AUC, the success rate on train, validation, and test datasets
- Seller Model Builder Service. Number of tries and successes per prediction bucket (each bucket contains requests with a prediction score in some predefined range, 5 buckets are enough)
- Seller frontend service. Number of tries and successes and average prediction score per prediction bucket for the exploration and exploitation groups
Questions
- What technology may be used in Seller Model Builder Service for building inference models?
- Does the Chrome team see violations of user’s privacy given that logs and models are encrypted and used only inside the sandbox?
- What effort is required to build all required infrastructure?
- Is the current debug mode enough to debug the traffic-shaping models?
- What should A/B tests look like in the described setup?
- What is the Chrome team opinion on usability for Sellers?