-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Hi, with reference to Example 2 in https://github.com/anthropics/anthropic-cookbook/blob/main/misc/prompt_caching.ipynb:
Turn 1:
...
User input tokens: 4
Output tokens: 22
Input tokens (cache read): 0
Input tokens (cache write): 187354
0.0% of input prompt cached (4 tokens)
Time taken: 20.37 seconds
Turn 2:
...
User input tokens: 4
Output tokens: 297
Input tokens (cache read): 187354
Input tokens (cache write): 36
100.0% of input prompt cached (187358 tokens)
Time taken: 7.53 seconds
Turn 3:
...
User input tokens: 4
Output tokens: 289
Input tokens (cache read): 187390
Input tokens (cache write): 308
100.0% of input prompt cached (187394 tokens)
Time taken: 6.76 seconds
I was under the impression that the number of tokens being written to the cache on Turn N
would be the sum of input and output tokens from Turn N-1
. But we see a difference of +10 and +7 tokens respectively on Turn 2 and 3. My understanding is that it's due to some extra metadata being cached, but I couldn't find any specifics in the Anthropic docs or anywhere else online. Claude 4 Sonnet seems to agree that this is the case:
https://bench.io/share/artifact/5SaWYiuBgxwwKVY7R5ZC8t (direct from Sonnet through the API)
The discrepancy is not a huge deal, but I'm mainly hoping to get some clarity on how large this discrepancy should be and any insight on what specifically contributes to this. Thanks!