+
Skip to main content
Log in

State-aware video procedural captioning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Video procedural captioning (VPC), which generates procedural text from instructional videos, is an essential task for scene understanding and real-world applications. The main challenge of VPC is to describe how to manipulate materials accurately. This paper focuses on this challenge by designing a new VPC task, generating a procedural text from the clip sequence of an instructional video and material set. In this task, the state of materials is sequentially changed by manipulations, yielding their state-aware visual representations (e.g., eggs are transformed into cracked, stirred, then fried forms). The essential difficulty is to convert such visual representations into textual representations; that is, a model should track the material states after manipulations to better associate the cross-modal relations. To achieve this, we propose a novel VPC method, which modifies an existing textual simulator for tracking material states as a visual simulator and incorporates it into a video captioning model. Our experimental results show the effectiveness of the proposed method, which outperforms state-of-the-art video captioning models. We further analyze the learned embedding of materials to demonstrate that the simulators capture their state transition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availablity Statement

The datasets generated during and/or analysed during the current study are available in our repositoryFootnote 7: https://github.com/misogil0116/svpc_pp

Notes

  1. We perform an experiments on the full prediction setting in Section 4.7.

  2. We employ pre-trained 300D word embedding, which can be downloaded from http://nlp.stanford.edu/data/glove.6B.zip

  3. To consider multiple words of materials, we divide the probability by the number of words.

  4. We will release annotated ingredients and the dataset split.

  5. The attention weight in the material selector was higher than 0.5.

  6. The raw and updated ingredients correspond to an embedding \(\boldsymbol {\mathcal {E}}^{0}\) by the material encoder and an embedding \(\boldsymbol {\mathcal {E}}^{n}\) updated from \(\boldsymbol {\mathcal {E}}^{0}\) by the visual simulator, respectively.

  7. Currently, the repository is a private mode. After our manuscript has been accepted, I will release the code and dataset.

  8. \(\boldsymbol {w}_{p}^{j}\) represents j-th value of wp; thus, Eq.(6) indicates normalization of wp

  9. https://github.com/flairNLP/flair

  10. In the tag definitions of the E-rFG corpus, we display food entities as the estimated ingredients. These entities cannot be directly used for our dataset because the definition of food slightly differs from the ingredient definition in this paper (for example, “it” or “salad” are recognized as food in the E-rFG corpus). Therefore, we asked annotators to delete or rewrite ingredients if they were not appropriate.

References

  1. Akbik A, Blythe D, Vollgraf R (2018) Contextual string embeddings for sequence labeling. In: Proc COLING, pp 1638–1649

  2. Alayrac J-B, Bojanowski P, Agrawal N, Sivic J, Laptev I, Lacoste-Julien S (2016) Unsupervised learning from narrated instruction videos. In: Proc CVPR, pp 4575–4583

  3. Alayrac J-B, Sivic J, Laptev I, Lacoste-Julien S (2017) Joint discovery of object states and manipulation actions. In: Proc ICCV, pp 2127–2136

  4. Amac MS, Yagcioglu S, Erdem A, Erdem E (2019) Procedural reasoning networks for understanding multimodal procedures. In: Proc coNLL, pp 441–451

  5. Banerjee S, Lavie A (2005) METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proc ACL workshop IEEMMTS, pp 65–72

  6. Bosselut A, Levy O, Holtzman A, Ennis C, Fox D, Choi Y (2018) Simulating action dynamics with neural process networks. In: Proc ICLR

  7. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Proc ECCV, pp 213–229

  8. Chen J, Ngo C-W (2016) Deep-based ingredient recognition for cooking recipe retrieval. In: Proc ACMMM, pp 32–41

  9. Dai Z, Yang Z, Yang Y, Carbonell J, Le Q, Salakhutdinov R (2019) Transformer-xl: attentive language models beyond a fixed-length context. In: Proc ACL, pp 2978–2988

  10. Dalvi B, Huang L, Tandon N, Yih W-t, Clark P (2018) Tracking state changes in procedural text: a challenge dataset and models for process paragraph comprehension. In: Proc NAACL, pp 1595–1604

  11. Damen D, Doughty H, Farinella GM, Fidler S, Furnari A, Kazakos E, Moltisanti D, Munro J, Perrett T, Price W, Wray M (2018) Scaling egocentric vision: The EPIC-KITCHENS dataset. In: Proc ECCV, pp 720–736

  12. Devlin J, Chang M -W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proc NAACL, pp 4171–4186

  13. Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proc CVPR, pp 2625–2634

  14. Escorcia V, Heilbron FC, Niebles JC, Ghanem B (2016) DAPS: deep action proposals for action understanding. In: Proc ECCV, pp 768–784

  15. Gupta A, Durrett G (2019) Tracking discrete and continuous entity state for process understanding. In: Proc NAACL workshop SPNLP, pp 7–12

  16. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proc CVPR, pp 770–778

  17. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proc ICML, pp 448–456

  18. Jang E, Gu S, Poole B (2017) Categorical reparametrization with gumble-softmax. In: Proc ICLR

  19. Jermsurawong J, Habash N (2015) Predicting the structure of cooking recipes. In: Proc EMNLP, pp 781–786

  20. Kiddon C, Ponnuraj GT, Zettlemoyer L, Choi Y (2015) Mise en Place: unsupervised interpretation of instructional recipes. In: Proc EMNLP, pp 982–992

  21. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Proc ICLR, USA

  22. Lei J, Wang L, Shen Y, Yu D, Berg T, Bansal M (2020) Mart: memory-augmented recurrent transformer for coherent video paragraph captioning. In: Proc ACL, pp 2603–2614

  23. Lin C-Y, Och FJ (2004) Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Proc ACL, pp 605–612

  24. Maeta H, Sasada T, Mori S (2015) A framework for procedural text understanding. In: Proc IWPT, pp 50–60

  25. Miech A, Alayrac J-B, Smaira L, Laptev I, Sivic J, Zisserman A (2020) End-to-end learning of visual representations from uncurated instructional videos. In: Proc CVPR, pp 9879–9889

  26. Miech A, Zhukov D, Alayrac J-B, Tapaswi M, Laptev I, Sivic J (2019) HowTo100M: learning a text-video embedding by watching hundred million narrated video clips. In: Proc ICCV, pp 2630–2640

  27. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: NeurIPS, pp 3111–3119

  28. Mintz M, Bills S, Snow R, Jurafsky D (2009) Distant supervision for relation extraction without labeled data. In: Proc ACL-IJCNLP, pp 1003–1011

  29. Nishimura T, Hashimoto A, Ushiku Y, Kameko H, Mori S (2021) State-aware video procedural captioning. In: Proc ACMMM

  30. Nishimura T, Hashimoto A, Ushiku Y, Kameko H, Yamakata Y, Mori S (2020) Structure-aware procedural text generation from an image sequence. IEEE Access 9:2125–2141

    Article  Google Scholar 

  31. Nishimura T, Sakoda K, Hashimoto A, Ushiku Y, Tanaka N, Ono F, Kameko H, Mori S (2021) Egocentric biochemical video-and-language dataset. In: Proc CLVL, pp 3129–3133

  32. Pan L, Chen J, Wu J, Liu S, Ngo C-W, Kan M-Y, Jiang Y-G, Chua T-S (2020) Multi-modal cooking workflow construction for food recipes. In: Proc ACMMM, pp 1132–1141

  33. Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: Proc ACL, pp 311–318

  34. Park JS, Rohrbach M, Darrell T, Rohrbach A (2019) Adversarial inference for multi-sentence video description. In: Proc CVPR, pp 6598–6608

  35. Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proc EMNLP, pp 1532–1543

  36. Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks arXiv

  37. Ren S, He K, Girshick R, Sun J (2015) Faster r-CNN: towards real-time object detection with region proposal networks. In: Proc NeurIPS, pp 91–99

  38. Salvador A, Hynes N, Aytar Y, Marin J, Ofli F, Weber I, Torralba A (2017) Learning cross-modal embeddings for cooking recipes and food images. In: Proc CVPR, pp 3020–3028

  39. Santoro A, Faulkner R, Raposo D, Rae J, Chrzanowski M, Weber T, Wierstra D, Vinyals O, Pascanu R, Lillicrap T (2019) Relational recurrent neural networks. In: Proc NeurIPS, pp 7299–7310

  40. See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator networks. In: Proc ACL, pp 1073–1083

  41. Shi B, Ji L, Liang Y, Duan N, Chen P, Niu Z, Zhou M (2019) Dense procedure captioning in narrated instructional videos. In: Proc ACL, pp 6382–6391

  42. Shi B, Ji L, Niu Z, Duan N, Zhou M, Chen X (2020) Learning semantic concepts and temporal alignment for narrated video procedural captioning. In: Proc ACMMM, pp 4355–4363

  43. Sun C, Myers A, Vondrick C, Murphy K, Schmid C (2019) Videobert: a joint model for video and language representation learning. In: Proc ICCV, pp 7464–7473

  44. Tan G, Liu D, Wang M, Zha Z-J (2020) Learning to discretely compose reasoning module networks for video captioning. In: Proc IJCAI, pp 745–752

  45. van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

  46. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proc NeurIPS, pp 5998–6008

  47. Vedantam R, Zitnick CL, Parikh D (2015) CIDER: consensus-based image description evaluation. In: Proc CVPR, pp 4566–4575

  48. Williams RJ, Zipser D (1989) A learning algorithm for continually running fully recurrent neural networks. Neural Comput 1:270–280

    Article  Google Scholar 

  49. Xiong Y, Dai B, Lin D (2018) Move forward and tell: a progressive generator of video descriptions. In: Proc ECCV, pp 489–505

  50. Yamakata Y, Mori S, Carroll J (2020) English recipe flow graph corpus. In: Proc LREC, pp 5187–5194

  51. Zamir N, Noy A, Friedman I, Protter M, Zelnik-Manor L (2020) Asymmetric loss for multi-label classification

  52. Zhou L, Kalantidis Y, Chen X, Corso JJ, Rohrbach M (2019) Grounded video description. In: Proc CVPR, pp 6578–6587

  53. Zhou L, Xu C, Corso JJ (2018) Towards automatic learning of procedures from web instructional videos. In: Proc AAAI, pp 7590–7598

  54. Zhou L, Zhou Y, Corso JJ, Socher R, Xiong C (2018) End-to-end dense video captioning with masked transformer. In: Proc CVPR, pp 8739–8748

Download references

Funding

This work was supported by JSPS KAKENHI Grant Number JP21J20250 and JP20H04210, and partially supported by JP21H04910, JP17H06100, JST-Mirai Program Grant Number JPMJMI21G2, and JST ACT-I Grant Number JPMJPR17U5.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Taichi Nishimura.

Ethics declarations

Competing interests

All of them are research grants from the Japanese government.

Conflict of Interests

All authors state that no financial/non-financial support has been received from any organization that may have an interest in this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Details of simulator

In this section, we describe the details of the visual simulator for the reproducibility of the proposed method. Given the encoded vectors of the clip sequence and material list \((\boldsymbol {{\mathscr{H}}}, \boldsymbol {\mathcal {E}}^{0})\), the visual simulator, shown in Fig. 3 in the main paper, recurrently reasons the state transition of materials at each step. Specifically, at the n-th step, given the n-th clip hn and (n − 1)-th material list \(\boldsymbol {\mathcal {E}}^{n-1}\), the visual simulator predicts executed actions and involved materials in (1) the action and (2) material selector and it then updates the state of materials in (3) the updater. After n-th reasoning, it outputs a state-aware step vector \(\boldsymbol {u}_{n} \in \mathbb {R}^{3 \times d}\), which concatenates the n-th clip hn, selected action \(\bar {\boldsymbol {f}}_{n}\) and material vectors \(\bar {\boldsymbol {e}}_{n}\) (d represents the dimension of these vectors). The visual simulator recurrently repeats the above process until processing the end element of the clip sequence. For clarity, we explain the simulation process of the visual simulator at the n-th step.

Action selector

Given a clip vector hn, the action selector outputs the selected action vector \(\bar {\boldsymbol {f}}_{n}\) by choosing actions executed in the clip from predefined action embedding \(\boldsymbol {\mathcal {F}}\). For example, in Fig. 3 in the main paper, the actions “crack” and “stir” are executed in the clip, thus both fcrack and fstir should be selected. To consider multiple actions, the action selector computes a soft selection wp as action probability for each action in \(\boldsymbol {\mathcal {F}}\). Then it outputs the selected action vector \(\boldsymbol {\bar {f}}_{n}\) as a weighted sum of the action embedding \(\boldsymbol {\mathcal {F}}\) and action probability wp:

$$ \begin{array}{@{}rcl@{}} \boldsymbol{w}_{p} &=& \text{MLP}(\boldsymbol{h}_{n}) \end{array} $$
(A1)
$$ \begin{array}{@{}rcl@{}} \bar{\boldsymbol{w}}_{p} &=& \frac{\boldsymbol{w}_{p}}{{\sum}_{j}\boldsymbol{w}_{p}^{j}} \end{array} $$
(A2)
$$ \begin{array}{@{}rcl@{}} \bar{\boldsymbol{f}}_{n} &=& \bar{\boldsymbol{w}}_{p}^{T} \boldsymbol{\mathcal{F}}, \end{array} $$
(A3)

where MLP(⋅) represents two-layer MLPs with the sigmoid function and \(\boldsymbol {w}_{p} \in \mathbb {R}^{\|\boldsymbol {\mathcal {F}}\|}\) is the attention distribution over \(\|\boldsymbol {\mathcal {F}}\|\) possible actionsFootnote 8.

Material selector

Based on the action probability wp and clip vector hn, the material selector outputs the selected material vector \(\boldsymbol {\bar {e}}_{n}\) by choosing materials involved in the clip from the material list \(\boldsymbol {\mathcal {E}}^{n-1}\). For example, in Fig. 3 in the main paper, the raw “cheese” and manipulated “eggs” and “butter” should be selected. To consider such a combination of raw and manipulated material selection, the material selector has two attention modules: (1) clip attention and (2) recurrent attention.

  1. (1)

    The clip attention chooses relevant materials from the clip vector hn and action probability wp:

    $$ \begin{array}{@{}rcl@{}} \hat{\boldsymbol{\textit{h}}}_{n} &=& \text{ReLU}(\boldsymbol{W}_{1} \boldsymbol{\textit{h}}_{n} + \boldsymbol{b}_{1}) \end{array} $$
    (A4)
    $$ \begin{array}{@{}rcl@{}} \boldsymbol{d}_{m} &=& \sigma((\boldsymbol{\textit{e}}_{m}^{n-1})^{\textsf{T}} \boldsymbol{W}_{2} [\hat{\boldsymbol{\textit{h}}}_{n};\boldsymbol{w}_{p}]) \end{array} $$
    (A5)

    where W1 and W2 are linear and bilinear mapping, b1 and b2 are biases, and \(\boldsymbol {\textit {e}}_{m}^{n-1}\) and dm represent the m-th material vector and its attention weight.

  2. (2)

    Recurrent attention selects materials based on information from both the current and previous clips. Using the result of clip attention, it computes a soft selection an as a material probability for each material in the material list:

    $$ \begin{array}{@{}rcl@{}} \boldsymbol{c} &=& \text{softmax}(\boldsymbol{W}_{3} \hat{\boldsymbol{\textit{h}}}_{n} + \boldsymbol{b}_{3}) \end{array} $$
    (A6)
    $$ \begin{array}{@{}rcl@{}} \boldsymbol{a}_{m}^{n} &=& \boldsymbol{c}_{1}\boldsymbol{d}_{m} + \boldsymbol{c}_{2}\boldsymbol{a}_{m}^{n-1} + \boldsymbol{c}_{3} \boldsymbol{0} \end{array} $$
    (A7)

    where W3 is a linear mapping, \(\boldsymbol {c} \in \mathbb {R}^{3}\) is the choice distribution, \(\boldsymbol {a}_{m}^{n-1}\) is the attention weight of the previous clip for each material, \(\boldsymbol {a}_{m}^{n}\) is the final distribution for each material, and 0 is a vector of zeros (providing the option not to select any materials). Finally, using the calculated attention weights, the selected material vector \(\boldsymbol {\bar {e}}_{n}\) is computed as the normalized weighted sum of the selected materials.

    $$ \begin{array}{@{}rcl@{}} \boldsymbol{\alpha}_{m}^{n} &=& \frac{\boldsymbol{a}_{m}^{n}}{{\sum}_{j}\boldsymbol{a}_{j}^{n}} \end{array} $$
    (A8)
    $$ \begin{array}{@{}rcl@{}} \bar{\boldsymbol{\textit{e}}}_{n} &=& \sum\limits_{m}\boldsymbol{\alpha}_{m}^{n}\boldsymbol{e}_{m}^{n-1}. \end{array} $$
    (A9)

Updater

Based on the selected actions and materials, the updater represents the state transition of materials by computing a new material vector \(\hat {\boldsymbol {e}}_{m}\). To this end, it first calculates an action-aware proposal vector ln of materials with a bilinear transformation of the selected action and material vectors \((\boldsymbol {\bar {f}}_{n},\boldsymbol {\bar {e}}_{n})\):

$$ \boldsymbol{l}_{n} = \text{ReLU}(\bar{\boldsymbol{f}}_{n}\boldsymbol{W}_{4}\bar{\boldsymbol{\textit{e}}}_{n} + \boldsymbol{b}_{4}), $$
(A10)

where W4 is a bilinear mapping.

Then, based on the material probability an, it computes the new material vector \(\hat {\boldsymbol {e}}_{m}\) by interpolating the action-aware proposal vector ln and current material vector \(\hat {\boldsymbol {e}}_{m}^{n-1}\):

$$ \hat{\boldsymbol{e}}_{m} = \boldsymbol{a}_{n}^{m}\boldsymbol{l}_{n} + (1 - \boldsymbol{a}_{n}^{m}) \boldsymbol{e}_{m}^{n-1}. $$
(A11)

The new m-th material vector \(\hat {\boldsymbol {e}}_{m}\) is assigned to \(\boldsymbol {\mathcal {E}}_{m}^{n}\), which is forwarded to the next (n + 1)-th step.

Appendix B: Detailed annotation process

We additionally annotated ingredients with the rest of the recipes (126 recipes) to the YouCook2-ingredient dataset, and built the YouCook2-ingredient+ dataset.

We increased the dataset size by obtaining the missing videos through YouCook2 author and annotating ingredients with these additional videos by hiring one annotator to use the web tool shown in Fig. 10. This annotation tool presents a recipe, corresponding video, and text boxes for writing ingredients. In this paper, ingredients are defined as raw materials that are necessary to complete the dish. For example, “tomato” and “cucumber” should be written as ingredients, although “salad” should not be written because it represents a mixture of ingredients.

Fig. 10
figure 10

A screen of our browser-based web annotation tool. Annotators write ingredients that appear in the recipes. To ease annotation, we preliminarily estimated ingredients using the NER method [1] pre-trained on the English recipe flow graph corpus [50], and we set the default values of inputs

To annotate ingredients easily, “jump” buttons, which allow annotators to see a clip corresponding to a step, are implemented based on the start/end timestamp from the original YouCook2 dataset. Moreover, to encourage annotators to write ingredients easily, this tool displays estimated ingredients using the named entity recognition (NER) model, flairFootnote 9 [1] pre-trained on the English recipe flow graph corpus (E-rFG corpus) [50]Footnote 10. If the estimated words are not appropriate for the ingredients, the ingredients can be deleted or rewritten.

Appendix C: Baseline implementation details

As our comparative models, we employed two state-of-the-art transformer-based video captioning models: Transformer-XL [9] and MART [22]. These models originally have no ingredient set in their inputs and copy mechanism in their decoder; thus for a fair comparison, we prepare for additional baseline +ingredient (-I) models, which incorporate the material encoder (Section 3.2) and the copy mechanism into the baselines.

These models are based on the transformer that encodes sequential inputs and decodes a sentence by attending all of the elements in the input sequence. Thus, to fit this characteristic, we concatenate the encoded ingredient and video vectors, and input them to the model, as shown in Fig. 11. When decoding, based on the output of the decoder ok and ingredient vectors \(\boldsymbol {\mathcal {E}}_{0}\), the copy mechanism calculates the copying gate to make a soft choice between selecting an ingredient from the ingredient set or generating a word from the vocabulary.

Fig. 11
figure 11

An overview of our baseline +ingredients (-I) implementation. These models also incorporate the material encoder described in Section 3.2 and copy mechanism into the baselines

Appendix D: Implementation and training details on full prediction settings

Here, we discuss the implementation and training details of the full prediction settings, where the material set is not given, but is predicted from the video clips in advance. To address this, as described in Section 4.7, we added an ingredient decoder of the multi-label classifier and trained the entire model as multi-task learning.

Figure 12 shows how to integrate the ingredient decoder into the model. The ingredient decoder consists of a two-layered MLP with a sigmoid function, and converts \(\hat {\boldsymbol {h}}\) of a max-pooled vector of clip vectors into a probability vector of materials, where q indicates the number of unique ingredients appearing more than three times in the training set (we obtained q = 668 in the experiment). During training, we compute the ingredient decoder loss \({\mathscr{L}}_{ingr}\), which is an asymmetric loss [51] on the multi-label classification settings, and add it to the total loss defined in (4). Note that we adopt teacher-forcing [48] to stabilize the training; while the models learn using the ground-truth ingredients for the downstream process in the training phase, they generate a recipe based on predicted ingredients at the inference phase (we sample the top k = 15 ingredients from the probability). Another modification of the model is to remove the copy mechanism because we find that it degrades the captioning performance with this setting.

Fig. 12
figure 12

An overview of how to integrate ingredient decoder into the model

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nishimura, T., Hashimoto, A., Ushiku, Y. et al. State-aware video procedural captioning. Multimed Tools Appl 82, 37273–37301 (2023). https://doi.org/10.1007/s11042-023-14774-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1007/s11042-023-14774-7

Keywords

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载