这是indexloc提供的服务,不要输入任何密码
Skip to main content
Account

Table 8 Comparison of performance and detailed model specifics including the number of parameters (# Params), FLOPS, and inference time in seconds/iteration between different backbones and methods

From: Matching Compound Prototypes for Few-Shot Action Recognition

Method

Backbone

Object

# Params

FLOPS-b

FLOPS-m

FLOPS-o

Inference time

SSv2-Small

SSv2-Full

Kinetics

MatchNet

ResNet50

/

24.6M

33.0G

0

0

0.4 (s/it)

34.9

35.1

54.6

TRX (Perrett et al., 2021)

ResNet50

/

27.2M

33.0G

10.57G

0

0.8 (s/it)

37.1

41.5

64.6

ITA-Net (Zhang et al., 2021b)

ResNet50

/

30.9M

33.0G

11.3G

0

0.9 (s/it)

38.4

46.1

72.6

Ours-

ResNet-50

/

32.0M

33.0G

2.2G

0

0.6 (s/it)

38.9

49.3

73.3

Ours-ms

ResNet-50

/

39.8M

33.0G

8.82G

0

0.8 (s/it)

42.6

52.3

74.0

Ours-ms

ResNet-18

/

26.7M

15.6G

8.82G

0

0.6 (s/it)

40.8

50.2

71.4

Ours-ms

DenseNet

/

23.2M

26.0G

8.82G

0

0.7 (s/it)

41.0

50.7

71.7

Ours-obj

ResNet-50

41.8M

37.2M

33.0G

8.06G

3T

3.3 (s/it)

57.1

59.6

81.0

Ours-obj

ResNet-18

41.8M

24.1M

15.6G

8.06G

3T

3.2 (s/it)

53.4

56.2

77.3

Ours-obj

DenseNet

41.8M

20.6M

26.0G

8.06G

3T

3.2 (s/it)

53.7

56.5

77.6

  1. FLOPS-b, FLOPS-m, and FLOPS-o denote the computation cost on backbones, few-shot learning modules, and object detectors, respectively. Method “Ours-ms” indicates our method with multiscale feature and “Ours-obj” denotes our method that uses an additional object detector. Experiments are conducted on a single NVIDIA V100 GPU