这是indexloc提供的服务,不要输入任何密码
Skip to content

[validator] Global OOB read in IsUtf8WhiteSpaceChar #40420

@hgarrereyn

Description

@hgarrereyn

Hi, there is a potential bug in the validator html parser utilitity IsUtf8WhiteSpaceChar reachable by e.g. calling SplitStrAtUtf8Whitespace with a string containing \xff.

This bug was reproduced on a43d97c.

Description

The crash is a global-buffer-overflow in htmlparser::Strings::IsUtf8WhiteSpaceChar, called by SplitStrAtUtf8Whitespace. The function implements a small DFA using the global kWhitespaceTable, declared as std::array<std::array<uint8_t, 255>, 10>. Each iteration does state = kWhitespaceTable[state][c], where c is the next input byte promoted to uint8_t. When c == 255 (0xff), the code indexes column 255, which is out of bounds for a 255-element row (valid indices 0..254). AddressSanitizer reports a read just past the end of kWhitespaceTable (10*255 = 2550 bytes), matching the definition at whitespacetable.h:36.

Fortunately, neither IsUtf8WhiteSpaceChar nor SplitStrAtUtf8Whitespace appear to be used elsewhere, hence the bug does not seem to be currently triggerable.

Suggested Patch

Update the kWhitespaceTable shape to std::array<std::array<uint8_t, 256>, 10> instead of std::array<std::array<uint8_t, 255>, 10>.

POC

The following testcase demonstrates the bug:

testcase.cpp

#include <string>
#include <vector>
#include <string_view>
#include "/fuzz/install/include/cpp/htmlparser/strings.h"

int main(){
  // Sequence drives the DFA into a multi-byte UTF-8 state, then feeds 0xff.
  // The 0xff makes the code index kWhitespaceTable[state][255], but the table
  // has only 255 columns (0..254), causing a global OOB read.
  std::string s;
  s.push_back((char)227);  // 0xe3
  s.push_back((char)128);  // 0x80
  s.push_back((char)255);  // 0xff -> offending index
  s.push_back((char)127);  // arbitrary following byte
  auto cols = htmlparser::Strings::SplitStrAtUtf8Whitespace(s);
  size_t total=0; for (auto v: cols) total+=v.size();
  volatile size_t sink = total; (void)sink;
  return 0;
}

stdout

=================================================================
==1==ERROR: AddressSanitizer: global-buffer-overflow on address 0x557ec4155616 at pc 0x557ec413d350 bp 0x7fff0b674f90 sp 0x7fff0b674f88
READ of size 1 at 0x557ec4155616 thread T0
    #0 0x557ec413d34f in htmlparser::Strings::SplitStrAtUtf8Whitespace(std::basic_string_view<char, std::char_traits<char>>) (/fuzz/test+0x17334f) (BuildId: cfed22865b06e91e502590dafc8e07de9485533b)
    #1 0x557ec412fefc in main /fuzz/testcase.cpp:15:15
    #2 0x7f39baa19d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #3 0x7f39baa19e3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #4 0x557ec4054b34 in _start (/fuzz/test+0x8ab34) (BuildId: cfed22865b06e91e502590dafc8e07de9485533b)

0x557ec4155616 is located 0 bytes after global variable 'htmlparser::kWhitespaceTable' defined in 'strings.cc' (0x557ec4154c20) of size 2550
SUMMARY: AddressSanitizer: global-buffer-overflow (/fuzz/test+0x17334f) (BuildId: cfed22865b06e91e502590dafc8e07de9485533b) in htmlparser::Strings::SplitStrAtUtf8Whitespace(std::basic_string_view<char, std::char_traits<char>>)
Shadow bytes around the buggy address:
  0x557ec4155380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x557ec4155400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x557ec4155480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x557ec4155500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x557ec4155580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x557ec4155600: 00 00[06]f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0x557ec4155680: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0x557ec4155700: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0x557ec4155780: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0x557ec4155800: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  0x557ec4155880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==1==ABORTING

Steps to Reproduce

The crash was triaged with the following Dockerfile:

Dockerfile

# Ubuntu 22.04 with some packages pre-installed
FROM hgarrereyn/stitch_repro_base@sha256:3ae94cdb7bf2660f4941dc523fe48cd2555049f6fb7d17577f5efd32a40fdd2c

RUN git clone https://github.com/ampproject/amphtml /fuzz/src && \
    cd /fuzz/src && \
    git checkout a43d97cc3d240be9aad844d782796f20658999d9 && \
    git submodule update --init --remote --recursive

ENV LD_LIBRARY_PATH=/fuzz/install/lib
ENV ASAN_OPTIONS=hard_rss_limit_mb=1024:detect_leaks=0

RUN echo '#!/bin/bash\nexec clang-17 -fsanitize=address -O0 "$@"' > /usr/local/bin/clang_wrapper && \
    chmod +x /usr/local/bin/clang_wrapper && \
    echo '#!/bin/bash\nexec clang++-17 -fsanitize=address -O0 "$@"' > /usr/local/bin/clang_wrapper++ && \
    chmod +x /usr/local/bin/clang_wrapper++

# Install required build tools
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
 && rm -rf /var/lib/apt/lists/*

# Build a minimal static library from the AMP HTML C++ htmlparser components
# Install prefix
ENV PREFIX=/fuzz/install
RUN mkdir -p ${PREFIX}/lib ${PREFIX}/include/htmlparser ${PREFIX}/include/cpp/htmlparser /tmp/build

# Compile selected sources that have no external dependencies
WORKDIR /fuzz/src/validator/cpp/htmlparser
RUN clang_wrapper++ -O2 -fPIC -I/fuzz/src/validator -I/fuzz/src/validator/cpp -I/fuzz/src/validator/cpp/htmlparser -c \
    strings.cc -o /tmp/build/strings.o && \
    clang_wrapper++ -O2 -fPIC -I/fuzz/src/validator -I/fuzz/src/validator/cpp -I/fuzz/src/validator/cpp/htmlparser -c \
    atomutil.cc -o /tmp/build/atomutil.o && \
    clang_wrapper++ -O2 -fPIC -I/fuzz/src/validator -I/fuzz/src/validator/cpp -I/fuzz/src/validator/cpp/htmlparser -c \
    token.cc -o /tmp/build/token.o && \
    clang_wrapper++ -O2 -fPIC -I/fuzz/src/validator -I/fuzz/src/validator/cpp -I/fuzz/src/validator/cpp/htmlparser -c \
    url.cc -o /tmp/build/url.o

# Create static archive
RUN ar rcs ${PREFIX}/lib/libamphtmlparser.a /tmp/build/strings.o /tmp/build/atomutil.o /tmp/build/token.o /tmp/build/url.o && \
    ranlib ${PREFIX}/lib/libamphtmlparser.a

# Install public headers in both layouts to satisfy includes
RUN for d in ${PREFIX}/include/htmlparser ${PREFIX}/include/cpp/htmlparser; do \
      cp -v \
        allocator.h \
        atom.h \
        atomutil.h \
        casetable.h \
        comparators.h \
        defer.h \
        entity.h \
        glog_polyfill.h \
        hash.h \
        logging.h \
        strings.h \
        token.h \
        url.h \
        whitespacetable.h \
        $d/; \
    done

ENV CPATH=${PREFIX}/include
ENV LIBRARY_PATH=${PREFIX}/lib
ENV LD_LIBRARY_PATH=${PREFIX}/lib

Build Command

clang++-17 -fsanitize=address -g -O0 -o /fuzz/test /fuzz/testcase.cpp -I/fuzz/install/include -L/fuzz/install/lib -lamphtmlparser && /fuzz/test

Reproduce

  1. Copy Dockerfile and testcase.cpp into a local folder.
  2. Build the repro image:
docker build . -t repro --platform=linux/amd64
  1. Compile and run the testcase in the image:
docker run \
    -it --rm \
    --platform linux/amd64 \
    --mount type=bind,source="testcase.cpp",target=/fuzz/testcase.cpp \
    repro \
    bash -c "clang++-17 -fsanitize=address -g -O0 -o /fuzz/test /fuzz/testcase.cpp -I/fuzz/install/include -L/fuzz/install/lib -lamphtmlparser && /fuzz/test"


Additional Info

This testcase was discovered by STITCH, an autonomous fuzzing system. All reports are reviewed manually (by a human) before submission.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions