-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Description
Hi, there is a potential bug in the validator html parser utilitity IsUtf8WhiteSpaceChar reachable by e.g. calling SplitStrAtUtf8Whitespace with a string containing \xff.
This bug was reproduced on a43d97c.
Description
The crash is a global-buffer-overflow in htmlparser::Strings::IsUtf8WhiteSpaceChar, called by SplitStrAtUtf8Whitespace. The function implements a small DFA using the global kWhitespaceTable, declared as std::array<std::array<uint8_t, 255>, 10>. Each iteration does state = kWhitespaceTable[state][c], where c is the next input byte promoted to uint8_t. When c == 255 (0xff), the code indexes column 255, which is out of bounds for a 255-element row (valid indices 0..254). AddressSanitizer reports a read just past the end of kWhitespaceTable (10*255 = 2550 bytes), matching the definition at whitespacetable.h:36.
Fortunately, neither IsUtf8WhiteSpaceChar nor SplitStrAtUtf8Whitespace appear to be used elsewhere, hence the bug does not seem to be currently triggerable.
Suggested Patch
Update the kWhitespaceTable shape to std::array<std::array<uint8_t, 256>, 10> instead of std::array<std::array<uint8_t, 255>, 10>.
POC
The following testcase demonstrates the bug:
testcase.cpp
#include <string>
#include <vector>
#include <string_view>
#include "/fuzz/install/include/cpp/htmlparser/strings.h"
int main(){
// Sequence drives the DFA into a multi-byte UTF-8 state, then feeds 0xff.
// The 0xff makes the code index kWhitespaceTable[state][255], but the table
// has only 255 columns (0..254), causing a global OOB read.
std::string s;
s.push_back((char)227); // 0xe3
s.push_back((char)128); // 0x80
s.push_back((char)255); // 0xff -> offending index
s.push_back((char)127); // arbitrary following byte
auto cols = htmlparser::Strings::SplitStrAtUtf8Whitespace(s);
size_t total=0; for (auto v: cols) total+=v.size();
volatile size_t sink = total; (void)sink;
return 0;
}
stdout
=================================================================
==1==ERROR: AddressSanitizer: global-buffer-overflow on address 0x557ec4155616 at pc 0x557ec413d350 bp 0x7fff0b674f90 sp 0x7fff0b674f88
READ of size 1 at 0x557ec4155616 thread T0
#0 0x557ec413d34f in htmlparser::Strings::SplitStrAtUtf8Whitespace(std::basic_string_view<char, std::char_traits<char>>) (/fuzz/test+0x17334f) (BuildId: cfed22865b06e91e502590dafc8e07de9485533b)
#1 0x557ec412fefc in main /fuzz/testcase.cpp:15:15
#2 0x7f39baa19d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#3 0x7f39baa19e3f in __libc_start_main csu/../csu/libc-start.c:392:3
#4 0x557ec4054b34 in _start (/fuzz/test+0x8ab34) (BuildId: cfed22865b06e91e502590dafc8e07de9485533b)
0x557ec4155616 is located 0 bytes after global variable 'htmlparser::kWhitespaceTable' defined in 'strings.cc' (0x557ec4154c20) of size 2550
SUMMARY: AddressSanitizer: global-buffer-overflow (/fuzz/test+0x17334f) (BuildId: cfed22865b06e91e502590dafc8e07de9485533b) in htmlparser::Strings::SplitStrAtUtf8Whitespace(std::basic_string_view<char, std::char_traits<char>>)
Shadow bytes around the buggy address:
0x557ec4155380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x557ec4155400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x557ec4155480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x557ec4155500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x557ec4155580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x557ec4155600: 00 00[06]f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
0x557ec4155680: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
0x557ec4155700: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
0x557ec4155780: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
0x557ec4155800: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
0x557ec4155880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==1==ABORTING
Steps to Reproduce
The crash was triaged with the following Dockerfile:
Dockerfile
# Ubuntu 22.04 with some packages pre-installed
FROM hgarrereyn/stitch_repro_base@sha256:3ae94cdb7bf2660f4941dc523fe48cd2555049f6fb7d17577f5efd32a40fdd2c
RUN git clone https://github.com/ampproject/amphtml /fuzz/src && \
cd /fuzz/src && \
git checkout a43d97cc3d240be9aad844d782796f20658999d9 && \
git submodule update --init --remote --recursive
ENV LD_LIBRARY_PATH=/fuzz/install/lib
ENV ASAN_OPTIONS=hard_rss_limit_mb=1024:detect_leaks=0
RUN echo '#!/bin/bash\nexec clang-17 -fsanitize=address -O0 "$@"' > /usr/local/bin/clang_wrapper && \
chmod +x /usr/local/bin/clang_wrapper && \
echo '#!/bin/bash\nexec clang++-17 -fsanitize=address -O0 "$@"' > /usr/local/bin/clang_wrapper++ && \
chmod +x /usr/local/bin/clang_wrapper++
# Install required build tools
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Build a minimal static library from the AMP HTML C++ htmlparser components
# Install prefix
ENV PREFIX=/fuzz/install
RUN mkdir -p ${PREFIX}/lib ${PREFIX}/include/htmlparser ${PREFIX}/include/cpp/htmlparser /tmp/build
# Compile selected sources that have no external dependencies
WORKDIR /fuzz/src/validator/cpp/htmlparser
RUN clang_wrapper++ -O2 -fPIC -I/fuzz/src/validator -I/fuzz/src/validator/cpp -I/fuzz/src/validator/cpp/htmlparser -c \
strings.cc -o /tmp/build/strings.o && \
clang_wrapper++ -O2 -fPIC -I/fuzz/src/validator -I/fuzz/src/validator/cpp -I/fuzz/src/validator/cpp/htmlparser -c \
atomutil.cc -o /tmp/build/atomutil.o && \
clang_wrapper++ -O2 -fPIC -I/fuzz/src/validator -I/fuzz/src/validator/cpp -I/fuzz/src/validator/cpp/htmlparser -c \
token.cc -o /tmp/build/token.o && \
clang_wrapper++ -O2 -fPIC -I/fuzz/src/validator -I/fuzz/src/validator/cpp -I/fuzz/src/validator/cpp/htmlparser -c \
url.cc -o /tmp/build/url.o
# Create static archive
RUN ar rcs ${PREFIX}/lib/libamphtmlparser.a /tmp/build/strings.o /tmp/build/atomutil.o /tmp/build/token.o /tmp/build/url.o && \
ranlib ${PREFIX}/lib/libamphtmlparser.a
# Install public headers in both layouts to satisfy includes
RUN for d in ${PREFIX}/include/htmlparser ${PREFIX}/include/cpp/htmlparser; do \
cp -v \
allocator.h \
atom.h \
atomutil.h \
casetable.h \
comparators.h \
defer.h \
entity.h \
glog_polyfill.h \
hash.h \
logging.h \
strings.h \
token.h \
url.h \
whitespacetable.h \
$d/; \
done
ENV CPATH=${PREFIX}/include
ENV LIBRARY_PATH=${PREFIX}/lib
ENV LD_LIBRARY_PATH=${PREFIX}/libBuild Command
clang++-17 -fsanitize=address -g -O0 -o /fuzz/test /fuzz/testcase.cpp -I/fuzz/install/include -L/fuzz/install/lib -lamphtmlparser && /fuzz/testReproduce
- Copy
Dockerfileandtestcase.cppinto a local folder. - Build the repro image:
docker build . -t repro --platform=linux/amd64- Compile and run the testcase in the image:
docker run \
-it --rm \
--platform linux/amd64 \
--mount type=bind,source="testcase.cpp",target=/fuzz/testcase.cpp \
repro \
bash -c "clang++-17 -fsanitize=address -g -O0 -o /fuzz/test /fuzz/testcase.cpp -I/fuzz/install/include -L/fuzz/install/lib -lamphtmlparser && /fuzz/test"Additional Info
This testcase was discovered by STITCH, an autonomous fuzzing system. All reports are reviewed manually (by a human) before submission.