Update OEMCrypto fuzzing documentation

- Add details for triaging crashes and writing fuzz tests.
- Move internal documentation not needed by partners to g3doc.
- Remove infrastructure details covered in the design document.

Change-Id: Ib60b2bea954f4371595b0f891434e2274366fdd2
This commit is contained in:
Ian Benz
2023-08-04 16:56:07 +00:00
committed by Robert Shih
parent 9a24732f5b
commit e19927f4bf
3 changed files with 171 additions and 269 deletions

View File

@@ -1,229 +1,144 @@
# OEMCRYPTO Fuzzing
# OEMCrypto fuzzing
Refer to [Setting up Clusterfuzz](build_clusterfuzz.md) if you are interested
in setting up a local instance of cluster fuzz to run fuzzing on your own
OEMCrypto implementations on linux.
ClusterFuzz and Google Cloud infrastructure continuously runs OEMCrypto fuzz
tests and reports crashes. To create a new automated fuzzing setup, refer to
[*ClusterFuzz setup*][1].
## Objective
## Run fuzz tests locally
* Run fuzzing on OEMCrypto public APIs on linux using google supported
clusterfuzz infrastructure to find security vulnerabilities.
Design Document - https://docs.google.com/document/d/1mdSV2irJZz5Y9uYb5DmSIddBjrAIZU9q8G5Q_BGpA4I/edit?usp=sharing
Fuzzing at google -
[go/fuzzing](https://g3doc.corp.google.com/security/fuzzing/g3doc/fuzzing_resources.md?cl=head)
## Monitoring
### Cluster fuzz statistics
* Performance of OEMCrypto fuzz binaries running continuously using cluster
fuzz infrastructure can be monitored
[here](https://clusterfuzz.corp.google.com/fuzzer-stats).
The options to select are `Job type: libfuzzer_asan_oemcrypto` and `Fuzzer:
fuzzer name you are looking for`
Example: [load_license_fuzz](https://clusterfuzz.corp.google.com/fuzzer-stats?group_by=by-day&date_start=2022-07-11&date_end=2022-07-17&fuzzer=libFuzzer_oemcrypto_load_license_fuzz&job=libfuzzer_asan_oemcrypto)
### Issues filed by clusterfuzz - Fixing those issues
* Any issues found with the fuzz target under test are reported by clusterfuzz
[here](https://b.corp.google.com/hotlists/2442954).
* The bug will have a link to the test case that generated the bug. Download
the test case and follow the steps from
[testing fuzzer locally](#testing-fuzzer-locally) section to run the fuzzer
locally using the test case that caused the crash.
* Once the issue is fixed, consider adding the test case that caused the crash
to the seed corpus zip file. Details about seed corpus and their location
are mentioned in
[this section](#build-oemcrypto-unit-tests-to-generate-corpus).
## Corpus
* Once the fuzzer scripts are ready and running continuously using clusterfuzz
or android infrastructure, we can measure the efficiency of fuzzers by
looking at code coverage and number of new features that have been
discovered by fuzzer scripts here Fuzz script statistics.
A fuzzer which tries to start from random inputs and figure out intelligent
inputs to crash the libraries can be time consuming and not effective. A way
to make fuzzers more effective is by providing a set of valid and invalid
inputs of the library so that fuzzer can use those as a starting point.
These sets of valid and invalid inputs are called corpus.
The idea is to run OEMCrypto unit tests and read required data into binary
corpus files before calling into respective OEMCrypto APIs under test.
Writing corpus data to binary files is controlled by --generate_corpus flag.
### Build OEMCrypto unit tests to generate corpus
* Install Pre-requisites
1. Build the fuzz tests:
```shell
$ sudo apt-get install gyp ninja-build
$ cd <cdm_repo_path>
$ oemcrypto/test/fuzz_tests/build_oemcrypto_fuzztests
```
* download cdm source code (including ODK & OEMCrypto unit tests):
2. Run the fuzz test:
```shell
$ git clone sso://widevine-internal/cdm
$ out/Default/<fuzz_test> [<corpus_dir>]
```
* Build OEMCrypto unit tests and run with --generate_corpus flag to generate
corpus files:
The corpus directory is optional and can either be a seed corpus from the
`corpus` subdirectory or be an empty directory. The corpus will be extended
with new inputs while the fuzz test is running.
## Triage crashes
To reproduce a crash locally for debugging:
1. Download the minimized testcase from the ClusterFuzz report.
2. In the *cdm* repository, switch branches based on the Fuzz Target prefix.
The prefix is *oemcrypto* for the master branch, *oemcrypto_v17* for the
oemcrypto-v17 branch, *oemcrypto_v18* for the oemcrypto-v18 branch, etc.
3. Build the fuzz tests:
```shell
$ cd /path/to/cdm/repo
$ export CDM_DIR=/path/to/cdm/repo
$ export PATH_TO_CDM_DIR=..
$ gyp --format=ninja --depth=$(pwd) oemcrypto/oemcrypto_unittests.gyp
$ ninja -C out/Default/
$ mkdir oemcrypto/test/fuzz_tests/corpus/<fuzzername>_seed_corpus
# Generate corpus by excluding buffer overflow tests.
$ ./out/Default/oemcrypto_unittests --generate_corpus \
--gtest_filter=-"*Huge*"
$ oemcrypto/test/fuzz_tests/build_oemcrypto_fuzztests
```
* There can be lot of duplicate corpus files that are generated from unit
tests. We can minimize the corpus files to only a subset of files that
cover unique paths within the API when run using fuzzer. Run following
command to minimize corpus.
4. Debug the crash:
```shell
$ cd /path/to/cdm/repo
# build fuzzer binaries
$ ./oemcrypto/test/fuzz_tests/build_oemcrypto_fuzztests
$ mkdir /tmp/minimized_corpus
# minimize corpus
$ ./out/Default/<fuzz_target_binary> -merge=1 /tmp/minimized_corpus \
<FULL_CORPUS_DIR>
$ gdb --args <fuzz_test_path> -timeout=0 <testcase_path>
```
* To avoid uploading huge binary files to git repository, the minimized corpus
files will be saved in fuzzername_seed_corpus.zip format in blockbuster
project's oemcrypto_fuzzing_corpus GCS bucket using gsutil. If you need
permissions for blockbuster project, contact widevine-engprod@google.com.
Example after substituting fuzz test and test case paths:
```shell
$ gsutil cp gs://oemcrypto_fuzzing_corpus/<fuzzername_seed_corpus.zip> \
<destination_path>
$ gdb --args out/Default/oemcrypto_opk_decrypt_cenc_fuzz -timeout=0 \
clusterfuzz-testcase-minimized-oemcrypto_v17_opk_decrypt_cenc_fuzz-6727459932078080
```
## Testing fuzzer locally
5. If reproducing the crash is unsuccessful, download the unminimized testcase
from the ClusterFuzz report and try again. If still unsuccessful, this may
indicate there is a persistent state issue with the fuzz test.
* Corpus needed to run fuzz tests locally are available in blockbuster
project's oemcrypto_fuzzing_corpus GCS bucket. If you need permissions for
this project, contact widevine-engprod@google.com. Download corpus.
Once the root cause of the crash is identified, its severity and complexity
should be assessed. The [*SEI CERT C Coding Standard*][2] is a good resource for
risk assessment. The ClusterFuzz report will also provide input in the Security
field. For complex fixes with a longer timeline, ClusterFuzz may report
duplicate crashes with the same root cause.
```shell
$ gsutil cp gs://oemcrypto_fuzzing_corpus/<fuzzername_seed_corpus.zip> \
<destination_path>
```
## Write fuzz tests
* Add flags to generate additional debugging information. Add '-g3' flag to
oemcrypto_fuzztests.gypi cflags_cc in order to generate additional debug
information locally.
While fuzzing has random elements, input data mutations are heavily influenced
by coverage feedback. Since discovering new control flow edges is a time
consuming process, input bytes should map to control flow edges in a simple,
predictable way. [`FuzzedDataProvider`][3], a class supplied with LLVMs
libFuzzer, can be used to easily split input data:
* Build and test fuzz scripts locally using following commands. The build
script builds fuzz binaries for opk implementation.
```shell
$ cd PATH_TO_CDM_DIR
$ ./oemcrypto/test/fuzz_tests/build_oemcrypto_fuzztests
$ mkdir /tmp/new_interesting_corpus
$ ./out/Default/fuzzer_binary /tmp/new_interesting_corpus \
/path/to/fuzz/seed/corpus/folder
```
* In order to run fuzz script against a crash input, follow the above steps
and run the fuzz binary against crash input rather than seed corpus.
```shell
$ ./out/Default/fuzzer_binary crash_input_file
```
## Adding a new OEMCrypto fuzz script
* In order to fuzz a new OEMCrypto API in future, a fuzz script can be added
to oemcrypto/test/fuzz_tests folder which starts with oemcrypto and ends
with fuzz.cc(GCB build script for oemcrypto fuzzers expects the format).
* In the program, define the function LLVMFuzzerTestOneInput with the following signature:
```
```cpp
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
<your test code goes here>
return 0;
FuzzedDataProvider fuzzed_data(data, size);
// One bit of input data maps to this control flow edge:
if (fuzzed_data.ConsumeBool()) {
// ...
}
// ...
}
```
*Note*: Make sure LLVMFuzzerTestOneInput calls the function you want to fuzz.
* Add a new target to oemcrypto_fuzztests.gyp file and follow instructions in
[testing fuzzer locally](#testing-fuzzer-locally) to build and test locally.
Fuzzing API methods with complex, structured input, may benefit from a seed
corpus containing a representative set of starting inputs. Unfortunately,
`FuzzedDataProvider` is not suitable for fuzz tests utilizing a seed corpus
since there is no equivalent serialization functionality for generating the
corpus. OEMCrypto fuzz tests have previously used struct-based serialization,
but this is no longer recommended due to portability issues. Protocol Buffers or
another portable serialization format should be considered instead.
## Building OEMCrypto fuzz scripts and uploading them to Google Cloud Storage:
Fuzz tests must be deterministic to reproduce and debug a crash. A common
pitfall is not resetting the OEMCrypto API state between calls to
`LLVMFuzzerTestOneInput`. Fully terminating OEMCrypto between inputs is
preferred, but in some cases, it may be necessary to implement careful
optimizations to achieve acceptable performance. Candidates for optimization
typically have less than 1000 executions per second (exec/s).
`LLVMFuzzerInitialize` can be used for global initialization, but there is no
corresponding termination method.
* We are using Google Cloud Buid (GCB) in order to setup continuous
integration which uploads OEMCrypto fuzz binaries to Google Cloud Storage.
GCB expects build script in form of a docker image that is uploaded to
Google Container Registry(GCR).
A good starting example is [`oemcrypto_install_oem_private_key_fuzz.cc`][4].
Targets should be added to `oemcrypto_opk_fuzztests.gyp` and, if the fuzz test
applies to partner OEMCrypto implementations, `partner_oemcrypto_fuzztests.gyp`.
The infrastructure expects that the target name starts with *oemcrypto* and ends
with *fuzz*.
The cloud build scripts (docker images) for widevine projects are
[here](https://widevine-internal.googlesource.com/cloud/+/refs/heads/master/docker/README.md)
For additional information about writing fuzz tests, see
[*What makes a good fuzz target*][5].
Refer to README of the project to setup a new docker image and uploading
the image to GCR.
## Generate corpus with OEMCrypto unit tests
* Git on borg repository needs to be integrated with GCB and a git trigger
needs to be set up in order to achieve continuous integration. Git trigger
will mention which docker image the GCB needs to use in order to build fuzz
binaries. GCB searches for docker images from GCR.
1. Build the unit tests:
Design document lists the steps to create a git trigger.
```shell
$ cd <cdm_repo_path>
$ export CDM_DIR=${PWD}
$ export PATH_TO_CDM_DIR=..
$ gyp --format=ninja --depth=${PWD} oemcrypto/oemcrypto_unittests.gyp
$ ninja -C out/Default
```
### Adding a new fuzz script to the build script:
2. Run the unit tests with the `--generate_corpus` flag:
* As long as a new fuzz script is added which starts with oemcrypto and ends
with fuzz, the build command can be added to build_oemcrypto_fuzztests.
GCB script uses build_oemcrypto_fuzztests script to build fuzz binaries
and make them available for clusterfuzz to run continuously.
```shell
$ mkdir oemcrypto/test/fuzz_tests/corpus/<fuzz_test>_seed_corpus
$ out/Default/oemcrypto_unittests --generate_corpus --gtest_filter='-*Huge*'
```
* If the new fuzzer cannot follow the naming convention OR GCB script needs
to be updated for any other reason, refer to [this section](https://docs.google.com/document/d/1mdSV2irJZz5Y9uYb5DmSIddBjrAIZU9q8G5Q_BGpA4I/edit#heading=h.bu9yfftdonkg)
section.
3. The unit tests can generate many duplicate corpus files. To minimize the
corpus to only the subset of inputs that cover unique paths within the API:
## Generate code coverage reports locally
```shell
$ oemcrypto/test/fuzz_tests/build_oemcrypto_fuzztests
$ mkdir /tmp/minimized_corpus
$ out/Default/<fuzz_test> -merge=1 /tmp/minimized_corpus <full_corpus_dir>
```
* Code coverage is a means of measuring fuzzer performance. We want to make
sure that our fuzzer covers all the paths in our code and make any tweeks to
fuzzer logic so we can maximize coverage to get better results.
Coverage reports for git on borg project is not automated and needs to be
generated manually. Future plan is to build a dashboard for git on borg
coverage reports.
### Generate code coverage reports using script from Google cloud build
* A docker image with script to generate code coverage reports for oemcrypto
fuzz scripts is linked with a GCB trigger
`oemcrypto-fuzzing-code-coverage-git-trigger`. More information about clang
source based coverage can be found
[here](https://clang.llvm.org/docs/SourceBasedCodeCoverage.html).
* This trigger when invoked compiles oemcrypto fuzz scripts with clang source
based code coverage enabled, downloads latest corpus from cluster fuzz
for the respective fuzzer, generates and uploads code coverage html reports
to [GCS](https://pantheon.corp.google.com/storage/browser/oemcrypto_fuzzing_code_coverage_reports;tab=objects?forceOnBucketsSortingFiltering=false&project=google.com:blockbuster-1154&prefix=).
* The trigger can be invoked manually using cloud scheduler
`oemcrypto_fuzzing_code_coverage_reports`.
* In order to generate latest code coverage reports from master branch,
go to pantheon->cloud scheduler->oemcrypto_fuzzing_code_coverage_reports and
click on `RUN NOW` button.
* The above step should invoke a google cloud build. Go to cloud build console
and find latest build job with Trigger Name
`oemcrypto-fuzzing-code-coverage-git-trigger`.
* Once the build job is successful, latest code coverage reports can be
downloaded from [GCS](https://pantheon.corp.google.com/storage/browser/oemcrypto_fuzzing_code_coverage_reports;tab=objects?forceOnBucketsSortingFiltering=false&project=google.com:blockbuster-1154&prefix=).
The coverage report folder uploaded to GCS is appended with timestamp.
[1]: clusterfuzz_setup.md
[2]: https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard
[3]: https://github.com/llvm/llvm-project/blob/main/compiler-rt/include/fuzzer/FuzzedDataProvider.h
[4]: oemcrypto_install_oem_private_key_fuzz.cc
[5]: https://github.com/google/fuzzing/blob/master/docs/good-fuzz-target.md

View File

@@ -1,14 +1,14 @@
# OEMCRYPTO Fuzzing - Build clustefuzz and run fuzzing
# ClusterFuzz setup
[ClusterFuzz][1]
## Objective
* Run fuzzing on OEMCrypto public APIs on linux by building open sourced
clusterfuzz source code in order to find security vulnerabilities.
[Clusterfuzz][1]
* Run fuzzing on OEMCrypto public APIs on Linux by building open sourced
ClusterFuzz source code in order to find security vulnerabilities.
* Partners who implement OEMCrypto can follow these instructions to build
clusterfuzz, the fuzzing framework and run fuzzing using fuzzer scripts
ClusterFuzz, the fuzzing framework and run fuzzing using fuzzer scripts
provided by the Widevine team at Google.
## Glossary
@@ -17,32 +17,32 @@
inputs are fed to APIs in order to crash those, thereby catching any
security vulnerabilities with the code.
* Fuzzing engines - [libfuzzer][4], afl, honggfuzz are the actual fuzzing
engines that get the coverage information from API, use that to generate
more interesting inputs which can be passed to fuzzer.
* Fuzzing engines - [libFuzzer][4], AFL, Honggfuzz, etc. are the actual
fuzzing engines that get the coverage information from API, use that to
generate more interesting inputs which can be passed to fuzzer.
* Seed corpus - Fuzzing engine trying to generate interesting inputs from an
empty file is not efficient. Seed corpus is the initial input that a fuzzer
can accept and call the API with that. Fuzzing engine can then mutate this
seed corpus to generate more inputs to fuzzer.
* Clusterfuzz - ClusterFuzz is a scalable fuzzing infrastructure that finds
* ClusterFuzz - ClusterFuzz is a scalable fuzzing infrastructure that finds
security and stability issues in software. Google uses ClusterFuzz to fuzz
all Google products. Clusterfuzz provides us with the capability, tools to
all Google products. ClusterFuzz provides us with the capability, tools to
upload fuzz binaries and make use of the fuzzing engines to run fuzzing,
find crashes and organizes the information. Clusterfuzz framework is open
find crashes and organizes the information. ClusterFuzz framework is open
sourced, the source code can be downloaded and framework can be built
locally or by using google cloud.
locally or by using Google Cloud.
* Fuzzing output - Fuzzing is used to pass random inputs to API in order to
ensure that API is crash resistant. We are not testing functionality via
fuzzing. Fuzz scripts run continuously until they find a crash with the API
under test.
## Building fuzz scripts
## Build fuzz scripts
This section outlines the steps to build fuzz binaries that can be run
continuously using clusterfuzz.
continuously using ClusterFuzz.
> **Note:** All the directories mentioned below are relative to cdm repository
> root directory.
@@ -50,8 +50,8 @@ continuously using clusterfuzz.
1. Fuzz scripts for OEMCrypto APIs are provided by the Widevine team at Google
located under `oemcrypto/test/fuzz_tests` directory.
> **Note:** Prerequisites to run the following step are [here][10]. We also need
> to install ninja.
> **Note:** Prerequisites to run the following step are [here][10]. We also
> need to install Ninja.
2. Build a static library of your OEMCrypto implementation.
* Compile and link your OEMCrypto implementation source with
@@ -64,11 +64,9 @@ continuously using clusterfuzz.
* This will generate fuzz binaries under the `out/Default` directory.
> **Note:** Alternatively, you can use your own build systems, for which you
> will need to define your own build files with the OEMCrypto fuzz source files
> included. You can find the the fuzz source files in
> will need to define your own build files with the OEMCrypto fuzz source
> files included. You can find the the fuzz source files in
> `oemcrypto/test/fuzz_tests/partner_oemcrypto_fuzztests.gyp` and
> `oemcrypto/test/fuzz_tests/partner_oemcrypto_fuzztests.gypi`.
@@ -78,7 +76,7 @@ continuously using clusterfuzz.
4. Create a zip file `oemcrypto_fuzzers_yyyymmddhhmmss.zip` with fuzz binaries
and respective seed corpus zip files. Structure of a sample zip file with
fuzzer binaries and seed corpus would look like following:
fuzzer binaries and seed corpus would look like this:
```
* fuzzerA
@@ -88,59 +86,58 @@ continuously using clusterfuzz.
* fuzzerC (fuzzerC doesn't have seed corpus associated with it)
```
## Building clusterfuzz
## Build ClusterFuzz
* OEMCrypto implementation can be fuzzed by building clusterfuzz code which is
open sourced and using it to run fuzzing. Use a Linux VM to build
clusterfuzz.
OEMCrypto implementation can be fuzzed by building ClusterFuzz code, which is
open source, and using it to run fuzzing. Use a Linux VM to build ClusterFuzz.
> **Note:** You may see some issues with python modules missing, please install
> those modules if you see errors. If you have multiple versions of python on
> **Note:** You may see some issues with Python modules missing. Please install
> those modules if you see errors. If you have multiple versions of Python on
> the VM, then use `python<version> -m pipenv shell` when you are at [this][3]
> step.
* Follow these [instructions][2] in order to download clusterfuzz repository,
build it locally or create a continuous fuzz infrastructure setup using
google cloud.
Follow these [instructions][2] in order to download the ClusterFuzz repository,
build it locally or create a continuous fuzz infrastructure setup using Google
Cloud.
## Running fuzzers on local clusterfuzz instance
## Run fuzzers on local ClusterFuzz instance
* If you prefer to run fuzzing on a local machine instead of having a
production setup using google cloud, then follow these [instructions][6] to
add a job to the local clusterfuzz instance.
If you prefer to run fuzzing on a local machine instead of having a production
setup using Google Cloud, then follow these [instructions][5] to add a job to
the local ClusterFuzz instance.
> **Note:** Job name should have a fuzzing engine and sanitizer as part of it. A
> libfuzzer and asan jobs should have libfuzzer_asan in the job name.
> libFuzzer and AddressSanitizer job should have libfuzzer_asan in the job name.
* Create a job e:g:`libfuzzer_asan_oemcrypto` and upload previously created
`oemcrypto_fuzzers_yyyymmddhhmmss.zip` as a custom build. Future uploads of
zip file should have a name greater than current name. Following the above
naming standard will ensure zip file names are always in ascending order.
* Once the job is added and clusterfuzz bot is running, fuzzing should be up
* Once the job is added and ClusterFuzz bot is running, fuzzing should be up
and running. Results can be monitored as mentioned [here][6].
* On a local clusterfuzz instance, only one fuzzer is being fuzzed at a time.
* On a local ClusterFuzz instance, only one fuzzer is being fuzzed at a time.
> **Note:** Fuzzing is time consuming. Finding issues as well as clusterfuzz
> **Note:** Fuzzing is time consuming. Finding issues as well as ClusterFuzz
> regressing and fixing the issues can take time. We need fuzzing to run at
> least for a couple of weeks to have good coverage.
## Finding fuzz crashes
## Find fuzz crashes
Once the clusterfuzz finds an issue, it logs crash information such as the
* Once the ClusterFuzz finds an issue, it logs crash information such as the
build, test case and stack trace for the crash.
* Test cases tab should show the fuzz crash and test case that caused the
crash. Run `./fuzz_binary <test_case>` in order to debug the crash locally.
More information about different types of logs is as below:
More information about different types of logs is below:
* [Bot logs][7] will show information related to fuzzing, number of crashes
that a particular fuzzer finds, number of new crashes, number of known
crashes etc.
* [Local GCS][8] in your clusterfuzz checkout folder will store the fuzz
* [Local GCS][8] in your ClusterFuzz checkout folder will store the fuzz
binaries that are being fuzzed, seed corpus etc.
* `local_gcs/test-fuzz-logs-bucket` will store information related to fuzz
@@ -151,16 +148,16 @@ More information about different types of logs is as below:
* `/path/to/my-bot/clusterfuzz/log.txt` will have any log information from
fuzzer script and OEMCrypto implementation.
## Fixing issues
## Fix issues
* Once you are able to debug using the crash test case, apply fix to the
1. Once you are able to debug using the crash test case, apply fix to the
implementation, create `oemcrypto_fuzzers_yyyymmddhhmmss.zip` with latest
fuzz binaries.
* Upload the latest fuzz binary to the fuzz job that was created earlier.
2. Upload the latest fuzz binary to the fuzz job that was created earlier.
Fuzzer will recognize the fix and mark the crash as fixed in test cases tab
once the regression finishes. You do not need to update crashes as fixed,
clusterfuzz will do that.
ClusterFuzz will do that.
[1]: https://google.github.io/clusterfuzz/
[2]: https://google.github.io/clusterfuzz/getting-started/

View File

@@ -1,10 +0,0 @@
#include <stddef.h>
#include <stdint.h>
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
if (size > 0 && data[0] == 'H')
if (size > 1 && data[1] == 'I')
if (size > 2 && data[2] == '!')
__builtin_trap();
return 0;
}