4,367 views
HITB2012AMS Day 2 – Taint Analysis
Automatically Searching for Vulnerabilities: How to use Taint Analysis to find Security Flaws
(by Alex Bazhanyuk (not present) and Nikita Tarakanov, Reverse Engineers, CISS)
Nikita explains they have been working on reversing binaries and auditing source code for a long time. Alex currently works on the BitBlaze work, and moved to the US to be able to work on security research in a better way. The presentation is based on work done by Alex and Nikita a while ago, before Alex moved from Ukrain to the US.
Nikita, an independent researcher, enjoys reversing kernels.
The agenda for the talk contains the following topics:
- Taint Analysis
- BitBlaze theory
- SASV implementation
- Lulz Time
- Pitfalls
- Conclusion
Taint Analysis
Nikita explains that they mainly focused on IDA Pro plugins and BitBlaze (Vine + utils, TEMU + plugins). Nikita explains that BitBlaze needed customization to work properly.
Most people look for vulnerabilities by fuzzing, generate mutation cases, etc. Nikita explains that, when the protocol implements crypto, CRC checks or uses unknown formats, fuzzing might not be very easy. A better way is to use taint analysis. From a taint source perspective, you can taint network input/output, keyboard input, memory, disk, function output, etc. The idea is to follow the tainted data and trace how the application behaves when processing the tainted data.
There are a couple of ways to perform taint analysis:
Static taint analysis : analysis performed over multiple paths of a program (mostly performed within IDA Pro). It’s typically performed on a control flow graph, where statements are nodes, and there is an edge between nodes if there is a possible transfer of control.
Dynamic taint analysis. To perform dynamic taint analysis, the researchers used BitBlaze. It will allow you to automatically extract security-related properties from binary code. It was build as a unified binary analysis platform for security, leverages recent advances in program analysis, formal methods, binary instrumentation, and can greatly decrease the amount of time to find/detect exploitable conditions.
BitBlaze
BitBlaze contains of a couple of components. It has an emulator, and taint analysis engine and a semantics extractor, made available to plugins via a TEMUAPI interface. TEMU is based on older versions of QEMU making it slow and buggy. TEMU is just used to perform tracing.
VINE is an intermediate language, sits in between the tracing (TEMU) and the output (graphs, logs, etc). Nikita dives into some details about the IL and STP.
SASV Components
To set up the SASV environment, they used
- Temu
- Vine
- STP
- IDA Plugins (Dangerousfunctions, IndirectCalls, ida2sql (zynamics)) to find calls to dangerous functions, find indirect jumps and calls, and to load idb into mysql
- iterators – wrapper for temu, vine, stp
- various publishers (for DeviceIOControl etc)
To optimize the SASV experience, Nikita explains, the minimum goal is to get maximum coverage of dangerous code. The max goal is to have max coverage of all of the code.
The basic SASV algorithm contains the following steps:
- First, using IDA plugins, the dangerous places in the app are identified.
- Using publishers, they invoke the targeted code and start using TEMU to trace.
- Trace -> appreplay -> IL
- IL -> change path algo – IL’ (change symbolic execution)
- wputil -> stp’ code
- stp
- repeat
There are some disadvantages. the definition of vulnerabilities is difficult and things can be very very slow, depending on the required functionality, and overhead introduced by hooking functions. On top of that, if you’re tracing big applications, the trace log file might be huge, and appreplay may not even to use it.
To enhance the performance of the process, Nikita says, it would probably be a good idea to get rid of the QEMU layer altogether… but it would be a huge task to do so.
Nikita continues by explaining that automated exploit generation would require you to build primitives (within the correct exploitation state), deal with a lot of exploit mitigations… and that EIP control does not mean you can build a weaponized exploit nowadays. It would require the automation of finding memory disclosures as well. :)
Unfortunately the flow of this talk was a bit slow. With lots of time spent on the BitBlaze components and Intermediate Language, the speaker had to rush a bit at the end, which was a pity (because I had the impression it had more interesting content than the first part of the presentation).
© 2012, Peter Van Eeckhoutte (corelanc0d3r). All rights reserved.