evaluation of VMAttack IDA plugin
My evaluation for this plugin is: it's too young but promising.
I firstly introduce the plugin, then evaluate it from two aspects: Automated and Manual Analysis.
Introduction
VMAttack is an IDA plugin which generates and analyses trace log of PE. If trace is not validly produced, the plugin is useless.
Trace generation is automatic and upon completion it will produce a success notification in IDAs output window.
Traversed paths will be colored in a shade of blue, where a darker shade represents a higher number of traversals. We can get a global distribution of traced code with a glance.
Initially it shows system and customer calls and args, this is useful when the PE has explicit function boundaries and give gross view. VMAttack can STEP over system funcs while extract args, which save sapce for trace.
As the best way to understand this plugin is to practice it, so I also collected ALL the tools and writed installer( install.bat ) for praticing.
The demo samps include the obfuscated binary and source binary of an add function:
addvm_3AE2BABAA4920BEF3E466F34B0075FFB.exe
addvm_B4E34E39CFDD13E65D070E9FB9717620.vmp.exe
They are available at https://github.com/anatolikalysch/VMAttack/tree/master/Example/addvmp . In the team discuss, I send an email titled as 'decode vmprotect is possible?', that email tell constuction of that vmp sample detailed, debug it with my mimic program, then practice VMAttack after perform install.bat. by this way it's easy to understand this plugin, in this way, we can better evaluate it.
Automated Analysis
Automated Analysis extract useful informations from the trace log automatically or semi-automatically. It includes Input / Output Analysis, Clustering Analysis, Grading Snalysis, Dynamic Trace Optimization, Static deobfuscate.
Input / Output Analysis
The input/output analysis could provide leads as to how the input arguments of the VM function are used and whether there is a connection between function input and function output.
evaluation: for realworld samples, connection between function input and output can be exposed, but not obviously, not very clear.
Clustering Analysis
If a group of insts executed more than one time, they may be taken as cluster. For example, if Cluster Heuristic Threshold set as 3, then if an address is encountered more than twice, it's taken as start of a cluster.
evaluation:Clustering is a good feature, it can folder and reduce trace by a lot, especially Greedy Clustering option is set. VMAttack can quickly remove unnecessary clusters. It can also rollback wrongly removed clusters. If basic block detection was not deactivated in the settings, the clusters themselves are additionally subdivided into basic blocks.The basic block description is a good summary, further more, instructions whose computations are simply overridden are not displayed, which is good feature of in-block deobfuscation.
Grading Analysis
Each inst, block of insts, cluster of insts has different importance. The grade of an inst is affected by Memory usage Importance, Clustering Importance, Input/Output Importance, and so on.
At the end of the grading analysis the now graded trace will be presented in the grading viewer. The trace can now be filtered either by double clicking a grade or via context menu where the user will be prompted to input the grade threshold to display.
evaluation:Clustering Analysis is useful. For example, if you decide Input/Output Analysis is very very important, then inst having largest grade should be the inst do the add op on two adders.
Dynamic Trace Optimization
Dynamic Trace Optimizations which make the trace easier to read.
evaluation:Foldering constants, Folding not used operand are good feature of deobfuscations.
Static deobfuscate
The static deobfuscate function tries to statically determine the instructions that will be executed by the byte code in the provided virtual machine function. The semi-automatic version of this analysis tries to determine all necessary values(later will introduce the values) automatically.
evaluation:refer to the Manual Analysis version of Static deobfuscate for the evaluation.
Manual Analysis
most Manual Analysis features are depending on following VM Context model:
- Code Start - the byte code start, vm_insts, exactly vm_insts_start
- Code End - the byte code end, vm_insts_end
- VM Addr - the start address of the virtual machine function(Protect Func, for short, pf);
- Base Addr - the base address of the jump table(the dispatch table, or , insts_engine), for vmp:
.vmp0:00404339 8A 06 mov al, [esi]
.vmp0:0040433B 0F B6 C0 movzx eax, al ; op code
.vmp0:0040433E 83 C6 01 add esi, 1
.vmp0:00404341 FF 24 85 9C 43 40 00 jmp dword ptr ds:inst_engines[eax*4]
There are three ways to decide VM Context:
- by the Settings menu entry
- by Manual_Analysis->VM_Context's 'find statically' or 'find dynamically' entry.
Following are so called Manual Analysis features:
- Find VM Function(Protect Func, pf) Input Parameter, the plugin will print "BABE5 , OFFSET WORD_40489A , AFFE1 “, BABE5 and AFFE1 are passed from Protected Func(pdf), WORD_40489A are vm_insts.
evaluation: useful
- Find VM Function Output Parameter, for the demo sample addvmp:
.text:0040102E ; .text:00401000
.text:0040102E ; .text:00401000 55 push ebp
.text:0040102E ; .text:00401001 89 E5 mov ebp, esp
.text:0040102E ; .text:00401003 8B 55 08 mov edx, [ebp+arg_0]
.text:0040102E ; .text:00401006 8B 45 0C mov eax, [ebp+arg_4]
.text:0040102E ; .text:00401009 01 D0 add eax, edx
.text:0040102E ; .text:0040100B 5D pop ebp
.text:0040102E ; .text:0040100C C3 retn
edi:0
eax:16ABC6 //affected
ebp:28FF88
esp:28FF60
edx:AFFE1 //affected
ebx:7EFDE000
esi:0
ecx:76728E8A
evaluation: very useful
- Find Virtual Reg to Reg mapping, for the demo sample addvmp:
.vmp0:0040432C 89 E5 mov ebp, esp ; vms_top
.vmp0:0040432E 81 EC C0 00 00 00 sub esp, 0C0h ; vmd , vm data, virtual registers
edi:28FF3C
eax:28FF58
ebp:28FF4C
edx:28FF50
ebx:28FF44
esi:28FF48
ecx:28FF54
evaluation: not checked, I will not trust this feature
- Follow Virtual Register: This provides a manual interface to the register tracking functionality.
evaluation: not useful, will not use this feature
- The address count reads in a trace and returns in IDAs output window the ratio: (Address (disasm): frequency of occurrence)
evaluation: not useful; except when used as counter of condition breakpoint
Except for previous features, the plugin provide an "Deobfuscate from..." menu, it seems it try to deobfuscate vm byte code, but I believe this feature is not realized.