Презентация Decompiler internals: microcode онлайн
На нашем сайте вы можете скачать и просмотреть онлайн доклад-презентацию на тему Decompiler internals: microcode абсолютно бесплатно. Урок-презентация на эту тему содержит всего 58 слайдов. Все материалы созданы в программе PowerPoint и имеют формат ppt или же pptx. Материалы и темы для презентаций взяты из открытых источников и загружены их авторами, за качество и достоверность информации в них администрация сайта не отвечает, все права принадлежат их создателям. Если вы нашли то, что искали, отблагодарите авторов - поделитесь ссылкой в социальных сетях, а наш сайт добавьте в закладки.
Презентации » Устройства и комплектующие » Decompiler internals: microcode
Оцените!
Оцените презентацию от 1 до 5 баллов!
- Тип файла:ppt / pptx (powerpoint)
- Всего слайдов:58 слайдов
- Для класса:1,2,3,4,5,6,7,8,9,10,11
- Размер файла:826.50 kB
- Просмотров:70
- Скачиваний:0
- Автор:неизвестен
Слайды и текст к этой презентации:
№2 слайд
![Presentation Outline](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img1.jpg)
Содержание слайда: Presentation Outline
Decompiler architecture
Overview of the microcode
Opcodes and operands
Stack and registers
Data flow analysis, aliasibility
Microcode availability
Your feedback
Online copy of this presentation is available at
http://www.hex-rays.com/products/ida/support/ppt/recon2018.ppt
№6 слайд
![Why microcode? It helps to](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img5.jpg)
Содержание слайда: Why microcode?
It helps to get rid of the complexity of processor instructions
Also we get rid of processor idiosyncrasies. Examples:
x86: segment registers, fpu stack
ARM: thumb mode addresses
PowerPC: multiple copies of CF register (and other condition registers)
MIPS: delay slots
Sparc: stack windows
It makes the decompiler portable. We “just” need to replace the microcode generator
Writing a decompiler without an intermediate language looks like waste of time
№8 слайд
![Why not use an existing IR?](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img7.jpg)
Содержание слайда: Why not use an existing IR?
There are tons of other intermediate languages: LLVM, REIL, Binary Ninja's ILs, RetDec's IL, etc.
Yes, we could use something
But I started to work on the microcode when none of the above languages existed
This is the main reason why we use our own IR
№9 слайд
![A long evolution I started to](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img8.jpg)
Содержание слайда: A long evolution
I started to work on the microcode in 1998 or earlier
The name is nothing fancy but reflects the nature of it
Some design decisions turned out to be bad (and some of them are already very difficult to fix)
For example, the notion of virtual stack registers
We will fix it, though. Just takes time
Even today we modify our microcode when necessary
For example, I reshuffled the instruction opcodes for this talk...
№10 слайд
![Design highlights Simplicity](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img9.jpg)
Содержание слайда: Design highlights
Simplicity:
No processor specific stuff
One microinstruction does one thing
Small number of instructions (only 45 in 1999, now 72)
Simple instruction operands (register, number, memory)
Consider only compiler generated code
Discard things we do not care about:
Instruction timing (anyway it is a lost battle)
Instruction order (exceptions are a problem!)
Order of memory accesses (later we added logic to preserve indirect memory accesses)
Handcrafted code
№11 слайд
![Generated microcode Initially](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img10.jpg)
Содержание слайда: Generated microcode
Initially the microcode looks like RISC code:
Memory loads and stores are done using dedicated microinstructions
The desired operation is performed on registers
Microinstructions have no side effects
Each output register is initialized by a separate microinstruction
It is very verbose. Example:
№15 слайд
![Minor details Reading](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img14.jpg)
Содержание слайда: Minor details
Reading microcode is not easy (but hey, it was not designed for that! :)
All operand sizes are spelled out explicitly
The initial microcode is very simple (RISC like)
As we transform microcode, nested subinstructions may appear
We implemented the translation from processor instructions to microinstructions in plain C++
We do not use automatic code generators or machine descriptions to generate them. Anyway there are too many processor specific details to make them feasible
№22 слайд
![Opcodes condition codes](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img21.jpg)
Содержание слайда: Opcodes: condition codes
Perform the operation on (l)left and (r)ight
Generate carry or overflow bits
Store CF or OF into (d)estination
We need these instructions to precisely track carry and overflow bits
Normally these instructions get eliminated during microcode transformations
№23 слайд
![Opcodes unconditional flow](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img22.jpg)
Содержание слайда: Opcodes: unconditional flow control
Initially calls have only the callee address
The decompiler retrieves the callee prototype from the database or tries to guess it
After that the 'd' operand contains all information about the call, including the function prototype and actual arguments
№26 слайд
![Opcodes miscellaneous Some](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img25.jpg)
Содержание слайда: Opcodes: miscellaneous
Some operations can not be expressed in microcode
If possible, we use intrinsic calls for them (e.g. sqrtpd)
If no intrinsic call exists, we use “ext” for them and only try to keep track of data dependencies (e.g. “aam”)
“und” is used when a register is spoiled in a way that we can not predict or describe (e.g. ZF after mul)
№28 слайд
![Operands! As everyone else,](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img27.jpg)
Содержание слайда: Operands!
As everyone else, initially we had only:
constant integer numbers
registers
Life was simple and easy in the good old days!
Alas, the reality is more diverse. We quickly added:
stack variables
global variables
address of an operand
list of cases (for switches)
result of another instruction
helper functions
call arguments
string and floating point constants
№29 слайд
![Register operands The](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img28.jpg)
Содержание слайда: Register operands
The microcode engine provides unlimited (in theory) number of microregisters
Processor registers are mapped to microregisters:
eax => microregisters (mreg) 8, 9, 10, 11
al => mreg 8
ah => mreg 9
Usually there are more microregisters than the processor registers. We allocate them as needed when generating microcode
Examples:
№30 слайд
![Stack as microregisters I was](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img29.jpg)
Содержание слайда: Stack as microregisters
I was reluctant to introduce a new operand type for stack variables and decided to map the stack frame to microregisters
Like, the stack frame is mapped to the microregister #100 and higher
A bright idea? Nope!
Very soon I realized that we have to handle indirect references to the stack frame
Not really possible with microregisters
But there was so much code relying on this concept that we still have it
Laziness pays off now and in the future (negatively)
№32 слайд
![More operand types! -bit](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img31.jpg)
Содержание слайда: More operand types!
64-bit values are represented as pairs of registers
Usually it is a standard pair like edx:eax
Compilers get better and nowadays use any registers as a pair; or even pair a stack location with a register: sp+4:esi
We ended up with a new operand type:
operand pair
It consists of low and high halves
They can be located anywhere (stack, registers, glbmem)
№33 слайд
![Scattered operands The](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img32.jpg)
Содержание слайда: Scattered operands
The nightmare has just begun, in fact
Modern compilers use very intricate rules to pass structs and unions by value to and from the called functions
A register like RDI may contain multiple structure fields
Some structure fields may be passed on the stack
Some in the floating registers
Some in general registers (unaligned wrt register start)
We had no other choice but to add
scattered operands
that can represent all the above
№36 слайд
![More detailed look at](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img35.jpg)
Содержание слайда: More detailed look at microcode transformations
The initial “preoptimization” step uses very simple constant and register propagation algorithm
It is very fast
It gets rid of most temporary registers and reduces the microcode size by two
Normally we use a more sophisticated propagation algorithm
It also works on the basic block level
It is much slower but can:
handle partial registers (propagate eax into an expression that uses ah)
move entire instruction inside another
work with operands other that registers (stack and global memory, pair and scattered operands)
№37 слайд
![Global optimization We build](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img36.jpg)
Содержание слайда: Global optimization
We build the control flow graph
Perform data flow analysis to find where each operand is used or defined
The use/def information is used to:
delete dead code (if the instruction result is not used, then we delete the instruction)
propagate operands and instructions across block boundaries
generate assertions for future optimizations (we know that eax is zero at the target of “jz eax” if there are no other predecessors; so we generate “mov 0, eax”)
№41 слайд
![Data dependency dependent](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img40.jpg)
Содержание слайда: Data dependency dependent rules
Naturally, all these rules are compiler-independent, they use common algebraic number properties
Unfortunately we do not have a language to describe these rules, so we manually added these rules in C++
However, the pattern recognition does not naively check if the previous or next instruction is the expected one. We use data dependencies to find the instructions that form the pattern
For example, the rule CMB43 looks for the 'low' instruction by searching forward for an instruction that accesses the 'x' operand:
№44 слайд
![Hooks It is possible to hook](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img43.jpg)
Содержание слайда: Hooks
It is possible to hook to the optimization engine and add your own transformation rules
The Decompiler SDK has some examples how to do it
Currently it is not possible to disable an existing rule
However, since (almost?) all of them are sound and do not use heuristics, it is not a problem
In fact the processor specific parts of the decompiler internally use these hooks as well
№45 слайд
![ARM hooks For example, the](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img44.jpg)
Содержание слайда: ARM hooks
For example, the ARM decompiler has the following rule:
so that a construct like this: BX LR
will be converted into: RET
only if we can prove that the value of LR at the "BX LR" instruction is equal to the initial value of LR at the entry point.
However, how do we find if we jump to the initial_lr? Data analysis is to help us
№46 слайд
![Data flow analysis In fact](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img45.jpg)
Содержание слайда: Data flow analysis
In fact virtually all transformation rules are based on data flow analysis. Very rarely we check the previous or the next instruction for pattern matching
Instead, we calculate the use/def lists for the instruction and search for the instructions that access them
We keep track of what is used and what is defined by every microinstruction (in red). These lists are calculated when necessary:
№47 слайд
![Use-def lists Similar lists](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img46.jpg)
Содержание слайда: Use-def lists
Similar lists are maintained for each block. Instead of calculating them on request we keep them precalculated:
We keep both “must” and “may” access lists
The values in parenthesis are part of the “may” list
For example, an indirect memory access may read any memory:
№48 слайд
![Usefulness of use-def lists](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img47.jpg)
Содержание слайда: Usefulness of use-def lists
Based on use-def lists of each block the decompiler can build global use-def chains and answer questions like:
Is a defined value used anywhere? If yes, where exactly? Just one location? If yes, what about moving the definition there? If the value is used nowhere, what about deleting it?
Where does a value come from? If only from one location, can we propagate (or even move) it?
What are the values are the used but never defined?These are the candidates for input arguments
What are the values that are defined but never used but reach the last block? These are the candidates for the return values
№51 слайд
![Data flow analysis The devil](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img50.jpg)
Содержание слайда: Data flow analysis
The devil is in details
Our analysis engine can handle partial registers (they are a pain)
Big endian and little endian can be handled as well (however, we sometimes end up with the situations when a part of the operand is little endian and another part – big endian)
The stack frame and registers are handled
Registers can be addressed only directly
Stack location can be addressed indirectly and our analysis takes this into account
Well, we have to make some assumptions...
№52 слайд
![Aliasability Take this](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img51.jpg)
Содержание слайда: Aliasability
Take this example:
can we claim that %stkvar == 1 after stx?
Naturally, in general case we can not
But it turns out that in some case we can claim it
Namely:
If we haven't taken the address of any stack variable
Or, if we did, the address we took is higher (*)
Or, if the address is lower, it was not moved into eax
Overall it is a tough question
№54 слайд
![Minimal stack reference](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img53.jpg)
Содержание слайда: Minimal stack reference
Aliasability is unsolvable problem in general
We should optimize things only if we can prove the correctness of the transformation
We keep track of expressions like &stkvar and calculate the minimal reference (minstkref)
We assume that everything below minstkref can be accessed only directly, i.e. is not aliasable
We propagate this information over the control graph
One value is maintained per block (we could probably improve things by calculating minstkref for each instruction)
A similar value is maintained for the incoming stack arguments (minargref)
№56 слайд
![Testing the microcode](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img55.jpg)
Содержание слайда: Testing the microcode
Microcode if verified for consistency after every transformation
BTW, third party plugins should do the same
Very few microcode related bug reports
We have quite extensive test suites that constantly grow
A hundred or so of processors cores running tests
However, after publishing microcode there will be a new wave of bug reports
Found a bug? Send us the database with the description how to reproduce it
Most problems are solved within one day or faster
№57 слайд
![Publishing microcode The](/documents_6/b34b3f99223d3ced53f1494951a9f70d/img56.jpg)
Содержание слайда: Publishing microcode
The microcode API for C++ will be available in the next version of IDA
Python API won't be available yet
We will start beta testing the next week
Decompiler users with active support: feel free to send an email to support@hex-rays.com if you want to participate
Check out the sample plugins that show how to use the new API
Скачать все slide презентации Decompiler internals: microcode одним архивом:
Похожие презентации
-
Advanced x86. BIOS and System Management Mode Internals Reset Vector
-
Циклы с параметром
-
Графические возможности Pascal
-
Графические возможности яп VB
-
Переменные в VB
-
Строковый тип данных в языке программирования Pascal
-
Для чего используется моделирование?
-
Создание простых механизмов с помощью конструктора LegoWeDo
-
Типы данных. Программирование на Паскале
-
Идентификаторы переменных
-
Задачи с функциями