[EVM - 2] Bytecode, Opcode and the stack.
Now that we know what a stack data structure looks like, let's find out how it is used in the EVM. But first, we need to know what a bytecode and opcode is.
Whenever you compile a Solidity source (.sol) file, you get .abi
and .bin
output.
Note: Solidity is not the only high level language that compiles into instruction set for the EVM. Another example of EVM high level language is Vyper. There are others as well, but solidity is the most popular one, and for the purposes of this course, we are sticking with Solidity
If you have ever used Remix, you know that once you compile your source code, you will see the two files here:
You will get access to these files no matter what smart contract framework you use. All the major smart contract building frameworks e.g. Truffle, Hardhat, Brownie, Foundry, and Dapp-tools gives you access to these files one way or the other. This is not a Remix only thing.
The ABI is a contract's "Application Binary Interface" which tells a third party application how to interact with your smart contract. Essentially, it maps what your functions are, and what parameters you need to send to those functions to interact with them successfully.
In short ( and according to the solidity docs), it is the standard way to interact with contracts in the Ethereum ecosystem, both from outside the blockchain and for contract-to-contract interaction.
If you are using JavaScript to interact with the blockchain ( like what we did in our Ethers JS track), this is the file what ethersJs or web3Js libraries will use to talk to your smart contract.
Just to be clear, you are not limited to JavaScript. You can write applications in Python using web3Py, and it uses ABI as well to interact with a smart contract. Other languages have some sort of helper libraries as well, or you can always roll out your own.
For this track, we do not really care about the .abi
file, we will dive down deeper in the bytecode contained in the .bin
file and how it interacts with the EVM.
The .bin or the binary file holds the bytecode that actually interacts with EVM and is deployed to the Ethereum network. This bytecode is stored in Ethereum network and is responsible for the actions of an address governed by code (as opposed to an EOA) .
The .bin
file looks like this:
The "object" key (red block in the diagram above) holds the bytecode, while the "opcodes" key (the the cyan block) holds the opcode.
The bytecode looks like this:
608060405234801561001057600080fd5b50610222806100206000396000f3fe608060405234801561001057600080fd5b506004361061002b5760003560e01c806368bc2ad114610030575b600080fd5b61004a600480360381019061004591906100d8565b610060565b6040516100579190610172565b60405180910390f35b60008282604051610072929190610159565b6040518091039020905092915050565b60008083601f840112610098576100976101d8565b5b8235905067ffffffffffffffff8111156100b5576100b46101d3565b5b6020830191508360018202830111156100d1576100d06101dd565b5b9250929050565b600080602083850312156100ef576100ee6101e7565b5b600083013567ffffffffffffffff81111561010d5761010c6101e2565b5b61011985828601610082565b92509250509250929050565b61012e81610198565b82525050565b6000610140838561018d565b935061014d8385846101c4565b82840190509392505050565b6000610166828486610134565b91508190509392505050565b60006020820190506101876000830184610125565b92915050565b600081905092915050565b60007fffffffff0000000000000000000000000000000000000000000000000000000082169050919050565b82818337600083830152505050565b600080fd5b600080fd5b600080fd5b600080fd5b600080fdfea2646970667358221220ebf7ac2036683f284b295199f01a0ef8ece4f8c16504b9eda78b04b81fdf933a64736f6c63430008070033
This is utter gibberish!
In other words, Mr. Byte Officer - we have one question for you - “what are thoooose”
Well, they are just a series of bytes that corresponds to the value in "opcodes" key in the .bin
file:
"PUSH1 0x80 PUSH1 0x40 MSTORE CALLVALUE DUP1 ISZERO PUSH2 0x10 JUMPI PUSH1 0x0 DUP1 REVERT JUMPDEST POP PUSH2 0x222 DUP1 PUSH2 0x20 PUSH1 0x0 CODECOPY PUSH1 0x0 RETURN INVALID PUSH1 0x80 PUSH1 0x40 MSTORE CALLVALUE DUP1 ISZERO PUSH2 0x10 JUMPI PUSH1 0x0 DUP1 REVERT JUMPDEST POP PUSH1 0x4 CALLDATASIZE LT PUSH2 0x2B JUMPI PUSH1 0x0 CALLDATALOAD PUSH1 0xE0 SHR DUP1 PUSH4 0x68BC2AD1 EQ PUSH2 0x30 JUMPI JUMPDEST PUSH1 0x0 DUP1 REVERT JUMPDEST PUSH2 0x4A PUSH1 0x4 DUP1 CALLDATASIZE SUB DUP2 ADD SWAP1 PUSH2 0x45 SWAP2 SWAP1 PUSH2 0xD8 JUMP JUMPDEST PUSH2 0x60 JUMP JUMPDEST PUSH1 0x40 MLOAD PUSH2 0x57 SWAP2 SWAP1 PUSH2 0x172 JUMP JUMPDEST PUSH1 0x40 MLOAD DUP1 SWAP2 SUB SWAP1 RETURN JUMPDEST PUSH1 0x0 DUP3 DUP3 PUSH1 0x40 MLOAD PUSH2 0x72 SWAP3 SWAP2 SWAP1 PUSH2 0x159 JUMP JUMPDEST PUSH1 0x40 MLOAD DUP1 SWAP2 SUB SWAP1 KECCAK256 SWAP1 POP SWAP3 SWAP2 POP POP JUMP JUMPDEST PUSH1 0x0 DUP1 DUP4 PUSH1 0x1F DUP5 ADD SLT PUSH2 0x98 JUMPI PUSH2 0x97 PUSH2 0x1D8 JUMP JUMPDEST JUMPDEST DUP3 CALLDATALOAD SWAP1 POP PUSH8 0xFFFFFFFFFFFFFFFF DUP2 GT ISZERO PUSH2 0xB5 JUMPI PUSH2 0xB4 PUSH2 0x1D3 JUMP JUMPDEST JUMPDEST PUSH1 0x20 DUP4 ADD SWAP2 POP DUP4 PUSH1 0x1 DUP3 MUL DUP4 ADD GT ISZERO PUSH2 0xD1 JUMPI PUSH2 0xD0 PUSH2 0x1DD JUMP JUMPDEST JUMPDEST SWAP3 POP SWAP3 SWAP1 POP JUMP JUMPDEST PUSH1 0x0 DUP1 PUSH1 0x20 DUP4 DUP6 SUB SLT ISZERO PUSH2 0xEF JUMPI PUSH2 0xEE PUSH2 0x1E7 JUMP JUMPDEST JUMPDEST PUSH1 0x0 DUP4 ADD CALLDATALOAD PUSH8 0xFFFFFFFFFFFFFFFF DUP2 GT ISZERO PUSH2 0x10D JUMPI PUSH2 0x10C PUSH2 0x1E2 JUMP JUMPDEST JUMPDEST PUSH2 0x119 DUP6 DUP3 DUP7 ADD PUSH2 0x82 JUMP JUMPDEST SWAP3 POP SWAP3 POP POP SWAP3 POP SWAP3 SWAP1 POP JUMP JUMPDEST PUSH2 0x12E DUP2 PUSH2 0x198 JUMP JUMPDEST DUP3 MSTORE POP POP JUMP JUMPDEST PUSH1 0x0 PUSH2 0x140 DUP4 DUP6 PUSH2 0x18D JUMP JUMPDEST SWAP4 POP PUSH2 0x14D DUP4 DUP6 DUP5 PUSH2 0x1C4 JUMP JUMPDEST DUP3 DUP5 ADD SWAP1 POP SWAP4 SWAP3 POP POP POP JUMP JUMPDEST PUSH1 0x0 PUSH2 0x166 DUP3 DUP5 DUP7 PUSH2 0x134 JUMP JUMPDEST SWAP2 POP DUP2 SWAP1 POP SWAP4 SWAP3 POP POP POP JUMP JUMPDEST PUSH1 0x0 PUSH1 0x20 DUP3 ADD SWAP1 POP PUSH2 0x187 PUSH1 0x0 DUP4 ADD DUP5 PUSH2 0x125 JUMP JUMPDEST SWAP3 SWAP2 POP POP JUMP JUMPDEST PUSH1 0x0 DUP2 SWAP1 POP SWAP3 SWAP2 POP POP JUMP JUMPDEST PUSH1 0x0 PUSH32 0xFFFFFFFF00000000000000000000000000000000000000000000000000000000 DUP3 AND SWAP1 POP SWAP2 SWAP1 POP JUMP JUMPDEST DUP3 DUP2 DUP4 CALLDATACOPY PUSH1 0x0 DUP4 DUP4 ADD MSTORE POP POP POP JUMP JUMPDEST PUSH1 0x0 DUP1 REVERT JUMPDEST PUSH1 0x0 DUP1 REVERT JUMPDEST PUSH1 0x0 DUP1 REVERT JUMPDEST PUSH1 0x0 DUP1 REVERT JUMPDEST PUSH1 0x0 DUP1 REVERT INVALID LOG2 PUSH5 0x6970667358 0x22 SLT KECCAK256 0xEB 0xF7 0xAC KECCAK256 CALLDATASIZE PUSH9 0x3F284B295199F01A0E 0xF8 0xEC 0xE4 0xF8 0xC1 PUSH6 0x4B9EDA78B04 0xB8 0x1F 0xDF SWAP4 GASPRICE PUSH5 0x736F6C6343 STOP ADDMOD SMOD STOP CALLER
Alright, we can see this makes a little more sense, but not a whole lot.
Basically, the generated bytecode corresponds to a series of opcodes that the EVM interprets. Opcodes are the instruction set for the EVM. The EVM only uses 140 unique opcodes. You can check out the list of EVM opcodes here or here.
Let's look at a trivial example to make more sense of this. We will start with the simplest bytecode that I can think of - 6005600401
Yeah okay, You are right - even the simplest bytecode above makes no sense! However, we can go to evm.codes or ethervm.io to find what they mean. For example, the first two digits 60
corresponds to PUSH1
:
if we map all the bytes to opcodes, this is what the entire translation of looks like:
60 05 -> PUSH1 0x05
60 04 -> PUSH1 0x04
01 -> ADD
Now, what does these opcodes do?
Well, I told you earlier that EVM is stack based, let’s see how stacks work in EVM, lets run through the opcode instruction set:
- With
PUSH1 0x05
, we push0x05
on the stack, now it has 5 in the stack - With
PUSH1 0x04
, we push0x04
on the stack , now it has both 5 and and 4 in the stack, with 4 at the top - Then, we
ADD
, which takes the top two numbers in the stack and outputs the result which sits at top of the stack ( in our case, the only element in the stack) - We have a stack with number 9
Let's try to visualize the process below:
There you have it - bytecode, opcodes and how it all works with stacks underneath the hood!
This is was a very simple instruction set to show how stack based computation works, we will look at a more practical example later.