If you are developping some smart language parser, you would be intersted for the use of Flex and Bison.
With the help of Flex and Bison you can do a lot of thinks such as build a compiler, run your own interpreted language/script/shell, parsing configuration files, etc.
These are meta-code generator, like the meta-language Qt used to build graphical user interface.
For Windows, getting the compiled version from gnuwin32 is the easiest. Setup installation file is very convinient, you can download the binary files archive and dependent files archive without forgetting the developer files archive if needed.
By the way, if you are looking for machine state compiler, have a look to Ragel also.
Gnuwin32 does not package each new version and actual compiled one is quite old: 2.4.1
This version suffer of a bug due to a limitation of a lazy Microsoft implementation of the function _spawnvp() which does not support spaces argument.
The function _spawnvp() is called when create_pipe() function is used.
m4: cannot open Files': No such file or directory m4: cannot open(x86)GnuWin32/share/bison': No such file or directory m4: cannot open C:\Program': No such file or directory m4: cannot openFiles': No such file or directory m4: cannot open `(x86)GnuWin32/share/bison/m4sugar/m4sugar.m4': No such file or directory
What happened? Bison tryied to run m4.exe as sub-process to bind a bidirectionnal pipe. For this, it provides several parameters to m4 porgram but each argument containing a space was splited into several parameters.
New version of Bison has another implementation and – I guess – does not have this bug anymore.
So here, we have mainly three choices: rebuild the entire Bison project on Windows plateform with the last version, find a workaround patching bison or make Microsoft to enhance their implementation. Here, I will develop the second solution.
The objective is to implement a little hook on the function _spawnvp() to quote string arguments provied to sub-process. In fact, to make it simple, we will quote any parameter that does not start by the hyphen character as I prefer not quote basic parameter.
We will need some space in the binary code of Bison program.
As you maybe know, the executable file format contains different sections, we need free space in the executable section .text section.
With any good portable executable editor, we can find that we have free space at the end of .text section.
The memory size is 37CBC16 while in the file, section occupy 37E0016.
XS-PeEditor * Internal program version : 0.26b ]open bison.exe * Creating a backup file...ok! :list ID section memory stuff hard stuff righ mem. data 00 *[.text ] @00401000h [00037CBCh] @00000400h [00037E00h] -rx --- --c 01 *[.data ] @00439000h [0000A7A0h] @00038200h [0000A800h] wr- --- -i- 02 -[.bss ] @00444000h [00002060h] @00000000h / wr- --- u-- 03 +[.idata ] @00447000h [000008E4h] @00042A00h [00000A00h] wr- --- -i- 04 +[.rsrc ] @00448000h [00000EFCh] @00043400h [00001000h] wr- --- -i-
Using a substraction, we guess that we can get 32410 fresh bytes just doing an alignment between the file space use and memory, this is enough for our needs.
Let's enlarge the size the memory area up to file space.
:resize 0 37e00 37e00 You should perform a defrag command after resizing a section :list ID section memory stuff hard stuff righ mem. data 00 *[.text ] @00401000h [00037E00h] @00000400h [00037E00h] -rx --- --c 01 *[.data ] @00439000h [0000A7A0h] @00038200h [0000A800h] wr- --- -i- 02 -[.bss ] @00444000h [00002060h] @00000000h / wr- --- u-- 03 +[.idata ] @00447000h [000008E4h] @00042A00h [00000A00h] wr- --- -i- 04 +[.rsrc ] @00448000h [00000EFCh] @00043400h [00001000h] wr- --- -i- :close ]quit
Summary of new available space:
The function _spawnvp has the prototype bellow.
intptr_t _spawnvp( int mode, const char *cmdname, const char *const *argv );
We are interested by the argv parameter. It's a dynamique-sized array of char pointer (C strings) and terminated by a NULL pointer.
Programming in binary is using an incremental way in 4 steps:
Bellow is the subroutine I wrote for this, I added comment to help you to understand.
The role is to replace the direct call of _spawnvp(), so it should handle parameters as initial function would do and, at the end of routine, call _spawnvp().
:00438CC0 55 push ebp :00438CC1 89E5 mov ebp, esp :00438CC3 81EC00040000 sub esp, 00000400 ; 1 KiB for our string buffer :00438CC9 53 push ebx :00438CCA 57 push edi :00438CCB 56 push esi :00438CCC 8B5510 mov edx, dword ptr [ebp+10] ; here is argv :00438CCF 8D9D00FCFFFF lea ebx, dword ptr [ebp+FFFFFC00] ; windasm bug: it's [ebp-400], our string buffer * Referenced by a (U)nconditional or (C)onditional Jump at Address: ; parsing loop |:00438D08(U) | :00438CD5 833A00 cmp dword ptr [edx], 00000000 ; is cell NULL? :00438CD8 7430 je 00438D0A :00438CDA 8B32 mov esi, dword ptr [edx] :00438CDC 803E2D cmp byte ptr [esi], 2D ; does string start by a '-'? :00438CDF 7424 je 00438D05 :00438CE1 89DF mov edi, ebx :00438CE3 C60722 mov byte ptr [edi], 22 ; first quote '"' :00438CE6 47 inc edi ; com * Referenced by a (U)nconditional or (C)onditional Jump at Address: |:00438CEE(U) | :00438CE7 803E00 cmp byte ptr [esi], 00 ; until end od cell string :00438CEA 7404 je 00438CF0 :00438CEC FC cld :00438CED A4 movsb ; copy character :00438CEE EBF7 jmp 00438CE7 * Referenced by a (U)nconditional or (C)onditional Jump at Address: |:00438CEA(C) | :00438CF0 C60722 mov byte ptr [edi], 22 ; final quote '"' :00438CF3 47 inc edi :00438CF4 C60700 mov byte ptr [edi], 00 ; null terminated string :00438CF7 52 push edx ; save edx, it's our cell pointer :00438CF8 53 push ebx ; string buffer * Reference To: msvcrt._strdup, Ord:01C1h | :00438CF9 E8EAFBFFFF call 004388E8 ; duplicate string (string get its own memory instance) :00438CFE 83C404 add esp, 00000004 ; cdecl call :00438D01 5A pop edx :00438D02 90 nop :00438D03 8902 mov dword ptr [edx], eax ; update cell string * Referenced by a (U)nconditional or (C)onditional Jump at Address: |:00438CDF(C) | :00438D05 83C204 add edx, 00000004 ; next cell :00438D08 EBCB jmp 00438CD5 ; array parsing loop * Referenced by a (U)nconditional or (C)onditional Jump at Address: |:00438CD8(C) | :00438D0A 5E pop esi :00438D0B 5F pop edi :00438D0C 5B pop ebx :00438D0D 5A pop edx :00438D0E 81C400040000 add esp, 00000400 :00438D14 C9 leave * Reference To: msvcrt._spawnvp, Ord:01B8h | :00438D15 E9D6FBFFFF jmp 004388F0 ; jump to function, no call as we keep arguments and return value in the stack
Last and easiest part, update the call where the function is called to point on our routine.
:0042BEDF FF742434 push [esp+34] ; argv :0042BEE3 8B7C2438 mov edi, dword ptr [esp+38] :0042BEE7 FF37 push dword ptr [edi] ; cmdname :0042BEE9 6A01 push 00000001 ; mode * Reference To: msvcrt._spawnvp, Ord:01B8h | :0042BEEB E8D0CD0000 call 004388F0 ; call to change
The call instruction have to request the address 00438CC0, new computed opcode is E800CA0000.