Popular Post Beege Posted July 30, 2019 Popular Post Posted July 30, 2019 (edited) Here is the latest assembly engine from Tomasz Grysztar, flat assembler g as a dll which I compiled using original fasm engine. He doesn't have it compiled in the download package but it was as easy as compiling a exe in autoit if you ever have to do it yourself. Just open up the file in the fasm editor and press F5. You can read about what makes fasmg different from the original fasm HERE if you want . The minimum you should understand is that this engine is bare bones by itself not capable of very much. The macro engine is the major difference and it uses macros for basically everything now including implementing the x86 instructions and formats. All of these macros are located within the include folder and you should keep that in its original form. When I first got the dll compiled I couldn't get it to generate code in flat binary format. It was working but size of output was over 300 bytes no matter what the assembly code and could just tell it was outputting a different format than binary. Eventually I figured out that within the primary "include\win32ax.inc"', it executes a macro "format PE GUI 4.0" if x86 has not been defined. I underlined macro there because at first I (wasted shit loads of time because I) didn't realize it was a macro (adding a bunch of other includes) since in version 1 the statement "format binary" was a default if not specified and specifically means add nothing extra to the code. So long story short, the part that I was missing is including the cpu type and extensions from include\cpu folder. By default I add x64 type and SSE4 ext includes. Note that the x64 here is not about what mode we are running in, this is for what instructions your cpu supports. if you are running on some really old hardware that may need to be adjusted or if your on to more advanced instructions like the avx extensions, you may have to add those includes to your source. Differences from previous dll function I like the error reporting much better in this one. With the last one we had a ton error codes and a variable return structure depending on what kind of error it had. I even had an example showing you what kind of an error would give you correct line numbers vs wouldn't. With this one the stdout is passed to the dll function and it simply prints the line/details it had a problem with to the console. The return value is the number of errors counted. It also handles its own memory needs automatically now . If the output region is not big enough it will virtualalloc a new one and virtualfree the previous. Differences in Code Earlier this year I showed some examples of how to use the macros to make writing assembly a little more familiar. Almost all the same functionality exists here but there are a couple syntax sugar items gone and slight change in other areas. Whats gone is FIX and PTR. Both syntax sugar that dont really matter. A couple changes to structures as well but these are for the better. One is unnamed elements are allowed now, but if it does not have a name, you are not allowed to initialize those elements during creation because they can only be intialized via syntax name:value . Previously when you initialized the elements, you would do by specifying values in a comma seperated list using the specific order like value1,value2,etc, but this had a problem because it expected commas even when the elements were just padding for alignment so this works out better having to specify the name and no need for _FasmFixInit function. "<" and ">" are not longer used in the initializes ether. OLD: $sTag = 'byte x;short y;char sNote[13];long odd[5];word w;dword p;char ext[3];word finish' _(_FasmAu3StructDef('AU3TEST', $sTag));convert and add definition to source _(' tTest AU3TEST ' & _FasmFixInit('1,222,<"AutoItFASM",0>,<41,43,43,44,45>,6,7,"au3",12345', $sTag));create and initalize New: $sTag = 'byte x;short y;char sNote[13];long odd[5];word w;dword p;char ext[3];word finish' _(_fasmg_Au3StructDef('AU3TEST', $sTag)) ;convert and add definition to source _(' tTest AU3TEST x:11,y:22,sNote:"AutoItFASM",odd:41,odd+4:42,odd+8:43,w:6,p:7,ext:"au3",finish:12345');create and initalize Extra Includes I created a includeEx folder for the extra macros I wrote/found on the forums. Most of them are written by Thomaz so they may eventually end up in the standard library. Edit: Theres only the one include folder now. All the default includes are in thier own folder within that folder and all the custom ones are top level. Align.inc, Nop.inc, Listing.inc The Align and Nop macros work together to align the next statement to whatever boundary you specified and it uses multibyte nop codes to fill in the space. Filling the space with nop is the default but you can also specify a fill value if you want. Align.assume is another macro part of align.inc that can be used to set tell the engine that a certain starting point is assumed to be at a certain boundary alignment and it will do its align calculations based on that value. Listing is a macro great for seeing where and what opcodes are getting generated from each line of assembly code. Below is an example of the source and output you would see printed to the console during the assembly. I picked this slightly longer example because it best shows use of align, nop, and then the use of listing to verify the align/nop code. Nop codes are instructions that do nothing and one use of them is to insert nop's as space fillers when you want a certian portion of your code to land on a specific boundary offset. I dont know all the best practices here with that (if you do please post!) but its a type of optimization for the cpu. Because of its nature of doing nothing, I cant just run the code and confirm its correct because it didnt crash. I need to look at what opcodes the actual align statements made and listing made that easy. source example: expandcollapse popup_('procf _main stdcall, pAdd') _(' mov eax, [pAdd]') _(' mov dword[eax], _crc32') _(' mov dword[eax+4], _strlen') _(' mov dword[eax+8], _strcmp') _(' mov dword[eax+12], _strstr') _(' ret') _('endp') _('EQUAL_ORDERED = 1100b') _('EQUAL_ANY = 0000b') _('EQUAL_EACH = 1000b') _('RANGES = 0100b') _('NEGATIVE_POLARITY = 010000b') _('BYTE_MASK = 1000000b') _('align 8') _('proc _crc32 uses ebx ecx esi, pStr') _(' mov esi, [pStr]') _(' xor ebx, ebx') _(' not ebx') _(' stdcall _strlen, esi') _(' .while eax >= 4') _(' crc32 ebx, dword[esi]') _(' add esi, 4') _(' sub eax, 4') _(' .endw') _(' .while eax') _(' crc32 ebx, byte[esi]') _(' inc esi') _(' dec eax') _(' .endw') _(' not ebx') _(' mov eax, ebx') _(' ret') _('endp') _('align 8, 0xCC') ; fill with 0xCC instead of NOP _('proc _strlen uses ecx edx, pStr') _(' mov ecx, [pStr]') _(' mov edx, ecx') _(' mov eax, -16') _(' pxor xmm0, xmm0') _(' .repeat') _(' add eax, 16') _(' pcmpistri xmm0, dqword[edx + eax], 1000b') ;EQUAL_EACH') _(' .until ZERO?') ; repeat loop until Zero flag (ZF) is set _(' add eax, ecx') ; add remainder _(' ret') _('endp') _('align 8') _('proc _strcmp uses ebx ecx edx, pStr1, pStr2') ; ecx = string1, edx = string2' _(' mov ecx, [pStr1]') ; ecx = start address of str1 _(' mov edx, [pStr2]') ; edx = start address of str2 _(' mov eax, ecx') ; eax = start address of str1 _(' sub eax, edx') ; eax = ecx - edx | eax = start address of str1 - start address of str2 _(' sub edx, 16') _(' mov ebx, -16') _(' STRCMP_LOOP:') _(' add ebx, 16') _(' add edx, 16') _(' movdqu xmm0, dqword[edx]') _(' pcmpistri xmm0, dqword[edx + eax], EQUAL_EACH + NEGATIVE_POLARITY') ; EQUAL_EACH + NEGATIVE_POLARITY ; find the first *different* bytes, hence negative polarity' _(' ja STRCMP_LOOP') ;a CF or ZF = 0 above _(' jc STRCMP_DIFF') ;c cf=1 carry _(' xor eax, eax') ; the strings are equal _(' ret') _(' STRCMP_DIFF:') _(' mov eax, ebx') _(' add eax, ecx') _(' ret') _('endp') _('align 8') _('proc _strstr uses ecx edx edi esi, sStrToSearch, sStrToFind') _(' mov ecx, [sStrToSearch]') _(' mov edx, [sStrToFind]') _(' pxor xmm2, xmm2') _(' movdqu xmm2, dqword[edx]') ; load the first 16 bytes of neddle') _(' pxor xmm3, xmm3') _(' lea eax, [ecx - 16]') _(' STRSTR_MAIN_LOOP:') ; find the first possible match of 16-byte fragment in haystack') _(' add eax, 16') _(' pcmpistri xmm2, dqword[eax], EQUAL_ORDERED') _(' ja STRSTR_MAIN_LOOP') _(' jnc STRSTR_NOT_FOUND') _(' add eax, ecx ') ; save the possible match start') _(' mov edi, edx') _(' mov esi, eax') _(' sub edi, esi') _(' sub esi, 16') _(' @@:') ; compare the strings _(' add esi, 16') _(' movdqu xmm1, dqword[esi + edi]') _(' pcmpistrm xmm3, xmm1, EQUAL_EACH + NEGATIVE_POLARITY + BYTE_MASK') ; mask out invalid bytes in the haystack _(' movdqu xmm4, dqword[esi]') _(' pand xmm4, xmm0') _(' pcmpistri xmm1, xmm4, EQUAL_EACH + NEGATIVE_POLARITY') _(' ja @b') _(' jnc STRSTR_FOUND') _(' sub eax, 15') ;continue searching from the next byte _(' jmp STRSTR_MAIN_LOOP') _(' STRSTR_NOT_FOUND:') _(' xor eax, eax') _(' ret') _(' STRSTR_FOUND:') _(' sub eax, [sStrToSearch]') _(' inc eax') _(' ret') _('endp') Listing Output: expandcollapse popup00000000: use32 00000000: 55 89 E5 procf _main stdcall, pAdd 00000003: 8B 45 08 mov eax, [pAdd] 00000006: C7 00 28 00 00 00 mov dword[eax], _crc32 0000000C: C7 40 04 68 00 00 00 mov dword[eax+4], _strlen 00000013: C7 40 08 90 00 00 00 mov dword[eax+8], _strcmp 0000001A: C7 40 0C D8 00 00 00 mov dword[eax+12], _strstr 00000021: C9 C2 04 00 ret 00000025: localbytes = current 00000025: purge ret?,locals?,endl?,proclocal? 00000025: end namespace 00000025: purge endp? 00000025: EQUAL_ORDERED = 1100b 00000025: EQUAL_ANY = 0000b 00000025: EQUAL_EACH = 1000b 00000025: RANGES = 0100b 00000025: NEGATIVE_POLARITY = 010000b 00000025: BYTE_MASK = 1000000b 00000025: 0F 1F 00 align 8 00000028: 55 89 E5 53 51 56 proc _crc32 uses ebx ecx esi, pStr 0000002E: 8B 75 08 mov esi, [pStr] 00000031: 31 DB xor ebx, ebx 00000033: F7 D3 not ebx 00000035: 56 E8 2D 00 00 00 stdcall _strlen, esi 0000003B: 83 F8 04 72 0D .while eax >= 4 00000040: F2 0F 38 F1 1E crc32 ebx, dword[esi] 00000045: 83 C6 04 add esi, 4 00000048: 83 E8 04 sub eax, 4 0000004B: EB EE .endw 0000004D: 85 C0 74 09 .while eax 00000051: F2 0F 38 F0 1E crc32 ebx, byte[esi] 00000056: 46 inc esi 00000057: 48 dec eax 00000058: EB F3 .endw 0000005A: F7 D3 not ebx 0000005C: 89 D8 mov eax, ebx 0000005E: 5E 59 5B C9 C2 04 00 ret 00000065: localbytes = current 00000065: purge ret?,locals?,endl?,proclocal? 00000065: end namespace 00000065: purge endp? 00000065: CC CC CC align 8, 0xCC 00000068: 55 89 E5 51 52 proc _strlen uses ecx edx, pStr 0000006D: 8B 4D 08 mov ecx, [pStr] 00000070: 89 CA mov edx, ecx 00000072: B8 F0 FF FF FF mov eax, -16 00000077: 66 0F EF C0 pxor xmm0, xmm0 0000007B: .repeat 0000007B: 83 C0 10 add eax, 16 0000007E: 66 0F 3A 63 04 02 08 pcmpistri xmm0, dqword[edx + eax], 1000b 00000085: 75 F4 .until ZERO? 00000087: 01 C8 add eax, ecx 00000089: 5A 59 C9 C2 04 00 ret 0000008F: localbytes = current 0000008F: purge ret?,locals?,endl?,proclocal? 0000008F: end namespace 0000008F: purge endp? 0000008F: 90 align 8 00000090: 55 89 E5 53 51 52 proc _strcmp uses ebx ecx edx, pStr1, pStr2 00000096: 8B 4D 08 mov ecx, [pStr1] 00000099: 8B 55 0C mov edx, [pStr2] 0000009C: 89 C8 mov eax, ecx 0000009E: 29 D0 sub eax, edx 000000A0: 83 EA 10 sub edx, 16 000000A3: BB F0 FF FF FF mov ebx, -16 000000A8: STRCMP_LOOP: 000000A8: 83 C3 10 add ebx, 16 000000AB: 83 C2 10 add edx, 16 000000AE: F3 0F 6F 02 movdqu xmm0, dqword[edx] 000000B2: 66 0F 3A 63 04 02 18 pcmpistri xmm0, dqword[edx + eax], EQUAL_EACH + NEGATIVE_POLARITY 000000B9: 77 ED ja STRCMP_LOOP 000000BB: 72 09 jc STRCMP_DIFF 000000BD: 31 C0 xor eax, eax 000000BF: 5A 59 5B C9 C2 08 00 ret 000000C6: STRCMP_DIFF: 000000C6: 89 D8 mov eax, ebx 000000C8: 01 C8 add eax, ecx 000000CA: 5A 59 5B C9 C2 08 00 ret 000000D1: localbytes = current 000000D1: purge ret?,locals?,endl?,proclocal? 000000D1: end namespace 000000D1: purge endp? 000000D1: 0F 1F 80 00 00 00 00 align 8 000000D8: 55 89 E5 51 52 57 56 proc _strstr uses ecx edx edi esi, sStrToSearch, sStrToFind 000000DF: 8B 4D 08 mov ecx, [sStrToSearch] 000000E2: 8B 55 0C mov edx, [sStrToFind] 000000E5: 66 0F EF D2 pxor xmm2, xmm2 000000E9: F3 0F 6F 12 movdqu xmm2, dqword[edx] 000000ED: 66 0F EF DB pxor xmm3, xmm3 000000F1: 8D 41 F0 lea eax, [ecx - 16] 000000F4: STRSTR_MAIN_LOOP: 000000F4: 83 C0 10 add eax, 16 000000F7: 66 0F 3A 63 10 0C pcmpistri xmm2, dqword[eax], EQUAL_ORDERED 000000FD: 77 F5 ja STRSTR_MAIN_LOOP 000000FF: 73 30 jnc STRSTR_NOT_FOUND 00000101: 01 C8 add eax, ecx 00000103: 89 D7 mov edi, edx 00000105: 89 C6 mov esi, eax 00000107: 29 F7 sub edi, esi 00000109: 83 EE 10 sub esi, 16 0000010C: @@: 0000010C: 83 C6 10 add esi, 16 0000010F: F3 0F 6F 0C 3E movdqu xmm1, dqword[esi + edi] 00000114: 66 0F 3A 62 D9 58 pcmpistrm xmm3, xmm1, EQUAL_EACH + NEGATIVE_POLARITY + BYTE_MASK 0000011A: F3 0F 6F 26 movdqu xmm4, dqword[esi] 0000011E: 66 0F DB E0 pand xmm4, xmm0 00000122: 66 0F 3A 63 CC 18 pcmpistri xmm1, xmm4, EQUAL_EACH + NEGATIVE_POLARITY 00000128: 77 E2 ja @b 0000012A: 73 0F jnc STRSTR_FOUND 0000012C: 83 E8 0F sub eax, 15 0000012F: EB C3 jmp STRSTR_MAIN_LOOP 00000131: STRSTR_NOT_FOUND: 00000131: 31 C0 xor eax, eax 00000133: 5E 5F 5A 59 C9 C2 08 00 ret 0000013B: STRSTR_FOUND: 0000013B: 2B 45 08 sub eax, [sStrToSearch] 0000013E: 40 inc eax 0000013F: 5E 5F 5A 59 C9 C2 08 00 ret 00000147: localbytes = current 00000147: purge ret?,locals?,endl?,proclocal? 00000147: end namespace 00000147: purge endp? procf and forcea macros In my previous post I spoke about the force macro and why the need for it. I added two more macros (procf and forcea) that combine the two and also sets align.assume to the same function. As clarified in the previous post, you should only have to use these macros for the first procedure being defined (since nothing calls that procedure). And since its the first function, it should be the starting memory address which is a good place to initially set the align.assume address to. Attached package should include everything needed and has all the previous examples I posted updated. Let me know if I missed something or you have any issues running the examples and thanks for looking Update 04/19/2020: A couple new macros added. I also got rid of the IncludeEx folder and just made one include folder that has the default include folder within it and all others top level. dllstruct macro does the same thing as _fasmg_Au3StructDef(). You can use either one; they both use the macro. getmempos macro does the delta trick I showed below using anonymous labels. stdcallw and invokew macros will push any parameters that are raw (quoted) strings as wide characters Ifex include file gives .if .ifelse .while .until the ability to use stdcall/invoke/etc inline. So if you had a function called "_add" you could do .if stdcall(_add,5,5) = 10. All this basically does in the background is perform the stdcall and then replaces the comparison with eax and passes it on to the original default macros, but is super helpful for cleaning up code and took a ton of time learning the macro language to get in place. Update 05/19/2020: Added fastcallw that does same as stdcallw only Added fastcall support for Ifex Corrected missing include file include\macro\if.inc within win64au3.inc fasmg 5-19-2020.zip Previous versions: Spoiler fasmg 7-29-2019.zip fasmg 8-25-2019.zip fasmg 9-14-2019.zip fasmg 10-14-2019.zip fasmg 10-26-2019.zip fasmg 4-19-2020.zip Edited May 19, 2020 by Beege Added x64 support - corrected stack alignment issues junkew, argumentum, UEZ and 4 others 7 Assembly Code: fasmg . fasm . BmpSearch . Au3 Syntax Highlighter . Bounce Multithreading Example . IDispatchASMUDFs: Explorer Frame . ITaskBarList . Scrolling Line Graph . Tray Icon Bar Graph . Explorer Listview . Wiimote . WinSnap . Flicker Free Labels . iTunesPrograms: Ftp Explorer . Snipster . Network Meter . Resistance Calculator
Beege Posted August 26, 2019 Author Posted August 26, 2019 (edited) I updated this with a couple new functions to make compiling and debugging variables easier. _fasmg_CompileAu3 will generate the au3 code needed for a standalone function without the library and place it on the clipboard. It will also encode/compress the opcode and generate the required code to decode/decompress, but only if its worth it. If the amount of extra au3 code it takes to decode is more than the number of characters saved from encoding, then I dont see the point. Below is the input/output you would get from the first example where base64 would not be worth it. The base64 saves us about 20 characters in the opcode, but even with a single one liner (with length precalculated when compiled) to decode the string its about 170 character long so for this example it would pick base16. Input: _('procf _main, pConsoleWriteCB, parm1, parm2') _(' mov edx, [parm1]') _(' add edx, [parm2]') _(' invoke pConsoleWriteCB, "edx = ", edx') ; _(' ret') _('endp') _fasmg_compileau3($g_sFasm) Output: ;base16 Local Static $dBinExec = Binary('0x5589E58B550C03551052E807000000656478203D2000FF5508C9C20C00') Local Static $tBinExec = DllStructCreate('byte[' & BinaryLen($dBinExec) & ']'), $bSet = DllStructSetData($tBinExec, 1, $dBinExec) Local Static $aProtect = DllCall('kernel32.dll', 'bool', 'VirtualProtect', 'struct*', $tBinExec, 'dword_ptr', DllStructGetSize($tBinExec), 'dword', 0x40, 'dword*', 0) Local Static $aFlush = DllCall('kernel32.dll', 'bool', 'FlushInstructionCache', 'handle', (DllCall('kernel32.dll', 'handle', 'GetCurrentProcess')[0]), 'struct*', $tBinExec, 'dword_ptr', DllStructGetSize($tBinExec)) ;base64 Local Static $sBinExec = 'VYnli1UMA1UQUugHAAAAZWR4ID0gAP9VCMnCDAA=' Local Static $aDec = DllCall('Crypt32.dll', 'bool', 'CryptStringToBinary', 'str', $sBinExec, 'dword', Null, 'dword', 1, 'struct*', DllStructCreate('byte[' & 29 & ']'), 'dword*', 29, 'ptr', 0, 'ptr', 0) Local Static $tBinExec = DllStructCreate('byte[' & $aDec[5] & ']', DllStructGetPtr($aDec[4])) Local Static $aProtect = DllCall('kernel32.dll', 'bool', 'VirtualProtect', 'struct*', $tBinExec, 'dword_ptr', DllStructGetSize($tBinExec), 'dword', 0x40, 'dword*', 0) Local Static $aFlush = DllCall('kernel32.dll', 'bool', 'FlushInstructionCache', 'handle', (DllCall('kernel32.dll', 'handle', 'GetCurrentProcess')[0]), 'struct*', $tBinExec, 'dword_ptr', DllStructGetSize($tBinExec)) _fasmg_Debug is the other much needed function that has been added. Ive gotten annoyed with copying callbacks from script to script just to debug and wanted something native that could handle all the basics. The function inserts ptr's to callbacks directly into the code so you cant use it in a compiled function but you wouldnt want to do that anyway. This also makes it so we dont have to pass the callback as a parameter. I have 4 callbacks setup that I think covers all my bases but let me know if not. The function excepts a comma delimited string of "type:var" parings using ":" operator to pair and will generate asm code to call the correct callback. The function excepts all autoit types allowed for dllcall and actually uses the literal string specifying the type to dynamically register a function for that variable type. I didn't always know this (never really needed to) but apparently we can register the same autoit function multiple times with DllCallbackRegister specifying different parameter types with each registration. With that functionality Im able to just create a "wildcard" type function and register the types as needed so that one callback covers most situations. The other 3 callbacks cover dll structures, raw quoted strings (no variable specified), and lastly references (ex: int*). For whatever reason, specifying the parameters in the callback as references doesn't work out right and doesn't deference so to get around that I send it to a function that creates a dllstruct at the pointer address and access it that way. I also added ability to specify hi/low values using the '|' operator. So "int64:eax|edx" would show you the combined 64bit value of eax and edx. For dll structures I made a made modified (basically a copy) version of _winapi_displaystruct that will print the data to the console instead of showing you listview. Despite the length of that function, it was actually real easy to modify by just replacing all the GUICtrlCreateListViewItem statements with _addarray() to turn it into a 2darray. All the statements are already deliminted with '|' so that worked out nice. I then pass that 2darray to _Array2DToFormatedStr() to convert and format the alignment of all the columns based on longest element in that column before printing so we have a nice table. The dashes help out with the colors too. Here is the output from the example in the helpfile: Debug Example: expandcollapse popup$_dbg = _fasmg_Debug ; (shortcut) $sTag = 'byte x;short y;char sNote[13];long odd[3];word w;dword p;char ext[3];word finish' _(_fasmg_Au3StructDef('AU3TEST', $sTag)) _('procf _main, parm1, parm2') _(' locals') _(' iTestSigned dd -12345') _(' iTestUnsigned dd 1234567') _(' stest db "This is a local string test",0') _(' sBuffer db 128 dup ?') _(' tTest AU3TEST x:11,y:22,sNote:"Test Msg",odd:333,odd+4:444,odd+8:555,w:77,p:6666,ext:"exe",finish:1234') ; _(' endl') _(" xor eax, eax") ; eax = 0 _(' mov bl, 13') ; bl = 13 _($_dbg('str:"#########################"')) ;; raw string - no variable specified _($_dbg('int:[iTestSigned]')) ; signed test _($_dbg('uint:[iTestUnsigned]')) ; unsigned test _(" cpuid") ; places cpuid string in ebx:ecx:edx _(" lea edi, [sBuffer]") ; Load address (ptr) of sBuffer into edi _(" mov [edi], ebx") ; first 4 chars of cpuid _(" mov [edi + 4], edx") ; _(" mov [edi + 8], ecx") _(" mov dword[edi + 12], 0") _($_dbg('str:edi')) ; will print cpuid string. Edi is holding ptr to local var "sBuffer" _(' mov edx, [parm1]') _(' add edx, [parm2]') _($_dbg('struct:addr tTest:' & $sTag)) ; printing structure. Only type that expects an extra ':' to specify "$sTag" _(' mov ebx, 77') _(' rdtsc') _($_dbg('str:"-----------------------"')) _($_dbg('uint*:addr iTestUnsigned')) ;testing passing a reference to it _($_dbg('uint:[iTestUnsigned]')) ; should be same _($_dbg('str:"-----------------------"')) _(' rdtsc') ; this places 64bit timestamp value in eax|edx _($_dbg('dword:ebx,uint:[iTestUnsigned],str:addr stest,dword:eax,uint64:eax|edx')); debugging multiple vars. printing 64bit timestamp stored in eax|edx _(' rdtsc') _($_dbg('uint64:eax|edx,dword:[parm1]')) _(' ret') _('endp') Console Output: expandcollapse popup######################### [iTestSigned]: -12345 [iTestUnsigned]: 1234567 edi: GenuineIntel Struct addr tTest: ==================================================================== # Member Offset Type Size Value ==================================================================== - - 0x005EF118 <struct> 0 - 1 x 0 BYTE 1 11 - - - <alignment> 1 - 2 y 2 short 2 22 3 sNote 4 CHAR[13] 13 Test Msg - - - <alignment> 3 - 4 odd 20 long[3] 12 (4) [1] 333 - - 24 - - [2] 444 - - 28 - - [3] 555 5 w 32 WORD 2 77 - - - <alignment> 2 - 6 p 36 DWORD 4 6666 7 ext 40 CHAR[3] 3 exe - - - <alignment> 1 - 8 finish 44 WORD 2 1234 - - - <alignment> 2 - - - 0x005EF148 <endstruct> 48 - ==================================================================== ----------------------- iTestUnsigned*: 1234567 [iTestUnsigned]: 1234567 ----------------------- ebx: 77 [iTestUnsigned]: 1234567 stest*: This is a local string test eax: 3773413439 eax|edx: 3154339424745535 eax|edx: 3154339424841536 [parm1]: 5 Update 9/14/2019: I added some additional commenting/renaming options for _fasmg_Debug and also added a small example showing a new trick I learned recently used in position independent coding (pic) to establish where you are in memory. For the debugging portion you can now also optionally change the name/comment of what you are printing by adding a semicolon ";" with new name/comment. Normally the function will just use the variable name. This replaces it with whatever you put after ";". Lets say here some_var = 6 _fasmg_Debug('dword:[some_var]') ; would print to console -> [some_var]: 6 _fasmg_Debug('dword:[some_var];value of some_var') ; would print to console -> value of some_var: 6 For the trick I learned, I like this as an alternative to passing the address of the structure to your function if your using any kind of global/static variable like data reservations. Heres a partial snip from original _Ex_GlobalVars example of what Im talking about. pMem is the address of the dllstructure holding are binary code and is passed as a parameter so we can calculate the address of g_Var1: _('procf _main uses ebx, pMem, pConsoleWriteCB, parm1') _(' mov ebx, [pMem]') ; This is where are code starts in memory. _(' mov [ebx + g_Var1], 111') The alternative way can be done with these three lines below. Whenever the x86 instruction call is executed, the address of where the function needs to return to when finished (which will be the next line) is always pushed on to the stack. This is normally complemented with the ret instruction which pops that value off the stack and returns you back where you need to go next. Here we never execute ret and pop the address off to eax. delta is then subtracted from that address giving us the original starting point. _('call delta') _('delta: pop eax'); eax now equals address of delta _('sub eax, delta'); eax now equals start of memory Heres the full example I added to the zip. This also shows how with fasmg, you get true namespaces and makes any data reservation inside procedures more like static variables. Here both _subfunc1 and _subfunc2 use the same names for label delta and data reservation var. I call the function 3 times just to show the variables holding on to there values between calls. In this example the life of those variables would die when the autoit function exits since I only declared $tbinary as local and not static so keep that it mind. Also notice that I defined those with a 0 and not "?". I dont have a example but without the 0 the data is just empty and there for I think it could cause the dllstructure to not have enough bytes define to hold that extra space for variables. Using 0 will always fill the data out so are BinaryLen() calculation will match up. expandcollapse popupFunc _Ex_CurrentPOS() $g_sFasm = '' _('procf _main') _(' stdcall _subfunc1') _(' stdcall _subfunc2') _(' ret') _('endp') _('proc _subfunc1') _(' call delta'); Calling anything places the address of the next instruction on the stack. we just happen to be calling the next line _(' delta: pop eax');in this function the label delta is equal to value 18 which is the offset or distance in bytes from _main _(' sub eax, delta'); after poping the address of delta (not offset - real address) to eax we subtract delta to give us the address of _main _(' add [eax+var], 2') _(' add [eax+g_var], 2') _( $_dbg('dword:[eax+var];subfunc1 var')) _( $_dbg('dword:[eax+g_var];subfunc1 g_var')) _(' stdcall _subfunc2') _(' ret') _(' var dd 0') _('endp') _('proc _subfunc2') _(' call delta') _(' delta: pop eax'); here delta=47 _(' sub eax, delta') _(' add [eax+var], 3') _(' add [eax+g_var], 3') _( $_dbg('dword:[eax+var];subfunc2 var')) _( $_dbg('dword:[eax+g_var];subfunc2 g_var')) _(' ret') _(' var dd 0') _('endp') _('g_var dd 0') Local $tBinary = _fasmg_Assemble($g_sFasm,1) If @error Then Exit (ConsoleWrite($tBinary & @CRLF)) DllCallAddress('dword', DllStructGetPtr($tBinary)) DllCallAddress('dword', DllStructGetPtr($tBinary)) DllCallAddress('dword', DllStructGetPtr($tBinary)) EndFunc Edited September 15, 2019 by Beege code comments Assembly Code: fasmg . fasm . BmpSearch . Au3 Syntax Highlighter . Bounce Multithreading Example . IDispatchASMUDFs: Explorer Frame . ITaskBarList . Scrolling Line Graph . Tray Icon Bar Graph . Explorer Listview . Wiimote . WinSnap . Flicker Free Labels . iTunesPrograms: Ftp Explorer . Snipster . Network Meter . Resistance Calculator
Beege Posted October 15, 2019 Author Posted October 15, 2019 (edited) X64 Support: The other day @UEZ gave me some motivation to get x64 working in my udf. At this time there is no fasmg.dll that runs in 64bit mode, but much like the original fasm.dll, the 32bit fasmg.dll is perfectly capable of generating x64 bit code since generating code vs executing code are two different things. So to get x64 working, a simple method I implemented is to just pass the source over to a 32bit script, generate the code and pass it back. I tried the same thing years ago and it did work when it comes to the 64bit code generation. The problems I had were how I lost so much the debugging options I had in place when running in 32bit mode and defeated a lot of the library's purpose of making asm code writing easier. This time I am happy to report that I'm able to get it all working and not lose any of the debugging/format options that are all in place for 32bit. All the macros I have been showing in 32bit exist for 64bit just the same and I have added code to make the proper selection. I've only written the one debugging example for testing purposes so far but covers a decent amount of the functionality plus took me a solid weekend to get there. Here are some of the basics you should know about x64 calling conventions and pitfalls that I learned a so far. First is fastcall. There is really only one call type and that is fastcall. With fastcall the first 4 parameters are passed in rcx,rdx,r8,r9. Space is still reserved on the stack for the first 4 parms, but they are empty at first and you can choose to transfer the variables to the space once in the function. rax, rcx, rdx, r8-r11 are all volatile registers so if you are going to be calling another function from within your function or using those registers for other purposes, then you should backup the parameter values in that space. For the other registers that are nonvolatile (rbx, rdi, rsi, r12-r15), its important to save those values using the "uses" operator if you are using them ANYWHERE in your function. x64 seems way less forgiving in that aspect then 32bit did. _('procf _main uses rbx rdi, parm1, parm2, parm3, parm4') _(' mov [parm1], rcx') _(' mov [parm2], rdx') _(' mov [parm3], r8') _(' mov [parm4], r9') Two items that tripped me up for a little bit. One is calling a function if the ptr is in rcx. I was passing a callback pointer as my first parameter discovering this. Its kinda obvious now that problem is tied in with ecx being a parameter but was not thinking about it at the time. The other issue is trying to persevere any kind of volatile register between a function call via push/pop. For whatever reason any kind of push R#X; call func; pop R#X fails with some kind of stack alignment issue. You can get around this by either backing up the values in a local variable or transfer whatever volatile register to r12-15 if not being used. If you know what the problem is there please let me know. ;use of push/pop is ok _(' push rdx') ; _(' pop rcx') ; _(' fastcall rax') ; call some function _(' push rax') ; _(' pop rcx') ; ;its calling a function when ever the stack is not balanced that fails _(' push rdx') ; preserve rdx _(' fastcall rax') ; call some function _(' pop rdx') ; restore First post has been updated. Reporting issues, questions, comments. Believe it or not these are all allowed Update - 10/26/2019 I got stack alignment issues figured out. The key is keeping the stack 16 byte aligned and its very easy. Just always push the values you want to preserve in mulitples of 2. In the above code when I only push rdx (1 register) so the alignment is now on 8. We just need to push the value once more or you could also add a 'sub rsp, 8' statement but a little more code. Here we are pushing 2 registers so this is aligned ok: _(' push rcx') ; preserve rcx _(' push rdx') ; preserve rdx _(' fastcall rax') ; call some function _(' pop rdx') ; restore _(' pop rcx') ; restore Here we would have an odd number of 3 so Ill push the last one twice _(' push rax') ; preserve rax _(' push rcx') ; preserve rcx _(' push rdx') ; preserve rdx _(' push rdx') ; preserve rdx ; <===========Extra push _(' fastcall rax') ; call some function _(' pop rdx') ; restore ; <===========Extra pop _(' pop rdx') ; restore _(' pop rcx') ; restore _(' pop rax') ; restore With that figured out the _fasm_debug function now properly preserves all the volatile registers so you can see whats in them without wiping other registers. First post updated Edited October 27, 2019 by Beege UEZ 1 Assembly Code: fasmg . fasm . BmpSearch . Au3 Syntax Highlighter . Bounce Multithreading Example . IDispatchASMUDFs: Explorer Frame . ITaskBarList . Scrolling Line Graph . Tray Icon Bar Graph . Explorer Listview . Wiimote . WinSnap . Flicker Free Labels . iTunesPrograms: Ftp Explorer . Snipster . Network Meter . Resistance Calculator
UEZ Posted October 15, 2019 Posted October 15, 2019 Thanks for the update @Beege. Meanwhile I've solved my x64 addressing issue and the code seams to run properly for 15/16-bit images. Please don't send me any personal message and ask for support! I will not reply! Selection of finest graphical examples at Codepen.io The own fart smells best! ✌Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!¯\_(ツ)_/¯ ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ
Beege Posted October 15, 2019 Author Posted October 15, 2019 5 hours ago, UEZ said: Thanks for the update @Beege. Meanwhile I've solved my x64 addressing issue and the code seams to run properly for 15/16-bit images. You got it. Good job on your x64 code! I wish it was as easy as swapping R's with E's... but that is certainly not the case! Assembly Code: fasmg . fasm . BmpSearch . Au3 Syntax Highlighter . Bounce Multithreading Example . IDispatchASMUDFs: Explorer Frame . ITaskBarList . Scrolling Line Graph . Tray Icon Bar Graph . Explorer Listview . Wiimote . WinSnap . Flicker Free Labels . iTunesPrograms: Ftp Explorer . Snipster . Network Meter . Resistance Calculator
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now