Jump to content

Recommended Posts

Posted (edited)

HiHo Forum,

in my program SMF - Search my Files I use a special method to calculate fake (and fast) md5 hashes to identify duplicate files. With the standard settings SMF reads the first 8KB, 8KB from the middle and the last 8KB from a file and calculates a md5 hash on the data. Now I've noticed that the larger the files, the slower the calculation (makes sense :D). I assume that's because also the FileSetPointer() operation takes it's time for large files. Calculation drops from somewhere about 300 files / sec to 30 files / sec (of course depending on the filesize and also your machine's power).

Now I thought about a way to further improve search & hashing speed and stumbled over the possibility for an "Overlapped" (that's Microsoft's term for asynchronous) file access. You request several portions of a file and wait for the results to show up, the ReadFile() function itself returns instantaneously and does not wait for the return buffer to be filled (opposed to the standard behavior where the function only returns when the operation has been finished).

Currently in SMF I'm requesting the 3 blocks of data and after they've been read I calculate the hash. What I will try out and test is to request all 3 blocks in overlapped mode and start calculating the hash for each block independently (maybe even in a different thread as pointed out by Ward in :oops: ) as soon as they pop up. My assumption is that this might improve the performance for large files (> 10MB? I just have to give it a shoot), for smaller files I think the standard operation should be superior.

As I've not found any example for asynchronous file access on the forum I thought I just post the WIP code for those interested to take a look and help me improve it :rip:... It's just a crude and raw example, but at least it works (on my machines anyhow).

Yashied's most excellent is required for the example to work.

#region ;**** Directives created by AutoIt3Wrapper_GUI ****
#AutoIt3Wrapper_UseUpx=n
#AutoIt3Wrapper_UseX64=n
#AutoIt3Wrapper_Res_requestedExecutionLevel=asInvoker
#endregion ;**** Directives created by AutoIt3Wrapper_GUI ****
; [url="http://msdn.microsoft.com/en-us/library/windows/desktop/aa365467(v=vs.85).aspx"]http://msdn.microsoft.com/en-us/library/windows/desktop/aa365467(v=vs.85).aspx[/url]
; [url="http://msdn.microsoft.com/en-us/library/windows/desktop/aa365683(v=vs.85).aspx"]http://msdn.microsoft.com/en-us/library/windows/desktop/aa365683(v=vs.85).aspx[/url]
; [url="http://support.microsoft.com/kb/156932"]http://support.microsoft.com/kb/156932[/url]
#include <StructureConstants.au3>
#include <WinAPIEx.au3>
#include <Memory.au3>
#include <array.au3>

Global $nBytes, $hFile
Global Const $ERROR_IO_INCOMPLETE = 996 ; Overlapped I/O event is not in a signaled state
Global Const $ERROR_IO_PENDING = 997 ; Overlapped I/O operation is in progress
$sFile = FileOpenDialog("Select a large file to open for overlapped (asynchronous) reading...", StringLeft(@WindowsDir, 3), "All (*.*)", 3)
If @error Then Exit
$sFile = StringReplace($sFile, "|", @CRLF)
If FileGetSize($sFile) < 1024 * 100 Then
    MsgBox(0, "", "Larger than 100kb would make sense...")
    Exit
EndIf
$aDrive = _WinAPI_GetDriveNumber(StringLeft($sFile, 2))
$aData = _WinAPI_GetDriveGeometryEx_RO($aDrive[1])
ConsoleWrite(_WinAPI_GetLastErrorMessage() & @TAB & @error & @CRLF)
; 'Bytes per Sector: ' & $aData[4]
; FILE_FLAG_NO_BUFFERING
; File access sizes, including the optional file offset in the OVERLAPPED structure, if specified,
; must be for a number of bytes that is an integer multiple of the volume sector size
$iBytesToRead = 16 * $aData[4]; ~ 8.192 bytes with 512 bytes per sector
; Because buffer addresses for read and write operations must be sector-aligned, the application must have direct control of how these buffers are allocated.
; One way to sector-align buffers is to use the VirtualAlloc function to allocate the buffers
$pBuffer_Mem = _MemVirtualAlloc(0, $iBytesToRead, $MEM_COMMIT, $PAGE_READWRITE) ;
$tBuffer = DllStructCreate("byte[" & $iBytesToRead & "];byte[" & $iBytesToRead & "];byte[" & $iBytesToRead & "]")
; $GENERIC_READ = 0x80000000
; $FILE_ATTRIBUTE_NORMAL = 0x00000080
; $FILE_FLAG_OVERLAPPED = 0x40000000
; $FILE_FLAG_NO_BUFFERING = 0x20000000
Global Const $FILE_FLAG_OVERLAPPED = 0x40000000
Global Const $FILE_FLAG_NO_BUFFERING = 0x20000000
$iTimer = TimerInit()
$hFile = _WinAPI_CreateFileEx($sFile, 3, $GENERIC_READ, 7, BitOR($FILE_ATTRIBUTE_NORMAL, $FILE_FLAG_OVERLAPPED, $FILE_FLAG_NO_BUFFERING))
$iFileGetSize = _WinAPI_GetFileSizeEx($hFile)
;ConsoleWrite(@CRLF & "+ Filesize " & $iFileGetSize & @CRLF & @CRLF)
$iTimer_0 = TimerDiff($iTimer)
; Global Const $tagOVERLAPPED = "int Internal;int InternalHigh;int Offset;int OffsetHigh;int hEvent"
$tOverlapped1 = DllStructCreate($tagOVERLAPPED)
$pOverlapped1 = DllStructGetPtr($tOverlapped1)
_WinAPI_ReadFile($hFile, DllStructGetPtr($tBuffer, 1), $iBytesToRead, $nBytes, $pOverlapped1)
$iTimer_1 = TimerDiff($iTimer)
$sTimer_1 = _WinAPI_GetLastError()
$tOverlapped2 = DllStructCreate($tagOVERLAPPED)
$pOverlapped2 = DllStructGetPtr($tOverlapped2)
$iOffset = Int(($iFileGetSize / 2) - ($iBytesToRead / 2 + 1))
$iOffset = (Floor($iOffset / $aData[4])) * $aData[4]
DllStructSetData($tOverlapped2, "Offset", _WinAPI_LoDWord($iOffset)) ; Setting "Filepointer" to the middle of the file (SetFilePointer is not valid for overlapped operations)
DllStructSetData($tOverlapped2, "OffsetHigh", _WinAPI_HiDWord($iOffset))
_WinAPI_ReadFile($hFile, DllStructGetPtr($tBuffer, 2), $iBytesToRead, $nBytes, $pOverlapped2)
$iTimer_2 = TimerDiff($iTimer)
$sTimer_2 = _WinAPI_GetLastError()
$tOverlapped3 = DllStructCreate($tagOVERLAPPED)
$pOverlapped3 = DllStructGetPtr($tOverlapped3)
$iOffset = $iFileGetSize - $iBytesToRead
$iOffset = (Floor($iOffset / $aData[4])) * $aData[4]
DllStructSetData($tOverlapped3, "Offset", _WinAPI_LoDWord($iOffset)) ; Setting "Filepointer" to the end of the file (SetFilePointer is not valid for overlapped operations)
DllStructSetData($tOverlapped3, "OffsetHigh", _WinAPI_HiDWord($iOffset))
_WinAPI_ReadFile($hFile, DllStructGetPtr($tBuffer, 3), $iBytesToRead, $nBytes, $pOverlapped3)
$iTimer_3 = TimerDiff($iTimer)
$sTimer_3 = _WinAPI_GetLastError()
$sRes1 = _WinAPI_GetOverlappedResult($hFile, $pOverlapped1, $nBytes)
$sRes1 = $sRes1 & @TAB & $nBytes
$iTimer_4 = TimerDiff($iTimer)
$sTimer_4 = _WinAPI_GetLastError()
$sRes2 = _WinAPI_GetOverlappedResult($hFile, $pOverlapped2, $nBytes)
$sRes2 = $sRes2 & @TAB & $nBytes
$iTimer_5 = TimerDiff($iTimer)
$sTimer_5 = _WinAPI_GetLastError()
$sRes3 = _WinAPI_GetOverlappedResult($hFile, $pOverlapped3, $nBytes)
$sRes3 = $sRes3 & @TAB & $nBytes
$iTimer_6 = TimerDiff($iTimer)
$sTimer_6 = _WinAPI_GetLastError()
_WinAPI_CloseHandle($hFile)
$iTimer_7 = TimerDiff($iTimer)
$sTimer_7 = _WinAPI_GetLastError()
$sRes4 = _WinAPI_GetOverlappedResult($hFile, $pOverlapped1, $nBytes)
$sRes4 = $sRes4 & @TAB & $nBytes
$sRes5 = _WinAPI_GetOverlappedResult($hFile, $pOverlapped2, $nBytes)
$sRes5 = $sRes5 & @TAB & $nBytes
$sRes6 = _WinAPI_GetOverlappedResult($hFile, $pOverlapped3, $nBytes)
$sRes6 = $sRes6 & @TAB & $nBytes

_MemVirtualFree($pBuffer_Mem, $iBytesToRead, $MEM_DECOMMIT)
$iTimer = TimerInit()
$t_ReadFile_Standard = _ReadFile_Standard($sFile, $iFileGetSize)
ConsoleWrite("_WinAPI_CloseHandle - Before " & $sRes1 & @TAB & $sRes2 & @TAB & $sRes3 & @CRLF)
ConsoleWrite("_WinAPI_CloseHandle - After " & $sRes4 & @TAB & $sRes5 & @TAB & $sRes6 & @CRLF & @CRLF)
ConsoleWrite($iTimer_0 & @CRLF & $iTimer_1 & @CRLF & $iTimer_2 & @CRLF & $iTimer_3 & @CRLF & $iTimer_4 & @CRLF & $iTimer_5 & @CRLF & $iTimer_6 & @CRLF & $iTimer_7 & @CRLF & @CRLF & TimerDiff($iTimer) & @CRLF & @CRLF)
ConsoleWrite($sTimer_1 & @CRLF & $sTimer_2 & @CRLF & $sTimer_3 & @CRLF & $sTimer_4 & @CRLF & $sTimer_5 & @CRLF & $sTimer_6 & @CRLF & $sTimer_7 & @CRLF)
MsgBox(0, "", StringLeft(DllStructGetData($tBuffer, 1), 5) & StringRight(DllStructGetData($tBuffer, 1), 5) & @CRLF _
         & StringLeft(DllStructGetData($tBuffer, 2), 5) & StringRight(DllStructGetData($tBuffer, 2), 5) & @CRLF _
         & StringLeft(DllStructGetData($tBuffer, 3), 5) & StringRight(DllStructGetData($tBuffer, 3), 5) & @CRLF _
         & @CRLF & @CRLF _
         & StringLeft(DllStructGetData($t_ReadFile_Standard, 1), 5) & StringRight(DllStructGetData($t_ReadFile_Standard, 1), 5) & @CRLF _
         & StringLeft(DllStructGetData($t_ReadFile_Standard, 2), 5) & StringRight(DllStructGetData($t_ReadFile_Standard, 2), 5) & @CRLF _
         & StringLeft(DllStructGetData($t_ReadFile_Standard, 3), 5) & StringRight(DllStructGetData($t_ReadFile_Standard, 3), 5))
Func _ReadFile_Standard($sFile, $iFileGetSize, $iFlag = 0)
    ; Local $hFile = _WinAPI_CreateFile($Checksum_Filename, 2, 2, 7), $Checksum_Result, $nBytes
    ; FILE_FLAG_SEQUENTIAL_SCAN = 0x08000000
    Local $hFile = _WinAPI_CreateFileEx($sFile, 3, $GENERIC_READ, 7, $iFlag), $nBytes
    If $hFile = 0 Then Return SetError(4) ;"File was locked and could not be analyzed..."
    Local $tBuffer = DllStructCreate("byte[" & $iBytesToRead & "];byte[" & $iBytesToRead & "];byte[" & $iBytesToRead & "]")
    _WinAPI_ReadFile($hFile, DllStructGetPtr($tBuffer, 1), $iBytesToRead, $nBytes)
    $iOffset = Int(($iFileGetSize / 2) - ($iBytesToRead / 2 + 1))
    $iOffset = (Floor($iOffset / $aData[4])) * $aData[4]
    _WinAPI_SetFilePointerEx($hFile, $iOffset)
    _WinAPI_ReadFile($hFile, DllStructGetPtr($tBuffer, 2), $iBytesToRead, $nBytes)
    $iOffset = $iFileGetSize - $iBytesToRead
    $iOffset = (Floor($iOffset / $aData[4])) * $aData[4]
    _WinAPI_SetFilePointerEx($hFile, $iOffset) ; $iBytesToRead/2 +1
    _WinAPI_ReadFile($hFile, DllStructGetPtr($tBuffer, 3), $iBytesToRead, $nBytes)
    _WinAPI_CloseHandle($hFile)
    Return $tBuffer
EndFunc   ;==>_ReadFile_Standard

Func _WinAPI_GetDriveGeometryEx_RO($iDrive)

    Local $hFile = _WinAPI_CreateFileEx('.PhysicalDrive' & $iDrive, 3, 0, 0x01)

    If Not $hFile Then
        Return SetError(1, 0, 0)
    EndIf

    Local $tDGEX = DllStructCreate('int64;dword;dword;dword;dword;int64')
    Local $Ret = DllCall('kernel32.dll', 'int', 'DeviceIoControl', 'ptr', $hFile, 'dword', 0x000700A0, 'ptr', 0, 'dword', 0, 'ptr', DllStructGetPtr($tDGEX), 'dword', DllStructGetSize($tDGEX), 'dword*', 0, 'ptr', 0)

    If (@error) Or (Not $Ret[0]) Then
        $Ret = 0
    EndIf
    _WinAPI_CloseHandle($hFile)
    If Not IsArray($Ret) Then
        Return SetError(2, 0, 0)
    EndIf

    Local $Result[6]

    For $i = 0 To 5
        $Result[$i] = DllStructGetData($tDGEX, $i + 1)
    Next
    Return $Result
EndFunc   ;==>_WinAPI_GetDriveGeometryEx_RO

Edit:

Just wanted to point out that this technique only makes sense in very special cases. The default synchronous FileRead utilizes the internal cache manager and for sure is much faster (and easier) to handle than this asynchronous FileRead method and should be the choice for 99.9% of your needs. For some further info take a look at these pages:

MSDN - ReadFile function

MSDN - Synchronous and Asynchronous I/O

MSKB - Asynchronous Disk I/O Appears as Synchronous

One observation I've made is that it seems like the _WinAPI_CloseHandle($hFile) call is blocking until the operation has been finished. Before that, the progress of the single requests can be monitored using the _WinAPI_GetOverlappedResult() function.

Best Regards

Edited by KaFu
Posted

Very interesting KaFu. Thanks :D

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Posted (edited)

Just realized that the example fails on Win7. The reason is, that the _WinAPI_GetDriveGeometryEx() function requires Admin rights to work. I've changed the requested access rights for the _WinAPI_CreateFileEx() in that function from 0x80000000 ($GENERIC_READ) to 0 and it works fine :D. This is also documented in MSDN here ("Note The dwDesiredAccess parameter can be zero...").

Edited by KaFu

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...