[SOLVED] FlateDecode , Zlib , PDF

DeltaRocked · July 25, 2011

Hello ,

I have been trying to parse a PDF and decode Javascript which is embedded within the PDF. The problem I am facing is with the decoding of the Flatedecode object.

I have been using zlib_udf.au3 by w00ter

and also Zlib function provided by Ward

without any success.

while using w00ter's udf am getting -3 as the error i.e. $Z_DATA_ERROR

any inputs ?

regards

Deltarocked

The SOLVED code. Huge thanks to ProgAndy and Ward.

Post no 7 by ProgAndy : use the modifications done by ProgAndy , if you are using zlib1.dll

Code which needs to be modified in the zlib_udf.zu3

; Decompresses data, you need to know how large the decompressed data will be.
Func _Zlib_Uncompress($CompressedPtr, ByRef $CompressedSize, $UncompressedPtr, $UncompressedSize)
    ; modified by ProgAndy
    $call = DllCall($Zlib_Dll, "int:cdecl", "uncompress", "ptr", $UncompressedPtr, "long*", $UncompressedSize, "ptr", $CompressedPtr, "long", $CompressedSize)
    If @error Then Return SetError(1,0,-7)
    $CompressedSize = $call[2]
    Return $call[0]
EndFunc   ;==>_Zlib_Uncompress

Func _ZLib_UncompressBinary($bBinary, $iLength = 0)
    ; ProgAndy
    Local $i=1, $tBuf, $iSize, $iRes
    Local $tBin = DllStructCreate("byte[" & BinaryLen($bBinary) & "]")
    DllStructSetData($tBin, 1, $bBinary)
    If $iLength < 1 Then $iLength = DllStructGetSize($tBin) * 2
    $bBinary = 0
    Do
        $tBuf = DllStructCreate("byte[" & $iLength * $i & "]")
        $iSize = DllStructGetSize($tBin)
        $iRes = _Zlib_Uncompress(DllStructGetPtr($tBin), $iSize, DllStructGetPtr($tBuf), DllStructGetSize($tBuf))
        $i += 1
    Until $iRes <> -5
    If $iRes <> 0 Then Return SetError($iRes, 0, "")
    $tBin = 0
    Return DllStructGetData(DllStructCreate("byte[" & $iSize & "]", DllStructGetPtr($tBuf)), 1)
EndFunc

#include<string.au3>
#include <array.au3>
#include "zlib_udf.au3"
;~ #include "zlib.au3"

;~ IOS
;~ $file = 'C:\pdf\ios_poc\'
;~ $file &= 'iPad1,1_3.2.1.pdf'

;~ Contagio
;~ $file = 'C:\pdf\contagio\'
;~ $file &= 'invitation.pdf'
;~ $file &= 'RB.pdf'
;~ $file &= 'SB.pdf'

;~ INFECTED
;~ $file = 'C:\pdf\infected\'
;~ $file &= '116d92f036f68d325068f3c7bbf1d535.pdf'
;~ $file &=  '0_infect_invitation.pdf'

;~ POC
;~ $file = 'c:\pdf\poc\'
;~ $file &= 'eicar.pdf'
;~ $file &= 'goodness.pdf'
;~ $file &= 'hello-world-reverse-uri8.pdf'
;~ $file &= 'launch-action-cmd.pdf'
;~ $file &= 'testx.pdf'

;~ ADOBE INFECTED
;~ $file = 'C:\pdf\infected\adobe-0day\'
;~ $file &= '721601bdbec57cb103a9717eeef0bfca'

;~ Normal PDFs
$file = 'C:\pdf\'
;~ $file = 'a4_1008-Form23AC.PDF'
;~ $file &= 'a3_R-intro.pdf'
$file &= 'a2.pdf'
;~ $file &= 'ab.txt'

$start_pt = '(?i) obj'
$start_obj = '(?i)\d* \d* obj'
$end_pt = '(?i)endobj'
_Zlib_Startup()

_CountPDFObj($file, $start_pt, $end_pt)

;~ _Zlib_Shutdown()

Exit

Func _CountPDFObj($fullfilename, $start_pt, $end_pt)
    Local $strpos = 0, $length = 10, $count_loop, $ex_data, $Decompressed, $header, $binlen
    Local $start_ex_pt = '(?i)>>\s*stream' ; & '\r\n' ;at the end of the string. this will include @CR @LF in the search.
    Local $end_ex_pt = '(?i)endstream' ;'\r\n' & ; at the start of the string
    If Not FileExists($fullfilename) Then Return SetError(1, 0, 0)
    $sData = FileRead($fullfilename)
    $start_array = StringRegExp($sData, $start_pt, 3)
    $start_obj_array = StringRegExp($sData, $start_obj, 3)
    $end_array = StringRegExp($sData, $end_pt, 3)
    FileDelete('c:\pdf\test.log')
    FileWrite('c:\pdf\test.log', 'Analyzing ' & $fullfilename & @CRLF)

    If IsArray($start_array) And IsArray($end_array) Then
        If UBound($start_array) == UBound($end_array) Then
            $count_loop = 1
            While $count_loop <= UBound($start_array)
                $start_pos = StringInStr($sData, $start_array[$count_loop - 1], 2, $count_loop) + StringLen($start_array[$count_loop - 1])
                $end_pos = StringInStr($sData, $end_array[$count_loop - 1], 2, $count_loop)
                $ex_data = StringMid($sData, $start_pos, $end_pos - $start_pos)
                If StringInStr($ex_data, 'stream', 2) And StringInStr($ex_data, '/flatedecode', 2) And StringInStr($ex_data, '/predictor', 2) == 0 And StringInStr($ex_data, '/BBox', 2) == 0 And StringInStr($ex_data, '/ASCIIHexDecode', 2) == 0 Then
                    $start_extract_array = StringRegExp($ex_data, $start_ex_pt, 3)
                    $end_extract_array = StringRegExp($ex_data, $end_ex_pt, 3)
                    If IsArray($start_extract_array) And IsArray($end_extract_array) Then
                        $start_ex_pos = StringInStr($ex_data, $start_extract_array[0], 2, 1) + StringLen($start_extract_array[0])
                        $end_ex_pos = StringInStr($ex_data, $end_extract_array[0], 2, 1)
                        $ex_ex_data = StringStripWS(StringMid($ex_data, $start_ex_pos, $end_ex_pos - $start_ex_pos), 3)
                        $binlen = BinaryLen($ex_ex_data)
                        $header = StringStripWS(StringLeft($ex_data, $start_ex_pos), 7) ; used for writing logs and has got nothing to do with the exracted stream
                        $Decompressed = zlib($ex_ex_data, $binlen)
;~                      If StringInStr($Decompressed, '/javascript', 2) <> 0 Or StringInStr($Decompressed, 'else if', 2) <> 0 Then
;~                          MsgBox(0,'Info','JS Decrypted')
                        FileWrite('c:\pdf\test.log', $start_obj_array[$count_loop - 1] & @CRLF & $header & @CRLF & _
                                'BinaryLen of the extracted compressed stream = ' & $binlen & @CRLF & '-------------------------' & _
                                @CRLF & StringReplace($Decompressed, '>><<', '>>' & @CRLF & '<<') & @CRLF & '-------------------------' & @CRLF)
;~                      EndIf
                    EndIf
                EndIf
                $count_loop += 1
            WEnd
            Return UBound($start_array)
        Else
            Return SetError(1, 0, 0)
        EndIf
    Else
        Return SetError(2, 0, 0)
    EndIf
EndFunc   ;==>_CountPDFObj


Func zlib($ex_ex_data, $binlen) ; requires zlib1.dll, zlib_udf.au3 and modification by progandy
    $Decompressed = _Zlib_UncompressBinary($ex_ex_data, $binlen)
    If StringLeft($Decompressed, 2) == '0x' Then
        $Decompressed = _HexToString($Decompressed)
        Return $Decompressed
    Else
        $Decompressed = _HexToString($Decompressed)
        Return $Decompressed
    EndIf
EndFunc   ;==>zlib


;~ Func zlib($ex_ex_data, $binlen) ; requires zlib.au3 by WARD
;~  Dim $Decompressed = _ZLIB_Uncompress($ex_ex_data)
;~  If StringLeft($Decompressed, 2) == '0x' Then
;~      $Decompressed = _HexToString($Decompressed)
;~      Return $Decompressed
;~  Else
;~      $Decompressed = _HexToString($Decompressed)
;~      Return $Decompressed
;~  EndIf
;~ EndFunc   ;==>zlib

PS: this is a part of the tool I have been working on which will be used for PDF analysis. extraction of the DeflateDecode stream is complete but am unable to decode it.

Edited July 29, 2011 by deltarocked

ProgAndy · July 25, 2011

Hello,

If you want an answer, you should add an example script and a PDF file to test it.

DeltaRocked · July 25, 2011

Hi Progandy,

My problem is that these pdfs are infected ... so would it be alright if I just upload the code and a link for the pdfs? cause I do want to end up getting banned for uploading something malicious ...

regards

deltarocked...

With this am getting -5 i.e. Z_BUF_ERROR or sometimes -3 ie. Z_DATA_ERROR and very rarely does it decode . The string which it is able to decode is as follows and has been extracted from an infected PDF file.

uploaded the extracted PDF log ....

searching for 116d92f036f68d325068f3c7bbf1d535.pdf in google will provide you with the link.

Edited July 29, 2011 by deltarocked

ProgAndy · July 25, 2011

You must not use StringStripWS on the compressed data.

The format is stream@LF{{DATA}}@LFendstream. You need the unmodified data between stream@LF and @LFendstream and decompress it:

$data = StringMid($sFile, $posOfStream + 7, StringInStr($sFile, @LF & "endstream", 1, 0, $posOfStream))
$data = StringToBinary($data, 1)
$UncompressedLength = ; I think this is the value of /Length1 in the obj-descriptor. ( /Length1 {{Length}} )
$decompress = _Zlib_UncompressBinary($data, $UncompressedLength)
MsgBox(0, "", BinaryToString($decompress))

If /Length1 is not availbale or incorrect, try this: In the beginning, use BinrayLen($data)*2 and each time Z_BUF_ERROR occurs, double the uncompressed size.

Edited July 25, 2011 by ProgAndy

DeltaRocked · July 25, 2011

Hi progandy,

thanks for the input.

rgds

delta rocked...

DeltaRocked · July 26, 2011

Hi,

something is really wrong with this code . ab.txt contains the execution code . But the problem is obj 53 is decoded while all other give -5 error (Z_BUF_ERROR)

Decoded Objects with this code:

53 0 obj

Decoded Text within quotes:

"37 0 <</IDS 20 0 R/Javascript 50 0 R/URLS 21 0 R>>"

After decoding we learn that the next object to be read and executed is -- 50 0 obj

From this object we are pointed to

51 0 obj

this object reveals to us that Adobe reader needs to execute a /JS i.e. javascript which is available in the object 52 0 obj

why only one object gets decoded ?

[EDIT UPDATE]

I have been analysing the PDF using python tools and even that is not able to decode some of the section.... so I might be wrong ...

Edited July 29, 2011 by deltarocked

ProgAndy · July 26, 2011

$binlen must be the size of the uncompressed data. You use the size of the compressed data. THat causes problems, since compression reduces the size and as a result, your buffer is too small.

Edit: I modified the functions to automatically adjust the size of the buffer if it is too small:

; Decompresses data, you need to know how large the decompressed data will be.
Func _Zlib_Uncompress($CompressedPtr, ByRef $CompressedSize, $UncompressedPtr, $UncompressedSize)
    ; modified by ProgAndy
    $call = DllCall($Zlib_Dll, "int:cdecl", "uncompress", "ptr", $UncompressedPtr, "long*", $UncompressedSize, "ptr", $CompressedPtr, "long", $CompressedSize)
    If @error Then Return SetError(1,0,-7)
    $CompressedSize = $call[2]
    Return $call[0]
EndFunc   ;==>_Zlib_Uncompress

Func _ZLib_UncompressBinary($bBinary, $iLength = 0)
    ; ProgAndy
    Local $i=1, $tBuf, $iSize, $iRes
    Local $tBin = DllStructCreate("byte[" & BinaryLen($bBinary) & "]")
    DllStructSetData($tBin, 1, $bBinary)
    If $iLength < 1 Then $iLength = DllStructGetSize($tBin) * 2
    $bBinary = 0
    Do
        $tBuf = DllStructCreate("byte[" & $iLength * $i & "]")
        $iSize = DllStructGetSize($tBin)
        $iRes = _Zlib_Uncompress(DllStructGetPtr($tBin), $iSize, DllStructGetPtr($tBuf), DllStructGetSize($tBuf))
        $i += 1
    Until $iRes <> -5
    If $iRes <> 0 Then Return SetError($iRes, 0, "")
    $tBin = 0
    Return DllStructGetData(DllStructCreate("byte[" & $iSize & "]", DllStructGetPtr($tBuf)), 1)
EndFunc

Edited July 26, 2011 by ProgAndy

DeltaRocked · July 26, 2011

$binlen must be the size of the uncompressed data. You use the size of the compressed data. THat causes problems, since compression reduces the size and as a result, your buffer is too small.

Hi ProgAndy,

its done, will be posting the complete code for analyzing PDF very soon. Thanks for your patience.

Thanks a Million.

regards

DeltaRocked.

[uPDATE]

Ran into a small problem with /BBOX ... anyway its not of a concern as nothing can be hidden inside the TextInput Box structure construct ... ROFL .... this had me taken by surprise ... was wondering why I was getting 0 as the return value.

[uPDATE]

tested both ZLIB udfs

A: (monoceres - edited by ProgAndy) and

B: Ward

same results ... wondering where am i going wrong ? will be posting about the python result shortly...

Edited July 27, 2011 by deltarocked

ReFran · July 27, 2011

Mmmh,

I really wonder that Zlib can be used to decompress and or decrypt a pdf.

Is that real the right tool for that ??

However for compress/encrypt and decompress/decrypt for a PDF you can use PDFTK.exe, a commandline tool for the handling of pdfs.

best regards, Reinhard

DeltaRocked · July 29, 2011

Mmmh,
I really wonder that Zlib can be used to decompress and or decrypt a pdf.
Is that real the right tool for that ??
However for compress/encrypt and decompress/decrypt for a PDF you can use PDFTK.exe, a commandline tool for the handling of pdfs.
best regards, Reinhard

Hi,

yes it is used.... very soon will be uploading the code for analysing the PDF .... once this is over will be going ahead with ASCII85 decode routine....

PDFTK is good but it require manual intervention and there are loads of python scripts available but this is autoit ... and I need an analyzer ...

Regards

Deltarocked

Edited July 29, 2011 by deltarocked

Avee · November 27, 2013

Hi,

I am trying to use this code to get some text out of a pdf.

Unfortunately, my script never executes the

    $end_array = StringRegExp($sData, $end_pt, 3)

correctly. It will stop evaluating the $sData input to the StringRegExp as soon as it hits a 00h value.

As soon as I point it to a spot further on in $sData as such:

    $end_array = StringRegExp( StringMid($sData,3936,1000), $end_pt, 3)

it will find the $end_pt expression.

So the expression is in the $sData string, but StringRegExp refuses to evaluate the whole string.

How can I fix this? Surely the OP must have hit some 00h values in his pdfs as well?

Edited November 27, 2013 by Avee

Avee · December 6, 2013

It is still a mystery to me how the OP got the regex to find the end of a stream. I completely rewrote the code. I have a slightly different requirement, I want to get text data out of a stream.

I tried to cut down on the regex functions. Since the regex selfdestructs on hitting a null byte in most streams and I move the pointer after each found stream, the function is not that expensive I think. On my four year old laptop it chews through 8 megabytes in 5-6 seconds. This code doesn't drill down into the tagging, it just searches for a typical start of a stream with a length declaration, and then trusts that length declaration to find the end of the stream. The regex repeats on the rest of the data that is past the found stream. This probably is a dirty way to do it, but it seems to work for me with a variety of pdfs.

; Find Streams within a pdf and return contents assuming text string
#include <Array.au3>
#include "zlib_udf.au3"

Dim $stream_Array[1]
Dim $streamtext
Dim $x
_Zlib_Startup("zlib1.dll")

$stream_Array = GetPdfStreamContent(FileOpenDialog("select pdf File", @MyDocumentsDir, "Adobe PDF Files (*.pdf)")); calls the function that extracts the streams

For $x = 1 To UBound($stream_Array) - 1

    $Streaminput = BinaryToString(_Zlib_UncompressBinary($stream_Array[$x], 0))
    If StringIsASCII ( $Streaminput ) Then $streamtext &= $Streaminput

Next

FileWrite( "result.txt", $streamtext  )


Func GetPdfStreamContent($fullfilename) ;Finds streams within a PDF. Returns an Array with the streams starting at $strm_Array[1]
    Local $start_obj = '(?i)(?s)\d* \d* obj[\n|\r]<<.*/Length (\d+).*>>\n*stream\r*\n' ;this is regex, (\d+) will return length, @extended the position of the stream
    Local $offset = 0
    Local $strm_Array[1]

    $sData = FileRead($fullfilename)

    While 1
            ;Find the stream via regex. We get the length declaration since the regex will die at null bytes, making it impossible to find endstream tags
        $result_array = StringRegExp(StringTrimLeft($sData, $offset), $start_obj, 1)

        If @error = 0 Then ;A stream with length declaration was found

            $offset += @extended

            ;store stream in array
            _ArrayAdd($strm_Array, StringMid($sData, $offset, $result_array[0]))

            ;sometimes the stream length is wrong due to missing @cr at start, this will leave a @LF at the end. We delete it here
            $strm_Array[UBound($strm_Array) - 1] = StringStripWS($strm_Array[UBound($strm_Array) - 1], 2)

            ;Advance offset past the end of the stream, so that the regex won't run into a null byte prior to the next stream
            $offset += StringLen($strm_Array[UBound($strm_Array) - 1])

        Else
            ExitLoop
        EndIf

    WEnd

    Return $strm_Array
EndFunc   ;==>GetPdfStreamContent

I guess one could also use it to extract image content, depends on how you handle the returned data.

Edited December 7, 2013 by Avee

[SOLVED] FlateDecode , Zlib , PDF

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members