antmar904 Posted January 2, 2020 Posted January 2, 2020 (edited) Hi. I am trying to use @jguinch XPDF UDF from here: in order to extract certain data from a PDF file that I converted to a text file but not sure how to move forward. In the text file (test.txt) that I attached to this thread I need to extract the "Asset" IP and/or the "Asset Name:" along with the details from the "Details:" section which includes "Port:", "(u)" and "(p)". Any help is greatly appreciated. Would regex be easier to accomplish this? test.txt Edited January 2, 2020 by antmar904
markyrocks Posted January 2, 2020 Posted January 2, 2020 (edited) I'm not very familiar with regex so the best path I can point you to is to open the file, fileopen(). The setup a loop that reads the file line by line comparing each line to the specific items you need. When it finds a match determine how to extract the relevant data based on the structure of the file. I only say this bc I'm not sure if the data you're expecting is beside or below. You also need to put in a way to reject lines that match but are just empty strings. Probably an easier way to do this but what I'm describing is the brute force method. Probably slow aswell. Also obviously once you find good data you are looking for you need to save it by passing it to an array and then doing whatever you need to do with it from there. Edited January 2, 2020 by markyrocks Spoiler "I Believe array math to be potentially fatal, I may be dying from array math poisoning"
markyrocks Posted January 2, 2020 Posted January 2, 2020 this is the best i could come up with bc im not exactly sure what you plan to do with the information after you sort it out or whatever. I really have no idea if this is finding everything but it seems to be working. it will definitely need to be tweaked and played around with to get it to squeeze out exactly what you are looking for, more filtering and shifting around. I could have used FileReadToArray() as well but its been awhile since i played around with files so i kinda forgot about that function. Merry christmas expandcollapse popup#include <File.au3> #include <Array.au3> Global $result[25],$result_count=0 $file=FileOpen(@ScriptDir & "\test.txt") if $file=-1 Then MsgBox('','ERROR',"file failed to open") EndIf Local $x=1 $linecount=_FileCountLines(@ScriptDir & "\test.txt") ;~ MsgBox('','line count',$linecount) while $x<>$linecount+1 Local $a[7] Local $line=FileReadLine($file,$x) ;~ MsgBox('','line',$line,1) $a[0]=StringInStr($line,"Asset Name:") ;$a is equal to the position. $a[1]=stringInStr($line,"IP ") $a[2]=stringInStr($line,"Details:") $a[3]=stringInStr($line,"Port") $a[4]=stringInStr($line,"(u)") $a[5]=stringInStr($line,"(p)") $a[6]=stringInStr($line,"Asset") Local $pos=0 for $y=0 to UBound($a)-1 ;this determines if multiple strInStr are found the lowest found position in the line.... ;~ MsgBox('',"a",$a[$y],1) if $a[$y]<>0 and $pos=0 Then $pos=$a[$y] elseif $a[$y]<>0 and $pos<>0 and $a[$y]<$pos Then $pos=$a[$y] ;~ MsgBox('',"pos",$pos,1) EndIf Next _ArrayDelete($a,"0-6") if $pos<>0 Then $Trim_Left=StringTrimLeft($line,$pos-1) ;trim off any garbabe b4 the part were looking for ;~ MsgBox('','',$Trim_Left,1) EndIf ;lets see if theres anything after what were looking for in the line..... if $pos<>0 Then $String_Split=StringSplit($Trim_Left," ") ;separates the string by spaces lol for $n=2 to $String_Split[0] ;[0] is the number of elements in the Stringsplit array, $String_Split[1] should be a found keyword so we can ignore it ;~ MsgBox('',"split strings",$String_Split[$n],1) if $String_Split[$n]<>" " Then ;the line is good should save as a result if $result_count>UBound($result)-1 Then ReDim $result[$result_count+1] EndIf $result[$result_count]=$Trim_Left ;~ MsgBox('','result',$result[$result_count],1) $result_count+=1 ExitLoop EndIf Next EndIf $x+=1 WEnd ;~ MsgBox('','linecount',$x) FileClose($file) _ArrayDisplay($result) Spoiler "I Believe array math to be potentially fatal, I may be dying from array math poisoning"
Jury Posted January 5, 2020 Posted January 5, 2020 (edited) The conversion into and out of pdf files is always going to give you inconsistent data as far as order and layout (especially if tables in the original are involved). you could fiddle around with pdftotext.exe command line options but I've had little success in these providing clean consistent data (pdf is for 'looks' and printing nothing more). This is the best I could do with regex - first trying to regularize the data lines (17 and 18) then trying to capture the data you want. note the last item in your test file has Asset after IPAddress (if you mean COMPUTERNAME as the Asset:). Anyhow here is something for you to go crazy trying to sort out - if you wish. There a hundreds of variations in regex so no doubt someone will provide different and perhaps even better examples. Joe #include <MsgBoxConstants.au3> #include <StringConstants.au3> #include <FileConstants.au3> $processing = @MyDocumentsDir & '\AutoIt_code\getter\processing\test.txt' ; Open the file for reading and store the handle to a variable. Local $hFileOpen = FileOpen($processing, $FO_READ) If $hFileOpen = -1 Then MsgBox($MB_SYSTEMMODAL, "", "An error occurred when reading the file.") EndIf ; Read the contents of the file using the handle returned by FileOpen. $sFileRead = FileRead($hFileOpen) $sFileRead = StringStripWS($sFileRead, 8) $sFileRead = StringRegExpReplace($sFileRead, '(?i)(?-s)(Asset:.*?\w*COMPUTERNAME\d*.*?)(?=Discovery:)', @CRLF & '$1') ;ConsoleWrite( $sFileRead & @CRLF) If StringRegExp($sFileRead, '(?i)(?-s)([A-Z]+COMPUTERNAME\d*)IPAddress:([\d\.]*).*?Details:(.*?)Details:(.*?)\(u\):(.*?)\(p\):(.*?)', 0) Then Local $aArray = StringRegExp($sFileRead, '(?i)(?-s)([A-Z]+COMPUTERNAME\d*)IPAddress:([\d\.]*).*?Details:(.*?)Details:(.*?)\(u\):(.*?)\(p\):(.*?)', 3) ;~ ElseIf StringRegExp($sFileRead, '(?i)(?-s)IPAddress:([\d\.]*).*?([A-Z]+COMPUTERNAME\d*).*?Details:(.*?)Details:(.*?)\(u\):(.*?)\(p\):(.*?)', 0) Then ;~ Local $aArray = StringRegExp($sFileRead, '(?i)(?-s)IPAddress:([\d\.]*).*?([A-Z]+COMPUTERNAME\d*).*?Details:(.*?)Details:(.*?)\(u\):(.*?)\(p\):(.*?)', 3) EndIf For $i = 0 To UBound($aArray) - 1 ConsoleWrite($aArray[$i] & @CRLF) Next Edited January 5, 2020 by Jury
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now