ijourneaux Posted April 4, 2019 Share Posted April 4, 2019 I am having an intermittent issue processing XML files. The XML file contains 1 <Entities> node and multiple <Entity> nodes. <?xml version="1.0"?> <Entities> <Entity RecordType="Emerson.CSI.DataImport.MHM.WaveFormData"> <Property Name="LastWrite_User" IsReadOnly="False" ValueType="System.Int32">42</Property> <Property Name="LastWrite_Time_as_String" IsReadOnly="True" ValueType="System.String">4/4/2019 5:21:20 AM</Property> <Property Name="LastWrite_Time_as_UInt" IsReadOnly="False" ValueType="System.UInt32">1554373280</Property> <Property Name="LastWrite_Time" IsReadOnly="False" ValueType="System.DateTime">4/4/2019 5:21:20 AM</Property> <Property Name="LastWrite_ProgramID" IsReadOnly="False" ValueType="System.Int16">37</Property> <Property Name="ExpertCode" IsReadOnly="False" ValueType="System.SByte">0</Property> <Property Name="ActualDate_as_UInt" IsReadOnly="False" ValueType="System.UInt32">1554373251</Property> <Property Name="ActualDate" IsReadOnly="False" ValueType="System.DateTime">4/4/2019 5:20:51 AM</Property> <Property Name="UnitsString" IsReadOnly="False" ValueType="System.String"> </Property> </Entity> <Entity RecordType="Emerson.CSI.DataImport.MHM.SpectraData"> <Property Name="LastWrite_User" IsReadOnly="False" ValueType="System.Int32">42</Property> <Property Name="LastWrite_Time_as_String" IsReadOnly="True" ValueType="System.String">4/4/2019 5:21:20 AM</Property> <Property Name="LastWrite_Time_as_UInt" IsReadOnly="False" ValueType="System.UInt32">1554373280</Property> <Property Name="LastWrite_Time" IsReadOnly="False" ValueType="System.DateTime">4/4/2019 5:21:20 AM</Property> <Property Name="LastWrite_ProgramID" IsReadOnly="False" ValueType="System.Int16">37</Property> <Property Name="IsTruePeak" IsReadOnly="False" ValueType="System.Boolean">False</Property> <Property Name="IsDigitalOverall" IsReadOnly="False" ValueType="System.Boolean">False</Property> <Property Name="IsZoom" IsReadOnly="False" ValueType="System.Boolean">False</Property> <Property Name="IsDigitallyIntegrated" IsReadOnly="False" ValueType="System.Boolean">False</Property> <Property Name="IsAWeighted" IsReadOnly="False" ValueType="System.Boolean">False</Property> <Property Name="Is3rdOctave" IsReadOnly="False" ValueType="System.Boolean">False</Property> <Property Name="CollectedInAnalyzeMode" IsReadOnly="False" ValueType="System.Boolean">False</Property> <Property Name="AnalysisFlag" IsReadOnly="False" ValueType="System.Int16">0</Property> <Property Name="OwningDCS" IsReadOnly="False" ValueType="System.Int32">4203</Property> <Property Name="ContinuationRecord" IsReadOnly="False" ValueType="System.Int32">1365089</Property> </Entity> </Entities> I would like to break up the XML file so that each XML file has only one <Entity> node. I used recommendations form another user to break the files apart. Conceptually easy Open XML file Verify that the minimum bits are there Select the <Entity> nodes Create a new XML file for each <Entity> node expandcollapse popupFunc BreakXMLFileApart($sFileXML) Local $oXmlDoc Local $aAttributes Local $oProperties Local $oProperty Local $oNode Local $iNodeCount Local $sString Local $i Local $cnt Local $sDrive, $sDir, $sFilename, $sExtension Local $aPathSplit = _PathSplit($sFileXML, $sDrive, $sDir, $sFilename, $sExtension) ;Create XML Document object and load XML file $oXmlDoc = _XML_CreateDOMDocument(Default) _XML_Load($oXmlDoc, $sFileXML) ;<== ENTER XML FILE PATH HERE If @error Then _WriteErrorLog(StringFormat("_XML_load error - @error = %s", @error) & @CRLF) _WriteErrorLog("-" & $sFileXML & " - Moved to Problem Folder") MoveFileToSubFolder($sFileXML, $ProblemFolder) return 1 EndIf ;If no specified nodes exist, log error and exit If Not _XML_NodeExists($oXmlDoc, "//Entity") Then ; ConsoleWrite("No specified nodes exist" & @CRLF) return 1 EndIf ;Get number of Entity nodes $iNodeCount = _XML_GetNodesCount($oXmlDoc, "//Entity") If ($iNodeCount > 1) Then ;Get number of Property nodes $oEntities = _XML_SelectNodes($oXmlDoc, "//Entity") $iNodeCount = @extended $cnt = 0 For $oEntity In $oEntities $cnt = $cnt + 1 $sString = "" FileWrite($sDrive & $sDir & $sFilename & "-" & $cnt & $sExtension, '<?xml version="1.0"?>' & @CRLF & "<Entities>" & @CRLF & $oEntity.xml & @CRLF & "</Entities>" & @CRLF) Next consolewrite($SaveOriginalXMLFiles & @crlf) If ($SaveOriginalXMLFiles = "True") Then consolewrite($SaveOriginalXMLFiles & @crlf) MoveFileToSubFolder($sFileXML, $OriginalXMLFolder) EndIf EndIf Return 0 EndFunc ;==>BreakXMLFileApart This works 99% of the time. Unfortunately, on that 1% of cases, the new XML file ends up with 2 identical sections <?xml version="1.0"?> <Entities> <Entity RecordType="Emerson.CSI.DataImport.MHM.WaveFormData"> <Property Name="LastWrite_User" IsReadOnly="False" ValueType="System.Int32">42</Property> ... </Entity> </Entities> <?xml version="1.0"?> <Entities> <Entity RecordType="Emerson.CSI.DataImport.MHM.WaveFormData"> <Property Name="LastWrite_User" IsReadOnly="False" ValueType="System.Int32">42</Property> ... </Entity> </Entities> It is as if FileWrite statement is getting executed twice. I know it would be better if I uploaded a working sample but the problem is intermittent. To make things even worse. One time I read the file I will see this issue. If I try a second time, it works. Appreciate any comments that can help me figure out what is going on. Link to comment Share on other sites More sharing options...
Neutro Posted April 5, 2019 Share Posted April 5, 2019 (edited) Hey @ijourneaux I think the main problem here is that you're using objects management functions which aren't easy to debug so you dont have a way to understand what your script is exactly doing step by step. If i were you I would instead read the XML file using _FileReadToArray() then locate the lines containing "<Entity" and "Entity>" and put what's between them inside a new text file which you name ".xml" This way you can add debug informations at all the steps of processing the files and you know exactly what's happening. That beeing said, you're using Filewrite with a file name as first argument which implies that if a file with the same name already exists it will APPEND this file and write the content of the second argument at the end of the file instead of erasing the content of the first file and write only what is in the second argument. Try to run this code multiple times and you'll understand what I mean: FileWrite(@ScriptDir & "\test.txt", "this is a test " & @HOUR & "H" & @MIN & "M" & @SEC & @CRLF) This means that if for any reason your program calls your XML processing function on the same file by mistake, the Filewrite function will append already existing files and make them look like duplicates inside exactly as you described. So instead of using Filewrite with a file name as first argument, you should create the file with Fileopen with 10 (create & overwrite) as second argument to prevent that from happening. This is the best reason I can think of, maybe it is, maybe it's not, if not then as I said earlier you'd better leave object processing and process the file as text yourself, it's fast and easy Edited April 5, 2019 by Neutro Identify active network connections and change DNS server - Easily export Windows network settings Clean temporary files from Windows users profiles directories - List Active Directory Groups members Export content of an Outlook mailbox to a PST file - File patch manager - IRC chat connect example Thanks again for your help Water! Link to comment Share on other sites More sharing options...
ijourneaux Posted April 11, 2019 Author Share Posted April 11, 2019 You have given me a couple of great ideas. I did not appreciate that the file write appended. I am not sure how I woul dahve the same filename twice but It looks pretty suspicious to me. Thanks for thanking the time to comment on a code snippet. I know that it isn't the best way to ask for help. Neutro 1 Link to comment Share on other sites More sharing options...
jdelaney Posted April 11, 2019 Share Posted April 11, 2019 (edited) I would do something like this...much easier: #include <File.au3> $sMyXMLFile = "test.xml" Local $oXML = ObjCreate("Microsoft.XMLDOM") $oXML.Load($sMyXMLFile) $oEntity_Nodes = $oXML.selectnodes("/Entities/Entity") $i = 1 For $oEntity in $oEntity_Nodes _FileCreate("XMLOutput." & $i & ".xml") FileWrite("XMLOutput." & $i & ".xml",$oEntity.xml) $i += 1 Next Of course, update the file name as you see fit...I just made a simple loop to make it unique. Or I would recommend proceeding using the XML udf rather than regular expressions. Just a note: there is no error handling required for how I set this up. If the xml is malformed, then $oXML will not populate with data, but the .selectnodes will not blow-up...it will just return an empty collection...then the loop will not get entered into because the collection is empty. All the XMLDOM is setup like that...as long as you don't do nested object references, which WOULD require validation that the parent node is an object...example of that would be something like this: $oXML.selectSingleNode("/Entities").selectnodes("./Entity") If there is no Entites node found, then this would blow up without an error handler. Edited April 11, 2019 by jdelaney IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now