Jump to content

Comparing 2 word files


Go to solution Solved by keldepulo,

Recommended Posts

Hello everyone,

I was able to make a script that compares 2 txt files and notify me of the differences between the 2 txt files.

 

I did this by using _FileReadToArray() on each txt file and then comparing both arrays for differences.

But in the word.au3 UDF, I don't see a siilar function to _FileReadToArray().

How would I go about creating a script to compare 2 word documents and having the script telling me the differences between both word docs?

Thanks

Link to comment
Share on other sites

Comparing Word documents is much more complex than comparing simple text files.

Can you describe what you try to do? How about different formatting, tables etc.?

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

several resources at your disposal:

1. Word 2010+ built-in function to visually compare documents side-by-side (Review > Compare)

2. Word COM property or method to retrieve the content of documents (absent from the Word UDF it seems - perhaps water can comment on that?)

3. Word COM method to compare documents, with or without formatting: 

http://msdn.microsoft.com/en-us/library/ff192559(v=office.14).aspx

4. use 7zip to extract the content file from the Word file (DOCX format only!) and parse it for content

5. external utility to extract only text from Word documents (and other formats):

http://freemind.s57.xrea.com/xdocdiffPlugin/en/

6. this is also used as a plugin for WinMerge - highly recommended full featured diff tool.

perhaps other ways exist too. my favorite is WinMerge.

 

and of course as water commented, you should better describe your purpose. what changes interest you: text? formatting? tables? charts? authors? etc.

Edited by orbs

Signature - my forum contributions:

Spoiler

UDF:

LFN - support for long file names (over 260 characters)

InputImpose - impose valid characters in an input control

TimeConvert - convert UTC to/from local time and/or reformat the string representation

AMF - accept multiple files from Windows Explorer context menu

DateDuration -  literal description of the difference between given dates

Apps:

Touch - set the "modified" timestamp of a file to current time

Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes

SPDiff - Single-Pane Text Diff

 

Link to comment
Share on other sites

Thanks for replying, both of you.

Simply comparing the text difference in each doc is good enough for me. The word doc's formatting, tables, chars or authors can be ignored.

Is there a way to do it without downloading WinMerge?

Here's an example of my script that simply compares text in .txt documents.

#include <File.au3>
#include <Array.au3>
#include <GUIConstantsEx.au3>
#include <GuiListView.au3> ;For making the clear listview function work
#include <GuiListBox.au3> ;For making a guictrlcreatelist
#include <EditConstants.au3> ;Allows positioning of gui input box text
#include <StaticConstants.au3> ;Allows formatting of gui labels
#include <WindowsConstants.au3>

Global $GUI = GuiCreate("Cute Fluffy Hamster Text Compare Tool", 890, 550, 0, 0) ;Creates the GUI

Dim $File1Array[10], $File2Array[10]

Global $FileInfo1 = GUICTRLCREATEListView("Line # | File Document #1", 10, 50, 430, 385, -1) ;The keyboard information box in the middle of the GUI
_GUICtrlListView_SetColumnWidth($FileInfo1, 0, 50) ;Increases the width of the "Line #" column
_GUICtrlListView_SetColumnWidth($FileInfo1, 1, 380) ;Increases the width of the "Document #1" column
_GUICtrlListView_JustifyColumn($FileInfo1, 0, 0) ;"Places the "Document #1" word in the center

Global $FileInfo2 = GUICTRLCREATEListView("Line # | File Document #2", 450, 50, 430, 385, -1) ;The keyboard information box in the middle of the GUI
_GUICtrlListView_SetColumnWidth($FileInfo2, 0, 50) ;Increases the width of the "Line #" column
_GUICtrlListView_SetColumnWidth($FileInfo2, 1, 380) ;Increases the width of the "Document #2" column
_GUICtrlListView_JustifyColumn($FileInfo2, 0, 0) ;"Places the "Document #1" word in the center

Global $Compare = GUICtrlCreateButton("Compare!", 780, 15, 100, 30)

Global $Input1 = GUICTRLCREATEInput("", 120, 20, 265, 22, $ES_READONLY) ;The search term input field
Global $ChooseFile1 = GUICtrlCreateButton("Select File #1", 10, 15, 100, 28)

Global $Input2 = GUICTRLCREATEInput("", 505, 20, 265, 22, $ES_READONLY) ;The search term input field
Global $ChooseFile2 = GUICtrlCreateButton("Select File #2", 395, 15, 100, 28)

Global $Exit = GUICtrlCreateButton("Exit", 10, 495, 870, 50)

Global $InfoConsole = GUICTRLCREATELIST("", 10, 440, 870, 60, -1, $SS_ETCHEDFRAME) ;The console info that describes all the changes



GUISetState(@SW_SHOW) ;Makes the GUI Appear


While 1
   Sleep(10)

   Switch GUIGetMsg()


            Case $GUI_EVENT_CLOSE ;If the "x" button on the GUI is clicked then exit while loop (which will lead to the last line of code which tells GUI to close)
                ExitLoop

                  Case $Exit ;If the exit button is pushed then close the GUI
                ExitLoop

            Case $ChooseFile1
                Local $FileOpen1 = FileOpenDialog("Choose 1st File", @WindowsDir & "\", "Text (*.txt)|Documents (*.doc;*.docx)", $FD_FILEMUSTEXIST + $FD_MULTISELECT)
                If @error Then
                   MsgBox($MB_SYSTEMMODAL, "", "No file(s) were selected.")
                   FileChangeDir(@ScriptDir)
                Else
                   FileChangeDir(@ScriptDir)
                   GUICtrlSetData($Input1, $FileOpen1) ;Change input box to display the opened file location
                   _FileReadToArray ($FileOpen1, $File1Array) ;Stores the contents of the file in an array

                EndIf

            Case $ChooseFile2
                Local $FileOpen2 = FileOpenDialog("Choose 2nd File", @WindowsDir & "\", "Text (*.txt)|Documents (*.doc;*.docx)", $FD_FILEMUSTEXIST + $FD_MULTISELECT)
                If @error Then
                   MsgBox($MB_SYSTEMMODAL, "", "No file(s) were selected.")
                   FileChangeDir(@ScriptDir)
                Else
                   FileChangeDir(@ScriptDir)
                   GUICtrlSetData($Input2, $FileOpen2) ;Change input box to display the opened file location
                   _FileReadToArray ($FileOpen2, $File2Array) ;Stores the contents of the file in an array

                EndIf



            Case $Compare

            If GUICtrlRead($Input1) <> "" and  GUICtrlRead($Input2) <> "" Then ;If there's files in both input boxes then do the comparing

            _GUICtrlListView_DeleteAllItems($FileInfo1) ;Delete all current things on the first listview window
            _GUICtrlListView_DeleteAllItems($FileInfo2) ;Delete all current things on the second listview window
            _GUICtrlListBox_ResetContent($InfoConsole) ;Delete all current things on the console info window

            For $i = 1 To UBound ($File1Array) - 1

         Local $aResult = _ArrayFindAll($File2Array, $File1Array[$i]) ;Search the entire file 2 array looking for each item in file 1
         GUICtrlCreateListViewItem($i &"|"&$File1Array[$i], $FileInfo1) ;Populates the listview window with info

         if ubound($aResult) < 1 and $File1Array[$i] <> "" Then  ;If you find a string in file 2 that doesn't exist in file 1 (in other words, 0 instances of it) Then

            GUICtrlSetBkColor(-1, 0xFFFF00) ;Sets a diff background color indicating the difference

            GUICtrlSetData($InfoConsole, "File #1 contains the word '"&$File1Array[$i]&"' (as seen on line "&$i& ") while File #2 does not contain that word.")
            GUICtrlSetFont($InfoConsole,10)
            GUICtrlSetColor($InfoConsole, 0xFF0000)
            EndIf
         Next

         For $j = 1 To UBound ($File2Array) - 1

         Local $bResult = _ArrayFindAll($File1Array, $File2Array[$j]) ;Search the entire file 1 array looking for each item in file 2
          GUICtrlCreateListViewItem($j &"|"&$File2Array[$j], $FileInfo2) ;Populates the listview window with info

         if ubound($bResult) < 1 and $File2Array[$j] <> "" Then  ;If you find a string in file 1 that doesn't exist in file 2 (in other words, 0 instances of it) Then

            GUICtrlSetBkColor(-1, 0xFFFF00) ;Sets a diff background color indicating the difference_GUICtrlListBox_AddString($InfoConsole, "File #2 contains the word '"&$File2Array[$j]&"' on line "&$j& " while File #1 does not contain that word.")

            GUICtrlSetData($InfoConsole, "File #2 contains the word '"&$File2Array[$j]&"' (as seen on line "&$j& ") while File #1 does not contain that word.")
            GUICtrlSetFont($InfoConsole,10)
            GUICtrlSetColor($InfoConsole, 0xFF0000)
            EndIf
         Next




Else ;If there isn't files in both input boxes then display message
    MsgBox(0,"File(s) Needed", "Please select 2 files")

               EndIf


                EndSwitch

Wend

Even though I allowed .doc's to be an option when you select a file, don't select a doc file :P

And Yes I purposely made my exit button that big :D

Link to comment
Share on other sites

i think your best choice would be this:

5. external utility to extract only text from Word documents (and other formats):

http://freemind.s57.xrea.com/xdocdiffPlugin/en/

 

(this can be used as a plugin for WinMerge, but it is a standalone app as well, with COM support)

It's just that I prefer not to generate 2 extra files for the user, but if it has to be that way, then so be it

 

why? temp files are used all around the place, by practically all apps you can think of.

- if it's security concerns, wipe the temp files when you're done with them.

- if it's space concerns, text files are never that large to be concerned about - especially compared to their Word origin.

and these files are not for the user - you can have your user select a doc file, and before it is processed, your script can convert it to text. this is transparent to the end user.

 

EDIT: your script does not detect change of order of lines. b.t.w. it seems it checks lines, not words. you better rephrase the messages in the info console.

Edited by orbs

Signature - my forum contributions:

Spoiler

UDF:

LFN - support for long file names (over 260 characters)

InputImpose - impose valid characters in an input control

TimeConvert - convert UTC to/from local time and/or reformat the string representation

AMF - accept multiple files from Windows Explorer context menu

DateDuration -  literal description of the difference between given dates

Apps:

Touch - set the "modified" timestamp of a file to current time

Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes

SPDiff - Single-Pane Text Diff

 

Link to comment
Share on other sites

Hey orbs,

Thanks for the feedback, I'll go with the text file route since that sounds simpler and you're right about how several programs create temp files.

And yeah my script only detects changes in lines atm, not each word individually. I'm working on fine tuning it to specifically look for each individual word right now.

Also, it seems that I can't see the scrollbar in my Info Console D:

Thanks again

Edited by BlazerV60
Link to comment
Share on other sites

perhaps this is what you need?

#NoTrayIcon

$prog = "WordCMP"

if $CmdLine[0] < 2 then
    msgbox(0, $prog, "Use:  " & $prog & ' "<Full Path Name 1>" "<Full Path Name 2>"', 10 )
    exit
endif

$doc1 = $CmdLine[1]
$doc2 = $CmdLine[2]

if FileExists ( $doc1 ) == 0 then
    msgbox(0, $prog, "File " & $doc1 & " not found!")
    exit
endif

if FileExists ( $doc2 ) == 0 then
    msgbox(0, $prog, "File " & $doc2 & " not found!")
    exit
endif

RegWrite("HKEY_CURRENT_USER\Software\Classes\CLSID\{000209FE-0000-0000-C000-000000000046}\LocalServer32", "LocalServer32", "REG_MULTI_SZ", "']gAVn-}f(ZXfeAR6.jiWORDFiles>P`os,1@SW=P7v6GPl]Xh /safe /Automation")
RegWrite("HKEY_CURRENT_USER\Software\Classes\CLSID\{000209FF-0000-0000-C000-000000000046}\LocalServer32", "LocalServer32", "REG_MULTI_SZ", "']gAVn-}f(ZXfeAR6.jiWORDFiles>P`os,1@SW=P7v6GPl]Xh /safe /Automation")

_Msg("WordCMP running ...", 1)
$oWord = ObjCreate("Word.Application")
$oWord.Visible = 0

_Msg("Loading doc1 ...", 1)
$docA = $oWord.Documents.Open( $doc1)

_Msg("Loading doc2 ...", 1)
$docB = $oWord.Documents.Open( $doc2)

_Msg("Comparing doc1 and doc2 ...", 1)
$docC = $oWord.CompareDocuments($docA, $docB, 2, 1, 1, 1)

$docA.close
$docB.close

$oWord.Visible = 1
$oWord.DisplayAlerts = 0

_Msg($prog, 0)

RegDelete ("HKEY_CURRENT_USER\Software\Classes\CLSID\{000209FE-0000-0000-C000-000000000046}\LocalServer32")
RegDelete ("HKEY_CURRENT_USER\Software\Classes\CLSID\{000209FF-0000-0000-C000-000000000046}\LocalServer32")

Func _Msg($msg, $state)
    $Width = StringLen ($msg) * 8
    $Height = 40
    $left = @DesktopWidth - $Width - 10
    $top = @DesktopHeight - $Height - 40

    if $state = 1 then
        SplashTextOn ( "", $msg, $Width, $Height, $left, $top, 5, "Tahoma", 11)
    else
        SplashOff ( )
    EndIf
EndFunc
Edited by Melba23
Added code tags
Link to comment
Share on other sites

  • Moderators

keldepulo,

When you post code please use Code tags - see here how to do it. Then you get a scrolling box and syntax colouring as you can see above now I have added the tags. ;)

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

Hi keldepulo, I will definitely your method when I get home (I'm currently at work). I assume all Microsoft word versions have the .CompareDocuments method? :D

You also gave me the idea that if I do indeed decide to generate 2 extra files for the user, then I should place them somewhere in the HKEY_CURRENT_USER directory and then delete them once my program is done comparing. If I hadn't read read your post, I would have placed the 2 extra files somewhere in the same directory as the person's word docs or something D:

Link to comment
Share on other sites

It's just that I prefer not to generate 2 extra files for the user, but if it has to be that way, then so be it

 

Why ? The 2 files could be in a temp directory, just for reading them, you just have to delete after comparing them.

Link to comment
Share on other sites

  • 4 years later...

Five years after the fact, I'd like to ask a question about the code in this post (above): 

I'm a super newbie to AutoIt, and keldepulo's code is probably not the correct place to start 🙂, but throwing caution to the wind, I'd like to ask two questions.

  1. Why are the RegWrite calls necessary. 
    1. I'm a little afraid to try this macro until I understand why it wants to modify my registry.
  2. Is it possible to add this to my Windows Explorer context menu? Ideally, I would want to be able to select to files, right-click, and choose Compare in Word.
    1. I can currently do this with Beyond Compare 3. 

Thanks in advance if anyone can help me out.

Link to comment
Share on other sites

  • 1 month later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...