KickStarter15 Posted April 27, 2019 Posted April 27, 2019 (edited) Hi Experts, Hope everyone is having a good day today!😊 I have this new task that involved XML creation based on data given. I've been searching how to create XML out of pure data text but until now still wondering if there's a thread on that since could not find one. Maybe I missed something in my searching.😅 For now, I posted this without any sample code yet coz I'm still looking for a head start and also, hope you can provide me any thread or suggestion on where should I start. Here, below is the data text that I should convert into an XML tag. John J., Gracy D., Jame R., et al., (2019) This is a sample sentence here. Then another here. 5(2);101-109. doi:1001.10110/aj21.j1j.10. Here's the XML looks like: <File xml:id="name-of-filename"> <Citation type="letter" xml:id="name-of-filename"><Person><familyName>John</familyName> <givenName>J.</givenName></Person>, <Person><familyName>Gracy</familyName>, <givenName>D.</givenName></Person>, et al. (<Year year="2019">2019</pubYear>). <Title>This is a sample Title sentence here</Title>. <SubTitle>Then another here</SubTitle>, <vol>5</vol>(<issue>2</issue>); <FisrstPage>101</FirstPage>–<SecondPage>109</SecondPage>. <url href="https://doi.org/1001.10110/aj21.j1j.10.">doi:1001.10110/aj21.j1j.10.</url></citation> </File> If you have any suggestions and if you can refer me to any thread that would be a big help Experts. Thank you in advance😁 KS15 Edited May 6, 2019 by KickStarter15 Programming is "To make it so simple that there are obviously no deficiencies" or "To make it so complicated that there are no obvious deficiencies" by C.A.R. Hoare.
orbs Posted April 27, 2019 Posted April 27, 2019 obviously, he first task at hand is to parse the text into fields - i.e. split it into an array where each element contains a single data item. that i leave to you. once you have the individual data items, i advise you compose the XML yourself. do not use any existing UDF for that, it is an overkill for such a simple task. you know the fields names, data, order and hierarchy; just write it down. Signature - my forum contributions: Spoiler UDF: LFN - support for long file names (over 260 characters) InputImpose - impose valid characters in an input control TimeConvert - convert UTC to/from local time and/or reformat the string representation AMF - accept multiple files from Windows Explorer context menu DateDuration - literal description of the difference between given dates Apps: Touch - set the "modified" timestamp of a file to current time Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes SPDiff - Single-Pane Text Diff
jdelaney Posted April 28, 2019 Posted April 28, 2019 Forum search microsoft.xmldom. createelement, appendchild, attributeset. IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
KickStarter15 Posted April 29, 2019 Author Posted April 29, 2019 (edited) @orbs, Thanks, I tried doing the stringsplit but could not do it correctly. Can you guide me of what you mean? maybe I'm just to upset right now knowing that I still need to learn the XML creation. 😓 I don't have any idea for now on what should I do. First, I tried this way: FileWrite() - but I need to input all names in FileWrite() function just to generate the XML I want. Second, tried using StringSplit - but I'm stuck in the For loop and still not what I expected. Here: #include <MsgBoxConstants.au3> #include <StringConstants.au3> $data_file = @ScriptDir & "\data.txt" $File = FileRead($data_file) $FilePath = @ScriptDir & "\Test.xml" XMLFile() Func XMLFile() Local $sText = $File Local $aArray = StringSplit($sText, ' ', $STR_ENTIRESPLIT) For $i = 1 To $aArray[0] ; Loop through the array returned by StringSplit to display the individual values. MsgBox($MB_SYSTEMMODAL, "", "$aArray[" & $i & "] - " & $aArray[$i]) $Text = $aArray[$i] Next FileWrite($FilePath,'<File xml:id="name-of-filename">'&@CRLF& _ '<Citation type="letter" xml:id="name-of-filename">' &@CRLF& _ '<Person><familyName>'&$Text&'</familyName> <givenName>'&$Text&'</givenName></Person>, '&@CRLF& _ '</citation>'&@CRLF& _ '</File>') EndFunc Yah, it's funny but true😂. From the help file and trying to compose one code that can put me to head start. I really need to learn this XML UDFs for me to avoid asking so much help.😅 @jdelaney, Yup I did searched that in forum and google but it's all about the existing XML that need to append or create new element. I need to create an XML file from the data I posted above. Maybe I'm not in the right link as you suggested but can you point me to existing thread relating to my inquiry? much appreciated jdelaney. Thanks😁 Edited April 29, 2019 by KickStarter15 Programming is "To make it so simple that there are obviously no deficiencies" or "To make it so complicated that there are no obvious deficiencies" by C.A.R. Hoare.
FrancescoDiMuro Posted April 29, 2019 Posted April 29, 2019 @KickStarter15 Just extract all the values from XML with SRE and then concat them, filtering them as you want #include <StringConstants.au3> Global $strFileContent = '<File xml:id="name-of-filename">' & _ '<Citation type="letter" xml:id="name-of-filename"><Person><familyName>John</familyName><givenName>J.</givenName></Person>,' & _ '<Person><familyName>Gracy</familyName>, <givenName>D.</givenName></Person>, et al. (<Year year="2019">2019</pubYear>).' & _ '<Title>This is a sample Title sentence here</Title>. <SubTitle>Then another here</SubTitle>, <vol>5</vol>(<issue>2</issue>);' & _ '<FisrstPage>101</FirstPage>–<SecondPage>109</SecondPage>.' & _ '<url href="https://doi.org/1001.10110/aj21.j1j.10.">doi:1001.10110/aj21.j1j.10.</url></citation>' & _ '</File>', _ $arrResult, _ $strResult $arrResult = StringRegExp($strFileContent, '>([^<]+)<', $STR_REGEXPARRAYGLOBALMATCH) For $i = 0 To UBound($arrResult) - 1 Step 1 If $arrResult[$i] = "–" Then $strResult &= "-" Else $strResult &= StringReplace($arrResult[$i], ';', '.') EndIf Next ConsoleWrite($strResult & @CRLF) KickStarter15 1 Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
KickStarter15 Posted April 30, 2019 Author Posted April 30, 2019 @FrancescoDiMuro, I think you understand it reversely😊 What I need is this: From the data.txt: "John J., Gracy D., Jame R., et al., (2019) This is a sample sentence here. Then another here. 5(2);101-109. doi:1001.10110/aj21.j1j.10.". It should be captured in XML file. <File xml:id="name-of-filename"> <Citation type="letter" xml:id="name-of-filename"><Person><familyName>John</familyName> <givenName>J.</givenName></Person>, <Person><familyName>Gracy</familyName>, <givenName>D.</givenName></Person>, et al. (<Year year="2019">2019</pubYear>). <Title>This is a sample Title sentence here</Title>. <SubTitle>Then another here</SubTitle>, <vol>5</vol>(<issue>2</issue>); <FisrstPage>101</FirstPage>–<SecondPage>109</SecondPage>. <url href="https://doi.org/1001.10110/aj21.j1j.10.">doi:1001.10110/aj21.j1j.10.</url></citation> </File> or maybe I miss understand your suggestion😅. Programming is "To make it so simple that there are obviously no deficiencies" or "To make it so complicated that there are no obvious deficiencies" by C.A.R. Hoare.
mikell Posted April 30, 2019 Posted April 30, 2019 orbs is definitely right. First parse your string using the way you want to get a cute array, example : #Include <Array.au3> $str = "John J., Gracy D., Jame R., et al., (2019) This is a sample sentence here. Then another here. 5(2);101-109. doi:1001.10110/aj21.j1j.10." $res = StringRegExp($str, '(?x) (?| ([[:alpha:]]+)\h(?=[A-Z]) | ([^,]+),\h | \((\d+)\)\h | ([A-Z][^.]+)\.\h | (\d+) | (\S+)$ ) ', 3) _ArrayDisplay($res) Then loop through the array and build the xml content string using conditions on each array element Such a xml is a custom thingy so there is no 'generic' way to do the job Good luck :)
KickStarter15 Posted May 2, 2019 Author Posted May 2, 2019 @mikell, Thanks, that's problem now. How can I loop thru the array and assign each array to a specific element for XML creation.☹️ However, can you help me with getting the string to get the array (which is the code you gave) and then create a unique delimiter of each array then create the XML based on the new delimiters added. Would that be possible? Programming is "To make it so simple that there are obviously no deficiencies" or "To make it so complicated that there are no obvious deficiencies" by C.A.R. Hoare.
orbs Posted May 2, 2019 Posted May 2, 2019 (edited) you need to review carefully that fabulous regex magic by @mikell, to properly identify the data fields you receive. not all titles would have exactly two authors, right? and not all authors would have exactly two components of the name, right? same goes for the other data items. i advise you apply that regex on multiple different sample data strings - as many as you can find - to confirm its usefulness in all scenarios. b.t.w. i notice the data string contains three names, yet your desired XML contains only two - "Jame R." is not included in the XML. is this intended? once you have a verified array of identified data items, composing them into XML is easy. but let's fry one fish at a time, ok? Edited May 2, 2019 by orbs Signature - my forum contributions: Spoiler UDF: LFN - support for long file names (over 260 characters) InputImpose - impose valid characters in an input control TimeConvert - convert UTC to/from local time and/or reformat the string representation AMF - accept multiple files from Windows Explorer context menu DateDuration - literal description of the difference between given dates Apps: Touch - set the "modified" timestamp of a file to current time Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes SPDiff - Single-Pane Text Diff
mikell Posted May 2, 2019 Posted May 2, 2019 Obviously, if the original string is formatted differently the regex may fail. Such an expression should be tested against many entry strings to check its reliability and to change it if needed. This first step is mandatory Anyway to build such an xml the hard work can't be avoided to create and populate the fields, whatever the way to be used - xml way as jdelaney said or string way as below #Include <Array.au3> $str = "John J., Gracy D., Jame R., et al., (2019) This is a sample sentence here. Then another here. 5(2);101-109. doi:1001.10110/aj21.j1j.10." ;$res = StringRegExp($str, '(?x) (?| ([^,]+),\h | \((\d+)\)\h | ([A-Z][^.]+)\.\h | (\d+) | (\S+)$ ) ', 3) $res = StringRegExp($str, '(?x) (?| ([[:alpha:]]+)\h(?=[A-Z]) | ([^,]+),\h | \((\d+)\)\h | ([A-Z][^.]+\.)\h | (\d+) | (\S+)$ ) ', 3) _ArrayDisplay($res) Local $i, $s = '<File xml:id="name-of-filename">' & @crlf & _ '<Citation type="letter" xml:id="name-of-filename">'& @crlf While StringRegExp($res[$i], '^[A-Z]') $s &= '<Person>' & @crlf & '<familyName>' & $res[$i] & '</familyName>' & _ '<givenName>' & $res[$i+1] & '</givenName>' & @crlf & '</Person>' & @crlf $i += 2 Wend $s &= $res[$i] & @crlf $i += 1 $s &= '<pubYear year="' & $res[$i] & '">' & $res[$i] & '</pubYear>' & @crlf $i += 1 ; and so on $s &= '</citation>' & @crlf Msgbox(0,"", $s) KickStarter15 1
KickStarter15 Posted May 2, 2019 Author Posted May 2, 2019 @mikell, Thanks, I see now what you mean. Sorry, I don't have this RegExp background yet and still learning on that part🤤. Also, trying the code you have, it is already displaying and creating the XML output that I need. However, if there are more that three person names, example Person 1, Person 2, Person 3, Person 4 and soon.... the code will stop and flagged as exceeds the range required. Please can you advise me where on the code part that I can adjust the range?😅 Programming is "To make it so simple that there are obviously no deficiencies" or "To make it so complicated that there are no obvious deficiencies" by C.A.R. Hoare.
mikell Posted May 2, 2019 Posted May 2, 2019 As long as the formatting is unchanged, the number of persons doesn't matter (see below). If a trouble occurs this means that something went different in the formatting - as orbs warned about this possibility You might post some string examples ;this works with my previous snippet $str = "John J., Gracy D., Jame R., Starter K., Orb S., Mikell C., Delaney J., Melba S., SoOn And, et al., (2019) This is a sample sentence here. Then another here. 5(2);101-109. doi:1001.10110/aj21.j1j.10." BTW there are other ways to parse the string, but if the format of the string is not constant then none of them will work flawless
KickStarter15 Posted May 3, 2019 Author Posted May 3, 2019 (edited) @mikell, Well now all make sense to me. Thanks, it worked perfectly however there are some format that are not using "et al.," after the last person, is there any else if... to this?😅 Or should I do a different RegExp on this. Let's say I have three different pattern so each pattern will have their own RegExp()? Please advise, I tried reading the below explanation from RegExp site but I only understand the few, well need to learn this now. Edited May 3, 2019 by KickStarter15 Programming is "To make it so simple that there are obviously no deficiencies" or "To make it so complicated that there are no obvious deficiencies" by C.A.R. Hoare.
KickStarter15 Posted May 3, 2019 Author Posted May 3, 2019 @mikell, These are the other sample string that I'm worried about and I already tried changing the regexp but could not do it correctly.😥 ;with "et al.," John J., Gracy D., Jame R., John J., Gracy D., Jame R., John J., Gracy D., Jame R., et al., (2019) This is a sample sentence here. Then another here. 5(2);101-109. doi:1001.10110/aj21.j1j.10. ;without "et al.," John J., Gracy D., Jame R., John J., Gracy D., Jame R., John J., Gracy D., Jame R., (2019) This is a sample sentence here. Then another here. 5(2);101-109. doi:1001.10110/aj21.j1j.10. ;without issue number "(2)" John J., Gracy D., Jame R., John J., Gracy D., Jame R., John J., Gracy D., Jame R., (2019) This is a sample sentence here. Then another here. 5;101-109. doi:1001.10110/aj21.j1j.10. ;without "doi:" and issue "(2)" numbers John J., Gracy D., Jame R., John J., Gracy D., Jame R., John J., Gracy D., Jame R., (2019) This is a sample sentence here. Then another here. 5;101-109. Programming is "To make it so simple that there are obviously no deficiencies" or "To make it so complicated that there are no obvious deficiencies" by C.A.R. Hoare.
mikell Posted May 3, 2019 Posted May 3, 2019 (edited) The regex can be slighly changed, but If some elements are likely to miss then this can be managed using conditions when building the xml - as I mentioned in my first post expandcollapse popup#Include <Array.au3> Local $astr[4] = ["John J., Gracy D., Jame R., et al., (2019) This is a sample sentence here. Then another here. 5(2);101-109. doi:1001.10110/aj21.j1j.10.", "John J., Gracy D., Jame R., John J., Gracy D., Jame R., (2019) This is a sample sentence here. Then another here. 5(2);101-109. doi:1001.10110/aj21.j1j.10.", "Jame R., John J., Gracy D., Jame R., (2019) This is a sample sentence here. Then another here. 5;101-109. doi:1001.10110/aj21.j1j.10.", "John J., Gracy D., Jame R., (2019) This is a sample sentence here. Then another here. 5;101-109."] $n = 0 For $str In $astr $res = StringRegExp($str, '(?x) (?| ' & _ '([[:alpha:]]+)\h([A-Z].) ' & _ ; names ' | ([[a-z\h.]+),\h ' & _ ; et al ' | \((\d+)\)\h ' & _ ; year ' | ([A-Z][^.]+\.)\h ' & _ ; title, subtitle ' | (\d+[\(\);.-]) ' & _ ; vol, issue, pages ' | (\S+)$ ) ' , 3) ; the rest $n += 1 _ArrayDisplay($res, $n) Local $i = 0, $s = '<File xml:id="name-of-filename">' & @crlf & _ '<Citation type="letter" xml:id="name-of-filename">'& @crlf While StringRegExp($res[$i], '^[A-Z]') $s &= '<Person>' & '<familyName>' & $res[$i] & '</familyName>' & _ '<givenName>' & $res[$i+1] & '</givenName>' & '</Person>' & @crlf $i += 2 Wend If StringRegExp($res[$i], '^[a-z\h.]+$') Then $s &= $res[$i] & @crlf $i += 1 EndIf $s &= '<pubYear year="' & $res[$i] & '">' & $res[$i] & '</pubYear>' & @crlf $i += 1 $s &= '<Title>' & $res[$i] & '<\Title>' & @crlf $i += 1 $s &= '<SubTitle>' & $res[$i] & '<\SubTitle>' & @crlf $i += 1 If StringRegExp($res[$i], '\d+[\(;]$') Then $s &= '<vol>' & StringTrimRight($res[$i], 1) & '<\vol>' & @crlf $i += 1 EndIf If StringRegExp($res[$i], '\d+\)$') Then $s &= '<issue>' & StringTrimRight($res[$i], 1) & '<\issue>' & @crlf $i += 1 EndIf ; and so on $s &= '</citation>' & @crlf Msgbox(0,$n, $s) Next Edited May 3, 2019 by mikell KickStarter15 1
KickStarter15 Posted May 4, 2019 Author Posted May 4, 2019 (edited) @mikell, I've got this error below after checking the string without url. And also, the code will have the same error as above if the first page and last page were changed. And another is this. It will include the hyphen in first page and period in lastpage which should not be included. Tried doing some changes but could not succeed😅. Please can you advise? ' | (\d+[\(\);.-]) ' & _ ; vol, issue, pages Edited May 4, 2019 by KickStarter15 Programming is "To make it so simple that there are obviously no deficiencies" or "To make it so complicated that there are no obvious deficiencies" by C.A.R. Hoare.
orbs Posted May 4, 2019 Posted May 4, 2019 (edited) @KickStarter15, you must be much less fuzzy in describing your conditions. if you cannot properly define the data items components and delimiters, how do you expect the computer could? speaking for myself, i'm no regex expert - not a regex novice even - so i know i cannot maintain such an elaborate code, i would walk the direct path of string manipulation, but first i would properly define the input string structure. read the following code carefully - especially the comments - it is a bit long, but very simple to understand, troubleshoot and maintain. expandcollapse popup; ref: https://www.autoitscript.com/forum/topic/198739-craeting-xml-file-based-on-data-text/ Global $aSample[4] ;with "et al.," $aSample[0] = 'John J., Gracy D., Jame R., John J., Gracy D., Jame R., John J., Gracy D., Jame R., et al., (2019) This is a sample sentence here. Then another here. 5(2);101-109. doi:1001.10110/aj21.j1j.10.' ;without "et al.," $aSample[1] = 'John J., Gracy D., Jame R., John J., Gracy D., Jame R., John J., Gracy D., Jame R., (2019) This is a sample sentence here. Then another here. 5(2);101-109. doi:1001.10110/aj21.j1j.10.' ;without issue number "(2)" $aSample[2] = 'John J., Gracy D., Jame R., John J., Gracy D., Jame R., John J., Gracy D., Jame R., (2019) This is a sample sentence here. Then another here. 5;101-109. doi:1001.10110/aj21.j1j.10.' ;without "doi:" and issue "(2)" numbers $aSample[3] = 'John J., Gracy D., Jame R., John J., Gracy D., Jame R., John J., Gracy D., Jame R., (2019) This is a sample sentence here. Then another here. 5;101-109.' For $i = 0 To UBound($aSample) - 1 _StringToXML($aSample[$i]) Next Func _StringToXML($sString) ConsoleWrite(@CRLF) ConsoleWrite('-source string:' & @CRLF) ConsoleWrite($sString & @CRLF) ; declare variables for data items Local $s_authors, $s_year, $s_title, $s_subtitle, $s_vol, $s_issue, $s_firstpage, $s_lastpage, $s_doi ; declare temporary variables Local $iPos, $sSubstring, $aSubString, $aSubStringPartial ; if 'doi:' exists then it must be after the last space $iPos = StringInStr($sString, ' ', Default, -1) If $iPos = 0 Then Return SetError(1, 0, '') ; no whitespace -> something went horribly wrong! ; get the part of the string after the last space $sSubstring = StringRight($sString, StringLen($sString) - $iPos) ; check if it is doi. if not, then $s_doi simply remains empty If StringLeft($sSubstring, 4) = 'doi:' Then ; store the value for later use $s_doi = StringTrimLeft($sSubstring, 4) ; trim the entire substring from the string, also trim the last whitspace $sString = StringTrimRight($sString, StringLen($sSubstring) + 1) EndIf ; now the input string does not contain the doi part, wether existed or not ; the vol/issue/pages part must be after the last space $iPos = StringInStr($sString, ' ', Default, -1) If $iPos = 0 Then Return SetError(1, 0, '') ; no whitespace -> something went horribly wrong! ; get the part of the string after the last space $sSubstring = StringRight($sString, StringLen($sString) - $iPos) ; remove dot from the end If StringRight($sSubstring, 1) = '.' Then $sSubstring = StringTrimRight($sSubstring, 1) Else Return SetError(1, 0, '') ; no dot -> something went horribly wrong! EndIf ; split the substring to two parts by semicilon $aSubString = StringSplit($sSubstring, ';') If $aSubString[0] <> 2 Then Return SetError(1, 0, '') ; not two parts -> something went horribly wrong! ; handle the pages part $aSubStringPartial = StringSplit($aSubString[2], '-') If $aSubStringPartial[0] <> 2 Then Return SetError(1, 0, '') ; not two page numbers -> something went horribly wrong! ; check if page parts are numbers If StringIsDigit($aSubStringPartial[1]) And StringIsDigit($aSubStringPartial[2]) Then ; store the value for later use $s_firstpage = $aSubStringPartial[1] $s_lastpage = $aSubStringPartial[2] ; trim the entire substring from the string, also trim the last whitspace and dot $sString = StringTrimRight($sString, StringLen($sSubstring) + 2) Else Return SetError(1, 0, '') ; not numbers -> something went horribly wrong! EndIf ; handle the vol/issue part $aSubStringPartial = StringSplit($aSubString[1], '(') Switch $aSubStringPartial[0] Case 2 ; two parts - vol and issue exist If StringRight($aSubStringPartial[2], 1) = ')' Then $s_issue = StringTrimRight($aSubStringPartial[2], 1) If Not StringIsDigit($s_issue) Then Return SetError(1, 0, '') ; issue is not a number -> something went horribly wrong! $s_vol = $aSubStringPartial[1] If Not StringIsDigit($s_vol) Then Return SetError(1, 0, '') ; vol is not a number -> something went horribly wrong! Else Return SetError(1, 0, '') ; issue number does not end with ')' -> something went horribly wrong! EndIf Case 1 ; one part - only vol exist $s_vol = $aSubStringPartial[1] If Not StringIsDigit($s_vol) Then Return SetError(1, 0, '') ; vol is not a number -> something went horribly wrong! Case Else Return SetError(1, 0, '') ; not exactly two parts -> something went horribly wrong! EndSwitch ; now the input string does not contain the doi part and the vol/issue/pages part ConsoleWrite('-elements: ' & @CRLF) ConsoleWrite('vol: ' & $s_vol & @CRLF) ConsoleWrite('issue: ' & $s_issue & @CRLF) ConsoleWrite('firstpage: ' & $s_firstpage & @CRLF) ConsoleWrite('lastpage: ' & $s_lastpage & @CRLF) ConsoleWrite('doi: ' & $s_doi & @CRLF) ConsoleWrite('-remaining string: ' & @CRLF) ConsoleWrite($sString & @CRLF) EndFunc ;==>_StringToXML oh, and when you post "code" that is not code - please, please, please select "Plain" from the dropdown list at the botton-right corner. Edited May 4, 2019 by orbs Signature - my forum contributions: Spoiler UDF: LFN - support for long file names (over 260 characters) InputImpose - impose valid characters in an input control TimeConvert - convert UTC to/from local time and/or reformat the string representation AMF - accept multiple files from Windows Explorer context menu DateDuration - literal description of the difference between given dates Apps: Touch - set the "modified" timestamp of a file to current time Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes SPDiff - Single-Pane Text Diff
mikell Posted May 4, 2019 Posted May 4, 2019 To prevent the array error the index must be checked in the conditions. The hyphen (and other chars) is intentionally left in the array elements to allow an easy later check when building the xml These checks could be done using a bunch of String* funcs nested or not, but the regex way remains the easier way to check for instance if a string "contains only digit(s) and a trailing hyphen" As I said before there are many ways to parse the source string. I used one big regular expression but it could be done using several smaller ones as well, or "classic" String* funcs. But whatever the method is the purpose is the same : you have to get a list of "checkable" elements to build a consistent xml Here is my last try expandcollapse popup#Include <Array.au3> Local $astr[7] = ["John J., Gracy D., Jame R., et al., (2019) This is a sample sentence here. Then another here. 5(2);101-109. doi:1001.10110/aj21.j1j.10.", _ "John J., Gracy D., (2019) This is a sample sentence here.", _ "John J., Gracy D., (2019). doi:1001.10110/aj21.j1j.10.", _ "John J., Gracy D., Jame R., John J., Gracy D., Jame R., (2019) This is a sample sentence here. Then another here. 5(2);101-109. doi:1001.10110/aj21.j1j.10.", _ "John J., Gracy D., (2019) This is a sample sentence here. 5(2);101-109. doi:1001.10110/aj21.j1j.10.", _ "Jame R., John J., Gracy D., Jame R., (2019) This is a sample sentence here. Then another here. 5;246-251. doi:1001.10110/aj21.j1j.10.", _ "John J., Gracy D., Jame R., (2019) This is a sample sentence here. Then another here. 5;101-109."] $n = 0 For $str In $astr $res = StringRegExp($str, '(?x) (?| ' & _ '([[:alpha:]]+)\h([A-Z]\.) ' & _ ; names ' | ([[a-z\h.]+),\h? ' & _ ; et al ' | \((\d+)\)\.?\h? ' & _ ; year ' | ([A-Z][^.]+\.)\h? ' & _ ; title, subtitle ' | (\d+[\(\);.-]) ' & _ ; vol, issue, pages ' | (\w+:\S+)$ ) ' , 3) ; url $n += 1 _ArrayDisplay($res, $n) Local $i = 0, $s = '<File xml:id="name-of-filename">' & @crlf & _ '<Citation type="letter" xml:id="name-of-filename">'& @crlf While StringRegExp($res[$i], '^[A-Z]') $s &= '<Person>' & '<familyName>' & $res[$i] & '</familyName>' & _ '<givenName>' & $res[$i+1] & '</givenName>' & '</Person>' & @crlf $i += 2 Wend If StringRegExp($res[$i], '^[a-z\h.]+$') Then $s &= $res[$i] & @crlf $i += 1 EndIf If $i < UBound($res) AND StringIsDigit($res[$i]) Then $s &= '<pubYear year="' & $res[$i] & '">' & $res[$i] & '</pubYear>' & @crlf $i += 1 EndIf If $i < UBound($res) AND StringIsUpper(StringLeft($res[$i], 1)) Then $s &= '<Title>' & $res[$i] & '<\Title>' & @crlf $i += 1 EndIf If $i < UBound($res) AND StringIsUpper(StringLeft($res[$i], 1)) Then $s &= '<SubTitle>' & $res[$i] & '<\SubTitle>' & @crlf $i += 1 EndIf If $i < UBound($res) AND StringRegExp($res[$i], '^\d+[\(;]$') Then $s &= '<vol>' & StringTrimRight($res[$i], 1) & '<\vol>' & @crlf $i += 1 EndIf If $i < UBound($res) AND StringRegExp($res[$i], '^\d+\)$') Then $s &= '<issue>' & StringTrimRight($res[$i], 1) & '<\issue>' & @crlf $i += 1 EndIf If $i < UBound($res) AND StringRegExp($res[$i], '^\d+-$') Then $s &= '<FirstPage>' & StringTrimRight($res[$i], 1) & '<\FirstPage>' & @crlf $i += 1 EndIf If $i < UBound($res) AND StringRegExp($res[$i], '^\d+\.$') Then $s &= '<SecondPage>' & StringTrimRight($res[$i], 1) & '<\SecondPage>' & @crlf $i += 1 EndIf If $i < UBound($res) AND StringLeft($res[$i], 4) = "doi:" Then $s &= '<Url href="https://' & $res[$i] & '">' & StringTrimRight($res[$i], 1) & '<\Url>' & @crlf EndIf $s &= '</citation>' & @crlf Msgbox(0, $n, $s) Next KickStarter15 1
KickStarter15 Posted May 6, 2019 Author Posted May 6, 2019 @mikell, Thank you so much. This is perfect and all the conditions were carried out correctly. I only have one more question. Question: I changed the below expression to cater two initial names given and that is by adding "+" after the character range A-Z. My question is, when there's an initial names like "i.e., S-J., S -J., ..." where hyphen was used, how can I add this from the below expression? I tried checking my guide found here and do some attempts, but still could not get the correct expression i need. I tried using this "[^.]+" but it will affect other format. $res = StringRegExp($str, '(?x) (?| ' & _ '([[:alpha:]]+)\h([A-Z][^.]+\.) ' & _ ; names Honestly, I really appreciated your time and attention in providing me the solution I need and I know how hard would that be that someone is depending on you, but please don't leave me now.😥 There are lot's of confusions in me right now that only you enlighten me in the right path. Hope this is not you last help, Mikell. Thank you so much!☺️ It's really hard to learn this Regular Expression thing, but I'll do my best to learn this type of coding for the future concerns.😅 Programming is "To make it so simple that there are obviously no deficiencies" or "To make it so complicated that there are no obvious deficiencies" by C.A.R. Hoare.
FrancescoDiMuro Posted May 6, 2019 Posted May 6, 2019 (edited) 11 minutes ago, KickStarter15 said: ([A-Z][^.]+\.) This part of the pattern already captures everything that is a capital letter, immediately followed by everything that is not a dot (from 1 to N characters, possessive), immediately followed by a dot. Maybe post a runnable example where this pattern is not doing what you are describing Edited May 6, 2019 by FrancescoDiMuro Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now