Jump to content

Transcode single column date to multiple columns


Go to solution Solved by Malkey,

Recommended Posts

I have an interesting project, where I need to convert a text file. Essentially I need to take 1 column and turn it into multiple columns. The purpose is to convert a single col. text file database into a "real" database using 1 row per record.

Example:

1
2
3
4
5
6

becomes

(using 3 *n format)
1,2,3
4,5,6
or
(using 4*n format)
1,2,3,4
5,6,,

the no. of columns is variable. Data is always left-right, top-bottom. Missing data is empty as per last example.

Excel can do this via formulas, but if of course limited to 65535 rows (XLS format)

I have the basic idea of how to do, but I stumble into a problem when the no. of rows is not equal to cols*rows (second example 4*2 = 8 which is > 6.)

My idea something like

column =1
for each line
   while column =< max_cols
      print line
      print separator
      column = column +1
   wend
   column= 1
   print linefeed
next line

I have also considered putting all lines into a large array (1 * n), but not sure about if I meed some boundaries here ? And what improvements this would give me.

I would prefer using AutoIT, since I need to add a small GUI on top. That part is simple.

Thankful for any input. Complete example will be posted here once done.

I am just a hobby programmer, and nothing great to publish right now.

Link to comment
Share on other sites

  • Solution

Here are two ways.

Local $sStr = "12" & @CRLF & "34" & @CRLF & "56" & @CRLF & "7" & @CRLF & "89" & @CRLF & "10" & _
        @CRLF & "11" & @CRLF & "13" & @CRLF & "14" & @CRLF & "15" & @CRLF & "16" & @CRLF & "99" & @CRLF
Local $iCols = 5

ConsoleWrite(_StringDisplayRE($sStr, $iCols) & @LF)
ConsoleWrite(_StringColumnDisplay($sStr, $iCols) & @LF)

MsgBox(0, "RE Display in " & $iCols & " columns.", $sStr & @LF & _StringDisplayRE($sStr, $iCols))
MsgBox(0, "Text Display in " & $iCols & " columns.", $sStr & @LF & _StringColumnDisplay($sStr, $iCols))


Func _StringDisplayRE($sStr, $iNumOfCol = 1)
    If $iNumOfCol < 1 Then $iNumOfCol = 1
    Local $a = StringRegExpReplace($sStr, "(.*[^v])(?:v+)", "${1},")
    Local $iNum = @extended
    Local $extra = Mod($iNumOfCol - Mod($iNum, $iNumOfCol), $iNumOfCol)
    If $extra <> 0 Then $a &= StringRegExpReplace(StringFormat("%" & $extra & "s", " "), ".", ",")
    Return StringTrimRight(StringRegExpReplace($a, "((?:[^,]*,){" & ($iNumOfCol - 1) & "}(?:[^,]*))+?,", "$1$2" & @LF), 1)
EndFunc   ;==>_StringDisplayRE

Func _StringColumnDisplay($sStr, $iNumOfCol = 1)
    If $iNumOfCol < 1 Then $iNumOfCol = 1
    Local $sRet
    Local $aLines = StringRegExp($sStr, "(?:([^v]+)(?:v+|$))", 3)
    Local $iNumLines = UBound($aLines)
    Local $extra = Mod($iNumOfCol - Mod($iNumLines, $iNumOfCol), $iNumOfCol)
    For $i = 1 To $iNumLines + $extra
        If $i < $iNumLines + 1 Then
            If Mod($i, $iNumOfCol) Then
                $sRet &= $aLines[$i - 1] & ","
            Else
                $sRet &= $aLines[$i - 1]
                If $i < $iNumLines + $extra Then $sRet &= @LF
            EndIf
        Else
            If $i < $iNumLines + $extra Then $sRet &= ","
        EndIf
    Next
    Return $sRet
EndFunc   ;==>_StringColumnDisplay
Link to comment
Share on other sites

Something like this?

Global $aArrayIn[10] = [1,2,3,4,5,6,7,8,9,10]
Global $iColumns = 4
Global $sOutRec = ""
For $i = 0 To UBound($aArrayIn) -1
    If Mod($i+1, $iColumns) = 0 Then
        $sOutRec &= $aArrayIn[$i] & ","
        ConsoleWrite(StringLeft($sOutrec, StringLen($sOutRec)-1) & @CRLF)
        $sOutRec = ""
    Else
        $sOutRec &= $aArrayIn[$i] & ","
    Endif
Next
If $sOutRec <> "" Then ConsoleWrite(StringLeft($sOutrec, StringLen($sOutRec)-1) & @CRLF)

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

If your aim is a real DB, then you may have to reconsider grouping entries. It all depends on the semantics of your data and the nature or structure of the queries you're going to run against it.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

  • 4 weeks later...

Malkey, thanks for these methods. Man you saved me countless hours of programming this algorithm. Thanks!

For some reason, your second method fails when colno. = 3, returning only 1 char of last line.

I modified the first method a bit to have userselectable separator.

Tested so far on 3 mio. lines, converted to 120.000 x 25, in 23 secs. (slow-mo computer). Not sure if file-based attempt compared to slurping files into memory would improve performance.

Real aim is for larger datafiles, but for now this is OK.

Func _StringDisplayRE2($sStr, $iNumOfCol = 1, $iSep = ";")
If $iNumOfCol < 1 Then $iNumOfCol = 1
Local $a = StringRegExpReplace($sStr, "(.*[^v])(?:v+)", _
   "${1}" & $iSep)
Local $iNum = @extended
Local $extra = Mod($iNumOfCol - Mod($iNum, $iNumOfCol), $iNumOfCol)
If $extra <> 0 Then $a &= StringRegExpReplace( _
   StringFormat("%" & $extra & "s", _
   " "), _
   ".", _
   $iSep)
Return StringTrimRight( _
   StringRegExpReplace($a, _
   "((?:[^" & $iSep & "]*" & $iSep & "){" & ($iNumOfCol - 1) & "}(?:[^" & $iSep & "]*))+?" & $iSep, _
   "$1$2" & @LF), _
   1)
EndFunc   ;==>_StringDisplayRE2

I am just a hobby programmer, and nothing great to publish right now.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...