Home   Archive   Permalink



Working with CSV files

Hello,
    
This application is probably trivial for most programmers, but I need your help...
    
I want to change the format in very big (1-2 GB) CSV files:
    
Input File Format:
    
dd/MM/yy hh:mm:ss,[Bid Price],[Ask Price]
    
Example:
    
29/11/06 00:04:31,1383.00,1387.00
29/11/06 00:04:33,1383.05,1387.06
    
    
Desired Ouput File Format:
    
yyyyMMdd hhmmss;[Price];[Volume]
    
Example:
    
20061129 000431;1383.00;25
20061129 000433;1383.05;36
    
User should select:
    
Century (XX/XXI) used to fill the two first digits in date field (19yy/20yy)
Price (bid or ask) used to fill the price field
Number (constant) to fill the Volume field
    
    
Please, can you give me any hint?.
    
Cheers
    
Francisco

posted by:   Francisco       12-Sep-2010/5:39:06-7:00



Not sure if you intended to do it like this with a GUI, but this should get you started with parsing the data and selecting values based on input data:
    
R E B O L []
view layout [
     f1: field 300 {29/11/06 00:04:31,1383.00,1387.00}
     r1: rotary data [{XX} {XXI}]
     r2: rotary data [{bid} {ask}]
     f2: field 50 {25}
     btn "Convert" [
         p: parse f1/text {/ ,} ; probe p
         f3/text: rejoin [
             (select [{XX} {19} {XXI} {20}] r1/text) p/3 p/2 p/1 { }
             (replace/all p/4 {:} {}) {;}
             do (select [{bid} p/5 {ask} p/6] r2/text) {;}
             f2/text
         ]
         show f3
     ]
     f3: field 300
]

posted by:   Nick       12-Sep-2010/14:42:53-7:00



Thanks a lot for your help:
    
R E B O L [
    Title: "Oanda Tick Data to Ninjatrader converter"
    Version: 0.0.1
]
    
view layout [
     f1: field 300 {29/11/06 00:04:31,1383.00,1387.00}
     Across H1 100 "Century: " r1: rotary data [{XX} {XXI}] Below
     Across H1 100 "Price: " r2: rotary data [{bid} {ask}] Below
     Across H1 120 "Volume: " f2: field 50 Right {1} Below
btn "Convert"
[ p: parse f1/text {/ ,} ; probe p
         f3/text: rejoin [
             (select [{XX} {19} {XXI} {20}] r1/text) p/3 p/2 p/1 { }
             (replace/all p/4 {:} {}) {;}
             do (select [{bid} p/5 {ask} p/6] r2/text) {;}
             f2/text
         ]
         show f3
     ]
     f3: field 300
]
    
I've modified a bit the GUI just to add some labels.
    
Parsing really requires more effort from my part.
    
I'd like to add a file selection widget for input and another file selection one for output.
    
And then the code to read the input file lines and writing the modified data to the output file lines until EOF.
    
Any idea will be welcomed
    
Francisco

posted by:   Francisco       12-Sep-2010/18:42:21-7:00



This is the simpler "mode" of parse, where the rule is just a list of characters used to split the string - in this case, the string is split at the characters required by your scenario (forward slash, space, and comma). Parse is very easy to use for this type of string splitting:
    
     parse "29/11/06 00:04:31,1383.00,1387.00" "/ ,"
    
Here's an example that reads/writes files:
    
     R E B O L []
        
     ; example data file:
        
     write %input.txt {29/11/06 00:04:31,1383.00,1387.00
     29/11/06 00:04:33,1383.05,1387.06}
        
     input-file: to-file request-file/file/filter %input.txt ["*.txt" "*.dat"]
     output-string: copy ""
        
     foreach line (read/lines input-file) [
         r1: "XXI" ; use a GUI to get these values from a user, read from a file, etc.
         r2: "bid"
         f2: "25"
         p: copy parse line {/ ,}
         output: copy rejoin [
             (select [{XX} {19} {XXI} {20}] r1) p/3 p/2 p/1 { }
             (replace/all p/4 {:} {}) {;}
             do (select [{bid} p/5 {ask} p/6] r2) {;}
             f2 newline
         ]
         append output-string output
     ]
        
     output-file: to-file request-file/save/file/filter %output.txt ["*.txt" "*.dat"]
     write output-file output-string
     editor output-file
    
Use write/append if you don't want to erase the former file content.
    
Be sure to add some code to confirm file overwrites and other potential issues.
    
    


posted by:   Nick       12-Sep-2010/22:54:38-7:00



Thanks again...
    
I'm still working with the GUI.
    
This is what I have at the moment:
    
    
R E B O L [
    Title: "Oanda Tick Data to Ninjatrader converter"
    Version: 0.0.2
]
    
    
    
view layout [
    
     Across H1 150 "Input File: "x: field 300 button " ... " [x: request-file] Below
     Across H1 150 "Output File: "y: field 300 button " ... " [y: request-file] Below
     f1: field 300 {29/11/06 00:04:31,1383.00,1387.00} ; this line will be changed by the data read from input file
     Across H1 100 "Century: " r1: rotary data [{XX} {XXI}] Below
     Across H1 100 "Price: " r2: rotary data [{bid} {ask}] Below
     Across H1 120 "Volume: " f2: field 50 Right {1} Below
btn "Convert"
[ p: parse f1/text {/ ,} ; probe p
         f3/text: rejoin [
             (select [{XX} {19} {XXI} {20}] r1/text) p/3 p/2 p/1 { }
             (replace/all p/4 {:} {}) {;}
             do (select [{bid} p/5 {ask} p/6] r2/text) {;}
             f2/text
         ]
         show f3
     ]
     f3: field 300 ; this line will be changed by a progress bar
]
    
I'd like that the user can fill the fields with the file paths or using the request-file button action.
    
Is it possible to fill those fields with the result of the request-file selections to give the users a visual feedback?

posted by:   Francisco       13-Sep-2010/0:55:17-7:00



Hi,
Try this for input file:
     b: [input-file: field 300 button "..." [if f: request-file/only [input-file/text: copy form f show input-file]]]
     view layout b
    
by the way, if your file is that big, 1-2 gb you should read it line by line with direct (unbuffered) mode.
otherwise it reads the whole file into memory and it will take long time.
    
also look: http://stackoverflow.com/questions/3033936/reading-large-binary-files-fails-in-rebol


posted by:   Endo       13-Sep-2010/4:17:16-7:00



and also:
http://stackoverflow.com/questions/2768563/i-need-to-generate-50-millions-rows-csv-file-with-random-data-how-to-optimize-th/2771205#2771205


posted by:   Endo       13-Sep-2010/4:19:29-7:00



Francisco,
    
I'm a little puzzled about exactly how you intend users to go through the file. A 1 gig file will be millions of lines - you can use the foreach loop example above to work through each line, but are users supposed to step through each line individually and manually select the options for each line, or should the options chosen in the GUI be applied to all the lines in the file, or should users be able to search for lines with specific content, etc.?

posted by:   Nick       13-Sep-2010/12:51:34-7:00



Here are some more example code that may be useful. This will run MUCH faster if you don't display the progress:
    
     R E B O L []
        
     random/seed now/time
     test-data: copy ""
     loop 5000 [
         append test-data rejoin [
             "1" random 9 "/0" random 9 "/0" random 9
             " 0" random 9 ":5" random 9 ":0" random 9 ","
             (q1: (2000 + random 999)) ".7" random 9 ","
             (q1 + 5) ".9" random 9 newline
         ]
     ]
     write %input.txt test-data
        
     svv/vid-face/color: white
     view layout [
         across
         tabs 140
         style lil h3 95 right
         style big h1 blue 100
         lil "Input File: " f1: field 250
         button 40 " ... " [if x: request-file/only/file %input.txt [f1/text: copy form x show f1]] return
         lil "Output File: " f2: field 250
         button 40 " ... " [if y: request-file/only/file %output.txt [f2/text: copy form y show f2]] return tab
         big "Century: " r1: rotary data [{XXI} {XX}] return tab
         big "Price: " r2: rotary data [{bid} {ask}] return tab
         big "Volume: " f5: field 100 Right {1} return
         lil "Input Data: " f3: field 300 {29/11/06 00:04:31,1383.00,1387.00} return
         lil "Output Data: " f4: field 300 return tab
         t1: text 175 "" button "Convert" [
             count: 0
             input-data: read/lines x
             total: length? input-data
             output-string: copy ""
             foreach line input-data [
                 f3/text: line show f3
                 p: parse line {/ ,}
                 f4/text: copy rejoin [
                     (select [{XX} {19} {XXI} {20}] r1/text) p/3 p/2 p/1 { }
                     (replace/all p/4 {:} {}) {;}
                     do (select [{bid} p/5 {ask} p/6] r2/text) {;}
                     f5/text newline
                 ]
                 show f4
                 count: count + 1
                 t1/text: rejoin [count " of " total " lines"] show t1
                 append output-string f4/text
             ]
             write y output-string editor y
         ]
     ]

posted by:   Nick       13-Sep-2010/14:34:28-7:00



To demonstrate the difference in speed is about 10x on my machine. Try uncommenting the lines in the convert button block to see the difference:
    
     R E B O L []
        
     random/seed now/time
     test-data: copy ""
     loop 50000 [
         append test-data rejoin [
             "1" random 9 "/0" random 9 "/0" random 9
             " 0" random 9 ":5" random 9 ":0" random 9 ","
             (q1: (2000 + random 999)) ".7" random 9 ","
             (q1 + 5) ".9" random 9 newline
         ]
     ]
     write %input.txt test-data
        
     svv/vid-face/color: white
     view layout [
         across
         tabs 140
         style lil h3 95 right
         style big h1 blue 100
         lil "Input File: " f1: field 250
         button 40 " ... " [if x: request-file/only/file %input.txt [f1/text: copy form x show f1]] return
         lil "Output File: " f2: field 250
         button 40 " ... " [if y: request-file/only/file %output.txt [f2/text: copy form y show f2]] return tab
         big "Century: " r1: rotary data [{XXI} {XX}] return tab
         big "Price: " r2: rotary data [{bid} {ask}] return tab
         big "Volume: " f5: field 100 Right {1} return
         lil "Input Data: " f3: field 300 {29/11/06 00:04:31,1383.00,1387.00} return
         lil "Output Data: " f4: field 300 return tab
         t1: text 175 "" button "Convert" [
             start-time: now/time
             count: 0
             input-data: read/lines x
             total: length? input-data
             output-string: copy ""
             foreach line input-data [
                 ; f3/text: line show f3
                 p: parse line {/ ,}
                 f4/text: copy rejoin [
                     (select [{XX} {19} {XXI} {20}] r1/text) p/3 p/2 p/1 { }
                     (replace/all p/4 {:} {}) {;}
                     do (select [{bid} p/5 {ask} p/6] r2/text) {;}
                     f5/text newline
                 ]
                 ; show f4
                 ; count: count + 1
                 ; t1/text: rejoin [count " of " total " lines"] show t1
                 append output-string f4/text
             ]
             write y output-string editor y
             alert form (now/time - start-time)
         ]
     ]

posted by:   Nick       13-Sep-2010/15:29:53-7:00



Hi,
    
Thanks for your help...
    
Users will select the options to use and then click the Convert button just once. The whole input file will be converted line by line creating the output file.

posted by:   Francisco       13-Sep-2010/16:44:13-7:00



As Endo said read and write should be done line by line to avoid memory issues.
    


posted by:   Francisco       13-Sep-2010/17:45:40-7:00



@Nick: Update progress bar every 10th times or every length-of-file / 100 etc.
Users may want to see progress especially for big files. But 10x speed diff. is too much.

posted by:   Endo       14-Sep-2010/5:22:58-7:00



Performance is very good for this - I updated every 1000 lines:
    
     R E B O L []
        
     random/seed now/time
     test-data: copy ""
     flash "Generating random data file - please wait..."
     loop 500000 [
         append test-data rejoin [
             "1" random 9 "/0" random 9 "/0" random 9
             " 0" random 9 ":5" random 9 ":0" random 9 ","
             (q1: (2000 + random 999)) ".7" random 9 ","
             (q1 + 5) ".9" random 9 newline
         ]
     ]
     write %input.txt test-data
     unview
        
     svv/vid-face/color: white
     view center-face layout [
         across
         tabs 140
         style lil h3 95 right
         style big h1 blue 100
         lil "Input File: " f1: field 250
         button 40 " ... " [if x: request-file/only/file %input.txt [f1/text: copy form x show f1]] return
         lil "Output File: " f2: field 250
         button 40 " ... " [if y: request-file/only/file %output.txt [f2/text: copy form y show f2]] return tab
         big "Century: " r1: rotary data [{XXI} {XX}] return tab
         big "Price: " r2: rotary data [{bid} {ask}] return tab
         big "Volume: " f5: field 100 Right {1} return
         lil "Input Data: " f3: field 300 {29/11/06 00:04:31,1383.00,1387.00} return
         lil "Output Data: " f4: field 300 return
         t1: text right 160 "0" t2: text 135 " of 0 lines"
         button "Convert" [
             start-time: now/time
             count: pcount: 0
             input-data: read/lines x
             total: length? input-data
             t2/text: rejoin [count " of " total " lines"] show t2
             output-string: copy ""
             foreach line input-data [
                 ; f3/text: line show f3
                 p: parse line {/ ,}
                 the-text: copy rejoin [
                     (select [{XX} {19} {XXI} {20}] r1/text) p/3 p/2 p/1 { }
                     (replace/all p/4 {:} {}) {;}
                     do (select [{bid} p/5 {ask} p/6] r2/text) {;}
                     f5/text newline
                 ]
                 ; f4/text: copy the-text show f4
                 count: count + 1
                 if count - pcount > 999 [pcount: count t1/text: form count show t1]
                 append output-string the-text
             ]
             write y output-string editor y
             alert form (now/time - start-time)
         ]
     ]

posted by:   Nick       14-Sep-2010/10:29:57-7:00



I added direct read/write and took out some unnecessary lines:
    
     R E B O L []
     svv/vid-face/color: white
     view center-face layout [
         across
         tabs 140
         style lil h3 95 right
         style big h1 blue 100
         lil "Input File: " f1: field 250
         button 40 " ... " [if x: request-file/only/file %input.txt [f1/text: copy form x show f1]] return
         lil "Output File: " f2: field 250
         button 40 " ... " [if y: request-file/only/file %output.txt [f2/text: copy form y show f2]] return tab
         big "Century: " r1: rotary data [{XXI} {XX}] return tab
         big "Price: " r2: rotary data [{bid} {ask}] return tab
         big "Volume: " f5: field 100 Right {1} return tab
         button 210 "Convert" [
             count: pcount: 0
             output-string: copy ""
             file: open/direct/lines x
             if exists? y [delete y]
             out-file: open/direct y
             while [attempt [input-data: copy/part file 1000]] [
                 the-text: copy ""
                 foreach line input-data [
                     p: parse line {/ ,}
                     the-text: append the-text rejoin [
                         (select [{XX} {19} {XXI} {20}] r1/text) p/3 p/2 p/1 { }
                         (replace/all p/4 {:} {}) {;}
                         do (select [{bid} p/5 {ask} p/6] r2/text) {;}
                         f5/text newline
                     ]
                     count: count + 1
                     if count - pcount > 999 [pcount: count t1/text: form count show t1]
                 ]
                 append out-file the-text
             ]
             close file close out-file
             editor y
         ] return tab
         text "lines read:" t1: text 100 "0"
     ]
    
Check the output data carefully before using this, and use it at your own risk - I haven't tested it at all.

posted by:   Nick       14-Sep-2010/12:12:50-7:00



Thanks!

posted by:   Francisco       26-Sep-2010/2:10:11-7:00