Working with CSV files
Hello, This application is probably trivial for most programmers, but I need your help... I want to change the format in very big (1-2 GB) CSV files: Input File Format: dd/MM/yy hh:mm:ss,[Bid Price],[Ask Price] Example: 29/11/06 00:04:31,1383.00,1387.00 29/11/06 00:04:33,1383.05,1387.06 Desired Ouput File Format: yyyyMMdd hhmmss;[Price];[Volume] Example: 20061129 000431;1383.00;25 20061129 000433;1383.05;36 User should select: Century (XX/XXI) used to fill the two first digits in date field (19yy/20yy) Price (bid or ask) used to fill the price field Number (constant) to fill the Volume field Please, can you give me any hint?. Cheers Francisco
posted by: Francisco 12-Sep-2010/5:39:06-7:00
Not sure if you intended to do it like this with a GUI, but this should get you started with parsing the data and selecting values based on input data: R E B O L [] view layout [ f1: field 300 {29/11/06 00:04:31,1383.00,1387.00} r1: rotary data [{XX} {XXI}] r2: rotary data [{bid} {ask}] f2: field 50 {25} btn "Convert" [ p: parse f1/text {/ ,} ; probe p f3/text: rejoin [ (select [{XX} {19} {XXI} {20}] r1/text) p/3 p/2 p/1 { } (replace/all p/4 {:} {}) {;} do (select [{bid} p/5 {ask} p/6] r2/text) {;} f2/text ] show f3 ] f3: field 300 ]
posted by: Nick 12-Sep-2010/14:42:53-7:00
Thanks a lot for your help: R E B O L [ Title: "Oanda Tick Data to Ninjatrader converter" Version: 0.0.1 ] view layout [ f1: field 300 {29/11/06 00:04:31,1383.00,1387.00} Across H1 100 "Century: " r1: rotary data [{XX} {XXI}] Below Across H1 100 "Price: " r2: rotary data [{bid} {ask}] Below Across H1 120 "Volume: " f2: field 50 Right {1} Below btn "Convert" [ p: parse f1/text {/ ,} ; probe p f3/text: rejoin [ (select [{XX} {19} {XXI} {20}] r1/text) p/3 p/2 p/1 { } (replace/all p/4 {:} {}) {;} do (select [{bid} p/5 {ask} p/6] r2/text) {;} f2/text ] show f3 ] f3: field 300 ] I've modified a bit the GUI just to add some labels. Parsing really requires more effort from my part. I'd like to add a file selection widget for input and another file selection one for output. And then the code to read the input file lines and writing the modified data to the output file lines until EOF. Any idea will be welcomed Francisco
posted by: Francisco 12-Sep-2010/18:42:21-7:00
This is the simpler "mode" of parse, where the rule is just a list of characters used to split the string - in this case, the string is split at the characters required by your scenario (forward slash, space, and comma). Parse is very easy to use for this type of string splitting: parse "29/11/06 00:04:31,1383.00,1387.00" "/ ," Here's an example that reads/writes files: R E B O L [] ; example data file: write %input.txt {29/11/06 00:04:31,1383.00,1387.00 29/11/06 00:04:33,1383.05,1387.06} input-file: to-file request-file/file/filter %input.txt ["*.txt" "*.dat"] output-string: copy "" foreach line (read/lines input-file) [ r1: "XXI" ; use a GUI to get these values from a user, read from a file, etc. r2: "bid" f2: "25" p: copy parse line {/ ,} output: copy rejoin [ (select [{XX} {19} {XXI} {20}] r1) p/3 p/2 p/1 { } (replace/all p/4 {:} {}) {;} do (select [{bid} p/5 {ask} p/6] r2) {;} f2 newline ] append output-string output ] output-file: to-file request-file/save/file/filter %output.txt ["*.txt" "*.dat"] write output-file output-string editor output-file Use write/append if you don't want to erase the former file content. Be sure to add some code to confirm file overwrites and other potential issues.
posted by: Nick 12-Sep-2010/22:54:38-7:00
Thanks again... I'm still working with the GUI. This is what I have at the moment: R E B O L [ Title: "Oanda Tick Data to Ninjatrader converter" Version: 0.0.2 ] view layout [ Across H1 150 "Input File: "x: field 300 button " ... " [x: request-file] Below Across H1 150 "Output File: "y: field 300 button " ... " [y: request-file] Below f1: field 300 {29/11/06 00:04:31,1383.00,1387.00} ; this line will be changed by the data read from input file Across H1 100 "Century: " r1: rotary data [{XX} {XXI}] Below Across H1 100 "Price: " r2: rotary data [{bid} {ask}] Below Across H1 120 "Volume: " f2: field 50 Right {1} Below btn "Convert" [ p: parse f1/text {/ ,} ; probe p f3/text: rejoin [ (select [{XX} {19} {XXI} {20}] r1/text) p/3 p/2 p/1 { } (replace/all p/4 {:} {}) {;} do (select [{bid} p/5 {ask} p/6] r2/text) {;} f2/text ] show f3 ] f3: field 300 ; this line will be changed by a progress bar ] I'd like that the user can fill the fields with the file paths or using the request-file button action. Is it possible to fill those fields with the result of the request-file selections to give the users a visual feedback?
posted by: Francisco 13-Sep-2010/0:55:17-7:00
Hi, Try this for input file: b: [input-file: field 300 button "..." [if f: request-file/only [input-file/text: copy form f show input-file]]] view layout b by the way, if your file is that big, 1-2 gb you should read it line by line with direct (unbuffered) mode. otherwise it reads the whole file into memory and it will take long time. also look: http://stackoverflow.com/questions/3033936/reading-large-binary-files-fails-in-rebol
posted by: Endo 13-Sep-2010/4:17:16-7:00
and also: http://stackoverflow.com/questions/2768563/i-need-to-generate-50-millions-rows-csv-file-with-random-data-how-to-optimize-th/2771205#2771205
posted by: Endo 13-Sep-2010/4:19:29-7:00
Francisco, I'm a little puzzled about exactly how you intend users to go through the file. A 1 gig file will be millions of lines - you can use the foreach loop example above to work through each line, but are users supposed to step through each line individually and manually select the options for each line, or should the options chosen in the GUI be applied to all the lines in the file, or should users be able to search for lines with specific content, etc.?
posted by: Nick 13-Sep-2010/12:51:34-7:00
Here are some more example code that may be useful. This will run MUCH faster if you don't display the progress: R E B O L [] random/seed now/time test-data: copy "" loop 5000 [ append test-data rejoin [ "1" random 9 "/0" random 9 "/0" random 9 " 0" random 9 ":5" random 9 ":0" random 9 "," (q1: (2000 + random 999)) ".7" random 9 "," (q1 + 5) ".9" random 9 newline ] ] write %input.txt test-data svv/vid-face/color: white view layout [ across tabs 140 style lil h3 95 right style big h1 blue 100 lil "Input File: " f1: field 250 button 40 " ... " [if x: request-file/only/file %input.txt [f1/text: copy form x show f1]] return lil "Output File: " f2: field 250 button 40 " ... " [if y: request-file/only/file %output.txt [f2/text: copy form y show f2]] return tab big "Century: " r1: rotary data [{XXI} {XX}] return tab big "Price: " r2: rotary data [{bid} {ask}] return tab big "Volume: " f5: field 100 Right {1} return lil "Input Data: " f3: field 300 {29/11/06 00:04:31,1383.00,1387.00} return lil "Output Data: " f4: field 300 return tab t1: text 175 "" button "Convert" [ count: 0 input-data: read/lines x total: length? input-data output-string: copy "" foreach line input-data [ f3/text: line show f3 p: parse line {/ ,} f4/text: copy rejoin [ (select [{XX} {19} {XXI} {20}] r1/text) p/3 p/2 p/1 { } (replace/all p/4 {:} {}) {;} do (select [{bid} p/5 {ask} p/6] r2/text) {;} f5/text newline ] show f4 count: count + 1 t1/text: rejoin [count " of " total " lines"] show t1 append output-string f4/text ] write y output-string editor y ] ]
posted by: Nick 13-Sep-2010/14:34:28-7:00
To demonstrate the difference in speed is about 10x on my machine. Try uncommenting the lines in the convert button block to see the difference: R E B O L [] random/seed now/time test-data: copy "" loop 50000 [ append test-data rejoin [ "1" random 9 "/0" random 9 "/0" random 9 " 0" random 9 ":5" random 9 ":0" random 9 "," (q1: (2000 + random 999)) ".7" random 9 "," (q1 + 5) ".9" random 9 newline ] ] write %input.txt test-data svv/vid-face/color: white view layout [ across tabs 140 style lil h3 95 right style big h1 blue 100 lil "Input File: " f1: field 250 button 40 " ... " [if x: request-file/only/file %input.txt [f1/text: copy form x show f1]] return lil "Output File: " f2: field 250 button 40 " ... " [if y: request-file/only/file %output.txt [f2/text: copy form y show f2]] return tab big "Century: " r1: rotary data [{XXI} {XX}] return tab big "Price: " r2: rotary data [{bid} {ask}] return tab big "Volume: " f5: field 100 Right {1} return lil "Input Data: " f3: field 300 {29/11/06 00:04:31,1383.00,1387.00} return lil "Output Data: " f4: field 300 return tab t1: text 175 "" button "Convert" [ start-time: now/time count: 0 input-data: read/lines x total: length? input-data output-string: copy "" foreach line input-data [ ; f3/text: line show f3 p: parse line {/ ,} f4/text: copy rejoin [ (select [{XX} {19} {XXI} {20}] r1/text) p/3 p/2 p/1 { } (replace/all p/4 {:} {}) {;} do (select [{bid} p/5 {ask} p/6] r2/text) {;} f5/text newline ] ; show f4 ; count: count + 1 ; t1/text: rejoin [count " of " total " lines"] show t1 append output-string f4/text ] write y output-string editor y alert form (now/time - start-time) ] ]
posted by: Nick 13-Sep-2010/15:29:53-7:00
Hi, Thanks for your help... Users will select the options to use and then click the Convert button just once. The whole input file will be converted line by line creating the output file.
posted by: Francisco 13-Sep-2010/16:44:13-7:00
As Endo said read and write should be done line by line to avoid memory issues.
posted by: Francisco 13-Sep-2010/17:45:40-7:00
@Nick: Update progress bar every 10th times or every length-of-file / 100 etc. Users may want to see progress especially for big files. But 10x speed diff. is too much.
posted by: Endo 14-Sep-2010/5:22:58-7:00
Performance is very good for this - I updated every 1000 lines: R E B O L [] random/seed now/time test-data: copy "" flash "Generating random data file - please wait..." loop 500000 [ append test-data rejoin [ "1" random 9 "/0" random 9 "/0" random 9 " 0" random 9 ":5" random 9 ":0" random 9 "," (q1: (2000 + random 999)) ".7" random 9 "," (q1 + 5) ".9" random 9 newline ] ] write %input.txt test-data unview svv/vid-face/color: white view center-face layout [ across tabs 140 style lil h3 95 right style big h1 blue 100 lil "Input File: " f1: field 250 button 40 " ... " [if x: request-file/only/file %input.txt [f1/text: copy form x show f1]] return lil "Output File: " f2: field 250 button 40 " ... " [if y: request-file/only/file %output.txt [f2/text: copy form y show f2]] return tab big "Century: " r1: rotary data [{XXI} {XX}] return tab big "Price: " r2: rotary data [{bid} {ask}] return tab big "Volume: " f5: field 100 Right {1} return lil "Input Data: " f3: field 300 {29/11/06 00:04:31,1383.00,1387.00} return lil "Output Data: " f4: field 300 return t1: text right 160 "0" t2: text 135 " of 0 lines" button "Convert" [ start-time: now/time count: pcount: 0 input-data: read/lines x total: length? input-data t2/text: rejoin [count " of " total " lines"] show t2 output-string: copy "" foreach line input-data [ ; f3/text: line show f3 p: parse line {/ ,} the-text: copy rejoin [ (select [{XX} {19} {XXI} {20}] r1/text) p/3 p/2 p/1 { } (replace/all p/4 {:} {}) {;} do (select [{bid} p/5 {ask} p/6] r2/text) {;} f5/text newline ] ; f4/text: copy the-text show f4 count: count + 1 if count - pcount > 999 [pcount: count t1/text: form count show t1] append output-string the-text ] write y output-string editor y alert form (now/time - start-time) ] ]
posted by: Nick 14-Sep-2010/10:29:57-7:00
I added direct read/write and took out some unnecessary lines: R E B O L [] svv/vid-face/color: white view center-face layout [ across tabs 140 style lil h3 95 right style big h1 blue 100 lil "Input File: " f1: field 250 button 40 " ... " [if x: request-file/only/file %input.txt [f1/text: copy form x show f1]] return lil "Output File: " f2: field 250 button 40 " ... " [if y: request-file/only/file %output.txt [f2/text: copy form y show f2]] return tab big "Century: " r1: rotary data [{XXI} {XX}] return tab big "Price: " r2: rotary data [{bid} {ask}] return tab big "Volume: " f5: field 100 Right {1} return tab button 210 "Convert" [ count: pcount: 0 output-string: copy "" file: open/direct/lines x if exists? y [delete y] out-file: open/direct y while [attempt [input-data: copy/part file 1000]] [ the-text: copy "" foreach line input-data [ p: parse line {/ ,} the-text: append the-text rejoin [ (select [{XX} {19} {XXI} {20}] r1/text) p/3 p/2 p/1 { } (replace/all p/4 {:} {}) {;} do (select [{bid} p/5 {ask} p/6] r2/text) {;} f5/text newline ] count: count + 1 if count - pcount > 999 [pcount: count t1/text: form count show t1] ] append out-file the-text ] close file close out-file editor y ] return tab text "lines read:" t1: text 100 "0" ] Check the output data carefully before using this, and use it at your own risk - I haven't tested it at all.
posted by: Nick 14-Sep-2010/12:12:50-7:00
Thanks!
posted by: Francisco 26-Sep-2010/2:10:11-7:00
|