Home   Archive   Permalink



parse question

Using Parse, how do I parse the following
    
TITLE: bla bla bla
         LINE1: 123
         TEST2: hello
         LINE3:
    
         LINE4: sfsjflk sfja;lfl sflajfldj
         asfjlsf sdfjlsf fddsfj
         ojijh
    
         TESTLINE: c
    
         LINE1: 456
         TEST2: Bello
         LINE3:
    
         LINE4: eiuteptieot
         fls flksflasj dljf
         asfljd dlfj sfjlsf sdlkjsfj
         tyhjmuikl
    
         TESTLINE c
    
         LINE1: 789
         TEST2: Kello
         LINE3:
    
         LINE4: lohlghkyju
         lkhgf oiu
    
         TESTLINE: c
    
    
    
    
TITLE: vla va vla
         LINE1: 8123
         TEST2: hellobla
         LINE3:
    
         LINE4: rtyu sfsjflk sfja;lfl sflajfldj
         aploik sfjlsf sdfjlsf fddsfj
         ojijh lala
    
         TESTLINE: c
            
         LINE1: 45677
         TEST2: Bello ssf
         LINE3:
            
         LINE4: tgr eiuteptieot
         hytr fls flk sfla sj dljf
         hjas fljd dlfj sfjlsf sdlkjsfj
         tyhjmuikl
            
         TESTLINE: c
            
         LINE1: 76689
         TEST2: Kriello
         LINE3:
            
         LINE4: hoho ohlghkyju
         ha ha lkhgf oiu
    
         TESTLINE: c
    
    
    
to extract the following :-
    
TITLE: bla bla bla
         LINE1: 123
         TEST2: hello
    
         LINE1: 456
         TEST2: Bello
    
         LINE1: 789
         TEST2: Kello
    
TITLE: vla va vla
         LINE1: 8123
         TEST2: hellobla
            
         LINE1: 45677
         TEST2: Bello ssf
        
         LINE1: 76689
         TEST2: Kriello
        
    
I don't want to use line by line approach with FIND. I am hoping there is a more elegant way with parse and less code to achieve this.
    
I tried the following rule but it did not work.
    
parse/all myInput [ some [
                     opt thru "TITLE"
                     tO "LINE1" copy txnid to "LINE3"
                     thru "LINE4" to "TESTLINE"
                     ]
    
            ]


posted by:   momo     29-Nov-2017/22:28:21-8:00



Source:
    
title-rule: [keep ["TITLE:" thru newline]]
line-rule: [keep [any space "LINE1:" thru newline]]
test-rule: [keep [any space "TEST2:" thru newline] keep (newline)]
    
rules: [
    collect [
        some [
            title-rule
        |    line-rule
        |    test-rule    
        |    skip    
        ]
    ]
]
    
rejoin parse data rules
    
    
Result:
    
>> print do %delme2.red
TITLE: bla bla bla
         LINE1: 123
         TEST2: hello
    
         LINE1: 456
         TEST2: Bello
    
         LINE1: 789
         TEST2: Kello
    
TITLE: vla va vla
         LINE1: 8123
         TEST2: hellobla
    
         LINE1: 45677
         TEST2: Bello ssf
    
         LINE1: 76689
         TEST2: Kriello
    
If you have tabs in your source, you need to fix `any space` to something like `any [space | tab]`.
    
Also, it's written in Red's `parse` variant, so if you want to run it under Rebol, change `keep` to `copy value` and add `(append result value)` to the rule or something like that.

posted by:   rebolek     30-Nov-2017/2:51:30-8:00



thanks Rebolek.
    
Wow I have a long way to go, before I get to that level of code :-)
    
I will need to digest your code and try to convert it into rebol. I have not played with Red yet, so still plodding along learning Rebol.
    


posted by:   momo     30-Nov-2017/22:23:10-8:00