Parse trying to learn parse
this is my log1.txt file line0: header line1: aaaa line1: bbbb line1: cccc line2: ddd line2: eee this is my code R E B O L [ ] log: read/string %log1.txt parse/all log [ some [ thru "line0:" copy msgLine0 to newline | thru "line1:" copy msgLine1 to newline | thru "line2:" copy msgline2 to newline ( print rejoin [ msgLine0 " * " msgline1 " * " msgLine2] msgLine0: msgLine1: msgLine2: copy "" ) ] ] my goal is to extract the 2nd column from the log1.txt file using parse. this is the rstults:- header * cccc * ddd * * eee it has missed lines aaa, bbb the logic I am using in plain language is:- one or more line0 | one or more line1 | one or more line2 wonder why this does not work. can someone help me understand? thank you
posted by: nubie 20-Feb-2020/21:14:21-8:00
Code in parentheses runs when the rule it is part of matches. The way you have written this, your code in parentheses is connected only with the case that the third rule in the alternate list matches. e.g. if you are to write: some [ rule0 | rule1 | rule2 (code) ] It's like you had written: some [ rule0 | rule1 | [rule2 (code)] ] So it will only be when rule2 matches that the code runs. Hence you are running the code exactly two times; once for each time the line2 matches. If you wish the code to run each time *any* of the rules matches, you would write: some [ [rule0 | rule1 | rule2] (code) ] If you wanted the code to run after a repeated match of all the rules (e.g. when the SOME rule has finished), you would write: some [rule0 | rule1 | rule2] (code) Hopefully that clarifies things...
posted by: Fork 20-Feb-2020/22:37:40-8:00
Thanks Fork for the clear explanations. I modified the code according to your advice. It works :) I am able to extract the 2nd column from the data file. It is so much easier than reading line by line and extracting the 2nd column. showing the modified code R E B O L [ ] log: read/string %log1.txt msgLine0: msgLine1: msgLine2: copy "" parse/all log [ some [ [ thru "line0:" copy msgLine0 to newline | thru "line1:" copy msgLine1 to newline | thru "line2:" copy msgline2 to newline ] ( print rejoin [ msgLine0 " * " msgLine1 " * " msgLine2] msgLine0: msgLine1: msgLine2: copy "" ) ] ] and results after running the code:- header * * * aaaa * * bbbb * * cccc * * * ddd * * eee
posted by: nubie 21-Feb-2020/8:01:05-8:00
Hi Fork, I tried something a little bit different but the results are weird. I just repeated the first group of data a 2nd time line0: header line1: aaaa line1: bbbb line1: cccc line2: ddd line2: eee line0: header2 line1: fff line1: ggg line1: hhh line2: iii line2: jjj and I run the same code above, and this is what I am getting header * * header2 * * * fff * * ggg * * hhh * * * iii * * jjj where did teh aaa, bbb, ccc, ddd, eee go? it was there when there was less data. I am kind of puzzled as to how parse works generally speaking how does parse work? does it take the whole data set and compare against rule1, then the whole data set against rule 2, then the whole data set against rule 3 ? or does it take one line at a time and goes against rule1, then rule2 , then rule3? line 1 of data against rule1, line1 of data against rule2, line 1 of data against rule3, then line2 and then line3 and so on... I tried adding a (print index? log) on each parse rule e.g thru "line0:" copy msgLine0 to newline ( print index? log ) but it returns me 1 all the time. kind of scratching my head...
posted by: nubie 21-Feb-2020/17:53:54-8:00
You are using alternates (the pipe character, |, is used to separate the alternates). They are run in priority order, and you are using THRU with that... so the THRU of the earlier rules will always take priority. For instance: parse "aba" [ some [ thru "a" (print "A!") | thru "b" (print "B!") ] ] That will give you: A! A! Because it will try the alternate ruleset once, find it can reach an "a". Then try the alternate ruleset a second time, and find it can reach an "a" again. It never even looks for a "b" until it has already passed it. So combining THRU and an alternates list is going to get you a pecking order you don't appear to like. What other choice you make depends on what you are looking for. For instance: some [ ; grab the data assuming `line` starts each line ; copy up TO (but not including the newline) ; [ "line0:" space copy msgLine0 to newline | "line1:" space copy msgLine1 to newline | "line2:" space copy msgline2 to newline ] newline ; now consume the newline (could also SKIP) ( print rejoin [ msgLine0 " * " msgLine1 " * " msgLine2] msgLine0: msgLine1: msgLine2: copy "" ) ] This is assuming that your input data has a newline even on the last line (this is actually a convention in Unix--that the last line of a file should have a newline on it--which has good reasons). But if you don't like that assumption you can have rules like `[newline | end]` to match either. And you can say things like `nend: [newline | end]` to make compound rules and reference them.
posted by: Fork 21-Feb-2020/21:41:49-8:00
Thanks a lot Fork for the clear explanations again. I was using thru, as I thought it was easier to use and it would cover many scenarios, not really knowing the full implication. Now it is clearer to me with the explanations you provided above. I have modified the code now based on your advice. I have defined a space variable and use "any space" in the rule, just to cover a scenario, that a line may be starting with space. R E B O L [ ] log: read/string %log1.txt msgLine0: msgLine1: msgLine2: copy "" space: " " parse/all log [ some [ [ any space "line0:" copy msgLine0 to newline | any space "line1:" copy msgLine1 to newline | any space "line2:" copy msgline2 to newline ] skip ( print rejoin [ msgLine0 " * " msgLine1 " * " msgLine2] msgLine0: msgLine1: msgLine2: copy "" ) ] ] I will continue playing witn parse, to learn more about it. So far after those 4 days reading the parse chapter in Rebol core docs and playing with it, I like what it can do. the next thing I will try is to see if i can find a way to trace it when it is running. one thing that I have been trying to see is what gets fed into each rule at each step of the way and where the pointer is. Thanks again for your help.
posted by: nubie 22-Feb-2020/9:14:28-8:00
Nice that you are enjoying PARSE. It's a fairly addictive alternative to RegEx. Being able not just to form named rules to break your problem into smaller parts... but also to build those rules programmatically... can be kind of a revelation (especially if you're not coming from a Lisp background). It's one of the best practical examples so far of how Rebol has taken the same box of parts (like blocks and words and code in parentheses) and given them a new meaning. There's this freedom for what you can make it do when there are really "no keywords". So once you catch on to this idea, you can look at solving other problem domains with a similar "liberated" mindset.
posted by: Fork 22-Feb-2020/10:04:23-8:00
Yes definitely better than regular expressions :)
posted by: nubie 24-Feb-2020/22:24:51-8:00
|