Home   Archive   Permalink



More parsing confusion

I happened to be reading some of Nick's documentation and came upon the parsing example below. The purpose of the example is to remove in-line comments from REBOL code. I understand in concept of what it is doing, namely, find the first semicolon, then look ahead to the end of the line, calculate the number of characters between that semicolon and the end of the line, and remove that many characters. But I can't EXPLAIN it. What does the "begin:" mean? And the "ending:"? And why is there the ":begin" at the end of the "remove" line? And why is there the "any" in the rule? I feel like there is some key concept in parsing that I am supposed to know but don't, and if I did, I would have this head-slapping moment of clarity where I would say, "Oh, of course, it's so obvious."    
    
I wonder if someone might give me some guidance in understanding this example so I can continue my march toward parsing knowledge.
    
Thank you.
    
R E B O L []
    
CODE: {
Owner_Name: ""     ;; A
Co-Owner_Name: "" ;; B
Mail_Address_1: "" ;; C
Mail_Address_2: "" ;; D
In_care_of: ""     ;; E
City: ""         ;; F
State: ""         ;; G
Country: ""        ;; H
Zip: ""            ;; I
}
    
parse/all code [any [
     to #";" begin: to newline ending: (
         remove/part begin ((index? ending) - (index? begin))) :begin
     ]
]
    
write %uncommented.txt CODE
editor CODE ; all comments removed


posted by:   Steven White       8-May-2019/17:13:10-7:00



This method of comment removal doesn't actually work, because you have to consider comments inside strings. Interestingly enough, I was writing a parse rule that did actual comment removal last week (uses the Ren-C/Red-ism AHEAD, but shows the general method):
    
https://github.com/metaeducation/ren-c/blob/65fcd12516f220a08893b9045bfd6ec79e72cabb/tools/common.r#L386
    
    
> What does the "begin:" mean?
    
Rebol has historically used SET-WORD!s to capture the current parse position into a variable. Hence the SET is referring to setting the variable. Correspondingly it has used GET-WORD!s to seek the parse position to the position held in that variable.
    
I think this is questionable. For one thing it was hard for me in the beginning to know SET-WORD! didn't mean "set the parse position", and GET-WORD! didn't mean "get the parse position". But also it seems a keyword like SEEK would be clearer (you'd have been less confused, right?)
    
https://forum.rebol.info/t/changing-set-word-and-get-word-in-parse/1139
    
> And why is there the ":begin" at the end of the "remove" line?
    
The parse position was at the end of the line. If you remove material from the start of the comment to the end of the line, the parse position will now be out of date, and somewhere on a future line. Seeking to the index saved in the `begin` variable puts you at the right place for processing the next comment.
    
R3-Alpha and Red have a REMOVE command in the parse dialect that takes care of this problem in one swoop. You say `remove [...rule...]` and it will effectively mark the begin, end, and fix up the position.
    
> And why is there the "any" in the rule?
    
The rule inside the ANY finds and removes one comment. If your goal is to remove several comments, you need some rule that does iteration...like ANY or SOME.
    
In writing my own comment removal rule linked above, I gained an understanding of how WHILE is different from ANY, and why it is necessary to have:
    
https://forum.rebol.info/t/parses-advancement-rule-bad/1159

posted by:   Fork       8-May-2019/17:34:54-7:00



Name:


Message:


Type the reverse of this captcha text: "e c i v r e s - e t o m e r - t s a l"



Home