Home   Archive   Permalink

Parsing baby steps

Still just not getting the parsing. I am looking at the manual, and at the makedoc2 source code (a parse-fest if there ever was one) and trying to concoct a simple example to see what it is in my head that might be blocking my understanding. The example below has an explanation in the comments for what I am trying to do in the example. I understand the principle. When I get the === I will copy what follows to the end of the line, trim it, and surround it with h1 tags. When I get a blank line, I will copy to the next blank line and surround what I get with the p tags. But I just don't see what to do to make that happen and I wonder if anyone can offer guidance.
Thank you.
R E B O L []
;; [---------------------------------------------------------------------------]
;; [ Demo for the purpose of trying to understand parsing.                     ]
;; [                                                                         ]
;; [ This demo will transform simple text with one markup item into html.     ]
;; [ The one markup item is the === on one line that indicates a heading     ]
;; [ at the h1 level. The text on that line should be trimmed and surrounded ]
;; [ by the h1 tags. The other lines of text should be divided on the         ]
;; [ blank line and be surrounded by the "p" tags.                             ]
;; [                                                                         ]
;; [ So text like this:                                                        ]
;; [                                                                         ]
;; [ ===Heading 1                                                             ]
;; [                                                                         ]
;; [ Paragraph 1-1                                                             ]
;; [                                                                         ]
;; [ ===Heading 2                                                             ]
;; [                                                                         ]
;; [ Paragraph 2-1                                                             ]
;; [                                                                         ]
;; [ Should be transformed to this:                                            ]
;; [                                                                         ]
;; [ <h1>Heading 1</h1>                                                        ]
;; [ <p>Paragraph 1-1</p>                                                     ]
;; [ <h1>Heading 2</h1>                                                        ]
;; [ <p>Paragraph 2-1</p>                                                     ]
;; [                                                                         ]
;; [ ...or something equivalent.                                             ]
;; [---------------------------------------------------------------------------]
;; -- This is the sample input data that we will parse.
===Heading one
This is a paragraph of text under heading one.
We would want it surrounded by the "p" tags.
This is a second paragraph
that should have its own set of "p" tags.
===A second heading
The above heading would be emitted with the "h1"
And here is a second paragraph under the second
heading just to show things are working
;; -- This will be the parsed input data with its html tags.
HTML-OUT: copy ""
;; -- Parse IN-TEXT, mark it up, and append it to HTML-OUT.
;; ???
;; -- Display the output and halt for probing.
print HTML-OUT

posted by:   Steven White       10-Sep-2018/13:13:06-7:00

A simplistic approach to this would be to use BITSET! to positively identify content portions. This is sort-of how MakeDoc works, but with a few more subtle rules.
; anything but newlines
content: complement charset "^/"
scan-doc: func [text [string!]][
     ; parse/all for rebol 2
     collect [
         parse/all text [
             any [
                 | "===" opt " " copy part some content (
                     keep 'heading
                     keep part
                 | copy part [some content any [newline some content] (
                     keep 'para
                     keep part
probe scan-doc {
=== A Header
A Paragraph
Another Paragraph

posted by:   Chris       10-Sep-2018/13:30:33-7:00

*This line was missing a close bracket
| copy part [some content any [newline some content]] (
Note that this lets you identify multiline paragraphs.

posted by:   Chris       10-Sep-2018/13:33:45-7:00

While this seems an easy task, a few matters make it more difficult than it looks with historical PARSE.
One of the not-so-easy aspects is that the TO and THRU doesn't allow you to use complex rules, which complicates your paragraph termination conditions. This is a decision which was reversed in Red (and will be also in Ren-C, when time permits).
You can try this in Red, and while there are likely workarounds for it in Rebol2 and R3-Alpha I'd rather consider the fact that this doesn't work as-is a bug than figure out what that would be:
     heading-rule: [
         "===" copy heading to "^/" (
             append html-out reduce [
                 <h1> heading </h1> newline
     paragraph-rule: [
         copy paragraph to ["^/^/" | "^/" end | end] (
             append html-out reduce [
                 <p> paragraph </p> newline
     parse in-text [
         (html-out: copy {})
         some [newline | heading-rule | paragraph-rule]
     print mold html-out

posted by:   Fork       10-Sep-2018/15:17:01-7:00



Type the reverse of this captcha text: "y x o r p"