Parsing with to
I'm building a web scraping application which when modeled like this works as I expect: p: " stuff <h1> a header </h1> <p> some words </p> <p> more words </p> whatever " chars: charset [ #"a" - #"z" ] h1: [ <h1> copy title any chars (print title)</h1> ] para: [ <p> copy ptext any chars (print ptext) </p> ] r: parse p [ any chars h1 some para any chars end ] print r Output is: a header some words more words true If, instead, I do something similar with "to", I get an error. p: " stuff <h1> a header </h1> <p> some words </p> <p> more words </p> whatever " header: [ <h1> copy title to </h1> (print title) ] para: [ <p> copy ptext to </p> (print ptext)] r: parse p [ to header [some para ] to end] print r ** Script Error: Invalid argument: <h1> copy title to </h1> print title ** Near: r: parse p [to header [some para] to end] print Can someone explain why "to" behaves so differently than "any chars"? I'm sure that I'm missing something. Thanks for your help.
posted by: Andyh 25-Jan-2012/23:46:28-8:00
This is because string parsing and block parsing is totally different. Take a look in Core manual about them. In string parsing there are characters, in block parsing there are REBOL values (words, numbers, blocks)
posted by: Endo 26-Jan-2012/3:26:36-8:00
TO cannot accept a rule as argument. You could use instead: r: parse p [ to <h1> header [some para ] to end]
posted by: DocKimbel 28-Jan-2012/3:38:03-8:00
Thanks a bunch Doc. I think I've got it. Now I'll try some real web pages!
posted by: Andyh 29-Jan-2012/15:02:26-8:00
Happy scraping! That is what's made me want to learn REBOL in the first place, twelve years ago. ;-)
posted by: DocKimbel 30-Jan-2012/13:47:12-8:00
|