Home   Archive   Permalink



Parsing with to

I'm building a web scraping application which when modeled like this works as I expect:
    
p:     " stuff

a header

some words

more words

whatever "
    
chars: charset [ #"a" - #"z" ]
h1: [

copy title any chars (print title)

]
para: [

copy ptext any chars (print ptext)

]
    
r:     parse p [ any chars h1 some para any chars end ]
    
print r
    
Output is:
    a header
some words
    more words
true
    
If, instead, I do something similar with "to", I get an error.
    
p:     " stuff

a header

some words

more words

whatever "
    
    
header: [

copy title to

(print title) ]
para: [

copy ptext to

(print ptext)]
    
r: parse p [ to header [some para ] to end]
print r
    
    
** Script Error: Invalid argument:

copy title to

print title
** Near: r: parse p [to header [some para] to end]
print
    
Can someone explain why "to" behaves so differently than "any chars"? I'm sure that I'm missing something.
    
Thanks for your help.

posted by:   Andyh     25-Jan-2012/23:46:28-8:00



This is because string parsing and block parsing is totally different. Take a look in Core manual about them.
In string parsing there are characters, in block parsing there are REBOL values (words, numbers, blocks)


posted by:   Endo     26-Jan-2012/3:26:36-8:00



TO cannot accept a rule as argument. You could use instead:
    
     r: parse p [ to

header [some para ] to end]
    


posted by:   DocKimbel     28-Jan-2012/3:38:03-8:00



Thanks a bunch Doc. I think I've got it. Now I'll try some real web pages!

posted by:   Andyh     29-Jan-2012/15:02:26-8:00



Happy scraping! That is what's made me want to learn REBOL in the first place, twelve years ago. ;-)

posted by:   DocKimbel     30-Jan-2012/13:47:12-8:00