Home   Archive   Permalink

Parsing a Python line; why does this even work?

I am trying to transition to Python for the sake of our shared expertise in that language, and I am trying to centralize our ODBC database connection strings for easy maintenance. I discovered a format for coding the connection strings into a Python module so they can be parsed and used by a REBOL script. One format to rule them all, as it were. But, the question is, it works, but why? As shown in the demo below, I can divide the file into sets of
(connection-name-1) (connection-string-1)
(connection-name-2) (connection-string-2)
based on that "equal" sign that divides the connection name and the connection string, but why doesn't it divide on all the "equal" signs inside the connection strings.
I am delighted that this works as I hoped, but I would like to be able to explain why.
Thank you.
R E B O L [
     Title: "General ODBC functions"
     Purpose: {Isolate ODBC connection strings for easy maintenance.
     Store them in a format such that they can be a Python module
     and still be used by a REBOL program.}
;; -- This is a demo. These will be stored in a file.
ODBC-OPEN: func [
] [
     ODBC-CON: open [
         scheme: 'odbc
     ODBC-CMD: first ODBC-CON
] [
     insert ODBC-CMD ODBC-SQL
     return copy ODBC-CMD
ODBC-CLOSE: does [
     close ODBC-CMD
     print [mold NAME ":" mold CONSTRING]

posted by:   Steven White       3-Jun-2019/14:22:39-7:00

It looks its the quotes:
     >> parse/all {test="a=1,b=2"} "=,"
     == ["test" "a=1,b=2"]
     >> parse/all {test="a=1,b=2",x=3,"y=4"} "=,"
     == ["test" "a=1,b=2" "x" "3" "y=4"]
     >> parse/all {test='a=1,b=2'} "=,"
     == ["test" "'a" "1" "b" "2'"]
But I don't know why it works that way.

posted by:   Endo       9-Jun-2019/13:58:12-7:00

Parse in Rebol 2 (and Rebol 3 Alpha) in 'Split' mode has one or two quirks, one of which is to skip content within quotes (if a quote character immediately follows a delimiter).
I suspect the reason for this is rooted in handling a certain CSV pattern more common when the interpreter was first written, but could also just be a bug.
Red and Ren-C both deprecated 'Split' mode in favour of a separate SPLIT function.

posted by:   Chris       13-Jun-2019/16:45:08-7:00

From Chapter 15 of the REBOL Core Manual:
Parsing splits a sequence of characters or values into smaller parts...parse ... has the general form:
parse series rules
The series argument is the input [to be] parsed and can be a string or a block. If the argument is a string, it is parsed by character.
... parse ... also accepts two refinements: /all and /case. The /all refinement parses all the characters within a string, including all delimiters, such as space, tab, newline, comma, and semicolon.
... parse ... normally ignores all intervening whitespace between patterns that it scans. To enforce a specific spacing convention, use parse with the /all refinement.
... parse ... splits the input ... string into a block of multiple strings, breaking each string wherever it encounters a delimiter
Thus: parse/all ODBC-CONNECTIONS "=^/"
Says this: parse all characters in the string splitting only each encountered equal sign and each encountered newline.
Try something simplier first:
>> parse "Test1=This Test2=That" "=^/"
== ["Test1" "This" "Test2" "That"]
or the string spanning two lines:
>> parse first [ {Test3=this
{        Test4=that}] "=^/"
== ["Test3" "this" "Test4" "that"]
Without the /all refinement, parse breaks on the =, the " " and the invisible newline, see:
>> newline
== #"^/"
Now, what happens with the /all refinement?
>> parse/all "Test1=This Test2=That" "=^/"
== ["Test1" "This Test2" "That"]
Because REBOL is now checking every character, including ones not visible to you, i.e., space, REBOL uses parse to break on = which gives:
and then it passes the space because it has no rule for it and finds the next equal, which it splits on, which gives:
"This Test2"
And then it finds the newline at the end of the second that, which gives:
With a string spanning lines, what happens?
>> parse/all first [ {Test3=this
{        Test4=that}] "=^/"
== ["Test3" "this " "Test4" "that"]
REBOL using parse breaks on =
and then on newline
"this "
and then on =
and then on newline

posted by:   Stone Johnson       24-Aug-2019/15:20:25-7:00



Type the reverse of this captcha text: "y p o c s"