Home   Archive   Permalink



Parsing a Python line; why does this even work?

I am trying to transition to Python for the sake of our shared expertise in that language, and I am trying to centralize our ODBC database connection strings for easy maintenance. I discovered a format for coding the connection strings into a Python module so they can be parsed and used by a REBOL script. One format to rule them all, as it were. But, the question is, it works, but why? As shown in the demo below, I can divide the file into sets of
    
[
(connection-name-1) (connection-string-1)
(connection-name-2) (connection-string-2)
...
]
    
based on that "equal" sign that divides the connection name and the connection string, but why doesn't it divide on all the "equal" signs inside the connection strings.
    
I am delighted that this works as I hoped, but I would like to be able to explain why.
    
Thank you.
    
R E B O L [
     Title: "General ODBC functions"
     Purpose: {Isolate ODBC connection strings for easy maintenance.
     Store them in a format such that they can be a Python module
     and still be used by a REBOL program.}
]
    
;; -- This is a demo. These will be stored in a file.
ODBC-CONNECTIONS:
{DB1_DBCONNECT="DRIVER={SQL Server};SERVER={SERVER1};DATABASE=DB1;UID=user1;PWD=password1"
DB2_DBCONNECT="DRIVER={SQL Server};SERVER={SERVER2};DATABASE=DB2;UID=user2;PWD=password2"
DB3_DBCONNECT="DRIVER={SQL Server};SERVER={SERVER3};DATABASE=DB3;UID=user3;PWD=password3"
DB4_DBCONNECT="DRIVER={SQL Server};SERVER={SERVER4};DATABASE=DB4;UID=user4;PWD=password4"
DB5_DBCONNECT="DRIVER={SQL Server};SERVER={SERVER5};DATABASE=DB5;UID=user5;PWD=password5"}
    
ODBC-CONNECTIONLIST: parse/all ODBC-CONNECTIONS "=^/"
    
ODBC-OPEN: func [
     ODBC-CONNECTIONNAME
     /local ODBC-CONNECTSTRING
] [
     ODBC-CONNECTIONSTRING: select ODBC-CONNECTIONLIST ODBC-CONNECTIONNAME
     ODBC-CON: open [
         scheme: 'odbc
         target: ODBC-CONNECTIONSTRING
     ]
     ODBC-CMD: first ODBC-CON
]
    
ODBC-EXECUTE: func [
     ODBC-SQL
] [
     insert ODBC-CMD ODBC-SQL
     return copy ODBC-CMD
]
    
ODBC-CLOSE: does [
     close ODBC-CMD
]
    
foreach [NAME CONSTRING] ODBC-CONNECTIONLIST [
     print [mold NAME ":" mold CONSTRING]
]
halt


posted by:   Steven White     3-Jun-2019/14:22:39-7:00



It looks its the quotes:
    
     >> parse/all {test="a=1,b=2"} "=,"
     == ["test" "a=1,b=2"]
    
     >> parse/all {test="a=1,b=2",x=3,"y=4"} "=,"
     == ["test" "a=1,b=2" "x" "3" "y=4"]
    
     >> parse/all {test='a=1,b=2'} "=,"
     == ["test" "'a" "1" "b" "2'"]
    
But I don't know why it works that way.


posted by:   Endo     9-Jun-2019/13:58:12-7:00



Parse in Rebol 2 (and Rebol 3 Alpha) in 'Split' mode has one or two quirks, one of which is to skip content within quotes (if a quote character immediately follows a delimiter).
    
I suspect the reason for this is rooted in handling a certain CSV pattern more common when the interpreter was first written, but could also just be a bug.
    
Red and Ren-C both deprecated 'Split' mode in favour of a separate SPLIT function.

posted by:   Chris     13-Jun-2019/16:45:08-7:00



From Chapter 15 of the REBOL Core Manual:
    
Parsing splits a sequence of characters or values into smaller parts...parse ... has the general form:
    
parse series rules
    
The series argument is the input [to be] parsed and can be a string or a block. If the argument is a string, it is parsed by character.
    
... parse ... also accepts two refinements: /all and /case. The /all refinement parses all the characters within a string, including all delimiters, such as space, tab, newline, comma, and semicolon.
    
... parse ... normally ignores all intervening whitespace between patterns that it scans. To enforce a specific spacing convention, use parse with the /all refinement.
    
... parse ... splits the input ... string into a block of multiple strings, breaking each string wherever it encounters a delimiter
    
Thus: parse/all ODBC-CONNECTIONS "=^/"
    
Says this: parse all characters in the string splitting only each encountered equal sign and each encountered newline.
    
Try something simplier first:
    
>> parse "Test1=This Test2=That" "=^/"
== ["Test1" "This" "Test2" "That"]
    
or the string spanning two lines:
    
>> parse first [ {Test3=this
{        Test4=that}] "=^/"
== ["Test3" "this" "Test4" "that"]
    
Without the /all refinement, parse breaks on the =, the " " and the invisible newline, see:
    
>> newline
== #"^/"
    
Now, what happens with the /all refinement?
    
>> parse/all "Test1=This Test2=That" "=^/"
== ["Test1" "This Test2" "That"]
    
Because REBOL is now checking every character, including ones not visible to you, i.e., space, REBOL uses parse to break on = which gives:
    
"Test1"
    
and then it passes the space because it has no rule for it and finds the next equal, which it splits on, which gives:
    
"This Test2"
    
And then it finds the newline at the end of the second that, which gives:
    
"That"
    
With a string spanning lines, what happens?
    
>> parse/all first [ {Test3=this
{        Test4=that}] "=^/"
== ["Test3" "this " "Test4" "that"]
    
REBOL using parse breaks on =
    
"Test3"
    
and then on newline
    
"this "
    
and then on =
    
"Test4"
    
and then on newline
    
"that"


posted by:   Stone Johnson     24-Aug-2019/15:20:25-7:00