Parsing a COBOL word
I am reviewing some old code in light of some new understanding of parsing. I have a brute-force function that checks a string to see if it is a valid COBOL word, that is, starts with a letter, contains only letters, digits, or the hyphen, and is no more than 30 characters long.
I am trying to perform that check using parse, and I think I have got it, except for the length restriction of 30 characters. I am wondering if a parse rule can check the parsed data for its length, or if that check is best done elsewhere. In other words, check for the maximum length, and THEN parse it for valid characters.
Thank you. Code sample follows.
R E B O L [Title: "Test for COBOL word"]
LETTER: charset [#"A" - #"Z"]
DIGIT: charset [#"0" - #"9"]
VALIDCHARACTER: [some LETTER | some DIGIT | #"-"]
COBOLWORD: [1 LETTER some VALIDCHARACTER]
print parse "123456" COBOLWORD ;; should be false; starts with number
print parse "ABCDEF" COBOLWORD ;; should be true; all letters
print parse "A-1-STEAK-SAUCE" COBOLWORD ;; should be true; starts with letter
print parse "4runner" COBOLWORD ;; should be false; starts with number
print parse "$ average" COBOLWORD ;; should be false; invalid character
print parse "A----BCDE" COBOLWORD ;; should be true; multiple - allowed
posted by: Steven White 6-Jul-2018/14:54:45-7:00
COBOLWORD: [1 LETTER
and [not 30 VALIDCHARACTER]
posted by: Giulio Lunati 6-Jul-2018/18:56:34-7:00
When you say `<integerA> <integerA> rule` that means "between A and B matches of the rule".
1 LETTER 1 29 [LETTER | DIGIT | "-"]
I would skip the separate definition of VALIDCHARACTER, doesn't seem necessary (confusing name...it's valid but not at the beginning, so it should be called VALIDNOTFIRSTCHARACTER or something, not naming it seems best)
Because I think one of the big points of the language is aesthetics over micro-optimization, I think parse rules should use string literals where possible. Character literals are ugly. String matching a single character string is slightly slower...but there's not a very compelling reason why it should be to any great extent. YMMV.
posted by: Fork 8-Jul-2018/4:33:11-7:00
That would have to become
1 LETTER 0 29 [LETTER | DIGIT | "-"]
Because 1 letter is also very valid as a variable.
posted by: iArnold 8-Jul-2018/15:33:21-7:00