Home   Archive   Permalink



parse question - how do I extract date and time from a string?

I am reading every line from a log file. it has details like
' processA start 12/03/2016 12:30:02'
' processZZZZ start 12/03/2016 12:32:56'
    
the lines are not all the same length
    
how do I parse each line to extract the date into a variable and the time into another variable ?
    
never used parse before, so I gave it a try, but couldn't figure out how to get it to work.
    
m_input: ' process start 12:30:12 2/10/2016 '
== ' process start 12:30:12 2/10/2016 '
>> (escape)
>> parse m_input [copy zzz time! ]
== false
>> probe zzz
' '
    


posted by:   yuem     17-Jul-2016/12:07:33-7:00



I will take a look at the parse approach when I get a few moments... Assuming the date and time part of your log is always the last 18 characters of the line, my first initial idea would be to go to the tail of each line and jump back 20 chars.
    
>> m_input: trim " process start 12:30:12 2/10/2016 "
== "process start 12:30:12 2/10/2016"
>> at (tail m_input) -18
== "12:30:12 2/10/2016"
    
    
    


posted by:   Edoc     17-Jul-2016/14:27:50-7:00



In case you didn't catch my typo, that should read: "jump back 18 chars"

posted by:   Edoc     17-Jul-2016/14:29:20-7:00



If the dates are not in American format, it should be relatively straightforward. This is one approach:
    
Build up a valid vocabulary that positively matches dates or times.
    
------------------------------------------
digit: charset "0123456789"
    
date: [1 2 digit "/" 1 2 digit "/" 4 digit]
time: [1 2 digit ":" 2 digit opt [":" 2 digit]]
------------------------------------------
    
Then assign a word to other non-space values:
    
------------------------------------------
other: complement charset " ^-^/"
------------------------------------------
    
And we're ready to go:
    
------------------------------------------
sample: [
    "processA start 12/03/2016 12:30:02"
    "processZZZZ start 12/03/2016 12:32:56"
]
    
collect [
    foreach item sample [
        keep/only collect [
            parse item [ ; PARSE/ALL for Rebol 2
                any [
                     copy part date (keep to date! part)
                    | copy part time (keep to time! part)
                    | some other
                    | some " "
                ]
            ]
        ]
    ]
]
------------------------------------------
    
There's a few flaws in the above method, there's no check to say that a time or a date is delimited by a space, but it shouldn't be too hard an exercise to add this check.

posted by:   Chris     17-Jul-2016/17:55:05-7:00



Mixed approach                                                                    
                                                                                    
Always four items per line?                                                                                                                                
    
foreach line read/lines %log.txt [                                                
     set [date time] load form skip parse line " " 2                        
     print [date time]                                                        
]                                                                                
                                                                                    
Otherwise maybe                                                                
    
foreach line read/lines %log.txt [                                                
     line: find/tail parse line " " "start"                                    
     set [date time] load form copy/part line 2                                
     print [date time]                                                        
]                                                                                
                                                                                    
Assumes d/m/y date format!!!


posted by:   Bert     17-Jul-2016/22:52:36-7:00



If the dates are in dd/mm/yyyy format, and the leading text is a valid Rebol value, then it's trivial to do just a "to block!" on each line.
    
to block! {processA start 12/03/2016 12:30:00
processZZZZ start 12/03/2016 12:32:56
     }
    
== [processA start 12-Mar-2016 12:30
     processZZZZ start 12-Mar-2016 12:32:56
]

posted by:   Graham     18-Jul-2016/6:21:02-7:00



Yes, and in that case it should be possible to do:
data: load %log.txt

posted by:   Bert     18-Jul-2016/12:20:24-7:00



Thanks for all your response. I will give it a try.
    
the actual message looks like this(I have hidden the real system name). It's in a big log file.
    
tid=493129247. PROCESSA Authorization Verification Starting. startTime: 2016-06-03 11:27:42.434, "
tid=493129247. System B call begin. startTime: 2016-06-03 11:27:42.435, "
tid=493129247. From System B :soapenv:Client.XACMLFailure-XACML authorization failure in method authorize. startTime: 2016-06-03 11:27:42.435, endTime: 2016-06-03 11:27:42.510, "
tid=493129247. System C Authorization Failed. startTime: 2016-06-03 11:27:42.434, endTime: 2016-06-03 11:27:42.510, "
    
I wanted to extract the information to an excel spreadsheet in 3 columns
system name | starttime | endtime
    
so that we can do analysis of the time spend in each system.
I was hoping that Parse will have a simple way to get the data I needed out, rather than me having to write complicated logic to count bytes and position.

posted by:   yuem     18-Jul-2016/22:39:44-7:00



Maybe not what you want. Tested with R3.
    
R E B O L []
    
log: read/string %log.txt
    
print-line: to-paren [
     stime/11: #"/" all [etime etime/11: #"/"]
     print
         format [50 30 30 30]
         reduce [
             msg stime etime
             all [etime difference to-date etime to-date stime]
         ]
]
    
parse/all log [
     some [
         thru "tid=" thru ". " copy msg to "."
         thru "startTime: " copy stime to ", "
         (etime: none)
         opt [", endTime: " copy etime to ","]
         print-line
     ]
]


posted by:   Bert     19-Jul-2016/23:54:09-7:00



Hmm printf...

posted by:   Bert     20-Jul-2016/0:07:41-7:00



thanks Bert. your code is very useful.
    
One thing I don't understand is : format [50 30 30 30]
I am searching the site, but could not find any format command description.

posted by:   yuem     20-Jul-2016/19:43:48-7:00



Should have been
    
printf [50 30 30] reduce [
     msg stime etime
     all [etime difference to-date etime to-date stime]
]
    
Formatted print. Not in my (outdated) R2.


posted by:   Bert     20-Jul-2016/20:34:27-7:00



Thanks Bert.
I have downloaded R3 to try your example, I was still using R2.
    
I am actually quite a bit, about parse, with the example you showed + also the other Rebolers on this thread posted.
    
Thanks a lot to you all.

posted by:   yuem     22-Jul-2016/22:55:46-7:00



"load" is mentioned above. Of course it should be "to block!"


posted by:   Bert     6-Aug-2016/16:13:56-7:00



The useless parse code (19-Jul-2016/23:54) is also incorrect. Please disregard.

posted by:   Bert     10-Aug-2016/13:22:06-7:00



Neither "load" nor "to block!" are safe to use on untrusted data, unless you do careful checks, which are missing in post #5. Also discussed in the "Accessing the REBOL header" thread.


posted by:   jj     16-Sep-2016/2:52:01-7:00



Given by you (the problem):
    
    
parse question - how do I extract date and time from a string?
    
I am reading every line from a log file. it has details like
' processA start 12/03/2016 12:30:02'
' processZZZZ start 12/03/2016 12:32:56'
    
Solution:
    
data: {processA start 12/03/2016 12:30:02
processZZZZ start 12/03/2016 12:32:56}
    
;; basic sets
letter: charset [#"A" - #"Z" #"a" - #"z"]
figure: charset [#"0" - #"9"]
dash: [#"-" #"—"]
    
;; basic symbols
slash: #"/"
colon: #":"
    
;; token rules
¿word: [some letter]
¿date: [
    1 2 figure [ slash | dash ] 1 2 figure [ slash | dash ] 2 4 figure
]
¿time: [1 2 figure colon 1 2 figure colon 1 2 figure]
    
;; record layout
;; word is 1 or more times
;; account for new line
    
record: [some ¿word
    copy mx ¿date (append out mx)
    copy mx ¿time (append out mx)
    newline
    ]
    
;; rule
;; 1 or more times
rule: [some record]
    
;; for the entire string of data
;; match records
    
parse data rule
    
    
Explanation:
    
1) Assumes that you READ in the log file as a string!
    
Hint:
    
You should have a parsesets.r file that loads on start up so you need not define these each time.
    
    
    


posted by:   Time Series Lord     24-Sep-2016/15:25:56-7:00



addendum:
    
;;data out block
out: copy []
    
;;put that above
parse data rule

posted by:   Time Series Lord     24-Sep-2016/15:27:42-7:00



;; Solution #2
    
Solution:
        
data: {processA start 12/03/2016 12:30:02
processZZZZ start 12/03/2016 12:32:56}
    
    
    
;; data out
out: copy []
data: parse/all data " "
forall data [
    attempt [if parse to-block data/1 [date!][append out to-date data/1]]
    attempt [if parse to-block data/1 [time!][append out to-time data/1]]
]
    
    
Note this interesting aspect of REBOL 2.7.8:
    
>> To-block ["12/03/2016"]
== ["12/03/2016"]
    
>> d: "12/03/2016"
== "12/03/2016"
    
>> to-block d
== [12-Mar-2016]

posted by:   Time Series Lord     24-Sep-2016/15:58-7:00



Final Thoughts:
    
First. Read my work: https://timeserieslord.github.io/red/
Check out the section titled: What is Your Function?
    
Learn about function chains and FFP. That is what Carl Sassenrath had in mind for you when he created REBOL.
    
Ban from your mind the idea that you need to process strings. Many compiler-based langs as well as VMs with langs force you to work with strings.
    
The REBOL VM does not. This is true for Red too.
    
Let REBOL do the work for you. Try to work with block!(s) and REBOL datatypes! exclusively. That should be your goal for every solution.
    
1) parse your log file into string! tokens
2) convert the string tokens into REBOL datatypes!
3) grab the ones you want.
    
Two functions defined below will let you do this:
    
>> outlog data
== [12-Mar-2016 12:30:02 12-Mar-2016 12:32:56]
    
I use an alike model with parsesets to crush it with any data however messed up.
    
SOLUTION #3
    
tranny: func [
{transform a string into a REBOL datatype!}
value
/local
][
    
    any [
     attempt [if parse to-block value [date!][return to-date value]]
     attempt [if parse to-block value [time!][return to-time value]]
     ;; add other types here
     ]
]
    
    
outlog: func [
{ Return the log file cleaned up}
data [string!]
/local out
][
    
;; data out
out: copy []
    
;; into a block of stupid strings
data: parse/all data " "
    
;; into a block of smart REBOL datatypes!
forall data [
     append out tranny data/1
]
    
;; remove the nones
remove-each j out [none? j]
    
;; out for further processing
return out
]

posted by:   Time Series Lord     24-Sep-2016/16:35:12-7:00



TWEAK IT!
    
Let's say that you need it in two columns
    
;; rather than do this:
new-line/all/skip outlog data on 2
    
Let's refine outlog to do this:
    
>> outlog/longform data
== [
     12-Mar-2016 12:30:02
     12-Mar-2016 12:32:56
]
    
You could even have a /csv refinement so you could pull the cleaned up log into a spreadsheet.
    
    
    
outlog: func [
    { Return the log file cleaned up}
    data [string!]
    /longform
    /csv
    /local out
][
    
    ;; data out
    out: copy []
    data: parse/all data " "
    forall data [
        append out tranny data/1
    ]
    remove-each j out [none? j]
    
    ;; better than either [][]
    any [
        if longform [return new-line/all/skip out on 2]
        if csv [ ;; for you to do]
        ;; default
        return out
    ]
]
    
    
Read my work: https://timeserieslord.github.io/red/

posted by:   Time Series Lord     24-Sep-2016/17:07:19-7:00