Home   Archive   Permalink



Accessing nested block data

I'm trying ot process a file of LaTeX document index references in Rebol.
The input file consists of records like these:
    
    
\indexentry{loop}{22}
\indexentry{gui}{32}    
... etc ...
    
    
I extract the required text parts and put each entry into a block :
    
indexentry: [ ["loop" ["22"]] ["gui" ["32"] .... ]
    
    
however, I need to find any duplicate entries as I process the input file
and simply update the inner block of page numbers.
    
    
>> do %scantexindex.r3
Script: "Edit LaTeX .idx file" Version: none Date: 12-Nov-2013
14-Nov-2013/18:10:17+11:00
++:whileTrue: 32
Reference: whileTrue:        PageNo: 32
++:Stream~File 33
Reference: Stream~File     PageNo: 33
++:FileStream 33
Reference: FileStream        PageNo: 33
++:nextPut: 33
Reference: nextPut:         PageNo: 33
++:FileStream 38
Reference: FileStream        PageNo: 38
++:Stream~File 201
Reference: Stream~File     PageNo: 201
    
I produce this :
    
    ["ReadStream~whileTrue:" ["32"]]
    ["Stream~File" ["33"]]
    ["FileStream" ["33"]]    
    ["nextPut:" ["33"]]
    ["FileStream" ["38"]]
    ["Stream~File" ["201"]]
    
    
But this is what I want to achieve :
    
["ReadStream~whileTrue:" ["32"]]
["Stream~File" ["33" "201" ]]
["FileStream" ["33" "38"]]
["nextPut:" ["33"]]
    
    
============================================================
I can do this manually ( in a console )
    
>> t: ["loop" ["22"]]    
== ["loop" ["22"]]
    
>> pick t/2 1
== "22"
    
>> append/only t/2 "77"
== ["22" "77"]
    
>> t
== ["loop" ["22" "77"]]
    
===========================================================
    
But I can't figure out how to access the separate fields in the
found entry when inside the foreach loop.
    
eg:
    
foreach [ entry ] indexlist
     [ if find entry "FileStream" [ prin "***** Found "
                        
                ;; here, I've found a entry , but can't uodate it !!!
                                
         ]
    ]
    
===================== complete program ============================
    
R E B O L [Title: "Edit LaTeX .idx file" Date: 12/11/2013 ]
    
; to use : ./rebol scantexindx.r3 > xxxx.txt
; then : move xxxx.txt to latex directory and rename as required
    
; eg:
; \indexentry{loop}{22} ---> loop 22
    
infile: read/lines/string %/home/brett/Saphir/smalltalk.idx
; -- short list
    
indexlist: copy [] ; create empty list
    
print now
    
foreach line infile [
     prin cr print line
     replace/all line "\indexentry" ""
     replace/all line "{" ""
     replace/all line "}" " "
        
     trim line     ; remove any leading/trailing spaces
        
     prin "++: " print reduce [ line ]
        
     pos: find line space
     ref: copy/part line pos
     rest: copy next pos
        
     prin "Reference: " prin ref prin tab prin"     PageNo: " print rest
        
     append/only indexlist remold [ ref reduce [ rest ] ]
        
] ; end major loop
    
print "" print indexlist    
    
print space print "***** search "
    
foreach [ entry ] indexlist
     [ if find entry "FileStream" [
                            prin "***** Found "
                            t: entry        
                         print t
         ]
    ]
==================== end program ==================================
***** search
***** Found ["FileStream" ["33"]]
***** Found ["FileStream" ["38"]]
== none
    
There is most likely a very obvious way to do this !! ??
    


posted by:   dragoncity       14-Nov-2013/7:12:34-8:00



Hi,
I assume that you already prepare the first block, so the rest is here:
    
R E B O L []
    
;input values
b: [
    ["ReadStream~whileTrue:" ["32"]]
    ["Stream~File" ["33"]]
    ["FileStream" ["33"]]    
    ["nextPut:" ["33"]]
    ["FileStream" ["38"]]
    ["Stream~File" ["201"]]
]
    
;prepare a block
c: copy []
foreach v b [append c reduce [v/1 copy []]]
c: unique/skip c 2
    
;fill the block with necessary values
foreach v b [
    if p: find/tail c v/1 [
        append p/1 v/2/1
    ]
]
    
? c
halt


posted by:   Endo       14-Nov-2013/8:46:04-8:00



The result is:
C is a block of value: ["ReadStream~whileTrue:" ["32"] "Stream~File" ["33" "201"] "FileStream" ["33" "38"] "nextPut:" ["33"]]

posted by:   Endo       14-Nov-2013/8:46:44-8:00



Hehe, Endo beat me to it, and as always with a slick Rebolish answer. Here's my simple solution, including a loop to build the initial data block you presented (labeled 'foundlist here):
    
R E B O L []
foundlist: copy []
foreach [entry] indexlist [
     if find entry "FileStream" [
         print rejoin ["***** Found " entry]        
         append foundlist entry
     ]
]
keys: copy []
foreach line foundlist [append keys line/1]
keys: unique keys
finallist: copy []
foreach key keys [
     final: copy reduce [key copy []]
     foreach found foundlist [
         if found/1 = key [append final/2 found/2]
     ]
     append finallist final
]
probe finallist
halt

posted by:   Nick       14-Nov-2013/10:00:03-8:00



Thanks for the responses, unfortunately, neither actually worked
using my data :-)
    
I was somewhat amused that I needed to actually transfer block data
into new blocks
as I expected to be able to work on the data 'in place' as it were.
ie: having found an item, update "here"
( But that's OK :-)
    
    
The results from Nicks code was so weird , ie lots of empty values, eg
["" "" .... etc that I chose to work on Endo's example for the moment.
    
    
Notice how different the data is at the "## X" & "## b" tags, below at
"results of run",    
    the ##X is created by my code reading the input file, the ##b is Endo's
internally defined block, containing very similar textual data, but
they are actually quite different data formats !
    
    
So of course when I tell the code to use my data it fails, how do I
reformat my data into the internal block format so Endo's code will work ?
    
    
========================================================================
R E B O L [Title: "Edit LaTeX .idx file - Endo version" Date: 12/11/2013 ]
    
; to use : ./rebol scantexindx.r3 > xxxx.txt
; then : move xxxx.txt to latex directory and rename as required
    
; eg:
; \indexentry{loop}{22} ---> loop 22
    
infile: read/lines/string %/home/brett/Saphir/smalltalk.idx
; -- short list
    
indexlist: copy [] ; create empty list
    
print now
    
foreach line infile [
     prin cr print line
     replace/all line "\indexentry" ""
     replace/all line "{" ""
     replace/all line "}" " "
        
     trim line     ; remove any leading/trailing spaces
        
; prin "++: " print reduce [ line ]
        
     pos: find line space
     ref: copy/part line pos
     rest: copy next pos
        
;    prin "Reference: " prin ref prin tab prin"     PageNo: " print rest
        
;;    append/only indexlist [ ref reduce [ rest ] ]
     append/only indexlist remold [ ref reduce [ rest ] ] ; original
    
] ; end major loop
    
prin " ## X " print indexlist    
    
        
;input values set up by "endo" ( a forum person :-)
b: [
     ["ReadStream~whileTrue:" ["32"]]
     ["Stream~File" ["33"]]
     ["FileStream" ["33"]]    
     ["nextPut:" ["33"]]
     ["FileStream" ["38"]]
     ["Stream~File" ["201"]]
]
    
; input value set up by my program ( brett )
;;; b: copy reform [ indexlist ]    
    
prin " ## b " print b
        
;prepare a block
c: copy []
foreach v b [append c reduce [v/1 copy []]]
c: unique/skip c 2
        
;fill the block with necessary values
foreach v b [
     if p: find/tail c v/1 [
         append p/1 v/2/1
     ]
]
        
? c
    
================= results of run ========================
    
    
>> do %scantexindex-endo.r3
Script: "Edit LaTeX .idx file - Endo version" Version: none Date: 12-Nov-2013
15-Nov-2013/17:12:17+11:00
\indexentry{whileTrue:}{32}
\indexentry{Stream~File}{33}
\indexentry{FileStream}{33}
\indexentry{nextPut:}{33}
\indexentry{FileStream}{38}
\indexentry{Stream~File}{201}
## X ["whileTrue:" ["32"]] ["Stream~File" ["33"]] ["FileStream" ["33"]] ["nextPut:" ["33"]] ["FileStream" ["38"]] ["Stream~File" ["201"]]
## b ReadStream~whileTrue: 32 Stream~File 33 FileStream 33 nextPut: 33 FileStream 38 Stream~File 201
C is a block of value: ["ReadStream~whileTrue:" ["32"] "Stream~File" ["33" "201"] "FileStream" ["33" "38"] "nextPut:" ["33"] "FileStream" [] "Stream~File" []]
>>
    
=================== my example data file ( smalltalk.idx )===============
\indexentry{whileTrue:}{32}
\indexentry{Stream~File}{33}
\indexentry{FileStream}{33}
\indexentry{nextPut:}{33}
\indexentry{FileStream}{38}
\indexentry{Stream~File}{201}
==========================================


posted by:   dragoncity       15-Nov-2013/1:16:32-8:00



Nailed it! Figured out how to match the file created block to what Endo's code needed.
Was the result of lots of trial & error, not exactly straight forward !!!
    
The double REDUCE of the original date formatting
solved the problem.
    
===============================    
     append/only indexlist reduce [ ref reduce [ rest ] ]
        
     ; create block 'like' internally defined block without enclosing quotes
     ;eg: whileTrue: 32 Stream~File 33 | NOT | "whileTrue:" "32" "Stream~File" "33"
    
============================================
I'm not doubting the value of Rebol , but its data handling is a bit weird :-)
    
Thanks again for your help.


posted by:   dragoncity       15-Nov-2013/6:14:49-8:00



I probably wasn't clear enough about the first part of my code. It *creates* the data block that Endo provided for you:
        
R E B O L []
foundlist: copy []
foreach [entry] indexlist [
     if find entry "FileStream" [
         print rejoin ["***** Found " entry]        
         append foundlist entry
     ]
]
    
This part of the code does the exact same thing as Endo's example. My block labeled 'foundlist is the same as his block labeled 'b - the code above creates that block, from the raw data example that you provided. That's probably why you saw a bunch of empty values - the foundlist block was empty if you didn't provide it your initial raw dataset:
    
R E B O L []
foundlist: [
     ["ReadStream~whileTrue:" ["32"]]
     ["Stream~File" ["33"]]
     ["FileStream" ["33"]]    
     ["nextPut:" ["33"]]
     ["FileStream" ["38"]]
     ["Stream~File" ["201"]]
]
keys: copy []
foreach line foundlist [append keys line/1]
keys: unique keys
finallist: copy []
foreach key keys [
     final: copy reduce [key copy []]
     foreach found foundlist [
         if found/1 = key [append final/2 found/2]
     ]
     append finallist final
]
probe finallist
halt

posted by:   Nick       15-Nov-2013/8:13:08-8:00



PS - I used your example with the 'indexlist lable to create the first part of the code above, so that you could just drop it into your existing code (I tried to do a bit more automation for you than in Endo's example). If your data formatting changed from the example you gave, of course, it won't work properly:
    
foreach [ entry ] indexlist
     [ if find entry "FileStream" [
                             prin "***** Found "
                             t: entry        
                         print t
         ]
     ]

posted by:   Nick       15-Nov-2013/8:21:54-8:00



BTW, it looks like your doing a lot of extra effort converting your original .idx file, where you should just use parse. The following script does the entire process for you:
    
R E B O L []
foundlist: copy []
foreach line read/lines %smalltalk.idx [append/only foundlist parse line "{}"]
keys: copy []
foreach line foundlist [append keys line/2]
keys: unique keys
finallist: copy []
foreach key keys [
     final: copy reduce [key copy []]
     foreach found foundlist [
         if found/2 = key [append final/2 found/4]
     ]
     append finallist final
]
probe finallist
halt
    
Using shorter variable labels like Endo's:
    
R E B O L []
b: copy [] foreach l read/lines %smalltalk.idx [append/only b parse l "{}"]
k: copy [] foreach l b [append k l/2] k: unique k
f: copy [] foreach v k [
     n: copy reduce [v copy []]
     foreach d b [if d/2 = v [append n/2 d/4]] append f n
]
editor f

posted by:   Nick       15-Nov-2013/12:44:03-8:00



And here's your entire program using Endo's example, with parse:
    
R E B O L []
b: copy [] foreach line read/lines %smalltalk.idx [append/only b parse line "{}"]
c: copy [] foreach v b [append c reduce [v/2 copy []]]
c: unique/skip c 4
foreach v b [if p: find/tail c v/2 [append p/1 v/4]]
editor c

posted by:   Nick       15-Nov-2013/12:59:48-8:00



Either of those previous examples reads your smalltalk.idx file, parses it, and formats into blocks as you requested.

posted by:   Nick       15-Nov-2013/14:38:30-8:00



R E B O L [ Title: "scantexindex-nick3"]
    
; a working version from Nick, using file input instead of internal data block
    
print "**input text file of : \indexentry{nextPut:}{pageno} ...."
foundlist: copy []
foreach line read/lines %smalltalk.idx [append/only foundlist parse line "{}"]
prin "%%read:" print foundlist
    
keys: copy []
foreach line foundlist [append keys line/2] ; build key ( names ) list
keys: unique keys
prin "@@keys:" print keys
    
finallist: copy []
foreach key keys [                                    ; use key block
     final: copy reduce [key copy []]             ; to
     foreach found foundlist [                     ; scan foundlist block
         if found/2 = key [append final/2 found/4] ; building (pageno) block
     ]                                        
     append finallist final                         ; making finallist block
]
prin "&&probe:" probe finallist
prin "$$print:" print finallist
halt
        


posted by:   dragoncity       15-Nov-2013/21:59:16-8:00



Brilliant !
    
Thanks Nick & Endo,
I had attached this text to the revised program, above, that I sent, but it did not seem to go?
    
You & Endo have done a nice job of showing off Rebol.
    
Esp. how you reduced my 13 lines of code to process the incoming text records to ONE !!
    
I have added a few PRINT commands to display the effects of the data passing through the
code to its intended end, should anybody be interested. This example will find
its way into my Collected Notes on Rebol.
    
Endo's 6 line result is an amazing demo of the power of Rebol & the need to understand
how to manipulate Rebol Blocks.
    
BTW: this rebol program will replace a 3 x A4 page Ada program which I wrote a few years ago and does much the same thing.
    
Thanks Again.

posted by:   dragoncity       16-Nov-2013/1:54:07-8:00



Another slightly different version with code nested in PARSE block, seems to work !
    
R E B O L [
    Title: "scantexindex-dreamyToto"
]
    
index: copy []
result: copy []
foreach line read/lines %smalltalk.idx [
    parse line [thru "\indexentry{" copy key to "}" thru "{" copy page-no to "}" to end (
        page-no: to-integer page-no
        print rejoin ["[PARSED] key=" key " - page-no=" page-no]
        either found: find/skip index key 2 [
             append found/2 page-no
        ][
             blk: reduce [key (copy reduce [page-no])]
             append index blk
             append/only result blk
        ]
     )
    ]
]
probe result
    


posted by:   dreamyToto       17-Nov-2013/13:39:52-8:00



Hi dreamyToto,
nice solution, its actually more in the style of my Ada program with the either ..else.. coding which was how I was developing my program when compared to Nick & Endo's more Rebol'ish solutions
    
It's interesting to see your use of Parse.
    
Thanks for your interest.

posted by:   dragoncity       17-Nov-2013/23:16:41-8:00



@dragoncity
Thank you. I should do that more often. It's a way to progress for me, don't have many occasions to use Rebol...
This code came to me like this, with the re-use of the same block of page numbers between the "index" block (for searching) and the "result" block !
    
It's Rebol code, don't know if it's Rebol'ish ! That's an interesting notion : is my code Rebol'ish ? Is it written respecting Rebol spirit ? I cannot reply myself !


posted by:   dreamyToto       18-Nov-2013/15:01:22-8:00



@dreamyToto
To me, Rebol's "spirit" is about keeping things simple. Rebol/core code is characterized by series manipulations, unnecessary syntactic cruft is avoided, and rebolers tend to craft short, readable solutions. Your code looks like Rebol to me because there are some familiar code patterns, and the whole parse evaluation, I think, is itself a Rebol'ism. You collapse some potential multiple lines of code into one, which is another Rebol'ism (doing any more of that in your code would just reduce readability). At first glance your parse approach can't be reduced to any simpler algorithm, so yes, it's Rebol'ish I think. To me, Endo's code seemed closest to what I think of as Rebol's "spirit" for short core examples like this, because he used refined series functions to craft the most concise solution possible.

posted by:   Nick       18-Nov-2013/16:38:36-8:00



I think for bigger problems, Rebol's goal is to enable users to build dialects for other users, which make use of the simple core API, series structure, native data types, etc., using the ability of parse to define elegant and simple language models which solve a given problem, without extra syntactic cruft. I don't think this has been explored nearly to the degree Carl had intended, in his initial design of the language. If users could grasp how powerful this concept is at simplying code patterns, I think Rebol-like languages would enjoy a much more important place among modern development tools.

posted by:   Nick       19-Nov-2013/4:45:17-8:00



@Nick
I like the fact that you can do the job using only 1 loop, the one reading the file line by line.
    
If the file is not too big, you can even remove the "foreach" loop and use "any" keyword in PARSE like that :
    
R E B O L [
    Title: "scantexindex-parse-any-dreamyToto"
]
    
index: copy []
result: copy []
file-content: to-string read %smalltalk.idx    
parse file-content [any [thru "\indexentry{" copy key to "}" thru "{" copy page-no to "}" to newline (
     page-no: to-integer page-no
     print rejoin ["[PARSED] key=" key " - page-no=" page-no]
     either found: find/skip index key 2 [
            append found/2 page-no
     ][
            blk: reduce [key (copy reduce [page-no])]
            append index blk
            append/only result blk
     ]
    )
] ]
probe result
    
Concerning dialects, I dream of a big software like SAP written only in Rebol or Red (maybe more Red because it targets better server performances under heavy load and it will have concurrency/multi-tasking). You could exchange Rebol blocks containing business dialects (DSL) between modules (ordering, invoicing, accounting,...) ! Could be great ! Let's dream !
    


posted by:   dreamyToto       19-Nov-2013/14:02:54-8:00



If only we could see the kind of interest develop around Red, now, which developed around Rebol during the early 2000's, the industry could see developer productivity increase in a big way. Rebol failed commercially largely because it was closed source, and the community was ignored by Carl for several long periods. I think Red will be in a great position to gain some traction during the next year. We need to keep Doc funded :)

posted by:   Nick       19-Nov-2013/16:53:07-8:00