Home   Archive   Permalink



Website error plus answer to another question

I tried to post an answer to my own question from another message and got:
    
** Script Error: append expected series argument of type: series port ** Near: append bbs/:topicnumber submitted-message append bbs/:topicnumber
    
In addition, here is the answer I was trying to post. I hope nobody takes too much time to try to answer it for me.
    
Never mind, answered my own question, plus one I didn't bother to ask myself.
    
After I submitted the above, I thought a bit more and came up with one variation I had not tried, and it worked. The answer is indeed the "load" function placed here:
    
set DATA-NAME load pick DATA-VALUES VALUE-COUNTER
    
Note the addition of the "load" function.
    
However, if the CSV file is not well-formatted, that is, with quotes around things that are supposed to be strings, dates formatted correctly, and so on, then "loading" what is in the file gives bad results. In other words, the thing that I was trying to do is something I don't really want to do. I was hoping to make a general-purpose CSV module so that if anyone gave me any CSV file I could do stuff with it. The problem is that I can't count on any arbitrary CSV file from any arbitrary person having data that makes sense. Better, I think, to take everything in strings, initially, to see what I might be up against.
    
At least the writing out of my problem helped me clarify it. Thank you.
    
By the way, the ability to do this, that is, create words with values at run time, is a really cool feature.

posted by:   Steven White       25-Mar-2014/11:25:19-7:00



Steve, have you seen http://www.rebol.org/view-script.r?script=csv-tools.r&sid=m382hp ? There have been a number of other scripts floating around over the years to help with CSV files. Ashley recently released munge, which can save to Excel or CSV format: http://www.dobeash.com/munge.html . If you search for CSV in http://business-programming.com you'll see several sections devoted to dealing with CSV files, and a bunch of more general info about dealing with tables of data.

posted by:   Nick       26-Mar-2014/16:02:31-7:00



I did read your business programming article about CSV files. I do appreciate the annotated scripts. I saw that the use of the foreach loop, as in
    
foreach [date-name-1 data-name-2] block-name-1 [
]
    
assigns values to data-name-1 and data-name-2, but I wanted to assign values to words without the loop function, for other uses I have in mind. I heard about "munge" through altme, but did not really follow what it was about until just now. Regarding csv-tools, I actually downloaded it some time ago, and tried it again a couple days ago, but was not able to see immediately how to use it or when I might want to use it. The documentation seems limited to comments in the code.
    
I am beginning to think I occupy the somewhat unique position of being smart enough to use REBOL but not smart enough to understand it. Figuring out what a program does from reading the code is not so easy for me. I notice this when I look at scripts by Carl. In my opinion there is a lot of genius out there that is lost or misunderstood because of lack of documentation. You are filling that need with the documentation you have produced.    
    
On a positive note, I am getting better. Just last week someone came by with a little programming problem, and I surprised myself by popping out a little custom REBOL application in about 20 minutes.    
    
Thank you.    
    
Maybe REBOL needs a marketing genius.

posted by:   Steven White       27-Mar-2014/10:54:57-7:00



I've written a few CSV parsers, including some hairy circumstances in which CSV files were created from spreadsheets, and I've always just created them as needed for each situation. Human readable section headers, notes and visual formatting sections, along with data entry errors (for example, letters included in SKU codes), mixed in with otherwise nicely formatted columns and rows of computable data are the norm. Often, there may be some easily recognizable pattern decipherable within the mess of extraneous human readable junk in spreadsheets. For example, if you regularly see a long string of text which matches a given pattern in the forth column - where a date or other recognizable data format may otherwise be expected - you can likely write some simple code to remove such rows. You can even copy such header data and include it in each row of data. There's a short example of handling this sort of thing at http://business-programming.com/business_programming.html#section-17.8
    
I've written several parsers which accepted CSV input created by third party apps which scraped data from PDF files. And often, the original PDFs in those situations were scans of randomly formatted documents which came from any number of totally different sources (in one case, shipment receipts from varied suppliers). In all those cases, I was able to extract usable columns of data with less than a page of code. In all the cases I've seen, there's always been some sort of simple logical way to determine where rows and columns of data values began and ended. Often, something as simple as a few extra blank lines, or a few characters of unique text, or some repetitive data format can be easily identified. Those sorts of problems just need to be approached on a case by case basis, and usually with a bit of common sense.
    
Brian's csv-tools.r it made to handle CSV data which is properly formatted according to http://tools.ietf.org/html/rfc4180 . It seems pretty simple to use:
    
     x: load-csv htp://re-bol.com/Download.csv
     editor x
    
Aside from the long comments at the beginning of the script, you can get how to use the functions, from each function definition. They're well documented:
    
load-csv: funct [
    "Load and parse CSV-style delimited data. Returns a block of blocks."
    [catch]
    source [file! url! string! binary! block!] "File or url will be read"
    /binary "Don't convert the data to string (if it isn't already)"
    /with "Specify field delimiter (preferably char, or length of 1)"
    delimiter [char! string! binary!] {Default #","}
    /into "Insert into a given block, rather than make a new one"
    output [block! list!] "Block returned at position after the insert"
    /part "Get only part of the data, and set to the position afterwards"
    count [integer!] "Number of lines to return"
    after [any-word! none!] "Set to data at position after decoded part"
]
    
The 'load-csv accepts a CSV data source, which can be any of these types of values: [file! url! string! binary! block!]. It returns a block of blocks (1 block for every row of data in the CSV file).
    
There are some refinements (optional parameters) which can be included when running the function above: /binary /with /into /part.
    
For example, if you don't want to convert a value in the CSV to a string, and just leave the binary data as-is, use, for example:    
    
     load-csv/binary %filename.csv    
    
The /into option, as documented above, allows data to be inserted into an existing block (the 'output value is the name of that block):
    
     y: copy [["some existing data"]]
     load-csv/into http://re-bol.com/Download.csv y
     editor y
The /part refinement requires 2 parameters, 'count and 'after. Does the format make sense?
    
    


posted by:   Nick       29-Mar-2014/16:24:05-7:00



The source for all mezzanine functions in Rebol is documented the same way:
    
source extract
extract: func [
     {Extracts a value from a series at regular intervals.}
     [catch]
     series [series!]
     width [integer!] "Size of each entry (the skip)"
     /index "Extract from an offset position"
     pos "The position" [number! logic! block!]
     /default "Use a default value instead of none"
     value {The value to use (will be called each time if a function)}
     /into {Insert into a buffer instead (returns position after insert)}
     output [series!] "The buffer series (modified)"
     /local len val
][
     if zero? width [return any [output make series 0]]
     len: either positive? width [
         divide length? series width
     ] [
         divide index? series negate width
     ]
     unless index [pos: 1]
     either block? pos [
         if empty? pos [return any [output make series 0]]
         parse pos [some [number! | logic! | set pos skip (
                     throw-error 'script 'expect-set reduce [[number! logic!] type? get/any 'pos]
                 )]]
         unless into [output: make series len * length? pos]
         if all [not default any-string? output] [value: copy ""]
         if binary? series [series: as-string series]
         forskip series width [forall pos [
                 if none? set/any 'val pick series pos/1 [set/any 'val value]
                 output: insert/only output get/any 'val
             ]]
     ] [
         unless into [output: make series len]
         if all [not default any-string? output] [value: copy ""]
         if binary? series [series: as-string series]
         forskip series width [
             if none? set/any 'val pick series pos [set/any 'val value]
             output: insert/only output get/any 'val
         ]
     ]
     either into [output] [head output]
]

posted by:   Nick       29-Mar-2014/16:30:29-7:00



The above function requires 2 parameters: 'series (which must be a series! data type) and 'width (which must be an integer! data type):
    
     extract system/locale/months 2
    
The optional /index refinement requires an additional argument labeled 'pos:
    
     extract/index system/locale/months 2 2
    
The /default and /into refinements likewise each require one additional argument. /local just ensures that the values of the 'len and 'val words will only be changed locally within the function (not elsewhere, if defined globally).
    
I hope that helps to make reading source a little simpler :)

posted by:   Nick       29-Mar-2014/16:39:42-7:00