Home   Archive   Permalink



Freeing memory

Not an urgent problem because memory is cheap and plentiful, but just wondering...
    
Let's say that I have a large text file and I want to make a lookup table out of some of the data in it.
    
I read the file in to a block of lines with 'read/lines.' For example, 'RAW-DATA: read/lines %datafile.txt'.
    
Then I go through each line of RAW-DATA, extract some stuff, and store it in some other block to be used later. When I am done, the original file now somewhere in memory and referred to by 'RAW-DATA' is not needed.    
    
If I do something like 'RAW-DATA: none' or 'RAW-DATA: copy []' will that remove from memory all that un-needed data from %datafile.txt? Or doesn't that concept mean anything anymore?
    
Thank you.

posted by:   Steven White     24-Jan-2017/10:49:10-8:00



In Ren-C, from boot:
    
     >> recycle
     == 3716
    
This first recycle just cleans up some boot residue. In order to make booting faster, it doesn't necessarily recycle everything before presenting you the command prompt. The number returned is a count of GC'd "nodes", e.g. the data behind series/strings/API-managed value cells, etc.
    
Worth noting is that in order to run this command, a block was created called [recycle]. It was held live across the recycle command itself by the command prompt, and will not be a candidate for freeing.
    
     >> recycle
     1
    
We're now at a steady state. This command prompt has again made a block that contains [recycle], but the previous command prompt's [recycle] block is now available for freeing. So that's the 1 recycled node you are seeing.
    
     >> data: read/lines %test.txt
     == [
         "Line One"
         "Line Two"
     ]
    
     >> recycle
     == 20
    
The loading process itself uses GC'able nodes internally, above and beyond the data and command line you are getting.
    
Here the command line took up 4 series. One for the new word `data:`, one for the path `read/lines`, one for `%test.txt`, and another for the entire block [data: read/lines %test.txt].
    
The result itself holds onto 4 series; it keeps alive the word `data`, generates a new block, and two strings. So 3 new series.
    
The recycle is able to get rid of 4 of those 7. That means (20 - 4 = 16) are internal temporaries that are part of the process of reading itself (file buffers, filenames translated into OS-native format, etc.)
    
     >> recycle
     == 1
    
Again at steady state, recycling the [recycle]. The `data:`, "Line One", "Line Two", and block are still live.
    
     >> data: blank
     == _
    
Since we've created no additional references to any of the items, this should release them.
    
     >> recycle
     == 5
    
This recycles the previous [recycle], as well as the 4 elements (data:, "Line One", "Line Two", and the block containing the strings)
    
     >> recycle
     == 1
    
Back to steady state, recycling the previous command line.
    
It should be noted that with Rebol's memory implementation, recycling returns the memory to the memory pools...not to the OS:
    
https://en.wikipedia.org/wiki/Memory_pool
    
To actually return the memory to the OS would require the pool to be empty. There could be a command (much like an old school hard-drive defragmenter) that would compact the memory and update pointers that referred to that memory, which would be feasible but non-trivial.
    
Note: R3-Alpha will give similar results to the above, though it runs a recycle directly before giving you the command line...the numbers are lower as words are not GC'd nor are API handles. Sometimes it will apparently not recycle the command line, or...something, e.g.:
    
     >> recycle
     == 2
    
     >> recycle
     == 0
    
     >> recycle
     == 1
    
     >> recycle
     == 0
    
     >> do [recycle]
     == 2
    
     >> recycle
     == 1
    
     >> recycle
     == 1
    
     >> recycle
     == 0
    
The curious are invited to try and figure out why.
    
Red has no GC and I would predict it being non-trivial to add one after the fact. It's disappointing to me that GC'ing words has never been a priority, as I feel dialect designers should not feel worried to use either strings or words in long running processes.

posted by:   Fork     24-Jan-2017/12:49:29-8:00



Correction on the above case with:
    
     >> data: blank
     == _
    
     >> recycle
     == 5
    
What I should have said was:
    
> This recycles the previous [recycle], as well as the 3
> elements ("Line One", "Line Two", and the block containing
> the strings), as well as the previous command line [data: _]"
    
e.g. This is not an example of a WORD! being recycled. Firstly, the word "data" is already accounted for in the system somewhere as a function parameter or field. But secondly, it hasn't gone away entirely...it was just set to blank, so the key is still there.
    
When typing at the command line, you often don't see word recycling currently. That's because in the REPL, every unknown word gets added to the user context, and stays there permanently.
    
(This isn't how modules work, and arguably shouldn't be how the console works either.)
    
But to show it does work, this debug-build feature using RECYCLE/VERBOSE will PROBE() each series--prior to freeing any of them:
    
     >> data: reduce [make set-word! "bargle" make word! "nawdle" make issue! "vous"]
     == [bargle: nawdle #vous]
    
     >> recycle
     == 9
    
     >> data: _
     == _
    
     >> recycle/verbose
    
     ** PROBE() tick 28349 ../src/core/n-system.c:171
     bargle
    
     ** PROBE() tick 28349 ../src/core/n-system.c:171
     nawdle
    
     ** PROBE() tick 28349 ../src/core/n-system.c:171
     vous
    
     ** PROBE() tick 28349 ../src/core/n-system.c:171
     [bargle: nawdle #vous]
    
     ** PROBE() tick 28349 ../src/core/n-system.c:171
     [recycle]
    
     ** PROBE() tick 28349 ../src/core/n-system.c:171
     [data: _]


posted by:   Fork     25-Jan-2017/9:14:35-8:00



Steve,
    
Google "Rebol Garbage Collection" for the most common answers this question.
    
One sure-fire way to clean up memory after an operation is to launch it in a separate process. Open Windows Task Manager, then run the following script and watch the Rebol processes:
    
write %bigmemoryoperation.r {
    R E B O L []
    x: copy []
    insert/dup x "asdf" 10000000
    wait 5
}
launch %bigmemoryoperation.r
    
Slightly off the topic of freeing memory, but a very important about memory use is Doc's response at http://stackoverflow.com/questions/16041017/why-allocate-a-variable-in-rebol

posted by:   Nick     25-Jan-2017/9:57:29-8:00



Thank you, that note from DOC is very important for something I am working on right now. I have several programs that go through a database table of around 30,000 records and pull out some fields, and add those fields to a block that is later used as a lookup table. I start the empty block with "LOOKUPTABLE: copy []". So if I understand that note, when I add item 20,001 to LOOKUPTABLE which is full after adding item 20,000, REBOL copies the full table to another area before adding item 20,001. And so on for every item added. This will change the way I do things for sure.

posted by:   Steven White     25-Jan-2017/12:39:59-8:00