Create a disk cache object — diskCache
diskCache(dir = NULL, max_size = 10 * 1024^2, max_age = Inf, max_n = Inf, evict = c("lru", "fifo"), destroy_on_finalize = FALSE, missing = key_missing(), exec_missing = FALSE, logfile = NULL)
Arguments
dir |
Directory to store files for the cache. If |
---|---|
max_size |
Maximum size of the cache, in bytes. If the cache exceeds
this size, cached objects will be removed according to the value of the
|
max_age |
Maximum age of files in cache before they are evicted, in
seconds. Use |
max_n |
Maximum number of objects in the cache. If the number of objects
exceeds this value, then cached objects will be removed according to the
value of |
evict |
The eviction policy to use to decide which objects are removed
when a cache pruning occurs. Currently, |
destroy_on_finalize |
If |
missing |
A value to return or a function to execute when
|
exec_missing |
If |
logfile |
An optional filename or connection object to where logging
information will be written. To log to the console, use |
Description
A disk cache object is a key-value store that saves the values as files in a
directory on disk. Objects can be stored and retrieved using the get()
and set()
methods. Objects are automatically pruned from the cache
according to the parameters max_size
, max_age
, max_n
,
and evict
.
Missing Keys
The missing
and exec_missing
parameters controls what happens
when get()
is called with a key that is not in the cache (a cache
miss). The default behavior is to return a key_missing()
object. This is a sentinel value that indicates that the key was not
present in the cache. You can test if the returned value represents a
missing key by using the is.key_missing()
function. You can
also have get()
return a different sentinel value, like NULL
.
If you want to throw an error on a cache miss, you can do so by providing a
function for missing
that takes one argument, the key, and also use
exec_missing=TRUE
.
When the cache is created, you can supply a value for missing
, which
sets the default value to be returned for missing values. It can also be
overridden when get()
is called, by supplying a missing
argument. For example, if you use cache$get("mykey", missing = NULL)
, it will return NULL
if the key is not in the cache.
If your cache is configured so that get()
returns a sentinel value
to represent a cache miss, then set
will also not allow you to store
the sentinel value in the cache. It will throw an error if you attempt to
do so.
Instead of returning the same sentinel value each time there is cache miss,
the cache can execute a function each time get()
encounters missing
key. If the function returns a value, then get()
will in turn return
that value. However, a more common use is for the function to throw an
error. If an error is thrown, then get()
will not return a value.
To do this, pass a one-argument function to missing
, and use
exec_missing=TRUE
. For example, if you want to throw an error that
prints the missing key, you could do this:
diskCache( missing = function(key) { stop("Attempted to get missing key: ", key) }, exec_missing = TRUE )
If you use this, the code that calls get()
should be wrapped with
tryCatch()
to gracefully handle missing keys.
Cache pruning
Cache pruning occurs when set()
is called, or it can be invoked
manually by calling prune()
.
The disk cache will throttle the pruning so that it does not happen on
every call to set()
, because the filesystem operations for checking
the status of files can be slow. Instead, it will prune once in every 20
calls to set()
, or if at least 5 seconds have elapsed since the last
prune occurred, whichever is first. These parameters are currently not
customizable, but may be in the future.
When a pruning occurs, if there are any objects that are older than
max_age
, they will be removed.
The max_size
and max_n
parameters are applied to the cache as
a whole, in contrast to max_age
, which is applied to each object
individually.
If the number of objects in the cache exceeds max_n
, then objects
will be removed from the cache according to the eviction policy, which is
set with the evict
parameter. Objects will be removed so that the
number of items is max_n
.
If the size of the objects in the cache exceeds max_size
, then
objects will be removed from the cache. Objects will be removed from the
cache so that the total size remains under max_size
. Note that the
size is calculated using the size of the files, not the size of disk space
used by the files --- these two values can differ because of files are
stored in blocks on disk. For example, if the block size is 4096 bytes,
then a file that is one byte in size will take 4096 bytes on disk.
Another time that objects can be removed from the cache is when
get()
is called. If the target object is older than max_age
,
it will be removed and the cache will report it as a missing value.
Eviction policies
If max_n
or max_size
are used, then objects will be removed
from the cache according to an eviction policy. The available eviction
policies are:
"lru"
Least Recently Used. The least recently used objects will be removed. This uses the filesystem's mtime property. When "lru" is used, each
get()
is called, it will update the file's mtime."fifo"
First-in-first-out. The oldest objects will be removed.
Both of these policies use files' mtime. Note that some filesystems (notably FAT) have poor mtime resolution. (atime is not used because support for atime is worse than mtime.)
Sharing among multiple processes
The directory for a DiskCache can be shared among multiple R processes. To do this, each R process should have a DiskCache object that uses the same directory. Each DiskCache will do pruning independently of the others, so if they have different pruning parameters, then one DiskCache may remove cached objects before another DiskCache would do so.
Even though it is possible for multiple processes to share a DiskCache directory, this should not be done on networked file systems, because of slow performance of networked file systems can cause problems. If you need a high-performance shared cache, you can use one built on a database like Redis, SQLite, mySQL, or similar.
When multiple processes share a cache directory, there are some potential
race conditions. For example, if your code calls exists(key)
to check
if an object is in the cache, and then call get(key)
, the object may
be removed from the cache in between those two calls, and get(key)
will throw an error. Instead of calling the two functions, it is better to
simply call get(key)
, and use tryCatch()
to handle the error
that is thrown if the object is not in the cache. This effectively tests for
existence and gets the object in one operation.
It is also possible for one processes to prune objects at the same time that
another processes is trying to prune objects. If this happens, you may see
a warning from file.remove()
failing to remove a file that has
already been deleted.
Methods
A disk cache object has the following methods:
get(key, missing, exec_missing)
Returns the value associated with
key
. If the key is not in the cache, then it returns the value specified bymissing
or,missing
is a function andexec_missing=TRUE
, then executesmissing
. The function can throw an error or return the value. If either of these parameters are specified here, then they will override the defaults that were set when the DiskCache object was created. See section Missing Keys for more information.set(key, value)
Stores the
key
-value
pair in the cache.exists(key)
Returns
TRUE
if the cache contains the key, otherwiseFALSE
.size()
Returns the number of items currently in the cache.
keys()
Returns a character vector of all keys currently in the cache.
reset()
Clears all objects from the cache.
destroy()
Clears all objects in the cache, and removes the cache directory from disk.
prune()
Prunes the cache, using the parameters specified by
max_size
,max_age
,max_n
, andevict
.