Clear the file stat cache in R? - r

In a RUnit test I have this snippet:
checkTrue(!file.exists(fname))
doSomething()
#cat(fname);print(file.info(fname))
checkTrue(file.exists(fname))
The second call to checkTrue() is failing, even though doSomething() creates a file. I've confirmed the file is there with the commented out line, shown above.
So, I'm wondering if the second call to file.exists is using cached data? In PHP there is a function called clearstatcache that stops this happening. Is there an R equivalent? (Or, perhaps, someone knows that R never caches results of stat calls?)

file.exists does not cache the result of the stat call, so there is no equivalent of PHP's clearstatcache. (Aside: that also means excessive use of calling file.exists, or any stat function, might degrade performance.)
As of R 3.0.1, file.exists uses an internal method. If you follow it through the source you will eventually end up with a call to R_FileExists, in sysutils.c:
Rboolean R_FileExists(const char *path)
{
struct stat sb;
return stat(R_ExpandFileName(path), &sb) == 0;
}
(On Windows it instead uses a call to _stat64 which I've just confirmed also does not do caching.)

Related

Rcpp::compileAttributes() not updating .R file

I am trying to build a package in R involving the Rcpp package. When I generated the package using the command Rcpp.package.skeleton("pck338").
By default, the files rcpp_hello_world.cpp is included, and the RcppExports.cpp file is included as well.
To my understanding, the compileAttributes() function needs to be run every time a new .cpp function is added to the src directory.
To that end, I wrote a simple function in the rcpp_dance.cpp file that goes like this:
# include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp:export]]
int rcpp_dance(int x) {
int val = x + 5;
return val;
}
However, when I run the compileAttributes(), the RcppExports.cpp stays the same, so the dance function is not converted to an R function. Why is this happening? Any specific and general feedback would be appreciated.
In a case like this where it smells like a possible error, check for a possible error. I learned (the hard way) to first assume I goofed...
In your case: :: != :.
You wanted Rcpp::export with two colons. Try that, rinse, repeat...
(And for the other conjecture: you need to re-run compileAttributes() each time an interface changes: adding or removing or renaming or retyping an argument in the signature, and of course adding or removing whole functions. But thankfully the function is so fast that you may as well get in the habit of running it often. If in doubt, run it.)

How to systematically populate a whitelist for a sandboxing program?

On pp. 260-263 of Programming in Lua (4th ed.), the author discusses how to implement "sandboxing" (i.e. the running of untrusted code) in Lua.
When it comes to imposing limiting the functions that untrusted code can run, he recommends a "whitelist approach":
We should never think in terms of what functions to remove, but what functions to add.
This question is about tools and techniques for putting this suggestion into practice. (I expect there will be confusion on this point I want to emphasize it upfront.)
The author gives the following code as an illustration of a sandbox program based on a whitelist of allowed functions. (I have added or moved around some comments, and removed some blank lines, but I've copied the executable content verbatim from the book).
-- From p. 263 of *Programming in Lua* (4th ed.)
-- Listing 25.6. Using hooks to bar calls to unauthorized functions
local debug = require "debug"
local steplimit = 1000 -- maximum "steps" that can be performed
local count = 0 -- counter for steps
local validfunc = { -- set of authorized functions
[string.upper] = true,
[string.lower] = true,
... -- other authorized functions
}
local function hook (event)
if event == "call" then
local info = debug.getinfo(2, "fn")
if not validfunc[info.func] then
error("calling bad function: " .. (info.name or "?"))
end
end
count = count + 1
if count > steplimit then
error("script uses too much CPU")
end
end
local f = assert(loadfile(arg[1], "t", {})) -- load chunk
debug.sethook(hook, "", 100) -- set hook
f() -- run chunk
Right off the bat I am puzzled by this code, since the hook tests for event type (if event == "call" then...), and yet, when the hook is set, only count events are requested (debug.sethook(hook, "", 100)). Therefore, the whole song-and-dance with validfunc is for naught.
Maybe it is a typo. So I tried experimenting with this code, but I found it very difficult to put the whitelist technique in practice. The example below is a very simplified illustration of the type of problems I ran into.
First, here is a slightly modified version of the author's code.
#!/usr/bin/env lua5.3
-- Filename: sandbox
-- ----------------------------------------------------------------------------
local debug = require "debug"
local steplimit = 1000 -- maximum "steps" that can be performed
local count = 0 -- counter for steps
local validfunc = { -- set of authorized functions
[string.upper] = true,
[string.lower] = true,
[io.stdout.write] = true,
-- ... -- other authorized functions
}
local function hook (event)
if event == "call" then
local info = debug.getinfo(2, "fnS")
if not validfunc[info.func] then
error(string.format("calling bad function (%s:%d): %s",
info.short_src, info.linedefined, (info.name or "?")))
end
end
count = count + 1
if count > steplimit then
error("script uses too much CPU")
end
end
local f = assert(loadfile(arg[1], "t", {})) -- load chunk
validfunc[f] = true
debug.sethook(hook, "c", 100) -- set hook
f() -- run chunk
The most significant differences in the second snippet relative to the first one are:
the call to debug.sethook has "c" as mask;
the f function for the loaded chunk gets added to the validfunc whitelist;
io.stdout.write is added to the validfunc whitelist;
When I use this sandbox program to run the one-line script shown below:
# Filename: helloworld.lua
io.stdout:write("Hello, World!\n")
...I get the following error:
% ./sandbox helloworld.lua
lua5.3: ./sandbox:20: calling bad function ([C]:-1): __index
stack traceback:
[C]: in function 'error'
./sandbox:20: in function <./sandbox:16>
[C]: in metamethod '__index'
helloworld.lua:3: in local 'f'
./sandbox:34: in main chunk
[C]: in ?
I tried to fix this by adding the following to validfunc:
[getmetatable(io.stdout).__index] = true,
...but I still get pretty much the same error. I could go on guessing and trying more things to add, but this is what I would like to avoid.
I have two related questions:
What can I add to validfunc so that sandbox will run helloworld (as is) to completion?
More importantly, what is a systematic way to find determine what to add to a whitelist table?
Part (2) is the heart of this post. I am looking for tools/techniques that remove the guesswork from the problem of populating a whitelist table.
(I know that I can get helloworld to work if I replace io.stdout:write with print, register print in sandbox's validfunc, and pass {print = print} as the last argument to loadfile, but doing this does not answer the general question of how to systematically determine what needs to be added to the whitelist to allow some specific code to work in the sandbox.)
EDIT: Ask #DarkWiiPlayer pointed out, the calling bad function error is being triggered by the calling of an unregistered function (__index?), which happened as part of the response to an earlier attempt to index a nil value error. So, this post's questions are all about systematically determining what to add to validfunc to allow Lua to emit the attempt to index a nil value error normally.
I should add that the question of which function's call triggered the hook's execution responsible for the calling bad function error message is at the moment completely unclear. This error message blames the error on __index, but I suspect that this may be a red herring, possibly due to a bug in Lua.
Why suspect a bug in Lua? If I change the error call in sandbox slightly to
error(string.format("calling bad function (%s:%d): %s (%s)",
info.short_src, info.linedefined, (info.name or "?"),
info.func))
...then the error message looks like this:
lua5.3: ./sandbox:20: calling bad function ([C]:-1): __index (function: 0x55b391b79ef0)
stack traceback:
[C]: in function 'error'
./sandbox:20: in function <./sandbox:16>
[C]: in metamethod '__index'
helloworld.lua:3: in local 'f'
./sandbox:34: in main chunk
[C]: in ?
Nothing surprising there, but if now I change helloworld.lua to
# Filename: helloworld.lua
nonexistent()
io.stdout:write("Hello, World!\n")
...and run it under sandbox, the error message becomes
lua5.3: ./sandbox:20: calling bad function ([C]:-1): nonexistent (function: 0x556a161cdef0)
stack traceback:
[C]: in function 'error'
./sandbox:20: in function <./sandbox:16>
[C]: in global 'nonexistent'
helloworld.lua:3: in local 'f'
./sandbox:34: in main chunk
[C]: in ?
From this error message, one may conclude that nonexistent is a real function; after all, it's sitting right there at 0x556a161cdef0! But we know that nonexistent lives up to its name: it doesn't exist!
The whiff of a bug is definitely in the air. It could be that the function that is triggering the hook should really be excluded from those that trigger such "c"-masked hooks? Be that as it may, it appears that, in this particular situation, the call to debug.info is returning inconsistent information (since the name of the function [e.g. nonexistent] clearly does not correspond at all to the actual function object [e.g. function: 0x556a161cdef0] that is supposedly triggering the hook).
(Final answer at the bottom, feel free to skip until the <hr> line)
I'll explain my debugging step by step.
This is a really weird phenomenon. After some testing, I've managed to narrow it down a bit:
Since you pass {} to load, the function runs with an empty environment, so io is, in fact, nil (and io.stdout would error anyway)
The error happens directly when attempting to index io (which is a nil value)
The functio __index is a C function (see error message)
My first intuition was that __index was called somewhere internally. Thus, to find out what it does, I decided to look at its locals in hopes of guessing what it does.
A quick helper function I threw together:
local function locals(f)
return function(f, n)
local name, value = debug.getlocal(f+1, n)
if name then
return n+1, name, value
end
end, f, 1
end
Insert that right before the line where the error is raised:
for idx, name, value in locals(2) do
print(name, value)
end
error(string.format("calling bad function (%s:%d): %s", info.short_src, info.linedefined, (info.name or "?")))
This led to an interesting result:
(*temporary) stdin:43: attempt to index a nil value (global 'io')
(*temporary) table: 0x563cef2fd170
lua: stdin:29: calling bad function ([C]:-1): __index
stack traceback:
[C]: in function 'error'
stdin:29: in function <stdin:21>
[C]: in metamethod '__index'
stdin:43: in function 'f'
stdin:49: in main chunk
[C]: in ?
shell returned 1
Why is there a temporary string value with a completely different error message?
By the way, this error makes total sense; io does not exist because of the empty environment, so indexing it should obviously raise just that error.
It's honestly a very interesting error, but I'll leave it at this, as you're learning the language and this hint might be enough for you to figure it out on your own. It's also a very nice chance to actually use (and get to know) the debug module in a more practical context.
Actual Solution
After some time has now passed, I came back to add a proper solution to this problem, but I really already did just that. The weird error reporting is just Lua being weird. The real error is the empty environment that's set when loading the chunk, as I mentioned a few paragraphs above.
From the manual:
load (chunk [, chunkname [, mode [, env]]])
Loads a chunk.
[...]
If the resulting function has upvalues, the first upvalue is set to the value of env, if that parameter is given, or to the value of the global environment. Other upvalues are initialized with nil. (When you load a main chunk, the resulting function will always have exactly one upvalue, the _ENV variable (see §2.2). However, when you load a binary chunk created from a function (see string.dump), the resulting function can have an arbitrary number of upvalues.) All upvalues are fresh, that is, they are not shared with any other function.
[...]
Now, in a "main chunk", i.e. one loaded from a text Lua file, the first (and only) upvalue is always the environment of the chunk, so where it will look for "globals" (this is slightly different in Lua 5.1). Since an empty table is passed in, the chunk has no access to any of the global variables like string or io.
Therefore, when the function f() tries to index io, Lua throws an error "attempt to index a nil value", because io is nil. For whatever reason Lua then makes some internal function calls that end up triggering the blacklist, causing a new error that shadows the previous one; this makes debugging this error extremely inconvenient and almost impossible without using the debug library to get additional information about the call stack.
I ultimately only realized this myself after I noticed the original error message while looking at the locals of the function that made the blocked call.
I hope this solves the problem :)

How to quit/exit from file included in the terminal

What can I do within a file "example.jl" to exit/return from a call to include() in the command line
julia> include("example.jl")
without existing julia itself. quit() will just terminate julia itself.
Edit: For me this would be useful while interactively developing code, for example to include a test file and return from the execution to the julia prompt when a certain condition is met or do only compile the tests I am currently working on without reorganizing the code to much.
I'm not quite sure what you're looking to do, but it sounds like you might be better off writing your code as a function, and use a return to exit. You could even call the function in the include.
Kristoffer will not love it, but
stop(text="Stop.") = throw(StopException(text))
struct StopException{T}
S::T
end
function Base.showerror(io::IO, ex::StopException, bt; backtrace=true)
Base.with_output_color(get(io, :color, false) ? :green : :nothing, io) do io
showerror(io, ex.S)
end
end
will give a nice, less alarming message than just throwing an error.
julia> stop("Stopped. Reason: Converged.")
ERROR: "Stopped. Reason: Converged."
Source: https://discourse.julialang.org/t/a-julia-equivalent-to-rs-stop/36568/12
You have a latent need for a debugging workflow in Julia. If you use Revise.jl and Rebugger.jl you can do exactly what you are asking for.
You can put in a breakpoint and step into code that is in an included file.
If you include a file from the julia prompt that you want tracked by Revise.jl, you need to use includet(.
The keyboard shortcuts in Rebugger let you iterate and inspect variables and modify code and rerun it from within an included file with real values.
Revise lets you reload functions and modules without needing to restart a julia session to pick up the changes.
https://timholy.github.io/Rebugger.jl/stable/
https://timholy.github.io/Revise.jl/stable/
The combination is very powerful and is described deeply by Tim Holy.
https://www.youtube.com/watch?v=SU0SmQnnGys
https://youtu.be/KuM0AGaN09s?t=515
Note that there are some limitations with Revise, such as it doesn't reset global variables, so if you are using some global count or something, it won't reset it for the next run through or when you go back into it. Also it isn't great with runtests.jl and the Test package. So as you develop with Revise, when you are done, you move it into your runtests.jl.
Also the Juno IDE (Atom + uber-juno package) has good support for code inspection and running line by line and the debugging has gotten some good support lately. I've used Rebugger from the julia prompt more than from the Juno IDE.
Hope that helps.
#DanielArndt is right.
It's just create a dummy function in your include file and put all the code inside (except other functions and variable declaration part that will be place before). So you can use return where you wish. The variables that only are used in the local context can stay inside dummy function. Then it's just call the new function in the end.
Suppose that the previous code is:
function func1(...)
....
end
function func2(...)
....
end
var1 = valor1
var2 = valor2
localVar = valor3
1st code part
# I want exit here!
2nd code part
Your code will look like this:
var1 = valor1
var2 = valor2
function func1(...)
....
end
function func2(...)
....
end
function dummy()
localVar = valor3
1st code part
return # it's the last running line!
2nd code part
end
dummy()
Other possibility is placing the top variables inside a function with a global prefix.
function dummy()
global var1 = valor1
global var2 = valor2
...
end
That global variables can be used inside auxiliary function (static scope) and outside in the REPL
Another variant only declares the variables and its posterior use is free
function dummy()
global var1, var2
...
end

How do I make a `Pipe` or `TTY` with a custom callback in Julia?

I'd like to fancy up my embedding of Julia in a MATLAB mex function by hooking up Julia's STDIN, STDOUT, and STDERR to the MATLAB terminal. The documentation for redirect_std[in|out|err] says that the stream that I pass in as the argument needs to be a TTY or a Pipe (or a TcpSocket, which wouldn't seem to apply).
I know how I will define the right callbacks for each stream (basically, wrappers around calls to MATLAB's input and fprintf), but I'm not sure how to construct the required stream.
Pipe was renamed PipeEndpoint in https://github.com/JuliaLang/julia/pull/12739, but the corresponding documentation was not updated and PipeEndpoint is now considered internal. Even so, creating the pipe up front is still doable:
pipe = Pipe()
Base.link_pipe(pipe)
redirect_stdout(pipe.in)
#async while !eof(pipe)
data = readavailable(pipe)
# Pass data to whatever function handles display here
end
Furthermore, the no-argument version of these functions already create a pipe object, so the recommended way to do this would be:
(rd,wr) = redirect_stdout()
#async while !eof(rd)
data = readavailable(rd)
# Pass data to whatever function handles display here
end
Nevertheless, all of this is less clear than it could be, so I have created a pull request to clean up this API: https://github.com/JuliaLang/julia/pull/18253. Once that pull request is merged, the link_pipe call will become unnecessary and pipe can be passed directly into redirect_stdout. Further, the return value from the no-argument version will become a regular Pipe.

Calling the original function after tweaking arguments is an incorrect use of the R-API?

I am trying to create a small extension for R here for embedding the current time on the R prompt: https://github.com/musically-ut/extPrompt
Things seem to be working overall, but R CMD check . raised a warning:
File '[truncated]..Rcheck/extPrompt/libs/extPrompt.so’:
Found non-API call to R: ‘ptr_R_ReadConsole’
Compiled code should not call non-API entry points in R.
The concerned file is this: https://github.com/musically-ut/extPrompt/blob/master/src/extPrompt.c and occurs on line 38, I think.
void extPrompt() {
// Initialize the plugin by replacing the R_ReadConsole function
old_R_ReadConsole = ptr_R_ReadConsole;
ptr_R_ReadConsole = extPrompt_ReadConsole;
// ...
}
int extPrompt_ReadConsole(const char *old_prompt, unsigned char *buf, int len,
int addtohistory) {
// ...
// Call the old function with the `new_prompt`
return (*old_R_ReadConsole)(new_prompt, buf, len, addtohistory);
}
I am trying to make the R_ReadConsole API call. However, since a different plugin (like mine) could have overridden it already, I do not want to directly invoke R_ReadConsole but the function which previously was at ptr_R_ReadConsole.
Is this an incorrect use of the API?
From the r-devel mailing list:
As for incorrectly using the API - yes. Since R CMD check is telling
you this is a problem, it is officially a problem. This is of no
consequence if the package works for you and any users of it, and
the package is not to be hosted on CRAN. However, using this symbol in
your code may not work on all platforms, and is not guaranteed to
work in the future (but probably will!).

Resources