How to safely remove item from a vector? - vector

Let's say I have this vector:
let mut v = vec![1,2,3];
And I want to remove some item from it:
v.remove(3);
It panics. How can I catch/gracefully handle that panic? I tried to use panic::catch_unwind but it doesn't seem to work with vectors (std::vec::Vec<i32> may not be safely transferred across an unwind boundary). Should I manually check if item exists at an index before removing it?

In general, vector and slice methods consider it a programming error if they receive an index that is out of range, and the convention in Rust is to panic for programming errors. If your code panics, you generally need to fix the code to uphold the invariant that was disregarded.
Some of the slice methods have variants that don't panic for invalid indices. One example is the indexing operator [index], which panics for and out-of-bounds index, and the get() method, which returns None if the index is out of bounds.
The remove() method does not have an equivalent that does not panic. You should check the index manually before passing it in:
if (index < v.len()) {
v.remove(index);
} else {
// Handle error
}
In real applications, this should rarely be necessary, though. The code that generates the index to be deleted can usually be written in a way that it will only yield in-bounds indices.

Related

How does subtracting .as_ptr() values work?

Looking at a code snippet for parsing HTTP requests provided as part of the Tokio examples, I see the following code:
let toslice = |a: &[u8]| {
let start = a.as_ptr() as usize - src.as_ptr() as usize;
assert!(start < src.len());
(start, start + a.len())
};
As I understand, the above code snippet it is getting the pointer location for the input vector a and the pointer location for a variable outside the scope of the closure and subtracting them. Then it returns a tuple containing this calculated value and the calculated value plus the length of the input vector.
What is this trying to accomplish? One could end up with a negative number and then panic because it wouldn't cast to usize. In fact, when I compile the example, this is exactly what happens when the input is the bytes for the string GET or POST, but not for other values. Is this a performance optimization for doing some sort of substring from a vector?
Yes this is just subtracting pointers. The missing context is the closure clearly intends that a must be a subslice (substring) of the closed-over src slice. Thus toslice(a) ends up returning the start and end indices of a inside src. More explicitly, let (start, end) = toslice(a); means src[start] is a[0] (not just equal value, they are the same address), and src[end - 1] is the last byte of a.
Violating this context assumption will likely produce panics. That's fine because this closure is a local variable that is only used locally and not exposed to any unknown users, so the only calls to the closure are in the example you linked and they evidently all satisfy the constraint.

Does Rust protect me from iterator invalidation when pushing to a vector while iterating over it?

Does Rust protect me from iterator invalidation here or am I just lucky with realloc? What guarantees are given for an iterator returned for &'a Vec<T>?
fn main() {
let mut v = vec![0; 2];
println!("capacity: {}", v.capacity());
{
let v_ref = &mut v;
for _each in v_ref.clone() {
for _ in 0..101 {
(*v_ref).push(1); // ?
}
}
}
println!("capacity: {}", v.capacity());
}
In Rust, most methods take an &self - a reference to self. In most circumstances, a call like some_string.len() internally "expands" to something like this:
let a: String = "abc".to_string();
let a_len: usize = String::len(&a); // This is identical to calling `a.len()`.
However, consider a reference to an object: a_ref, which is an &String that references a. Rust is smart enough to determine whether a reference needs to be added or removed, like we saw above (a becomes &a); In this case, a_ref.len() expands to:
let a: String = "abc".to_string();
let a_ref: &String = &a;
let a_len: usize = String::len(a_ref); // This is identical to calling `a_ref.len();`. Since `a_ref` is a reference already, it doesn't need to be altered.
Notice that this is basically equivalent to the original example, except that we're using an explicitly-set reference to a rather than a directly.
This means that v.clone() expands to Vec::clone(&v), and similarly, v_ref.clone() expands to Vec::clone(v_ref), and since v_refis &v (or, specifically, &mut v), we can simplify this back into Vec::clone(&v). In other words, these calls are equivalent - calling clone() on a basic reference (&) to an object does not clone the reference, it clones the referenced object.
In other words, Tamas Hedgeus' comment is correct: You are iterating over a new vector, which contains elements that are clones of the elements in v. The item being iterated over in your for loop is not a &Vec, it's a Vec that is separate from v, and therefore iterator invalidation is not an issue.
As for your question about the guarantees Rust provides, you'll find that Rust's borrow checker handles this rather well without any strings attached.
If you were to remove clone() from the for loop, though, you would receive an error message, use of moved value: '*v_ref', because v_ref is considered 'moved' into the for loop when you iterate over it, and cannot be used for the remainder of the function; to avoid this, the iter function creates an iterator object that only borrows the vector, allowing you to reuse the vector after the loop ends (and the iterator is dropped). And if you were to try iterating over and mutating v without the v_ref abstraction, the error reads cannot borrow 'v' as mutable because it is also borrowed as immutable. v is borrowed immutably within the iterator spawned by v.iter() (which has type signature of fn iter(&self) -> Iter<T> - note, it makes a borrow to the vector), and will not allow you to mutate the vector as a result of Rust's borrow checker, until the iterator is dropped (at the end of the for loop). However, since you can have multiple immutable references to a single object, you can still read from the vector within the for loop, just not write into it.
If you need to mutate an element of a vector while iterating over the vector, you can use iter_mut, which returns mutable references to one element at a time and lets you change that element only. You still cannot mutate the iterated vector itself with iter_mut, because Rust ensures that there is only one mutable reference to an object at a time, as well as ensuring there are no mutable references to an object in the same scope as immutable references to that object.

In OCaml, how do I re-assign a global variable inside a function

My program has the following global variable:
let a = (0.0,0.0);;
And the following, where eval e1 returns a string_of_float and somefunc e2 returns a tuple.
let rec output_expr = function
Binop(e1, op, e2) ->
let onDist = float_of_string(eval e1) and onDir = somefunc e2 in
let newA = onDir in (
fprintf oc "\n\t%s" ("blah");
fprintf oc "\n\t%s" ("blah");
fprintf oc "\n\t%s" ("blah");
let a = newA
)
Now, the code above gives me the following error:
Error: This expression has type bool
but an expression was expected of type unit
Command exited with code 2.
I want let a = newA to change the value of the global variable a. How can I do that?
To do it you need to make the value a reference,
let a = ref (0.0, 0.0)
then later that state can change by,
a := (1.0, 2.0);
In a functional world you would not want to have this global state. Sometimes it is very helpful, but in this particular case that is doubtful. You should pass the value a into your function and return a new value (a') that can be used subsequently; note that the value never changes, but new values take the place and are used in further computation.
In your particular case, I think you need to ask yourself why a function named output_expr modifies some global state, or returns anything but unit. But maybe this is a toy example for our consumption, so I will leave it at that.
You cannot assign to a variable (local or global is the same) in OCaml. There's simply no syntax in the language for it. In other words, variables in OCaml are what other languages call "constants" -- they get a value once in initialization, and that's it.
However, you can use a mutable data structure, which offers ways to modify its contents. Data structures are reference types, you can hold a reference to the data structure in a variable, and modify the contents, without needing to assign to the variable.
nlucaroni mentioned such a data structure, ref, which is a simple mutable cell holding a value of the desired type. There are other mutable data structures, like arrays, strings, and any record with mutable fields. Each has its own way of modifying the contents.
However, mutable state can mostly be avoided in functional programming, and if you are relying on mutable state, it may be an indication that you are not doing it the functional way.
In OCaml, values are immutable. You can't change the content of a value and should reorganize your code so that you don't need to.
Here your function output_expr should return the newA and this value should be used instead of a after that.
Actually you can have mutable variables using references but you should only use them if you know what you do and think they are better suited for a particular use case, never because you don't understand immutability.

Initialize an Empty Array of Tuples in Julia

I can't figure out how to initialize an empty array of tuples. The manual says:
The type of a tuple of values is the tuple of types of values... Accordingly, a tuple of types can be used anywhere a type is expected.
Yet this does not work:
myarray = (Int64,Int64)[]
But this does:
Int64[]
It would seem a type is expected in front of the empty square brackets, but the tuple type doesn't work. This <type>[] syntax is the only way I can find to get an empty typed Array (other methods seem to produce a bunch of #undef values). Is the only way to do it, and if it is, how can I type the Array with tuples?
BTW, my use case is creating an array of initially indeterminate length and pushing tuples onto it in a loop.
For people who look for the latest solution,
Tuple{Int, Int}[] works in v0.4
Also the verbose way Array{Tuple{Int, Int}}(0) works in v0.4 as well.
It creates a 0-element Array{Tuple{Int64,Int64},1}
Note that in v1.0 you'd need to write
Array{Tuple{Int, Int}}(undef, 0)
You can do Array((Int,Int),0) for this. It is probably feasible to add methods to getindex to make (Int,Int)[] work, but I'm not sure it's worth it. Feel free to open an issue.

How to force an error if non-finite values (NA, NaN, or Inf) are encountered

There's a conditional debugging flag I miss from Matlab: dbstop if infnan described here. If set, this condition will stop code execution when an Inf or NaN is encountered (IIRC, Matlab doesn't have NAs).
How might I achieve this in R in a more efficient manner than testing all objects after every assignment operation?
At the moment, the only ways I see to do this are via hacks like the following:
Manually insert a test after all places where these values might be encountered (e.g. a division, where division by 0 may occur). The testing would be to use is.finite(), described in this Q & A, on every element.
Use body() to modify the code to call a separate function, after each operation or possibly just each assignment, which tests all of the objects (and possibly all objects in all environments).
Modify R's source code (?!?)
Attempt to use tracemem to identify those variables that have changed, and check only these for bad values.
(New - see note 2) Use some kind of call handlers / callbacks to invoke a test function.
The 1st option is what I am doing at present. This is tedious, because I can't guarantee I've checked everything. The 2nd option will test everything, even if an object hasn't been updated. That is a massive waste of time. The 3rd option would involve modifying assignments of NA, NaN, and infinite values (+/- Inf), so that an error is produced. That seems like it's better left to R Core. The 4th option is like the 2nd - I'd need a call to a separate function listing all of the memory locations, just to ID those that have changed, and then check the values; I'm not even sure this will work for all objects, as a program may do an in-place modification, which seems like it would not invoke the duplicate function.
Is there a better approach that I'm missing? Maybe some clever tool by Mark Bravington, Luke Tierney, or something relatively basic - something akin to an options() parameter or a flag when compiling R?
Example code Here is some very simple example code to test with, incorporating the addTaskCallback function proposed by Josh O'Brien. The code isn't interrupted, but an error does occur in the first scenario, while no error occurs in the second case (i.e. badDiv(0,0,FALSE) doesn't abort). I'm still investigating callbacks, as this looks promising.
badDiv <- function(x, y, flag){
z = x / y
if(flag == TRUE){
return(z)
} else {
return(FALSE)
}
}
addTaskCallback(stopOnNaNs)
badDiv(0, 0, TRUE)
addTaskCallback(stopOnNaNs)
badDiv(0, 0, FALSE)
Note 1. I'd be satisfied with a solution for standard R operations, though a lot of my calculations involve objects used via data.table or bigmemory (i.e. disk-based memory mapped matrices). These appear to have somewhat different memory behaviors than standard matrix and data.frame operations.
Note 2. The callbacks idea seems a bit more promising, as this doesn't require me to write functions that mutate R code, e.g. via the body() idea.
Note 3. I don't know whether or not there is some simple way to test the presence of non-finite values, e.g. meta information about objects that indexes where NAs, Infs, etc. are stored in the object, or if these are stored in place. So far, I've tried Simon Urbanek's inspect package, and have not found a way to divine the presence of non-numeric values.
Follow-up: Simon Urbanek has pointed out in a comment that such information is not available as meta information for objects.
Note 4. I'm still testing the ideas presented. Also, as suggested by Simon, testing for the presence of non-finite values should be fastest in C/C++; that should surpass even compiled R code, but I'm open to anything. For large datasets, e.g. on the order of 10-50GB, this should be a substantial savings over copying the data. One may get further improvements via use of multiple cores, but that's a bit more advanced.
The idea sketched below (and its implementation) is very imperfect. I'm hesitant to even suggest it, but: (a) I think it's kind of interesting, even in all of its ugliness; and (b) I can think of situations where it would be useful. Given that it sounds like you are right now manually inserting a check after each computation, I'm hopeful that your situation is one of those.
Mine is a two-step hack. First, I define a function nanDetector() which is designed to detect NaNs in several of the object types that might be returned by your calculations. Then, it using addTaskCallback() to call the function nanDetector() on .Last.value after each top-level task/calculation is completed. When it finds an NaN in one of those returned values, it throws an error, which you can use to avoid any further computations.
Among its shortcomings:
If you do something like setting stop(error = recover), it's hard to tell where the error was triggered, since the error is always thrown from inside of stopOnNaNs().
When it throws an error, stopOnNaNs() is terminated before it can return TRUE. As a consequence, it is removed from the task list, and you'll need to reset with addTaskCallback(stopOnNaNs) it you want to use it again. (See the 'Arguments' section of ?addTaskCallback for more details).
Without further ado, here it is:
# Sketch of a function that tests for NaNs in several types of objects
nanDetector <- function(X) {
# To examine data frames
if(is.data.frame(X)) {
return(any(unlist(sapply(X, is.nan))))
}
# To examine vectors, matrices, or arrays
if(is.numeric(X)) {
return(any(is.nan(X)))
}
# To examine lists, including nested lists
if(is.list(X)) {
return(any(rapply(X, is.nan)))
}
return(FALSE)
}
# Set up the taskCallback
stopOnNaNs <- function(...) {
if(nanDetector(.Last.value)) {stop("NaNs detected!\n")}
return(TRUE)
}
addTaskCallback(stopOnNaNs)
# Try it out
j <- 1:00
y <- rnorm(99)
l <- list(a=1:4, b=list(j=1:4, k=NaN))
# Error in function (...) : NaNs detected!
# Subsequent time consuming code that could be avoided if the
# error thrown above is used to stop its evaluation.
I fear there is no such shortcut. In theory on unix there is SIGFPE that you could trap on, but in practice
there is no standard way to enable FP operations to trap it (even C99 doesn't include a provision for that) - it is highly system-specifc (e.g. feenableexcept on Linux, fp_enable_all on AIX etc.) or requires the use of assembler for your target CPU
FP operations are nowadays often done in vector units like SSE so you can't be even sure that FPU is involved and
R intercepts some operations on things like NaNs, NAs and handles them separately so they won't make it to the FP code
That said, you could hack yourself an R that will catch some exceptions for your platform and CPU if you tried hard enough (disable SSE etc.). It is not something we would consider building into R, but for a special purpose it may be doable.
However, it would still not catch NaN/NA operations unless you change R internal code. In addition, you would have to check every single package you are using since they may be using FP operations in their C code and may also handle NA/NaN separately.
If you are only worried about things like division by zero or over/underflows, the above will work and is probably the closest to something like a solution.
Just checking your results may not be very reliable, because you don't know whether a result is based on some intermediate NaN calculation that changed an aggregated value which may not need to be NaN as well. If you are willing to discard such case, then you could simply walk recursively through your result objects or the workspace. That should not be extremely inefficient, because you only need to worry about REALSXP and not anything else (unless you don't like NAs either - then you'd have more work).
This is an example code that could be used to traverse R object recursively:
static int do_isFinite(SEXP x) {
/* recurse into generic vectors (lists) */
if (TYPEOF(x) == VECSXP) {
int n = LENGTH(x);
for (int i = 0; i < n; i++)
if (!do_isFinite(VECTOR_ELT(x, i))) return 0;
}
/* recurse into pairlists */
if (TYPEOF(x) == LISTSXP) {
while (x != R_NilValue) {
if (!do_isFinite(CAR(x))) return 0;
x = CDR(x);
}
return 1;
}
/* I wouldn't bother with attributes except for S4
where attributes are slots */
if (IS_S4_OBJECT(x) && !do_isFinite(ATTRIB(x))) return 0;
/* check reals */
if (TYPEOF(x) == REALSXP) {
int n = LENGTH(x);
double *d = REAL(x);
for (int i = 0; i < n; i++) if (!R_finite(d[i])) return 0;
}
return 1;
}
SEXP isFinite(SEXP x) { return ScalarLogical(do_isFinite(x)); }
# in R: .Call("isFinite", x)

Resources