How do I specify non-scalar parameters in a Slim table? - vector

I want to test functions that have non-scalar parameters and return types, e.g., "given a matrix M, check that M times its inverse is the appropriate identity" or "given a row vector V, check that transpose(V) is the correct column vector.
Do I need a clever use of TableTable?

Related

Hash functions in R: keys and values as doubles?

I'm fairly new to R. I want to use a hash function to hold key-value pairs as doubles/numerics.
I have a function f(x) = y where the function f takes the input x as a double and returns y as a double. I use this function f(x) over 10^8 times in my code (and I would like to use it much more often), and 99% of the time f(x) has been computed once already. I would like to store my answers as key-value pairs, so I can find them instead of calculating them again.
I read the article below about using environments as hash tables, but I cannot figure out how to use doubles as key-value pairs.
https://www.r-bloggers.com/hash-table-performance-in-r-part-i/
How do I use doubles as key-value pairs in a hash function in R?

Standard ML : Calculating the average of a given set

I recently had the assignment to calculate the average of a set (given by input) in Standard ML.
The idea is to have a function like below in which you input a list of real numbers and receive the average of those numbers (also a real), such that the terminal gives you this as a return answer when you input the function:
average = fn : real list -> real
We discussed this in a tutorial as well but I wanted to know if there was some sort of trick when creating such functions in Standard ML.
Thanks in advance!
Sum the numbers and divide by the length. A simple recursive sum is typically one of the first examples that you would see in any SML tutorial. You would need to have the empty list basis case of sum evaluate to 0.0 rather than 0 to make sure that the return type is real. Once you define a sum function then you can define average in 1 line using sum and the built in length function. A subtlty is that SML doesn't allow a real to be divided by an int. You could use the conversion function Real.fromInt on the length before dividing the sum by it. There is some inefficiency in passing over the same list twice, once to sum it and once to calculate its length, but there is little reason to worry about such things when you are first learning the language.
On Edit: Since you have found a natural solution and shared it in the comments, here is a more idiomatic version which computes the average in one pass over the list:
fun average nums =
let
fun av (s,n,[]) = s/Real.fromInt(n)
| av (s,n,x::xs) = av (s+x,n+1,xs)
in
av (0.0, 0, nums)
end;
It works by defining a helper function which does the heavy lifting. These are used extensively in functional programming. In the absence of mutable state, a common trick is to explicitly pass as parameters quantities which would be successively modified by a corresponding loop in an imperative language. Such parameters are often called accumulators since they typically accumulate growing lists, running sums, running products, etc. Here s and n are the accumulators, with s the sum of the elements and n the length of the list. In the basis case of (s,n,[]) there is nothing more to accumulate so the final answer is returned. In the non-basis case, (s,n,x::xs), s and n are modified appropriately and passed to the helper function along with the tail of the list. The definition of av is tail-recursive hence will run with the speed of a loop without growing the stack. The only thing that the overall average function needs to do is to invoke the helper function with the appropriate initial values. The let ... helper def ... in ... helper called with start-up values ...end is a common idiom used to prevent the top-level of a program from being cluttered with helper functions.
Since only non-empty lists can have averages, an alternative on John Coleman's answer is:
fun average [] = NONE
| average nums =
let
fun av (s,n,[]) = s/Real.fromInt(n)
| av (s,n,x::xs) = av (s+x,n+1,xs)
in
SOME (av (0.0, 0, nums))
end;
Whether a function for calculating averages should take non-empty lists into account depends on whether you intend to export it or only use it within a scope in which you guarantee elsewhere that the input list is non-empty.

Boolean (BitArray) multidimensional array indexing or masking in Julia?

As part of a larger algorithm, I need to produce the residuals of an array relative to a specified limit. In other words, I need to produce an array which, given someArray, comprises elements which encode the amount by which the corresponding element of someArray exceeds a limit value. My initial inclination was to use a distributed comparison to determine when a value has exceeded the threshold. As follows:
# Generate some test data.
residualLimit = 1
someArray = 2.1.*(rand(10,10,3).-0.5)
# Determine the residuals.
someArrayResiduals = (residualLimit-someArray)[(residualLimit-someArray.<0)]
The problem is that the someArrayResiduals is a one-dimensional vector containing the residual values, rather than a mask of (residualLimit-someArray). If you check [(residualLimit-someArray.<0)] you'll find that it is behaving as expected; it's producing a BitArray. The question is, why doesn't Julia allow to use this BitArray to mask someArray?
Casting the Bools in the BitArray to Ints using int() and distributing using .*produces the desired result, but is a bit inelegant... See the following:
# Generate some test data.
residualLimit = 1
someArray = 2.1.*(rand(10,10,3).-0.5)
# Determine the residuals.
someArrayResiduals = (residualLimit-someArray).*int(residualLimit-someArray.<0)
# This array should be (and is) limited at residualLimit. This is correct...
someArrayLimited = someArray + someArrayResiduals
Anyone know why a BitArray can't be used to mask an array? Or, any way that this entire process can be simplified?
Thanks, all!
Indexing with a logical array simply selects the elements at indices where the logical array is true. You can think of it as transforming the logical index array with find before doing the indexing expression. Note that this can be used in both array indexing and indexed assignment. These logical arrays are often themselves called masks, but indexing is more like a "selection" operation than a clamping operation.
The suggestions in the comments are good, but you can also solve your problem using logical indexing with indexed assignment:
overLimitMask = someArray .> residualLimit
someArray[overLimitMask] = residualLimit
In this case, though, I think the most readable way to solve this problem is with min or clamp: min(someArray, residualLimit) or clamp(someArray, -residualLimit, residualLimit)

Optimisation tool in R to find the input parameter of function that minimises output value?

I wish to find an optimisation tool in R that lets me determine the value of an input parameter (say, a specific value between 0.001 and 0.1) that results in my function producing a desired output value.
My function takes an input parameter and computes a value. I want this output value to exactly match a predetermined number, so the function outputs the absolute of the difference between these two values; when they are identical, the output of the function is zero.
I've tried optimize(), but it seems to be set up to minimise the input parameter, not the output value. I also tried uniroot(), but it produces the error f() values at end points not of opposite sign, suggesting that it doesn't like the fact that increasing/decreasing the input parameter reduces the output value up to a point, but going beyond that point then increases it again.
Apologies if I'm missing something obvious here—I'm completely new to optimising functions.
Indeed you are missing something obvious:-) It's very obvious how you should/could formulate your problem.
Assuming the function that must equal a desired output value is f.
Define a function g satisfying
g <- function(x) f(x) - output_value
Now you can use uniroot to find a zero of g. But you must provide endpoints that satisfy the requirements of uniroot. I.e. the value of g for one endpoint must be positive and the value of g for the other endpoint must be negative (or the other way around).
Example:
f <- function(x) x - 10
g <- function(x) f(x) - 8
then
uniroot(g,c(0,20))
will do what you want but
uniroot(g,c(0,2))
will issue the error message values at end points not of opposite sign.
You could also use an optimization function but then you want to minimize the function g. To set you straight: optimize does not minimize the input paramater. Read the help thoroughly.

how to reduce dimensionality of vector

I have a set of vectors. I'm working on ways to reduce a n-dimensional vector to a unary value (1-d), say
(x1,x2,....,xn) ------> y
This single value needs to be the characteristic value of the vector. Each unique vector produces a unique output value. Which of the following methods is appropriate:
1- norm of the vector - square root of sum of squares that measures euclidian distance from origin
2- compute hash of F, using some hashing techniques avoiding collision
3- use linear regression to compute, y = w1*x1 + w2*x2 + ... + wn*xn - unlikely to be good if there is no good dependence of input values on output
4- feature extraction technique like PCA that assigns weights to each of x1,x2,..xn based on
the set of input vectors
It's unclear from the method what properties you need this transform to have, so I'm making a guess that you don't need the transformation to preserve any properties other than uniqueness, and possibly invertibility.
None of the techniques you suggest can in general avoid collisions:
Norm - two vectors pointing in opposite directions have the same norm.
Hash - if the input isn't known apriori - what is generally meant by hash function has a finite image, and you have an infinite number of possible vectors - no good.
It's easy to find to vectors which give the same result for any linear regression result (think about it).
PCA is a specific kind of linear transformation - hence the same problem as with linear regression.
So - if you're just looking for uniqueness, you could "stringify" your vectors. One way to do it is to write them down as text strings, with the different coordinates separated by a special character (underscore, for example). Then take the binary value of this string as your representation.
If space is important and you need a more efficient representation, you could think of a more efficient bit encoding: each character in the set 0,1,...,9,'.','' can be represented by 4 bits - a hexadecimal digit (map '.' to A and '' to B). Now encode this string as a hexadecimal number, saving half the space.

Resources