How to check equality of two FStar.Set's - functional-programming

How can you check whether two sets are equal in FStar? The following expression is of type Type0 not Tot Prims.bool so I'm not sure how to use it to determine if the sets are equal (for example in a conditional). Is there a different function that should be used instead of Set.equal?
Set.equal (Set.as_set [1; 2; 3]) Set.empty

The sets defined in FStar.Set are using functions as representation.
Therefore, a set s of integers for instance, is nothing else than a function mapping integers to booleans.
For instance, the set {1, 2} is represented as the following function:
// {1, 2}
fun x -> if x = 1 then true
else (
if x = 2 then true
else false
)
You can add/remove value (that is, crafting a new lambda), or asks for a value being a member (that is, applying the function).
However, when it comes to comparing two sets of type T, you're out of luck : for s1 and s2 two sets, s1 = s2 means that for any value x : T, s1 x = s2 x. When the set of T's inhabitants is inifinite, this is not computable.
Solution The function representation is not suitable for you. You should have a representation whose comparaison is computable. FStar.OrdSet.fst defines sets as lists: you should use that one instead.
Note that this OrdSet module requires a total order on the values held in the set. (If you want have set of non-ordered values, I implemented that a while ago, but it's a bit hacky...)

Related

How to find the common members between different vectors by a majority voting rule

I am trying to find the common elements between multiple vectors. The current situation is a little tricky, where the common elements do not need to be the totally same, but could have some errors, say +/- 1, even the common elements do not need to show in all of these vectors, which are chosen by a majority voting rule. Besides, these vectors have different lengths. Here is an example,
a <- c(5,7,11,18,27,30);
b <- c(5,8,18,26);
c <- c(6,7,10,26,30)
5 in a, 5 in b, 6 in c, will be regarded as a common element, which will be taken the floor(the average), i.e. 5;
7 in a, 8 in b, 7 in c, will be regarded as a common element, which will be taken the floor(the average), i.e. 7;
11 in a, 10 in c, will be regarded as a common element, which will be taken the floor(the average), i.e. 10;
The same rules apply to 18,26,30
Therefore, the final result that I should get is c(5,7,10,18,26,30)
You can do this via a moving window approach. First we can combine the values into a single vector, and then we use a moving window of size 'majority' (in this case, 2). If the values in the moving window are within the error range (in this case, 1), add the lowest value to our list of common values, then remove those values. Then continue checking the values.
Here it is as a function:
#vec_list -> list of the vectors to compare
#error -> margin of error to tolerate
#returns a vector of the common numbers
get_common_values = function(vec_list, error){
all = sort(unlist(vec_list)) #put all the numbers in one vector and sort them
majority = ceiling(length(vec_list)/2) #get the number of values that constitute a majority
common = c() #initialize an empty vector to store the values
#now we'll loop over 'all' and find common values - and we'll remove them as we go
#this is basically a moving window approach
while(length(all) >= majority){ #keep going until we've gone through all the elements
vals_i = all[1:majority] #get the values at the front
if(diff(range(vals_i)) <= error){ #if the range is <= to the error, then this value can be considered one of the 'common' values
new_val = min(vals_i) #get the minimum of the values
common = c(common, new_val) #add it to our vector of common values
all = all[!(all %in% new_val:(new_val+error))] #remove any of the values that fall within this range
} else {
all = all[2:length(all)] #if the range isn't greater than the error, just remove the first element
}
}
return(common)
}
We can use it like this:
a = c(5,7,11,18,27,30)
b = c(5,8,18,26)
c = c(6,7,10,26,30)
get_common_values(list(a,b,c),1)
This returns 5 7 10 18 26 30.
It also works with more than three vectors, and with different error tolerances:
set.seed(0)
a = sample(1:100,8)
b = sample(1:100,10)
c = sample(1:100,7)
d = sample(1:100,11)
e = sample(1:100,9)
get_common_values(list(a,b,c,d,e),2)
Note that this assumes that there are no duplicates within each vector, which seems to be a valid assumption based on the info you've provided.

Assigning specific values to a boolean array

Say I am tossing a fair coin where 'tails' is assigned the value x = -1/2 and 'heads' is assigned x = 1/2.
I do this N times and I want to obtain the sum. This is what I have tried:
p = 0.5
N = 1e4
X(N,p)=(rand(N).<p)
I know this is incomplete but when I check (rand(N).<p) I see an array consisting of true, false. I interpret this as 'Tails' or 'Heads'. However, I don't know how to assign the values 1/2 and -1/2 to each of these elements in order for me to find the sum. If I simply use sum((rand(N).<p)) I do get an integer value, but I don't think this is the right way to do it because I haven't specified the values 1/2 and -1/2 anywhere.
Any help is greatly appreciated.
As indicated by the comments already, you want to do
sum(rand([-0.5, 0.5], N))
where N must be an integer (you wrote N=1e4, therefore typeof(N) == Float64 and rand won't work).
The documentation of rand (obtained by ?rand) describes what rand(S, N) does:
Pick a random element or array of random elements from the set of
values specified by S
Here, S can be an optional indexable collection, an array of values in your case (or a type like Int). So, above S = [-0.5, 0.5] and rand draws N random elements from this collection, which we can afterwards sum up.
Assigning specific values to a boolean array
Since this is the title of your question, and the answer above doesn't actually address this, let me comment on this as well.
You could do sum((rand(N).<p)-0.5), i.e. you shift all the ones to 0.5 and all the zeros to -0.5 to get the wanted result. Note that this is a general strategy: Let's say you want true to be a and false to be b, where a and b are numbers. You achieve this by (rand(N).<p)*(a-b) + b.
However, beyond being more "complicated", sum((rand(N).<p)-0.5) will allocate temporary arrays, first one of booleans, then one of numbers, the latter of which will eventually go into sum. Because of these unnecessary allocations this approach will be slower than the solution above.

Generate Unique Combinations of Integers

I am looking for help with pseudo code (unless you are a user of Game Maker 8.0 by Mark Overmars and know the GML equivalent of what I need) for how to generate a list / array of unique combinations of a set of X number of integers which size is variable. It can be 1-5 or 1-1000.
For example:
IntegerList{1,2,3,4}
1,2
1,3
1,4
2,3
2,4
3,4
I feel like the math behind this is simple I just cant seem to wrap my head around it after checking multiple sources on how to do it in languages such as C++ and Java. Thanks everyone.
As there are not many details in the question, I assume:
Your input is a natural number n and the resulting array contains all natural numbers from 1 to n.
The expected output given by the combinations above, resembles a symmetric relation, i. e. in your case [1, 2] is considered the same as [2, 1].
Combinations [x, x] are excluded.
There are only combinations with 2 elements.
There is no List<> datatype or dynamic array, so the array length has to be known before creating the array.
The number of elements in your result is therefore the binomial coefficient m = n over 2 = n! / (2! * (n - 2)!) (which is 4! / (2! * (4 - 2)!) = 24 / 4 = 6 in your example) with ! being the factorial.
First, initializing the array with the first n natural numbers should be quite easy using the array element index. However, the index is a property of the array elements, so you don't need to initialize them in the first place.
You need 2 nested loops processing the array. The outer loop ranges i from 1 to n - 1, the inner loop ranges j from 2 to n. If your indexes start from 0 instead of 1, you have to take this into consideration for the loop limits. Now, you only need to fill your target array with the combinations [i, j]. To find the correct index in your target array, you should use a third counter variable, initialized with the first index and incremented at the end of the inner loop.
I agree, the math behind is not that hard and I think this explanation should suffice to develop the corresponding code yourself.

R in simple terms - why do I have to feel like such an idiot? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
my question is simple... every reference I find in books and on the internet for learning R programming is presented in a very linear way with no context. When I try and learn things like functions, I see the code and my brain just freezes because it's looking for something to relate these R terms to and I have no frame of reference. I have a PhD and did a lot of statistics for my dissertation but that was years ago when we were using different programming languages and when it comes to R, I don't know why I can't get this into my head. Is there someone who can explain in plain english an example of this simple code? So for example:
above <- function(x, n){
use <- x > n
x[use]
}
x <- 1:20
above(x, 12)
## [1] 13 14 15 16 17 18 19 20
I'm trying to understand what's going on in this code but simply don't. As a result, I could never just write this code on my own because I don't have the language in my head that explains what is happening with this. I get stuck at the first line:
above <- function(x, n) {
Can someone just explain this code sample in plain English so I have some kind of context for understanding what I'm looking at and why I'm doing what I'm doing in this code? And what I mean by plain English is, walking through the code, step by step and not just repeating the official terms from R like vector and function and array and all these other things, but telling me, in a common sense way, what this means.
Since your background ( phd in statsitics) the best way to understand this
is in mathematics words.
Mathematically speaking , you are defining a parametric function named above that extracts all element from a vector x above a certain value n. You are just filtering the set or the vector x.
In sets notation you can write something like :
above:{x,n} --> {y in x ; y>n}
Now, Going through the code and paraphrasing it (in the left the Math side , in the right its equivalent in R):
Math R
---------------- ---------------------
above: (x,n) <---> above <- function(x, n)
{y in x ; y>n} <---> x[x > n]
So to wrap all the statments together within a function you should respect a syntax :
function_name <- function(arg1,arg2) { statements}
Applying the above to this example (we have one statement here) :
above <- function(x,n) { x[x>n]}
Finally calling this function is exactly the same thing as calling a mathematical function.
above(x,2)
ok I will try, if this is too detailed let me know, but I tried to go really slowly:
above <- function(x, n)
this defines a function, which is just some procedure which produces some output given some input, the <- means assign what is on the right hand side to what is on the left hand side, or in other words put everything on the right into the object on the left, so for example container <- 1 puts 1 into the container, in this case we put a function inside the object above,
function(x, n) everything in the paranthesis specifys what inputs the function takes, so this one takes two variables x and n,
now we come to the body of the function which defines what it does with the inputs x and n, the body of the function is everything inside the curley braces:
{
use <- x > n
x[use]
}
so let's explain that piece by piece:
use <- x > n
this part again puts whats on the right side into the object on the left, and what is happening on the right hand side? a comparison returning TRUE if x is bigger than n and FALSE if x is equal to or smaller then n, so if x is 5 and n is 3 the result will be TRUE, and this value will get stored inside use, so use contains TRUE now, now if we have more than one value inside x than every value inside x will get compared to n, so for example if x = [1, 2, 3] and n = 2
than we have
1 > 2 FALSE
2 > 2 FALSE
3 > 2 TRUE
, so use will contain FALSE, FALSE, TRUE
x[use]
now we are taking a part of x, the square brackets specify which parts of x we want, so in my example case x has 3 elements and use has 3 elements if we combine them we have:
x use
1 FALSE
2 FALSE
3 TRUE
so now we say I dont want 1,2 but i want 3 and the result is 3
so now we have defined the function, now we call it, or in normal words we use it:
x <- 1:20
above(x, 12)
first we assign the numbers 1 through 20 to x, and then we tell the function above to execute (do everything inside its curley braces with the inputs x = 1:20 and n = 12, so in other words we do the following:
above(x, 12)
execute the function above with the inputs x = 1:20 and n = 12
use <- 1:20 > 12
compare 12 to every number from 1:20 and return for each comparison TRUE if the number is in fact bigger than 12 and FALSE if otherwise, than store all the results inside use
x[use]
now give me the corresponding elements of x for which the vector use contains TRUE
so:
x use
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 FALSE
7 FALSE
8 FALSE
9 FALSE
10 FALSE
11 FALSE
12 FALSE
13 TRUE
14 TRUE
15 TRUE
16 TRUE
17 TRUE
18 TRUE
19 TRUE
20 TRUE
so we get the numbers 13:20 back as a result
I'll give it a crack too. A few basic points that should get you going in the right direction.
1) The idea of a function. Basically, a function is reusable code. Say I know that in my analysis for some bizarre reason I will often want to add two numbers, multiply them by a third, and divide them by a fourth. (Just suspend disbelief here.) So one way I could do that would just be to write the operation over and over, as follows:
(75 + 93)*4/18
(847 + 3)*3.1415/2.7182
(999 + 380302)*-6901834529/2.5
But that's tedious and error-prone. (What happens if I forget a parenthesis?) Alternatively, I can just define a function that takes whatever numbers I feed into it and carries out the operation. In R:
stupidMath <- function(a, b, c, d){
result <- (a + b)*c/d
}
That code says "I'd like to store this series of commands and attach them to the name "stupidMath." That's called defining a function, and when you define a function, the series of commands is just stored in memory---it doesn't actually do anything until you "call" it. "Calling" it is just ordering it to run, and when you do so, you give it "arguments" ---the stuff in the parentheses in the first line are the arguments it expects, i.e., in my example, it wants four distinct pieces of data, which will be called 'a', 'b', 'c', and 'd'.
Then it'll do the things it's supposed to do with whatever you give it. "The things it's supposed to do" is the stuff in the curly brackets {} --- that's the "body" of the function, which describes what to do with the arguments you give it. So now, whenever you want to carry that mathematical operation you can just "call" the function. To do the first computation, for example, you'd just write stupidMath(75, 93, 4, 18) Then the function gets executed, treating 75 as 'a', 83 as 'b', and so forth.
In your example, the function is named "above" and it takes two arguments, denoted 'x' and 'n'.
2) The "assignment operator": R is unique among major programming languages in using <- -- that's equivalent to = in most other languages, i.e., it says "the name on the left has the value on the right." Conceptually, it's just like how a variable in algebra works.
3) so the "body" of the function (the stuff in the curly brackets) first assigns the name "use" to the expression x > n. What's going on there. Well, an expression is something that the computer evaluates to get data. So remember that when you call the function, you give it values for x and n. The first thing this function does is figures out whether x is greater than n or less than n. If it's greater than n, it evaluates the expression x > n as TRUE. Otherwise, FALSE.
So if you were to define the function in your example and then call it with above(10, 5), then the first line of the body would set the local variable (don't worry right now about what a 'local' variable is) 'use' to be 'TRUE'. This is a boolean value.
Then the next line of the function is a "filter." Filtering is a long topic in R, but basically, R things of everything as a "vector," that is, a bunch of pieces of data in a row. A vector in R can be like a vector in linear algebra, i.e., (1, 2, 3, 4, 5, 99) is a vector, but it can also be of stuff other than numbers. For now let's just focus on numbers.
The wacky thing about R (one of the many wacky things about R) is that it treats a single number (a "scalar" in linear algebra terms) just as a vector with only one item in it.
Ok, so why did I just go into that? Because in lots of places in R, a vector and a scalar are interchangable.
So in your example code, instead of giving a scalar for the first argument, when we call the function we've given 'above' a vector for its first argument. R likes vectors. R really likes vectors. (Just talk to R people for a while. They're all obsessed with doing every goddmamn thing in terms of a vector.) So it's no problem to pass a vector for the first argument. But what that means is that the variable 'use' is going to be a vector too. Specifically, 'use' is going to be a vector of booleans, i.e., of TRUE or FALSE for each individual value of X.
To take a simpler version: suppose you said:
mynums <- c(5, 10)
myresult <- above(mynums, 7)
when the code runs, the first thing it's going to do is define that 'use' variable. But x is a vector now, not a scalar (the c(5,10) code said "make a vector with two elements, and fill them with the numbers '5' and '10'), so R's going to go ahead and carry out the comparison for each element of x. Since 5 is less than 7 and 10 is greater than 7, use becomes the two item-vector of boolean values (FALSE, TRUE)
Ok, now we can talk about filtering. So a vector of boolean values is called a 'logical vector.' And the code x[use] says "filter x by the stuff in the variable use." When you tell R to filter something by a logical vector, it spits back out the elements of the thing being filtered which correspond to the values of 'TRUE'
So in the example just given:
mynums <- c(5, 10)
myresult <- above(mynums, 7)
the value of myresult will just be 10. Why? Because the function filtered 'x' by the logical vector 'use,' 'x' was (5, 10), and 'use' was (FALSE, TRUE); since the second element of the logical was the only true, you only got the second element of x.
And that gets assigned to the variable myresult because myresult <- above(mynums, 7) means "assign the name myresult to the value of above(mynums, 7)"
voila.

Can any set of numbers be expressed by a function?

Is it possible to express ANY random set of numbers by a function?
Question clarification:
for example:
if desired result set = {1,2,3,4,5}
so I don't mean something like this:
function getSet(){
return {1,2,3,4,5};
}
but more like this:
function genSet(){
result = {}
for(i=0;i<5;i++){
result.push(i);
}
return result;
}
So in other words, can there be a logic to calculate any desired set?
There is a lot of mathematics behind this question. There are some interesting results.
Any set of (real) numbers can be define by a polynomial function f(x) = a + b x + c x^2 + ... so that a number is in the set if f(x)=0. Technically this is an algebraic curve in 1D. While this might seem a optimistic result there is not limit on how complex the polynomial could be and polynomials above the degree 5 have no explicit result.
There is a whole field of study on Computable numbers, real numbers which can be can be computed to within any desired precision by a finite, terminating algorithm, and their converse: non computable numbers, which can't. The bad news is there are a lot more non-computable numbers than computable ones.
The above has been based on real numbers which are decidedly more tricky than the integers or even a finite set of integers which is all we can represent by int or long datatypes. There is a big field of study in this see Computability theory (computer science). I think the Turings halting problem come in to play, this is about if you can determine if a algorithm will terminate. Unfortunately this can't be determined and a consequence is "Not every set of natural numbers is computable." The proof of this does require the infinite size of the naturals so I'm not sure about finite sets.
Representations
There are two common representations used for sets when programming. Suppose the set S is a subset of some universe of items U.
Membership Predicate
One way to represent the set S is a function member from S to { true, false }. For all x in U:
member(x) = true if x is in S
member(x) = false if x is not in S
Pseudocode
bool member(int n)
return 1 <= n <= 5
Enumeration
Another way to represent the S is to store all of its members in a data structure, such as a list, hash table, or binary tree.
Pseudocode
enumerable<int> S()
for int i = 1 to 5
yield return i
Operations
With either of these representations, most set operations can be defined. For example, the union of two sets would look as follows with each of the two representations.
Membership Predicate
func<int, bool> union(func<int, bool> s, func<int, bool> t)
return x => s(x) || t(x)
Enumeration
enumrable<int> union(enumerable<int> s, enumerable<int> t)
hashset<int> r
foreach x in s
r.add(x)
foreach x in t
if x not in r
r.add(x)
return r
Comparison
The membership predicate representation can be extremely versatile because all kinds of set operations from mathematics can be very easily expressed (complement, Cartesian product, etc.). The drawback is that there is no general way to enumerate all the members of a set represented in this way. The set of all positive real numbers, for example, cannot even be enumerated.
The enumeration representation typically involves much more expensive set operations, and some operations (such as the complement of the integer set {1, 2, 3, 4, 5}) cannot even be represented. It should be chosen if you need to be able to enumerate the members of a set, not just test membership.

Resources