Evaluating combination with vectorized function in Julia - julia

In Julia, vectorized function with dot . is used for element-wise manipulation.
Running f.(x) means f(x[1]), f(x[2]),... are sequentially executed
However, suppose I have a function which takes two arguments, say g(x,y)
I want g(x[1],y[1]),g(x[2],y[1]), g(x[3],y[1]), ..., g(x[1],y[2]), g(x[2],y[2]), g(x[3],y[2]), ...
Is there any way to evaluate all combination of x and y?

Matt's answer is good, but I'd like to provide an alternative using an array comprehension:
julia> x = 1:5
y = 10:10:50
[i + j for i in x, j in y]
5×5 Array{Int64,2}:
11 21 31 41 51
12 22 32 42 52
13 23 33 43 53
14 24 34 44 54
15 25 35 45 55
In my opinion the array comprehension can often be more readable and more flexible than broadcast and reshape.

Yes, reshape y such that it is orthogonal to x. The . vectorization uses broadcast to do its work. I imagine this as "extruding" singleton dimensions across all the other dimensions.
That means that for vectors x and y, you can evaluate the product of all combinations of x and y simply by reshaping one of them:
julia> x = 1:5
y = 10:10:50
(+).(x, reshape(y, 1, length(y)))
5×5 Array{Int64,2}:
11 21 31 41 51
12 22 32 42 52
13 23 33 43 53
14 24 34 44 54
15 25 35 45 55
Note that the shape of the array matches the orientation of the arguments; x spans the rows and y spans the columns since it was transposed to a single-row matrix.

Related

Split a vector list with M elements into 2 lists of N and M-N elements

I created a vector list, aa, with 50 elements. And I need to split aa into two vector lists called bb and cc. bb has the first 20 elements of aa while cc has the last 30 elements of aa. How do I do it?
Creation of original vector list
aa <- list (sample (1:50))
aa
#[[1]]
# [1] 29 30 39 45 17 11 43 14 24 34 3 1 28 2 21 23 6 31 5 27 44 7 4 46 49 22 33 38 50 36 15 48 8 16 25 42 13 41 47
#[40] 37 26 32 35 9 18 10 20 40 19 12
Sorry all, I know my question is really basic. Maybe it is because the question is too simple and the solution is thus not easily found from the internet.
Since I couldn't a direct question answering this adding an answer. We can first subset the list using [[ and then select individual elements in it with [.
bb <- aa[[1]][1:20]
cc <- aa[[1]][21:50]
We can also use head and tail to select first 20 and last 30 elements respectively.
bb <- head(aa[[1]], 20)
cc <- tail(aa[[1]], 30)
We can use split to create a list of vectors
lst1 <- split(aa[[1]], rep(1:2, c(20, 30)))
and extract the vector with [[
lst[[1]]
lst1[[2]]
It can be extended to any number of splits (i.e. generalized version) where we just need to change the rep

Ceil and floor values in R

I have a data.table of integers with values between 1 and 60.
My question is about flooring or ceiling any number to the following values: 12 18 24 30 36 ... 60.
For example, let's say my data.table contains the number 13. I want R to "transform" this number into 12 and 18 as 13 lies in between those numbers. Moreover, if I have 18 I want R to keep it at 18.
If my data.table contains the value 50, I want R to convert that number into 48 and 54 and so on.
My goal is to get two different data.tables. One where the floored values are saved and one where the ceiled values are saved.
Any idea how one could do this in R?
EDIT: Numbers smaller than 12 should always be transformed to 12.
Example output:
If have the following data.table data.table(c(1,28,29,41,53,53,17,41,41,53))
I want the following two output data.tables: floored values data.table(c(12,24,24,36,48,48,12,36,36,48))
I want the following two output data.tables: ceiled values data.table(c(12,30,30,42,54,54,18,42,42,54))
Here is a fairly direct way (edited to round up to 12 if any values are below):
df <- data.frame(nums = 10:20)
df$floors <- with(df,pmax(12,6*floor(nums/6)))
df$ceils <- with(df,pmax(12,6*ceiling(nums/6)))
Leading to:
> df
nums floors ceils
1 10 12 12
2 11 12 12
3 12 12 12
4 13 12 18
5 14 12 18
6 15 12 18
7 16 12 18
8 17 12 18
9 18 18 18
10 19 18 24
11 20 18 24
Here's a way we could do this, using sapply and the which.min functions. From your question, it's not immediately clear how values < 12 should be handled.
x <- 1:60
num_list <- seq(12, 60, 6)
floorr <- sapply(x, function(x){
diff_vec <- x - num_list
diff_vec <- ifelse(diff_vec < 0, Inf, diff_vec)
num_list[which.min(diff_vec)]
})
ceill <- sapply(x, function(x){
diff_vec <- num_list - x
diff_vec <- ifelse(diff_vec < 0, Inf, diff_vec)
num_list[which.min(diff_vec)]
})
tail(cbind(x, floorr, ceill))
x floorr ceill
[55,] 55 54 60
[56,] 56 54 60
[57,] 57 54 60
[58,] 58 54 60
[59,] 59 54 60
[60,] 60 60 60

use dplyr mutate() in programming

I am trying to assign a column name to a variable using mutate.
df <-data.frame(x = sample(1:100, 50), y = rnorm(50))
new <- function(name){
df%>%mutate(name = ifelse(x <50, "small", "big"))
}
When I run
new(name = "newVar")
it doesn't work. I know mutate_() could help but I'm struggling in using it together with ifelse.
Any help would be appreciated.
Using dplyr 0.7.1 and its advances in NSE, you have to UQ the argument to mutate and then use := when assigning. There is lots of info on programming with dplyr and NSE here: https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html
I've changed the name of the function argument to myvar to avoid confusion. You could also use case_when from dplyr instead of ifelse if you have more categories to recode.
df <- data.frame(x = sample(1:100, 50), y = rnorm(50))
new <- function(myvar){
df %>% mutate(UQ(myvar) := ifelse(x < 50, "small", "big"))
}
new(myvar = "newVar")
This returns
x y newVar
1 37 1.82669 small
2 63 -0.04333 big
3 46 0.20748 small
4 93 0.94169 big
5 83 -0.15678 big
6 14 -1.43567 small
7 61 0.35173 big
8 26 -0.71826 small
9 21 1.09237 small
10 90 1.99185 big
11 60 -1.01408 big
12 70 0.87534 big
13 55 0.85325 big
14 38 1.70972 small
15 6 0.74836 small
16 23 -0.08528 small
17 27 2.02613 small
18 76 -0.45648 big
19 97 1.20124 big
20 99 -0.34930 big
21 74 1.77341 big
22 72 -0.32862 big
23 64 -0.07994 big
24 53 -0.40116 big
25 16 -0.70226 small
26 8 0.78965 small
27 34 0.01871 small
28 24 1.95154 small
29 82 -0.70616 big
30 77 -0.40387 big
31 43 -0.88383 small
32 88 -0.21862 big
33 45 0.53409 small
34 29 -2.29234 small
35 54 1.00730 big
36 22 -0.62636 small
37 100 0.75193 big
38 52 -0.41389 big
39 36 0.19817 small
40 89 -0.49224 big
41 81 -1.51998 big
42 18 0.57047 small
43 78 -0.44445 big
44 49 -0.08845 small
45 20 0.14014 small
46 32 0.48094 small
47 1 -0.12224 small
48 66 0.48769 big
49 11 -0.49005 small
50 87 -0.25517 big
Following the dlyr programming vignette, define your function as follows:
new <- function(name)
{
nn <- enquo(name) %>% quo_name()
df %>% mutate( !!nn := ifelse(x <50, "small", "big"))
}
enquo takes its expression argument and quotes it, followed by quo_name converting it into a string. Since nn is now quoted, we need to tell mutate not to quote it a second time. That's what !! is for. Finally, := is a helper operator to make it valid R code. Note that with this definition, you can simply pass newVar instead of "newVar" to your function, maintaining dplyr style.
> new( newVar ) %>% head
x y newVar
1 94 -1.07642088 big
2 85 0.68746266 big
3 80 0.02630903 big
4 74 0.18323506 big
5 86 0.85086915 big
6 38 0.41882858 small
Base R solution
df <-data.frame(x = sample(1:100, 50), y = rnorm(50))
new <- function(name){
df[,name]='s'
df[,name][df$x>50]='b'
return(df)
}
I am using dplyr 0.5 so i just combine base R with mutate
new <- function(Name){
df=mutate(df,ifelse(x <50, "small", "big"))
names(df)[3]=Name
return(df)
}
new("newVar")

Having issues with adding two positions of array

Question in regards to adding arrays. I have this code below:
B[row][col] = B[row+1][col+1] + B[row][col+1];
Let say row = 2, col = 3; I don't quite understand what happens how. We have the (=) assignment so I'm guessing would assign whatever is on the right but I don't know how to count it. In this example it come up to me to be: 13 on the right side but that doesn't make sense. I would assign 13 value to b[row][col] ??? In the tracing program showed as 2. I don't understand, please help!
I'm not entirely sure what it is you're asking but essentially you have a 2D array and the B[row][col] syntax is to access a specific "cell" within the 2D array. Think of it like a grid. So what you're doing with the assignment operator is taking the values in cells B[row+1][col+1] and B[row][col+1], adding them together, and assigning that resulting value to the cell B[row][col]. Does that make sense? Also it'll be good to make sure you don't get any index out of bounds exceptions doing this.
This does somewhat depend on the tool/language you are using, for instance matlab starts indexing arrays at 1 so the first element of an array a is a[1] while languages like C/Java start indexing at 0 so the first element of an array a is a[0].
Lets assume that indexing is done like in C/Java, then consider a multidimensional array B
12 13 14 11
41 17 23 22
18 10 20 38
81 17 32 61
Then with row = 2 and col = 3 you will have that B[row][col] as the element that sits on the third row (remembering indexing starts at 0, so B[2] is the third row) and fourth column, marked here between * signs.
12 13 14 11
41 17 23 22
18 10 20 *38*
81 17 32 61
As for changing a value in the multidimensional array, it is done by assigning a new value to the index of the old value.
B[row][col] = B[row+1][col+1] + B[row][col+1];
With row=1 and col=0 we get
B[1][0] = B[2][1] + B[1][1];
B[1][0] = 10 + 17;
B[0][0] = 27;
Or:
12 13 14 11 12 13 14 11
(41) 17 23 22 (27) 17 23 22
18 10 20 38 ==> 18 10 20 38
81 17 32 61 81 17 32 61

Multiple unions

I am trying to do unions on several lists (these are actually GRanges objects not integer lists but the priciple is the same), basically one big union.
x<-sort(sample(1:20, 9))
y<-sort(sample(10:30, 9))
z<-sort(sample(20:40, 9))
mylists<-c("x","y","z")
emptyList<-list()
sapply(mylists,FUN=function(x){emptyList<-union(emptyList,get(x))})
That is just returning the list contents.
I need the equivalent of
union(x,union(y,z))
[1] 2 3 5 6 7 10 13 15 20 14 19 21 24 27 28 29 26 31 36 39
but written in an extensible and non-"variable explicit" form
A not necessarily memory efficient paradigm that will work with GRanges is
Reduce(union, list(x, y, z))
The argument might also be a GRangesList(x, y, z) for appropriate values of x etc.
x<-sort(sample(1:20, 9))
y<-sort(sample(10:30, 9))
z<-sort(sample(20:40, 9))
Both of the below produce the same output
unique(c(x,y,z))
[1] 1 2 4 6 7 8 11 15 17 14 16 18 21 23 26 28 29 20 22 25 31 32 35
union(x,union(y,z))
[1] 1 2 4 6 7 8 11 15 17 14 16 18 21 23 26 28 29 20 22 25 31 32 35
unique(unlist(mget(mylists, globalenv())))
will do the trick. (Possibly changing the environment given in the call to mget, as required.)
I think it would be cleaner to separate the "dereference" part from the n-ary union part, e.g.
dereflist <- function(l) lapply(a,get)
nunion <- function(l) Reduce(union,l)
But if you look at how union works, you'll see that you could also do
nunion <- function(l) unique(do.call(c,l))
which is faster in all the cases I've tested (much faster for long lists).
-s
This can be done by using the reduce function in the purrr package.
purrr::reduce(list(x, y, z),union)
ok this works but I am curious why sapply seems to have its own scope
x<-sort(sample(1:20, 9))
y<-sort(sample(10:30, 9))
z<-sort(sample(20:40, 9))
mylists<-c("x","y","z")
emptyList<-vector()
for(f in mylists){emptyList<-union(emptyList,get(f))}

Resources