Subtracting list elements from another list in R - r

I have two lists and I want to subtract one list element wise with the other, in order to replicate a Matlab function bsxfun(#minus, lt, lt2). The two lists look something like the below (edit: now works without pracma package):
# Code
# First list
lt = c(list())
# I use these lines to pre-dim the list...
lt[[1]] = c(rep(list(1)))
lt[[2]] = c(rep(list(1)))
# ... such that I can add matrices it this way:
lt[[1]][[1]] = matrix(c(3),nrow=1, ncol=1,byrow=TRUE)
lt[[2]][[1]] = matrix(c(1),nrow=1, ncol=1, byrow=TRUE)
# Same with the second list:
lt2 = c(list())
lt2[[1]] = c(rep(list(1)))
lt2[[2]] = c(rep(list(1)))
lt2[[1]][[1]] = matrix(c(2,2,2),nrow=3, ncol=1,byrow=TRUE)
lt2[[2]][[1]] = matrix(c(1,1,1),nrow=3, ncol=1,byrow=TRUE)
Element wise subtraction would mean that that each row of an element of lt2 would be subtracted
by the respective element of the object lt, i.e., lt2[[1]][[1]] each row by 3, resulting in t(c(-1 -1 -1)).... and lt2[[2]][[1]] = t(c(0,0,0)) by 1 ... It is important to me that the list structure is maintained in the results.
Now I tried using lapply(lt2,"-",lt) but it does not work. Any suggestions?

I suspect you are looking for something like this skeleton code which subtracts 2 lists element-wise...
x <- list(1,2,3)
y <- list(4,5,6)
mapply('-', y, x, SIMPLIFY = FALSE)
but as noted, you need 2 identical lists (or at least R's recycling algorithms must make sense) as for example...
z <- list(4,5,6,7,8,9)
mapply('-',z,x,SIMPLIFY = FALSE)
You might be looking for something like this where you subtract a constant from each member of the list...
mapply('-',y,2, SIMPLIFY= FALSE)

I figured it out - I had another mistake in the question :/
Changing the second class as.numeric worked
lt3 = lapply(lt2[[1]],"-",as.numeric(lt[[1]]))

Related

Creating a simple for loop in R

I have a tibble called 'Volume' in which I store some data (10 columns - the first 2 columns are characters, 30 rows).
Now I want to calculate the relative Volume of every column that corresponds to Column 3 of my tibble.
My current solution looks like this:
rel.Volume_unmod = tibble(
"Volume_OD" = Volume[[3]] / Volume[[3]],
"Volume_Imp" = Volume[[4]] / Volume[[3]],
"Volume_OD_1" = Volume[[5]] / Volume[[3]],
"Volume_WS_1" = Volume[[6]] / Volume[[3]],
"Volume_OD_2" = Volume[[7]] / Volume[[3]],
"Volume_WS_2" = Volume[[8]] / Volume[[3]],
"Volume_OD_3" = Volume[[9]] / Volume[[3]],
"Volume_WS_3" = Volume[[10]] / Volume[[3]])
rel.Volume_unmod
I would like to keep the tibble structure and the labels. I am sure there is a better solution for this, but I am relative new to R so I it's not obvious to me. What I tried is something like this, but I can't actually run this:
rel.Volume = NULL
for(i in Volume[,3:10]){
rel.Volume[i] = tibble(Volume = Volume[[i]] / Volume[[3]])
}
Mockup Data
Since you did not provide some data, I've followed the description you provided to create some mockup data. Here:
set.seed(1)
Volume <- data.frame(ID = sample(letters, 30, TRUE),
GR = sample(LETTERS, 30, TRUE))
Volume[3:10] <- rnorm(30*8)
Solution with Dplyr
library(dplyr)
# rename columns [brute force]
cols <- c("Volume_OD","Volume_Imp","Volume_OD_1","Volume_WS_1","Volume_OD_2","Volume_WS_2","Volume_OD_3","Volume_WS_3")
colnames(Volume)[3:10] <- cols
# divide by Volumn_OD
rel.Volume_unmod <- Volume %>%
mutate(across(all_of(cols), ~ . / Volume_OD))
# result
rel.Volume_unmod
Explanation
I don't know the names of your columns. Probably, the names correspond to the names of the columns you intended to create in rel.Volume_unmod. Anyhow, to avoid any problem I renamed the columns (kinda brutally). You can do it with dplyr::rename if you wan to.
There are many ways to select the columns you want to mutate. mutate is a verb from dplyr that allows you to create new columns or perform operations or functions on columns.
across is an adverb from dplyr. Let's simplify by saying that it's a function that allows you to perform a function over multiple columns. In this case I want to perform a division by Volum_OD.
~ is a tidyverse way to create anonymous functions. ~ . / Volum_OD is equivalent to function(x) x / Volumn_OD
all_of is necessary because in this specific case I'm providing across with a vector of characters. Without it, it will work anyway, but you will receive a warning because it's ambiguous and it may work incorrectly in same cases.
More info
Check out this book to learn more about data manipulation with tidyverse (which dplyr is part of).
Solution with Base-R
rel.Volume_unmod <- Volume
# rename columns
cols <- c("Volume_OD","Volume_Imp","Volume_OD_1","Volume_WS_1","Volume_OD_2","Volume_WS_2","Volume_OD_3","Volume_WS_3")
colnames(rel.Volume_unmod)[3:10] <- cols
# divide by columns 3
rel.Volume_unmod[3:10] <- lapply(rel.Volume_unmod[3:10], `/`, rel.Volume_unmod[3])
rel.Volume_unmod
Explanation
lapply is a base R function that allows you to apply a function to every item of a list or a "listable" object.
in this case rel.Volume_unmod is a listable object: a dataframe is just a list of vectors with the same length. Therefore, lapply takes one column [= one item] a time and applies a function.
the function is /. You usually see / used like this: A / B, but actually / is a Primitive function. You could write the same thing in this way:
`/`(A, B) # same as A / B
lapply can be provided with additional parameters that are passed directly to the function that is being applied over the list (in this case /). Therefore, we are writing rel.Volume_unmod[3] as additional parameter.
lapply always returns a list. But, since we are assigning the result of lapply to a "fraction of a dataframe", we will just edit the columns of the dataframe and, as a result, we will have a dataframe instead of a list. Let me rephrase in a more technical way. When you are assigning rel.Volume_unmod[3:10] <- lapply(...), you are not simply assigning a list to rel.Volume_unmod[3:10]. You are technically using this assigning function: [<-. This is a function that allows to edit the items in a list/vector/dataframe. Specifically, [<- allows you to assign new items without modifying the attributes of the list/vector/dataframe. As I said before, a dataframe is just a list with specific attributes. Then when you use [<- you modify the columns, but you leave the attributes (the class data.frame in this case) untouched. That's why the magic works.
Whithout a minimal working example it's hard to guess what the Variable Volume actually refers to. Apart from that there seems to be a problem with your for-loop:
for(i in Volume[,3:10]){
Assuming Volume refers to a data.frame or tibble, this causes the actual column-vectors with indices between 3 and 10 to be assigned to i successively. You can verify this by putting print(i) inside the loop. But inside the loop it seems like you actually want to use i as a variable containing just the index of the current column as a number (not the column itself):
rel.Volume[i] = tibble(Volume = Volume[[i]] / Volume[[3]])
Also, two brackets are usually used with lists, not data.frames or tibbles. (You can, however, do so, because data.frames are special cases of lists.)
Last but not least, initialising the variable rel.Volume with NULL will result in an error, when trying to reassign to that variable, since you haven't told R, what rel.Volume should be.
Try this, if you like (thanks #Edo for example data):
set.seed(1)
Volume <- data.frame(ID = sample(letters, 30, TRUE),
GR = sample(LETTERS, 30, TRUE),
Vol1 = rnorm(30),
Vol2 = rnorm(30),
Vol3 = rnorm(30))
rel.Volume <- Volume[1:2] # Assuming you want to keep the IDs.
# Your data.frame will need to have the correct number of rows here already.
for (i in 3:ncol(Volume)){ # ncol gives the total number of columns in data.frame
rel.Volume[i] = Volume[i]/Volume[3]
}
A more R-like approach would be to avoid using a for-loop altogether, since R's strength is implicit vectorization. These expressions will produce the same result without a loop:
# OK, this one messes up variable names...
rel.V.2 <- data.frame(sapply(X = Volume[3:5], FUN = function(x) x/Volume[3]))
rel.V.3 <- data.frame(Map(`/`, Volume[3:5], Volume[3]))
Since you said you were new to R, frankly I would recommend avoiding the Tidyverse-packages while you are still learing the basics. From my experience, in the long run you're better off learning base-R first and adding the "sugar" when you're more familiar with the core language. You can still learn to use Tidyverse-functions later (but then, why would anybody? ;-) ).

How to concatenate NOT as character in R?

I want to concatenate iris$SepalLength, so I can use that in a function to get the Sepal Length column from iris data frame. But when I use paste function paste("iris$", colnames(iris[3])), the result is as characters (with quotes), as "iris$SepalLength". I need the result not as a character. I have tried noquotes(), as.datafram() etc but it doesn't work.
freq <- function(y) {
for (i in iris) {
count <-1
y <- paste0("iris$",colnames(iris[count]))
data.frame(as.list(y))
print(y)
span = seq(min(y),max(y), by = 1)
freq = cut(y, breaks = span, right = FALSE)
table(freq)
count = count +1
}
}
freq(1)
The crux of your problem isn't making that object not be a string, it's convincing R to do what you want with the string. You can do this with, e.g., eval(parse(text = foo)). Isolating out a small working example:
y <- "iris$Sepal.Length"
data.frame(as.list(y)) # does not display iris$Sepal.Length
data.frame(as.list(eval(parse(text = y)))) # DOES display iris.$Sepal.Length
That said, I wanted to point out some issues with your function:
The input variable appears to not do anything (because it is immediately overwritten), which may not have been intended.
The for loop seems broken, since it resets count to 1 on each pass, which I think you didn't mean. Relatedly, it iterates over all i in iris, but then it doesn't use i in any meaningful way other than to keep a count. Instead, you could do something like for(count in 1 : length(iris) which would establish the count variable and iterate it for you as well.
It's generally better to avoid for loops in R entirely; there's a host of families available for doing functions to (e.g.) every column of a data frame. As a very simple version of this, something like apply(iris, 2, table) will apply the table function along margin 2 (the columns) of iris and, in this case, place the results in a list. The idea would be to build your function to do what you want to a single vector, then pass each vector through the function with something from the apply() family. For instance:
cleantable <- function(x) {
myspan = seq(min(x), max(x)) # if unspecified, by = 1
myfreq = cut(x, breaks = myspan, right = FALSE)
table(myfreq)
}
apply(iris[1:4], 2, cleantable) # can only use first 4 columns since 5th isn't numeric
would do what I think you were trying to do on the first 4 columns of iris. This way of programming will be generally more readable and less prone to mistakes.

Expand grid in R with paste

I am trying to analyse a dataframe using hierarchical clustering hclust function in R.
I would like to pass in a vector of p values I'll write beforehand (maybe something like c(5/4, 3/2, 7/4, 9/4)) and be able to have these specified as the different p value options with Minkowski distance when I use expand.grid. Ideally, when hyperparams is viewed, it would also be clear which value of p has been used for each minkowski, i.e. they should be labelled. So for example, where (if you run my code for hyperparams) there would currently just be one minkowski under Dists, for each of the methods in Meths, there would be, if I supplied the p vector as c(5/4, 3/2, 7/4, 9/4), now instead 4 rows for Minkowski distance: minkowski, p=5/4, minkowski, p=3/2, minkowski, p=7/4, minkowski, p=9/4 (or looking something like that, making the p values clear). Any ideas?
(Note: no packages please, only base R!)
Edit: I worded it poorly before, now rewritten. Let's take the following example instead:
acc <- function(x){
first = sum(x)
second = sum(x^2)
return(list(First=first,Second=second))
}
iris0 <- iris
iris1 <- cbind(log(iris[,1:4]),iris[5])
iris2 <- cbind(sqrt(iris[,1:4]),iris[5])
Now the important bit:
tests <- expand.grid(Dists=c("euclidean","maximum","manhattan","canberra","binary"),
DS=c("iris0","iris1","iris2"))
Table <- Map(function(x, ds){acc(table(ds$Species, cutree(hclust(dist(get(ds)[,1:4], method=x)),3)))},tests[[1]], tests[[2]])
This will work. But now if I want to include a term like "minkowski",p=3 in expand.grid, how would I do it?
tests <- expand.grid(Dists=c("euclidean","maximum","manhattan","canberra","binary","minkowski,p=3"),
DS=c("iris0","iris1","iris2"))
Table <- Map(function(x, ds){acc(table(ds$Species, cutree(hclust(dist(get(ds)[,1:4], method=x)),3)))},tests[[1]], tests[[2]])
This gives an error.
In reality there should be no p argument unless the method="minkowski". I have tried to use strsplit to get the first part of the expression into ds, and a switch with strsplit to get the second part and then use parse (it would return NULL if the length of the strsplit was not 2 -- this should pass no argument, I think). The issue seems to be that strsplit is not strsplit(x,",") fails to evaluate the vectorized x but rather tries to evaluate the character x which is not a string. Can anyone suggest any workaround/fix or other method for including the minkowski,p=1.6 terms and the like?
We can create a 'p' value column
tests <- expand.grid(Dists=c("euclidean","maximum","manhattan","canberra","binary",
"minkowski3", "minkowski4", "minkowski5"),
DS=c("iris0","iris1","iris2"))
Suppose, we have another column of 'p' values in 'tests', the above solution can be changed to
tests$p <- as.list(args(dist))$p # default value
i1 <- grepl("minkowski", tests$Dists)
tests$Dists <- sub("[0-9.]+$", "", tests$Dists)
tests$p[i1] <- rep(3:5, length.out = sum(i1))
Map(function(x, ds, p){
dist1 <- dist(get(ds)[, 1:4], method = x, p = p)
ct <- cutree(hclust(dist1), 3)
acc(table(get(ds)$Species, ct))},
as.character(tests[[1]]), as.character(tests[[2]]), tests$p )

Make a new vector using elements of list and another vector interchengeably

I have a list with 20 elements each contains a vector of 2 numbers. I have also generated a sequence of numbers (20). Now I would like to construct 1 long vector that would first list the elements of intervals[[1]] and the first element of newvals[1], later intervals[[2]], newvals[2] etc etc
Help will be much appreciated. I think plyr package might be helpful although I am not sure how to structure it. help will be much appreciated!
s1 <- seq(0, 1, by = 0.05)
intervals <- Map(c, s1[-length(s1)], s1[-1])
intervals[[length(intervals)]][2] <- intervals[[length(intervals)]][2]+0.1
newvals <- seq(1,length(intervals),1)
#### HERE I WOULD LIKE TO HAVE A VECTOR IN THE FOLLOWING PATTERN
####UP TO THE LAST ELEMENT OF THE LIST:
stringreclass <- c(intervals[[1]],newvals[1]), .... , intervals[[20]],newvals[20])

Nested lists in R: extract and combine values according to the name/position of the sublists

I store my data in a multilevel nested list. This list may look like this
my.list = list()
my.list[[1]] = list("naive" = list("a"=c(1,1)) )
my.list[[2]] = list("naive" = list("b"=c(2,1)) )
my.list[[3]] = list("naive" = list("c"=c(3,1)) )
my.list[[4]] = list("naive" = list("d"=c(4,1)) )
my.list[[5]] = list("naive" = list("e"=c(5,1)) )
Now I want to do some operations on the stored values, like selecting the first elements of the 2-d vectors and combine them in another vector. Of course this can be done like the following
foo = c(my.list[[1]][["naive"]][[1]][1],
my.list[[2]][["naive"]][[1]][1],
my.list[[3]][["naive"]][[1]][1],
my.list[[4]][["naive"]][[1]][1],
my.list[[5]][["naive"]][[1]][1])
But this is really awkward if I have many more than 5 first level sublists. And I have the same feeling about for-looping through K in my.list[[K]][["naive"]][[1]][1], K=1:5.
The suggestions provided for similar questions using neat mapply, Reduce, sapply cannot be applied directly due to presence of the "naive" second level sublist.
Can someone help we with a clever and instructive solution, where values stored according to such multilevel list structure can be concatenated, added to each other or both? In example above, the final desired result could be adding respective components for all vectors, and storing the result in a new 2-d vector, that in this case would be foo = c(15,5). Your answer would help me a lot.
Yeah, sapply will definitely do the trick for you in this situation:
sapply(my.list, function(x) x[["naive"]][[1]][1])
And you can generalize from here or sapply/lapply over the indices of the naive vector if you need to.

Resources