enumerate in R over dataframe rows

enumerate in R over dataframe rows - r

I'm trying to modify a function so that if I put in a dataframe, I get the rownumber and row output.
These functions taken from Zip or enumerate in R? are a good starting point for me:
zip <- function(...) {
mapply(list, ..., SIMPLIFY = FALSE)
}
enumerate <- function(...) {
zip(k=seq_along(..1), ...)
}
I modified enumerate to work as I want when the input is a dataframe:
enumerate2 <- function(...){
mod <- ..1
if(is.data.frame(mod)){
mod = split(mod, seq(nrow(mod)))
}
zip(k = seq_along(mod), ...)
}
So for example:
g = data.frame(a = c(1, 2, 3), b = c(4, 5, 6))
enumerate2(v = g)
This will enumerate the rows of a dataframe, so I can do:
for(i in enumerate2(v = g)){
"rowNumber = %s, rowValues = %s" %>% sprintf(i$k, list(i$v)) %>% print
}
The problem is I get a warning:
Warning message:
In mapply(list, ..., SIMPLIFY = FALSE) :
longer argument not a multiple of length of shorter
Also, I'd rather the dataframe still be a dataframe so that I can do things like i$v$b to return the value of row i$k column b from the dataframe.
How can I get rid of the warning, and how can I keep the dataframe structure after split?
edit:
example 1 - data frame input
output:
enumerate2(v = data.frame(A = c(1, 2), B = c(3, 4)))
[[1]]
[[1]]$k
[1] 1
[[1]]$v
A B
1 1 3
[[2]]
[[2]]$k
[1] 2
[[2]]$v
A B
1 2 4
example 2 - list input
output:
enumerate2(v = LETTERS[1:2])
[[1]]
[[1]]$k
[1] 1
[[1]]$v
[1] "A"
[[2]]
[[2]]$k
[1] 2
[[2]]$v
[1] "B"

Related

Updating a list object in the global environment from within a function in R

I am trying to update a global list from inside a function.
Here is the code that does not work (can be sourced as a whole file):
require(rlang)
(my_list <- list(a = 1, b = "two", c = "set outside"))
print( paste("my_list$c is" , my_list$c) )
my_function <- function(x = 1, y = 2, parent_object_name = "my_list") {
z <- x + y # do some stuff (irrelevant here)
some_names <- "updated inside"
upper_env_object_name <- paste0(parent_object_name, "$c")
# browser()
# env_poke(env = env_tail(), upper_env_object_name, some_names) # does not work
# env_poke(env = env_parents()[[1]], upper_env_object_name, some_names) # does not work
env_poke(env = caller_env(), upper_env_object_name, some_names ) # creates `my_list$c` character vector
# force(env_poke(env = caller_env(), upper_env_object_name, some_names )) # creates `my_list$c` character vector
# browser()
# env_poke(env = caller_env(), paste0("as.list(",upper_env_object_name,")"), some_names) # creates as.list(my_list$c)` character vector
return(z)
}
my_function(x = 1, y = 2, parent_object_name = "my_list")
print(class(`my_list$c`))
print( `my_list$c`)
print( paste("my_list$c is" , my_list$c) )
I found this but it does not help:
Updating a nested list object in the global environment from within a function in R
Tried also with assign, and specifying the environment.
Background: I have some S3- subclases and want to keep track of them in the parent class object, which is also a list. The subclass objects are created "on-demand" and I want to have an overview what was created. My workaround for now is to create a new vector in the global environment and update it with :
if (exists("global_names_list")) global_names_list <<- unique(rbind(global_names_list, some_names)) else global_names_list <<- some_names

Modify List in Global Environment
Subscript the environment like this:
f <- function(listname = "my_list", envir = .GlobalEnv) {
envir[[listname]]$c <- "some value"
}
# test
my_list <- list(a = 1, b = "two", c = "set outside")
f()
str(my_list)
## List of 3
## $ a: num 1
## $ b: chr "two"
## $ c: chr "some value"
Functional Approach
Note that working via side effects such as the code above is not the usual style used in R. Rather an object oriented style using Reference Classes or other object oriented framework or a functional style is more common. For the functional style here is an example:
g <- function(x) modifyList(x, list(c = "somevalue"))
# test
my_list <- list(a = 1, b = "two", c = "set outside")
my_list <- g(my_list)
str(my_list)
## List of 3
## $ a: num 1
## $ b: chr "two"
## $ c: chr "somevalue"
Example of object oriented processing
Regarding the background paragraph at the end of the question here is an example of where we have a top object that contains properties a and last (both numeric) and a method add. There are any number of sub-objects that have property 'b' and inherit the add method to add a in top to b in the current object. last is the value of the last sum that was calculated by any sub-object.
library(proto)
top <- proto(a = 1,
last = NULL,
add = function(.) { top$last <- .$a + .$b; .$last }
)
sub1 <- top$proto(b = 2)
sub1$add()
# [1] 3
top$last
# [1] 3
sub2 <- top$proto(b = 3)
sub2$add()
# [1] 4
top$last
# [1] 4

This is probably not the best way, but it is a way:
my_list <- list(a = 1, b = "two", c = "set outside")
my_function <- function(x = 1, y = 2, parent_object_name = "my_list"){
z <- x+y
some_names <- "updated inside"
lst <- get(parent_object_name)
lst$c <- some_names
assign(parent_object_name, lst, envir = .GlobalEnv)
return(z)
}
my_function()
#> [1] 3
#check
my_list
#> $a
#> [1] 1
#>
#> $b
#> [1] "two"
#>
#> $c
#> [1] "updated inside"

Using lapply on a diverse list in R

I created a list containing sub-lists, each sub-list containing information for one task I want R to do.
df <- as.data.frame(matrix(1:6 , ncol =2 , nrow = 3))
colnames(df) <- c("Col1", "Col2")
myList <- list()
myList[["Dataset1"]] <- list()
myList[["Dataset1"]]["Function"] <- "mean"
myList[["Dataset1"]][["DataFrame"]] <- df
myList[["Dataset2"]] <- list()
myList[["Dataset2"]]["Function"] <- "lm"
myList[["Dataset2"]][["DataFrame"]] <- df*2
Now I want R to apply the Function to the Dataframe and Store the results in a new List. How do I do that best?
So far I had two ideas: I either use lapply to run through the list, each time accessing several items from the sub-list, supplying it to a new function
myResult <- lapply(myList, FUN = myList[["Dataset1"]]["Function"](x) , x = as.matrix(myList[["Dataset1"]][["DataFrame"]]))
But I dont know how to tell R how to cycle correctly through the sub-lists.
Second, I was hoping to be able to send the whole sublist to a function, but I could not get this to run either.
myFunction <- function(x){
TempData <- x[["DataFrame"]]
TempFunction <- x["Function"]
TempResult <- get(TempFunction)(TempData)
return(TempResult)
}
myResult <- lapply(myList, myFunction(x))
If someone could give me an idea how to solve this id be very happy.
Many thanks in advance!

Sounds more complicated than it should be.. And it's not clear how you would apply mean() or lm(). Below is an example where you store the function as an object in the list:
myList[["Dataset1"]] <- list()
myList[["Dataset1"]][["Function"]] <- function(x)mean(as.matrix(x))
myList[["Dataset1"]][["DataFrame"]] <- df
myList[["Dataset2"]] <- list()
myList[["Dataset2"]][["Function"]] <- function(x){lm(Col2~Col1,data=x)}
myList[["Dataset2"]][["DataFrame"]] <- df*2
So you iterate through elements in the list and you get the result of function(dataframe) for each element:
lapply(myList,function(i)i$Function(i$DataFrame))
$Dataset1
[1] 3.5
$Dataset2
Call:
lm(formula = Col2 ~ Col1, data = x)
Coefficients:
(Intercept) Col1
6 1

A first thing you can do is use the match.fun function that allows you to recover a function from a string. Applied to a loop, it returns the following solution :
list<-list()
for (i in 1:length(myList)){
TempFunc <- match.fun(myList[[i]][["Function"]])
Results <- TempFunc(myList[[i]][["DataFrame"]])
list[i]<-Results
}
> list
[[1]]
[1] NA
[[2]]
(Intercept) Col2
-6 1
Actually, the functions don't make sense because the functions are not appropriate.

The function call allows you to call a function by its name given as a character string, and you can evaluate this call with eval. Note though, that neither of your function calls make sense applied to a data frame:
lapply(myList, function(sublist) call(sublist$Function, sublist[["DataFrame"]]))
#> $Dataset1
#> mean(list(Col1 = 1:3, Col2 = 4:6))
#> $Dataset2
#> lm(list(Col1 = c(2, 4, 6), Col2 = c(8, 10, 12)))
So the results are kind of useless for the examples used:
lapply(myList, function(sublist) eval(call(sublist$Function, sublist[["DataFrame"]])))
#> $Dataset1
#> [1] NA
#>
#> $Dataset2
#>
#> Call:
#> lm(formula = structure(list(Col1 = c(2, 4, 6), Col2 = c(8, 10,
#> 12)), class = "data.frame", row.names = c(NA, -3L)))
#>
#> Coefficients:
#> (Intercept) Col2
#> -6 1
#>
#>
#> Warning message:
#> In mean.default(list(Col1 = 1:3, Col2 = 4:6)) :
#> argument is not numeric or logical: returning NA

Creating mock data with natural decreasing numbers

I want to create random mock data looks like this.
__ID__|__Amount__
1 20
1 14
1 9
1 3
2 11
2 5
2 2
Starting from the random number but the second number with the same ID should be lesser than the first one, and the third number has to be lesser than the second one. Maximum number to start should be 20.

you can just create the data first and then sort it as you need, using tidyverse :
set.seed(0)
df <- data.frame(id = rep(1:3,10), amt = sample(1:20, 30, replace = TRUE))
df %>%
group_by(id) %>%
arrange(id, desc(amt))

This is a tricky one if you want the Amount column to be truly random values you can use a recursive call that will use sample recursively:
## Recursively sampling from a uniform distribution
recursive.sample <- function(start, end, length, results = NA, counter =0) {
## To enter the recursion, counter must be smaller than the length out
## and the last result must be smaller than the starting point (except the firs time)
if(counter < length && ifelse(counter != 0, results[counter] > start, TRUE)){
## Increment the counter
counter <- counter + 1
## Sample between start and the last result or the start and the end of the vector
results[counter] <- ifelse(counter != 1, sample(start:results[counter-1], 1), sample(start:end, 1))
## Recursive call
return(recursive.sample(start = start, end = end, length = length, results = results, counter = counter))
} else {
## Exit the recursion
return(results)
}
}
## Example
set.seed(0)
recursive.sample(start = 1, end = 20, length = 3, results = NA, counter = 0)
#[1] 18 5 2
Alternatively (and way easier) you can use sort(sample()):
set.seed(0)
sort(sample(1:20, 3), decreasing = TRUE)
#[1] 18 7 6
Note that the results differ due to the lower probability of sampling higher values in the recursive function.
You can then easily create your table with your chosen function as follow:
set.seed(123)
## The ID column
ID <- c(rep(1, 4), rep(2,3))
## The Amount column
Amount <- c(recursive.sample(1, 20, 4, NA, 0), recursive.sample(1, 11, 3, NA, 0))
## The table
cbind(ID, Amount)
# ID Amount
#[1,] 1 18
#[2,] 1 5
#[3,] 1 2
#[4,] 1 2
#[5,] 2 10
#[6,] 2 3
#[7,] 2 3
Or, again, with the simple sort(sample()) function for a higher probability of picking larger numbers.

Two methods, one using dplyr and one using only base R functions. These are slightly different to the two previous solutions.
I used sorted ID column, but this is not necessary.
Method 1
rm(list = ls())
set.seed(1)
df <- data.frame(ID = rep(1:3, each = 5))
df %>% group_by(ID) %>%
mutate(Amount = sort(sample(1 : 20, n(), replace = T), decreasing = TRUE))
Method 2
rm(list = ls())
set.seed(1)
df <- data.frame(ID = rep(1:3, each = 5))
df$Amount <- NA
uniq_ID <- unique(df$ID)
index_lst <- lapply(uniq_ID, function(x) which(df$ID == x))
res <- lapply(index_lst, function(x) sort(sample(1 : 20, length(x)),
decreasing = TRUE))
df$Amount[unlist(index_lst)] <- unlist(res)
Method 2.5
This is more convoluted than the 2nd method.
rm(list = ls())
set.seed(1)
df <- data.frame(ID = rep(1:3, each = 5))
df$Amount <- NA
tab <- as.data.frame(table(df$ID))
lapply(1 : nrow(tab), function(x) df$Amount[which(df$ID == tab$Var1[x])] <<-
sort(sample(1 : 20, tab$Freq[x]), decreasing = TRUE))

Find the indices of an element in a nested list?

I have a list like:
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
is there an (loop-free) way to identify the positions of the elements, e.g. if I want to replace a values of "C" with 5, and it does not matter where the element "C" is found, can I do something like:
Aindex <- find_index("A", mylist)
mylist[Aindex] <- 5
I have tried grepl, and in the current example, the following will work:
mylist[grepl("C", mylist)][[1]][["C"]]
but this requires an assumption of the nesting level.
The reason that I ask is that I have a deep list of parameter values, and a named vector of replacement values, and I want to do something like
replacements <- c(a = 1, C = 5)
for(i in names(replacements)){
indx <- find_index(i, mylist)
mylist[indx] <- replacements[i]
}
this is an adaptation to my previous question, update a node (of unknown depth) using xpath in R?, using R lists instead of XML

One method is to use unlist and relist.
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
tmp <- as.relistable(mylist)
tmp <- unlist(tmp)
tmp[grep("(^|.)C$",names(tmp))] <- 5
tmp <- relist(tmp)
Because list names from unlist are concatenated with a ., you'll need to be careful with grep and how your parameters are named. If there is not a . in any of your list names, this should be fine. Otherwise, names like list(.C = 1) will fall into the pattern and be replaced.

Based on this question, you could try it recursively like this:
find_and_replace <- function(x, find, replace){
if(is.list(x)){
n <- names(x) == find
x[n] <- replace
lapply(x, find_and_replace, find=find, replace=replace)
}else{
x
}
}
Testing in a deeper mylist:
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3, d = list(C=10, D=55)))
find_and_replace(mylist, "C", 5)
$a
[1] 1
$b
$b$A
[1] 1
$b$B
[1] 2
$c
$c$C ### it worked
[1] 5
$c$D
[1] 3
$c$d
$c$d$C ### it worked
[1] 5
$c$d$D
[1] 55

This can now also be done using rrapply in the rrapply-package (an extended version of base rapply). To return the position of an element in the nested list based on its name, we can use the special arguments .xpos and .xname. For instance, to look up the position of the element with name "C":
library(rrapply)
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
## get position C-node
(Cindex <- rrapply(mylist, condition = function(x, .xname) .xname == "C", f = function(x, .xpos) .xpos, how = "unlist"))
#> c.C1 c.C2
#> 3 1
We could then update its value in the nested list with:
## update value C-node
mylist[[Cindex]] <- 5
The two steps can also be combined directly in the call to rrapply:
rrapply(mylist, condition = function(x, .xname) .xname == "C", f = function(x) 5, how = "replace")
#> $a
#> [1] 1
#>
#> $b
#> $b$A
#> [1] 1
#>
#> $b$B
#> [1] 2
#>
#>
#> $c
#> $c$C
#> [1] 5
#>
#> $c$D
#> [1] 3

access plyr id variables within functions

I regulary have the problem that I need to access the actual id variable when using d*ply or l*ply. A simple (yet nonsense) example would be:
df1 <- data.frame( p = c("a", "a", "b", "b"), q = 1:4 )
df2 <- data.frame( m = c("a", "b" ), n = 1:2 )
d_ply( df1, "p", function(x){
actualId <- unique( x$p )
print( mean(x$q)^df2[ df2$m == actualId, "n" ] )
})
So in case of d*ply functions I can help myself with unique( x$p ). But when it comes to l*ply, I have no idea how to access the name of the according list element.
l_ply( list(a = 1, b = 2, c = 3), function(x){
print( <missing code> )
})
# desired output
[1] "a"
[1] "b"
[1] "c"
Any suggestions? Anything I am ignoring?

One way I've gotten around this is to loop over the index (names) and do the subsetting within the function.
l <- list(a = 1, b = 2, c = 3)
l_ply(names(l), function(x){
print(x)
myl <- l[[x]]
print(myl)
})
myl will then be the same as
l_ply(l, function(myl) {
print(myl)
})

Here's one idea.
l_ply( list(a = 1, b = 2, c = 3), function(x){
print(eval(substitute(names(.data)[i], parent.frame())))
})
# [1] "a"
# [1] "b"
# [1] "c"
(Have a look at the final code block of l_ply to see where I got the names .data and i.)

I'm not sure there's a way to do that, because the only argument to your anonymous function is the list element value, without its name :
l_ply( list(a = 1, b = 2, c = 3), function(x){
print(class(x))
})
[1] "numeric"
[1] "numeric"
[1] "numeric"
But if you get back the results of your command as a list or a data frame, the names are preserved for you to use later :
llply( list(a = 1, b = 2, c = 3), function(x){
x
})
$a
[1] 1
$b
[1] 2
$c
[1] 3
Aside from Josh solution, you can also pass both names and values of your list elements to a function with mapply or m*ply :
d <- list(a = 1, b = 2, c = 3)
myfunc <- function(value, name) {
print(as.character(name))
print(value)
}
mapply(myfunc, d, names(d))
m_ply(data.frame(value=unlist(d), name=names(d)), myfunc)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

enumerate in R over dataframe rows - r

Related

Updating a list object in the global environment from within a function in R

Using lapply on a diverse list in R

Creating mock data with natural decreasing numbers

Find the indices of an element in a nested list?

access plyr id variables within functions

Categories

Resources