Shuffling around elements in a list - r

My general question concerns shuffling elements around in a list efficiently.
Say I have a list:
region <- list(c(1,3,2,6),c(5,8,9),c(10,4,7))
and two constants:
value <- 2
swapin <- 5
I want to do two things. I want to remove the element == value from the vector in the list, and then add it to the vector in which the first element of that vector == swapin
The result should look like:
region <- list(c(1,3,6),c(5,8,9,2),c(10,4,7))
For the first step, the only way I can think of doing it is doing something like this:
region <- lapply(1:length(region), function(x) region[[x]][region[[x]] != value])
but this seems inefficient. My actual data could involve a very large list, and this approach seems cumbersome. Is there an easy trick to avoiding the looping going on?
For the second step, I can create an updated vector like this:
updated <- c(unlist(region[sapply(region, `[`, 1)==swap]),best)
but I am stumped on how to replace the vector currently in the list, c(5,8,9), with the updated vector, c(5,8,9,2). Maybe I can just add the element some easier way?
Can anyone help, please?

Something like this will do the trick:
region <- list(c(1,3,2,6),c(5,8,9),c(10,4,7))
value <- 2
swapin <- 5
step1 = lapply(region, function(x) x[x != value])
step2 = lapply(step1, function(x){
if(x[1]==swapin){
return(c(x, value))
} else {
return(x)
}
})
Instead of looping through region by feeding in it's element indices, you can just loop through region itself. This is actually how lapply is intended to be used - to apply a function to each element of a list. The second step replaces each element x, with x + value if the first element of x matches with swapin, or with x itself if swapin doesn't match.
Result:
> step2
[[1]]
[1] 1 3 6
[[2]]
[1] 5 8 9 2
[[3]]
[1] 10 4 7
You can also easily make it a convenience function for later use:
element_swap = function(list, value, swapin){
step1 = lapply(list, function(x) x[x != value])
step2 = lapply(step1, function(x){
if(x[1]==swapin){
return(c(x, value))
} else {
return(x)
}
})
return(step2)
}
Result:
> element_swap(region, 1, 10)
[[1]]
[1] 3 2 6
[[2]]
[1] 5 8 9
[[3]]
[1] 10 4 7 1

Related

Conditionally add named elements to a list

I have a function to perform actions on a variable list of dataframes depending on user selections. The function mostly performs generic actions but there are a few actions that are dataframe specific.
My code runs fine if all dataframes are selected but I am unable to get it to work if not all dataframes are selected.
The following provides a minimal reproducible example:
# User switches.
df1Switch <- TRUE
df2Switch <- TRUE
df3Switch <- TRUE
# DF creation.
set.seed(1)
df <- data.frame(X=sample(1:10), Y=sample(11:20))
if (df1Switch) df1 <- df
if (df2Switch) df2 <- df
if (df3Switch) df3 <- df
# Function to do something.
fn_something <- function(file_list, file_names) {
df <- file_list
# Do lots of generic things.
df$Z <- df$X + df$Y
# Do a few specific things.
if (file_names == "Name1") df$X <- df$X + 1
else if (file_names == "Name2") df$X <- df$Z - 1
else if (file_names == "Name3") df$Y <- df$X + df$Y
return(df)
}
# Call function to do something.
file_list <- list(Name1=df1, Name2=df2, Name3=df3)
file_names <- names(file_list)
all_df <- do.call(rbind,mapply(fn_something, file_list, file_names,
SIMPLIFY=FALSE))
In this case the code runs fine as the user has selected to create all three dataframes. I use a named list so that the specific actions can be performed against the correct dataframes.
The output looks something like this (the actual numbers aren't important):
X Y Z
Name1.1 4 13 16
Name1.2 5 12 16
Name1.3 6 16 21
: : : :
Name2.1 15 13 16
: : : :
The problem arises if the user selects not to create some dataframes, e.g.:
# User switches.
df1Switch <- TRUE
df2Switch <- FALSE
df3Switch <- TRUE
Not surprisingly, in this case an object not found error results:
> # Call function to do something.
> file_list <- list(Name1=df1, Name2=df2, Name3=df3)
Error: object 'df2' not found
What I would like to do is conditionally specify the contents of file_list along the lines of this pseudo code:
file_list <- list(if (df1Switch) {Name1=df1}, if (df2Switch) {Name2=df2}, if (df3Switch) {Name3=df3})
I have come across list.foldLeft
Conditionally merge list elements but I don't know if this is suitable.
(I'll re-hash my comment:)
In general, I would encourage you to consider use of a list-of-dataframes instead of individual frames. My rationale for this:
assuming that each frame is structured (nearly) identically; and
assuming that what you do to one frame you will (or at least can) do to all frames; then
it is easier to list_of_frames <- lapply(list_of_frames, some_func) than it is to do something like:
for (nm in c("df1", "df2", "df3")) {
d <- get(nm)
d <- some_func(d)
assign(nm, d)
}
especially when dealing with non-global environments (i.e., doing this within a function).
To be clear, "easier" is subjective: though it does win code-golf, I find it much easier to read and understand that "I am running some_func on each element of list_of_frames and saving the result". (You can even save it to a new list-of-frames, thereby keeping the original frames untouched.)
You may also do things conditionally, as in
needs_work <- sapply(list_of_frames, some_checker_func) # returns logical
# or
needs_work <- c("df1", "df2") # names of elements of list_of_frames
list_of_frames[needs_work] <- lapply(list_of_frames[needs_work], some_func)
Having said that ... the direct answer to your one liner:
c(if (df1Switch) list(Name1=df1), if (df2Switch) list(Name2=df2), if (df3Switch) list(Name3=df3))
This capitalizes on the fact that unstated else results in a NULL, and the NULL-compressing (dropping) characteristic of c(). You can see it in action with:
c(if (T) list(a=1), if (T) list(b=2), if (T) list(d=4))
# $a
# [1] 1
# $b
# [1] 2
# $d
# [1] 4
c(if (T) list(a=1), if (FALSE) list(b=2), if (T) list(d=4))
# $a
# [1] 1
# $d
# [1] 4

Reference vector from data frame using custom function

I'm trying to call a vector "a" from a data frame "df" using a function. I know I could do this just fine with the following:
> df$a
[1] 1 2 3
But I'd like to use a function where both the data frame and vector names are input separately as arguments. This is the best that I've come up with:
show_vector <- function(data.set, column) {
data.set$column
}
But here's how it goes when I try it out:
> show_vector(df, a)
NULL
How could I change this function in order to successfully reference vector df$a where the names of both are input to a function as arguments?
It's actually possible to do this without passing the column name as a string (in other words, you can pass in the unquoted column name:
show_vector <- function(data.set, column) {
eval(substitute(column), envir = data.set)
}
Usage example:
df <- data.frame(a = 1:3, b = 4:6)
show_vector(df, b)
# 4 5 6
I've wondered about this kind of thing a lot in the past and haven't found an easy fix. The best I've come up with is this:
df <- data.frame(c(1, 2, 3), c(4, 5, 6))
colnames(df) <- c("A", "B")
test <- function(dataframe, columnName) {
return(dataframe[, match(columnName, colnames(dataframe))])
}
test(df, "A")
Your code would work if you only put the column name in quotes i.e. show_vector(df, "a")
Other multiple ways to do this:
Using base functionality
func <- function(df, cname){
return(df[, grep(cname, colnames(df))])
}
Or even
func <- function(df, cname){
return(df[, cname])
}
You can use substitute to capture the input vector name as it is then use `as.character to make it as a character.
show_vector <- function(data.set, column) {
data.set[,as.character(substitute(column))]
}
Now lets take a look:
(dat=data.frame(a=1:3,b=4:6,c=10:12))
a b c
1 1 4 10
2 2 5 11
3 3 6 12
show_vector(dat,a)
[1] 1 2 3
show_vector(dat,"a")
[1] 1 2 3
It works.
we can also write a simple one where we just input a character string:
show_vector1 <- function(data.set, column) {
data.set[,column]
}
show_vector1(dat,"a")
[1] 1 2 3
Although this will not work if the column name is not a character:
show_vector1(dat,a)
**Show Traceback
Rerun with Debug
Error in `[.data.frame`(data.set, , column) : undefined columns selected**

Regex in R lists to call specific function

It is of course possible to store functions in a list to call it.
It is also possible to name that list entry to have a better access to it later.
Now I need the list item name to be a regular expression like this:
funcList <- list("^\\+[0-9]{1,3}$"=lead, "^\\-[0-9]{1,3}$"=lag)
a <- funcList$"+12"(a,12) # this will fire function "lead"
a <- funcList$"-4"(a,-4) # this will fire function "lag"
a <- funcList$"^\\+[0-9]{1,3}$"(a,12) # this works of course but is not what I want...
Of course this is not working correctly and I am getting the error "Error: attempt to apply non-function" because it is not used as regex but as a normal string value.
Is it possible to do what I need?
You could use the names of the array as parameters for grepl:
funcList <- list("^\\+[0-9]{1,3}$"=lead, "^\\-[0-9]{1,3}$"=lag)
f1 <- funcList[sapply(names(funcList), function(x) grepl(x,"+12"))][[1]]
f2 <- funcList[sapply(names(funcList), function(x) grepl(x,"-4"))][[1]]
> f1(seq(1,10))
[1] 2 3 4 5 6 7 8 9 10 NA
> f2(seq(1,10))
[1] NA 1 2 3 4 5 6 7 8 9
I think you can map strings like "+4" and "-12" to lead/lag more straightforwardly like:
set.seed(123)
df = data.frame(
x = sample(1:20, 10)
)
shifted = function(x, shift) {
direction = substr(shift, 1, 1)
amount = as.integer(substr(shift, 2, nchar(shift)))
if (direction == "+") {
return(lead(x, amount))
} else {
return(lag(x, amount))
}
}
df %>%
mutate(
plus4 = shifted(x, "+4"),
minus3 = shifted(x, "-3")
)
You could use regex within the shifted function if you need to do more validation of the "+4" strings, but I prefer not to go for complicated regexes unless they're definitely needed.

Don't understand how apply gets its parameters in r

I am struggling to make my apply() work: I have two dataframes:
from <- c(1,2,3)
to <- c(2,3,4)
df1 <- data.frame(from, to)
long <-c(9,9.2,9.4,9.6)
lat <- c(45,45.2,45.4,45.6)
id <- c(1,2,3,4)
df2 <- data.frame(long, lat, id)
Now I want something like this:
myFunction <- function(arg){
>>> How do I access arg$from and arg$to? <<<<
}
apply(df1,1,myFunction)
In myFunction I need to make some calculations and return a value for each from-to pair. I don't understand how to access parts of the arg, since arg[0] gives me numeric(0) and arg$from just crashes.
The problem is that apply(...) requires a matrix or array as the first argument. If you pass a dataframe, it will coerce that to a matrix. Matrices are 1 indexed, so the upper left element is [1,1], not [0,0]. Also, matrix columns cannot be referenced using the $ notation.
So,
f <- function(x) {
from <- x[1]
to <- x[2]
# do stuff with from and to...
}
apply(df,1,f)
would work.
One other thing to watch out for is that if your dataframe has (other) columns that have character strings, the conversion will make everything character (including the numbers!). This is because, by definition, all elements of a matrix must have the same data type. Your example does not have that problem, though.
Try mapply(). It's a multivariate version of sapply(). For example:
> myFunction <- function(arg1, arg2){
+ return(sum(arg1, arg2))
+ }
>
> mapply(myFunction, df1$from, df1$to)
[1] 3 5 7
You can also use it to make a new variable in your data frame.
> df1$newvar <- mapply(myFunction, df1$from, df1$to)
> df1
from to newvar
1 1 2 3
2 2 3 5
3 3 4 7

How Can I vectorize this function to return an index vector?

I'm new to R and am trying to get a handle on the apply family of functions. Specifically, I am trying to write a higher-order function that will accept 2 character vectors, "host", and "guest" (which do not need to be the same length) and return me an index vector the same length as "host", with the resulting elements corresponding to their indices in guest (NA if not there).
host <- c("A","B","C","D")
guest <- c("D","C","A","F")
matchIndices <- function(x,y)
{
return(match(x,y))
}
This code returns 3 as expected:
matchIndices(host[1],guest)
This is the loop I'd like to be able to replace with a succinct apply function (sapply?)
for (i in 1:length(host))
{ idx <- matchIndices(host[i],guest);
cat(paste(idx,host[i],"\n",sep=";"))
}
This code "works" in that it produces the output below, but I really want the result to be a vector, and I have a hunch that one of the apply functions will do the trick. I'm just stuck on how to write it. Any help would be most appreciated. Thanks.
3;A;
NA;B;
2;C;
1;D;
host <- c("A","B","C","D")
guest <- c("D","C","A","F")
matchIndices <- function(x,y) {
return(match(x,y))
}
One (inefficient) way is to sapply over the host vector, passing in guest as an argument (note you could just simplify this to sapply(host, match, guest) but this illustrates a general way of approaching this sort of thing):
> sapply(host, matchIndices, guest)
A B C D
3 NA 2 1
However, this can be done directly using match as it accepts a vector first argument:
> match(host, guest)
[1] 3 NA 2 1
If you want a named vector as output,
> matched <- match(host, guest)
> names(matched) <- host
> matched
A B C D
3 NA 2 1
which could be wrapped into a function
matchIndices2 <- function(x, y) {
matched <- match(x, y)
names(matched) <- x
return(matched)
}
returning
> matchIndices2(host, guest)
A B C D
3 NA 2 1
If you really want the names and the matches stuck together into a vector of strings, then:
> paste(match(host, guest), host, sep = ";")
[1] "3;A" "NA;B" "2;C" "1;D"
if you want the output vector in the host;guestNum format you would use do.call, paste, match as follows:
> do.call(paste, list(host, sapply(host, match, guest), sep = ';'))
[1] "A;3" "B;NA" "C;2" "D;1"
sapply(host , function(x) which(guest==x))
$A
[1] 3
$B
integer(0)
$C
[1] 2
$D
[1] 1
unlist(sapply(host , function(x) which(guest==x)))
A C D
3 2 1
paste(host, sapply(host , function(x) which(guest==x)), sep=":", collapse=" ")
[1] "A:3 B:integer(0) C:2 D:1"

Resources