Pasting as object names - r

I am trying to use paste0 with merge, so that I can merge a bunch of stuff in a loop. However, I'm having trouble with calling specific columns from data.frames
To illustrate, I'll use head
Example:
df <- data.frame(x=1:10,y=1:10)
head(df)
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
head(get("df"))
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
head(df$x)
[1] 1 2 3 4 5 6
head(get("df$x"))
Error in get("df$x") : object 'df$x' not found
Is there a way to get a specific column?

The function get looks for objects defined in an environment. If you do not specify the environment, it defaults to your global workspace.
You need to coerce df into an environment using as.environment, and then call get using this environment, e.g.:
get("x", as.enviroment(get("df")))

Related

How to assign values in one column to other columns in wide data using R

There is a wide data set, a simple example is
df<-data.frame("id"=c(1:6),
"ax"=c(1,2,2,3,4,4),
"bx"=c(7,8,8,9,10,10),
"cx"=c(11,12,12,13,14,14))
I'm looking for a way to assign the values in "ax" to column "bx" and "cx". Here, imagine we have thousands of columns we intend to replace with "ax", so I want this to be done in an automated approach using R. The expected output look like
df<-data.frame("id"=c(1:6),
"ax"=c(1,2,2,3,4,4),
"bx"=c(1,2,2,3,4,4),
"cx"=c(1,2,2,3,4,4))
I've thought of, and tried using mutate_at and ends_with, but this has not work for me. For example, I tried
df %>%
mutate_at(vars(ends_with("x")), labels = "ax")
and this prints an error. Not sure what's wrong or what's to be added to get this working, so I would like to request your help on this. Thank you very much!
A simple way using base R would be :
change_cols <- grep('x$', names(df))
df[change_cols] <- df$ax
df
# id ax bx cx
#1 1 1 1 1
#2 2 2 2 2
#3 3 2 2 2
#4 4 3 3 3
#5 5 4 4 4
#6 6 4 4 4
I would suggest this tidyverse approach using across() to select the range of variables you want:
library(tidyverse)
#Data
df<-data.frame("id"=c(1:6),
"ax"=c(1,2,2,3,4,4),
"bx"=c(7,8,8,9,10,10),
"cx"=c(11,12,12,13,14,14))
#Mutate
df %>% mutate(across(c(bx:cx), ~ ax))
Output:
id ax bx cx
1 1 1 1 1
2 2 2 2 2
3 3 2 2 2
4 4 3 3 3
5 5 4 4 4
6 6 4 4 4
Another option with mutate_at()
df %>%
mutate_at(vars(matches("x$")), ~ax)
# id ax bx cx
# 1 1 1 1 1
# 2 2 2 2 2
# 3 3 2 2 2
# 4 4 3 3 3
# 5 5 4 4 4
# 6 6 4 4 4

Placing multiple outputs from each function call using apply into a row in a dataframe in R

I have a function that I repeat, changing the argument each time, using apply/sapply/lapply.
Works great.
I want to return a data set, where each row contains two (or more) variables from each iteration of the function.
Instead I get an unusable list.
do <-function(x){
a <- x+1
b <- x+2
cbind(a,b)
}
over <- [1:6]
final <- lapply(over, do)
Any suggestions?
Without changing your function do, you can use sapply and transpose it.
data.frame(t(sapply(over, do)))
# X1 X2
#1 2 3
#2 3 4
#3 4 5
#4 5 6
#5 6 7
#6 7 8
If you want to use do in current form with lapply, we can do
do.call(rbind.data.frame, lapply(over, do))
You could also try
as.data.frame(Reduce(rbind, final))
# a b
# 1 2 3
# 2 3 4
# 3 4 5
# 4 5 6
# 5 6 7
# 6 7 8
See ?Reduce and ?rbind for information about what they'll do.
You could also modify your final expression as
final <- as.data.frame(Reduce(rbind, lapply(over, do)))
#final
# a b
# 1 2 3
# 2 3 4
# 3 4 5
# 4 5 6
# 5 6 7
# 6 7 8

purrr map instead of apply

I'm trying to incorporate more pipes in my code. Oftentimes, I have to break up pipes to use the apply function. Then I found purrr. However, it's not clear to me how exactly it works. Here is what I want, and what I've tried. The main problem is that I want a rowwise computation.
want:
apply(mtcars,1,function(x) which.max(x))
have:
mtcars %>% map_dbl(which.max)
If we need rowwise, then use pmap. According to ?pmap
... Note that a data frame is a very important special case, in which case pmap() and pwalk() apply the function .f to each row. map_dfr(), pmap_dfr() and map2_dfc(), pmap_dfc() return data frames created by row-binding and column-binding respectively. ...
pmap_int(mtcars, ~ which.max(c(...)))
#[1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 3 4 4 3
Also, in base R, this can be easily done and efficiently done with max.col
max.col(mtcars, "first")
#[1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 3 4 4 3
The map is used similar to lapply/sapply where it loops through each column and apply the function on that column. So, it would be similar to
apply(mtcars, 2, which.max)

How to BiCluster with constant values in columns - in R

My Problem in general:
I have a data frame where i would like to find all bi-clusters with constant values in columns.
For Example the initial dataframe:
> df
v1 v2 v3
1 0 2 1
2 1 3 2
3 2 4 3
4 3 3 4
5 4 2 3
6 5 2 4
7 2 2 3
8 3 1 2
And for example i would like to find the a cluster like this:
> cluster1
v1 v3
1 2 3
2 2 3
I tried to use the biclust package and tested several functions but the result was always not what i want to archive.
I figured out that I may can use the BCPlaid function with fit.model = y ~ m. But it looks like this produce also different results.
Is there a way to archive this task efficient?

Accessing list elements within mutate

I am trying to use the dplyr 'mutate' command to perform matching over a list of arrays, but am getting an error "Error: recursive indexing failed at level 2"
here is an example:
templist=list();templist[["A"]]=c(6,9,8,1);templist[["B"]]=c(1,9,6,8);templist[["C"]]=c(8,1,9,6)
tempdat=data.frame(SYSTEM=c("A","A","A","B","B","B","C","C","C"),nums=c(1,8,9,1,8,9,1,8,9))
which provides
templist
$A
[1] 6 9 8 1
$B
[1] 1 9 6 8
$C
[1] 8 1 9 6
and
tempdat
SYSTEM idnum
1 A 1
2 A 8
3 A 9
4 B 1
5 B 8
6 B 9
7 C 1
8 C 8
9 C 9
I then want to find the position of matching numbers the lists corresponding to the appropriate systems. E.g.
tempdat %>% mutate(numids=match(nums,templist[[SYSTEM]]))
should yield
tempdat
SYSTEM nums numids
1 A 1 1
2 A 8 3
3 A 9 2
4 B 1 1
5 B 8 4
6 B 9 2
7 C 1 2
8 C 8 1
9 C 9 3
but I get the above noted error instead
(Error: recursive indexing failed at level 2)
Can anyone explain why this is failing? Or better yet, figure out a way to get this accomplished correctly?
I have a hunch that it could be done using a for loop to create separate data frames for each list and then use left_join to add the match indices from each system frame onto the original frame, but this seems like it will probably be very inefficient, inelegant, and clunky...
The reason it fails is that [[ for list doesn't accept vector indexing, and variable passed to mutate function is essentially a vector. A quick fix would be grouping your data frame by SYSTEM and pass unique variable to it thus for every group the SYSTEM would be a single value instead of a vector:
tempdat %>% group_by(SYSTEM) %>% mutate(numids=match(nums,templist[[unique(SYSTEM)]]))
# Source: local data frame [9 x 3]
# Groups: SYSTEM [3]
#
# SYSTEM nums numids
# (fctr) (dbl) (int)
# 1 A 1 4
# 2 A 8 3
# 3 A 9 2
# 4 B 1 1
# 5 B 8 4
# 6 B 9 2
# 7 C 1 2
# 8 C 8 1
# 9 C 9 3
If you check templist[[c("A", "B", "A")]], you will find that it throws exactly the same error as you have seen:
Error in templist[[c("A", "B", "A")]] : recursive indexing failed
at level 2

Resources