Build data.frame to set names? - r

I can convert list into data.frame with do.call function:
z=list(c(1:3),c(5:7),c(7:9))
x=as.data.frame(do.call(rbind,z))
names(x)=c("one","two","three")
x
## one two three
## 1 1 2 3
## 2 5 6 7
## 3 7 8 9
I want to make it to be more concise ,merge the two statement into one statment,can i?
x=as.data.frame(do.call(rbind,z))
names(x)=c("one","two","three")

setNames is what you want. It is in the stats package which should load with R
setNames(as.data.frame(do.call(rbind,z)), c('a','b','c'))
## a b c
## 1 1 2 3
## 2 5 6 7
## 3 7 8 9

An alternative is the structure() function, this is in base, and more general:
structure(as.data.frame(do.call(rbind,z)), names=c('a','b','c'))

Related

Placing multiple outputs from each function call using apply into a row in a dataframe in R

I have a function that I repeat, changing the argument each time, using apply/sapply/lapply.
Works great.
I want to return a data set, where each row contains two (or more) variables from each iteration of the function.
Instead I get an unusable list.
do <-function(x){
a <- x+1
b <- x+2
cbind(a,b)
}
over <- [1:6]
final <- lapply(over, do)
Any suggestions?
Without changing your function do, you can use sapply and transpose it.
data.frame(t(sapply(over, do)))
# X1 X2
#1 2 3
#2 3 4
#3 4 5
#4 5 6
#5 6 7
#6 7 8
If you want to use do in current form with lapply, we can do
do.call(rbind.data.frame, lapply(over, do))
You could also try
as.data.frame(Reduce(rbind, final))
# a b
# 1 2 3
# 2 3 4
# 3 4 5
# 4 5 6
# 5 6 7
# 6 7 8
See ?Reduce and ?rbind for information about what they'll do.
You could also modify your final expression as
final <- as.data.frame(Reduce(rbind, lapply(over, do)))
#final
# a b
# 1 2 3
# 2 3 4
# 3 4 5
# 4 5 6
# 5 6 7
# 6 7 8

Accessing list elements within mutate

I am trying to use the dplyr 'mutate' command to perform matching over a list of arrays, but am getting an error "Error: recursive indexing failed at level 2"
here is an example:
templist=list();templist[["A"]]=c(6,9,8,1);templist[["B"]]=c(1,9,6,8);templist[["C"]]=c(8,1,9,6)
tempdat=data.frame(SYSTEM=c("A","A","A","B","B","B","C","C","C"),nums=c(1,8,9,1,8,9,1,8,9))
which provides
templist
$A
[1] 6 9 8 1
$B
[1] 1 9 6 8
$C
[1] 8 1 9 6
and
tempdat
SYSTEM idnum
1 A 1
2 A 8
3 A 9
4 B 1
5 B 8
6 B 9
7 C 1
8 C 8
9 C 9
I then want to find the position of matching numbers the lists corresponding to the appropriate systems. E.g.
tempdat %>% mutate(numids=match(nums,templist[[SYSTEM]]))
should yield
tempdat
SYSTEM nums numids
1 A 1 1
2 A 8 3
3 A 9 2
4 B 1 1
5 B 8 4
6 B 9 2
7 C 1 2
8 C 8 1
9 C 9 3
but I get the above noted error instead
(Error: recursive indexing failed at level 2)
Can anyone explain why this is failing? Or better yet, figure out a way to get this accomplished correctly?
I have a hunch that it could be done using a for loop to create separate data frames for each list and then use left_join to add the match indices from each system frame onto the original frame, but this seems like it will probably be very inefficient, inelegant, and clunky...
The reason it fails is that [[ for list doesn't accept vector indexing, and variable passed to mutate function is essentially a vector. A quick fix would be grouping your data frame by SYSTEM and pass unique variable to it thus for every group the SYSTEM would be a single value instead of a vector:
tempdat %>% group_by(SYSTEM) %>% mutate(numids=match(nums,templist[[unique(SYSTEM)]]))
# Source: local data frame [9 x 3]
# Groups: SYSTEM [3]
#
# SYSTEM nums numids
# (fctr) (dbl) (int)
# 1 A 1 4
# 2 A 8 3
# 3 A 9 2
# 4 B 1 1
# 5 B 8 4
# 6 B 9 2
# 7 C 1 2
# 8 C 8 1
# 9 C 9 3
If you check templist[[c("A", "B", "A")]], you will find that it throws exactly the same error as you have seen:
Error in templist[[c("A", "B", "A")]] : recursive indexing failed
at level 2

Extract data from data.frame based on coordinates in another data.frame

So here is what my problem is. I have a really big data.frame woth two columns, first one represents x coordinates (rows) and another one y coordinates (columns), for example:
x y
1 1
2 3
3 1
4 2
3 4
In another frame I have some data (numbers actually):
a b c d
8 7 8 1
1 2 3 4
5 4 7 8
7 8 9 7
1 5 2 3
I would like to add a third column in first data.frame with data from second data.frame based on coordinates from first data.frame. So the result should look like this:
x y z
1 1 8
2 3 3
3 1 5
4 2 8
3 4 8
Since my data.frames are really big the for loops are too slow. I think there is a way to do this with apply loop family, but I can't find how. Thanks in advance (and sorry for ugly message layout, this is my first post here and I don't know how to produce this nice layout with code and proper data.frames like in another questions).
This is a simple indexing question. No need in external packages or *apply loops, just do
df1$z <- df2[as.matrix(df1)]
df1
# x y z
# 1 1 1 8
# 2 2 3 3
# 3 3 1 5
# 4 4 2 8
# 5 3 4 8
A base R solution: (df1 and df2 are coordinates and numbers as data frames):
df1$z <- mapply(function(x,y) df2[x,y], df1$x, df1$y )
It works if the last y in the first data frame is corrected from 5 to 4.
I guess it was a typo since you don't have 5 columns in the second data drame.
Here's how I would do this.
First, use data.table for fast merging; then convert your data frames (I'll call them dt1 with coordinates and vals with values) to data.tables.
dt1<-data.table(dt)
vals<-data.table(vals)
Second, put vals into a new data.table with coordinates:
vals_dt<-data.table(x=rep(1:dim(vals)[1],dim(vals)[2]),
y=rep(1:dim(vals)[2],each=dim(vals)[1]),
z=matrix(vals,ncol=1)[,1],key=c("x","y"))
Now merge:
setkey(dt1,x,y)[vals_dt,z:=z]
You can also try the data.table package and update df1 by reference
library(data.table)
setDT(df1)[, z := df2[cbind(x, y)]][]
# x y z
# 1: 1 1 8
# 2: 2 3 3
# 3: 3 1 5
# 4: 4 2 8
# 5: 3 4 8

Pasting as object names

I am trying to use paste0 with merge, so that I can merge a bunch of stuff in a loop. However, I'm having trouble with calling specific columns from data.frames
To illustrate, I'll use head
Example:
df <- data.frame(x=1:10,y=1:10)
head(df)
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
head(get("df"))
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
head(df$x)
[1] 1 2 3 4 5 6
head(get("df$x"))
Error in get("df$x") : object 'df$x' not found
Is there a way to get a specific column?
The function get looks for objects defined in an environment. If you do not specify the environment, it defaults to your global workspace.
You need to coerce df into an environment using as.environment, and then call get using this environment, e.g.:
get("x", as.enviroment(get("df")))

R ffdf sorted data

I want to sort the data
z=as.ffdf(data.frame(w=c(4,1,2,5,7,8,65,3,2,9), x=c(12,1,3,5,65,3,2,45,34,11),y=1:10))
I need sorted data based on columns w,x. This is much simple task, if we have a data frame.
Thanks.
Use ffdforder from package ff, this returns an ff_vector, which you can use to index your ffdf, without RAM issues.
require(ff)
z=as.ffdf(data.frame(w=c(4,1,2,5,7,8,65,3,2,9), x=c(12,1,3,5,65,3,2,45,34,11),y=1:10))
idx <- ffdforder(z[c("w","x")])
zordered <- z[idx, ]
zordered
You can try something like this
require(ffbase)
z <- as.ffdf(data.frame(w=c(4,1,2,5,7,8,65,3,2,9),
x=c(12,1,3,5,65,3,2,45,34,11),y=1:10))
z[order(z$w[], z$x[]), ]
## w x y
## 2 1 1 2
## 3 2 3 3
## 9 2 34 9
## 8 3 45 8
## 1 4 12 1
## 4 5 5 4
## 5 7 65 5
## 6 8 3 6
## 10 9 11 10
## 7 65 2 7
You can use fforder to order your ffdf without using your RAM. Credit to #jwijffels
z[fforder(z$w, z$x), ]

Resources