I wrote a function that does operations on a list. Now I am trying to bind the results into a data.frame, but nothing seems to work. Can someone explain how to fix this, but more importantly, why I am having this problem?
ret<-lapply(1:3,function(x){getVals(x,x+1,x+2)})
getVals<-function(x,y,z){
rbind(x,y,z)
}
as.data.frame(ret)
as.matrix(ret,ncol=3)
Desired output is:
1,2,3
2,3,4
3,4,5
You can get the result as a data frame by doing something like this:
as.data.frame(do.call(cbind, ret))
V1 V2 V3
x 1 2 3
y 2 3 4
z 3 4 5
ret is a list of arrays. The are several ways of working with a these lists. I prefer to unlist, convert to matrix and then onto the data frame:
df<-data.frame(matrix(unlist(ret),ncol=3, byrow=TRUE))
df
Related
I have certain data in a list extracted from a bayesian processing from certain electrodes and I want to populate a dataframe out of a loop. First I have a list of 729 processing outcomes and an object elecs which is basically a list of 729 pairs of electrodes (27*27) as you can see.
> head(elecs)
X Elec1 Elec2
1 1 1 1
2 2 1 2
3 3 1 3
4 4 1 4
5 5 1 5
6 6 1 6
The thing is I would like to fill dataf1 with the outcome of this loop which happens to be a dataframe of 4000 rows.
dataf1 <- data.frame('Elec1'=rep(NA,4000*729),'Elec2'=rep(NA,4000*729),'int'=rep(NA,4000*729))
for (i in nrow(elecs)){
Elec1=as.data.frame(rep(elecs[i,]$Elec1,4000))
Elec2=as.data.frame(rep(elecs[i,]$Elec2,4000))
post <- posterior_samples(bayeslist[[i]])
int <- as.data.frame(post$b_Intercept)
df <- cbind(Elec1,Elec2,est)
colnames(df) <- c('Elec1','Elec2','int')
dataf1[(1+(i-1)*4000):((1+(i-1)*4000)+3999),c('Elec1','Elec2','int')] <- df
}
Everything works perfectly fine until the last line in the loop:
dataf1[(1+(i-1)*4000):((1+(i-1)*4000)+3999),c('Elec1','Elec2','int')] <- df
And I don't know why exactly this is not working as expected and populating the dataf1 preinitialised dataframe.
Any insight, as always, will be highly appreciated.
I realised I was missing the init in the for, so it's kinda newbie typo. Apart from this, the code works, in case anyone is wondering.
for (i in nrow(elecs)){
for (i in 1:nrow(elecs)){
Given the below dataframe
df <- data.frame(cbind(seq(1:4),rep(letters[seq(1:3)],4)))
X1 X2
1 a
2 b
3 c
4 a
1 b
2 c
3 a
4 b
1 c
2 a
3 b
4 c
I would like to summarize unique X2s by X1. For example,
1 a,b,c
2 b,c,a
3 c,a,b
4 a,b,c
I am very close. I use the following code:
'summary <- aggregate(df$X2, list(df$X1),FUN=unique)`
which produces
Group.1 X
1 1,2,3
2 2,3,1
3 3,1,2
4 1,2,3
(the index of the list). What is the most efficient way to get my desired result?
I am certain there is an easy solution and I've tried searching, but I must not be using the correct search terms. Thank you in advanced.
We can use toString to paste the elements
aggregate(X2~X1, unique(df), toString )
Or if we need to keep it as list
aggregate(X2~X1, transform(unique(df), X2 = as.character(X2)), list)
As the OP also mentioned the efficient approach
library(data.table)
unique(setDT(df))[, .(X2 = toString(X2)), by = X1]
Regarding the creation of data.frame, it is easier, compact and error-free way to do without using cbind with data.frame. The main reason is that cbind converts to a matrix and matrix can have only a single class. So, if there is a single character column or elements, all the elements are converted to character. With as.data.frame, by default the stringsAsFactors=TRUE, so the columns are converted to factor class.
df <- data.frame(X1= 1:4, X2= rep(letters[1:3],4), stringsAsFactors= FALSE)
The above code gets the intended output. Note that seq is not needed when we use :
My question is probably quite simple but I think my code could definitely be improved. Right now it's two for-loops but I'm sure there's a way to do what I need in a single loop, for the life of me I can't see what it is.
Having searched Stack, I found this excellent answer from Ananda where he was able to extract and keep columns within a range using lapply and for-loop methods. The structure of my data gets in the way, however, as I want to be able to pick specific columns to delete. My data structure looks like this:
1 AAAT_1 1 GROUP **** 1 -13.70 0
2 AAAT_2 51 GROUP **** 1 -9.21 0
3 AAAT_3 101 GROUP **** 1 -7.60 0
4 AAAT_4 151 GROUP **** 1 -6.28 0
It's extract from some docking software and the only columns I want to keep are 2 (e.g. AAAT_1) and 7 (e.g. -13.70). The code I've used to do it, two for-loops:
for (i in 1:length(temp)) {
assign(temp[i], get(temp[i])[2:7])
}
....to keep the data from columns 2-7, followed by:
for (i in 1:length(temp)) {
assign(temp[i], get(temp[i])[-2:-5])
}
....to delete the rest of the columns I didn't need, where temp[i] is just a list of data frames the loop is acting on.
So, as you can see, it's just two loops doing similar actions. Surely there's a way to be able to pick specific columns to keep/delete and do it all in one loop/lapply statement? Trying things like [2,7] in the get statement doesn't work, appears to keep only column 7 and turns each data frame into 'Values' instead. I'm not sure what's going so any insight there would be wonderful but, either way, if anyone can turn this two-loop solution into one, would be really appreciated. Definitely feel like I'm missing something really simple/obvious.
Cheers.
EDIT: Have taken into account the vectorised solutions from below to do the following instead. The names of raw imported data start with stuff like F0001, F0002, etc. hence the pattern to make the initial list.
lst <- mget(ls(pattern='^F\\d+'))
lst <- lapply(lst, "[", TRUE, c("V2","V7") )
lst <- lapply(seq_along(lst),
function(i,x) {assign(paste0(temp[i]),x[[i]], envir=.GlobalEnv)},
x=lst)
I know loops get a bad rap in R, was a natural solution to me as a CPP programmer but meh, this was far quicker. Initially, the only downside from the other example was that the assign command pasted a letter to each of the created tables in sequence 1,2,3,....,n when the list of raw imported data files weren't entirely in numerical order (i.e. 1,2,3,5,6,10,...etc.) so this didn't preserve that order. So I had to use a list of the files (our old friend temp) to name them correctly. Minor thing and the code isn't much shorter than two loops but it's most certainly faster.
So, in short, the above three lines add all the imported raw data to a list, keep only the columns I need then split the list up into separate dataframes whilst preserving the correct names. Cheers for the help!
If you have a data frame, you index rows and columns with
data.frame[row, column]
So, data.frame[2,7]) will give you the value of the 2nd row in the 7th column. I guess you were looking for
temp <- temp[, c(2,7)]
or, if temp is a list of data frames
temp <- lapply(temp, function(x) x[, c(2,7)])
So, if you want to use a vector of numbers as column- or row-indices, create this vector with c(...). If I understand your example right, you don't need any loop-command, if you use lapply.
A for loop? Maybe I'am missing something but just why do not use the solution proposed by #Daniel or a dplyr approach like this.
data
V1 V2 V3 V4 V5 V6 V7 V8
1 1 AAAT_1 1 GROUP **** 1 -13.70 0
2 2 AAAT_2 51 GROUP **** 1 -9.21 0
3 3 AAAT_3 101 GROUP **** 1 -7.60 0
4 4 AAAT_4 151 GROUP **** 1 -6.28 0
and here the code:
library(dplyr)
data <- select(data, V2, V7)
data
V2 V7
1 AAAT_1 -13.70
2 AAAT_2 -9.21
3 AAAT_3 -7.60
4 AAAT_4 -6.28
I have a list of data frame (lets call that "data") that I have generated which goes something like this:
$"something.csv"
x y z
1 1 1 1
2 2 2 2
3 3 3 3
$"something else.csv"
x y z
1 1 1 1
2 2 2 2
3 3 3 3
I would like to output from the table "something.csv" one number within column x.
So far I have used:
data$"something.csv"$x[2]
This coding works and I am happy that it does, but my problem is that I want to automate this process and so i have put all the table titles into a list (filename) which goes:
[1] "something.csv", "something else.csv"
So i made a for loop which should allow me to do so but when I put in:
data$filename[1]$x[2]
it gives me back NULL.
When i print filename[1], I get [1] "something.csv" and if I type
data$"something.csv"$x[2]
I get the result I want. so if filename[1] = "something.csv", why does it not give me the same results?
I just want my code to out put the second row of column x and automate by using filename[i] in a for loop.
The way you have tried to approach the problem tries to find a column 'filename[1]' from the list, but it is not found. Hence, the NULL gets returned.
You need to use square brackets, and subset the object data. Here's an example:
# Generate data
data<-vector("list", 2)
names(data)<-c("something.csv", "something else.csv")
data[[1]]<-data.frame(x=1:3, y=1:3, z=1:3)
data[[2]]<-data.frame(x=1:3, y=1:3, z=1:3)
filename<-names(l)
# Subset the data
# The first data frame, notice the square brackets for subsetting lists!
data[[filename[1]]]
# column x
data[[filename[1]]]$x
# Second observation of x
data[[filename[1]]]$x[2]
The above uses for subsetting the names of the objects in the list. You can also use the number-based subsetting suggested by #Jeremy.
you can also use [ and [[ to call data$"something.csv"$x[2] try
data[[1]][2,1]
where [[1]] is the first list element and [2,1] is the data frame reference element
I have done lot of googling but I didn't find satisfactory solution to my problem.
Say we have data file as:
Tag v1 v2 v3
A 1 2 3
B 1 2 2
C 5 6 1
A 9 2 7
C 1 0 1
The first line is header. The first column is Group id (the data have 3 groups A, B, C) while other column are values.
I want to read this file in R so that I can apply different functions on the data.
For example I tried to read the file and tried to get column mean
dt<-read.table(file_name,head=T) #gives warnings
apply(dt,2,mean) #gives NA NA NA
I want to read this file and want to get column mean. Then I want to separate the data in 3 groups (according to Tag A,B,C) and want to calculate mean(column wise) for each group. Any help
apply(dt,2,mean) doesn't work because apply coerces the first argument to an array via as.matrix (as is stated in the first paragraph of the Details section of ?apply). Since the first column is character, all elements in the coerced matrix object will be character.
Try this instead:
sapply(dt,mean) # works because data.frames are lists
To calculate column means by groups:
# using base functions
grpMeans1 <- t(sapply(split(dt[,c("v1","v2","v3")], dt[,"Tag"]), colMeans))
# using plyr
library(plyr)
grpMeans2 <- ddply(dt, "Tag", function(x) colMeans(x[,c("v1","v2","v3")]))