I have a problem in which I try to create a matrix with the number of occurrences of specific 'coordinates'. I am working in R.
To illustrate, this is (part of) my data:
pre = c(3,1,3,2,2,4,3,5,3,4,6,5,6,5,4,5,6,6,5,6,5,7,6,7,7,7,4,8,4,8,8,4,4,8,3,9,8,6,9,8)
post = c(4,3,5,3,4,6,5,6,5,4,5,6,6,5,6,5,7,6,7,7,7,4,8,4,8,8,4,4,8,3,9,8,6,9,8,8,9,7,9,9)
df = data.frame(pre,post)
I then define this output matrix with the possible coordinate dimensions(range 1-20 in all data):
matrix = matrix(NA, nrow=20, ncol=20)
colnames(matrix) = seq(1,20,1)
rownames(matrix) = seq(1,20,1)
I then need a loop to run through my data and to store how many of the specific pre-post combinations exist within the data:
for (i in 1:40){matrix[df$post[i], df$pre[i]] = 1}
This works as in that the output now shows which 'coordinates' occurred in the data, but it doesn't say how many times.
For example, I know that pre=4, post=4 occurred twice.
Thus the loop needs to remember the combination already occurred and needs to add one more 1, but I don't know how to program this.
I hope somebody can be of help,
Anne
You could initialize the matrix with zeros instead of NA and then increment the matrix value like this:
pre = c(3,1,3,2,2,4,3,5,3,4,6,5,6,5,4,5,6,6,5,6,5,7,6,7,7,7,4,8,4,8,8,4,4,8,3,9,8,6,9,8)
post = c(4,3,5,3,4,6,5,6,5,4,5,6,6,5,6,5,7,6,7,7,7,4,8,4,8,8,4,4,8,3,9,8,6,9,8,8,9,7,9,9)
df = data.frame(pre,post)
matrix = matrix(0, nrow=20, ncol=20)
colnames(matrix) = seq(1,20,1)
rownames(matrix) = seq(1,20,1)
for (i in 1:40){matrix[df$post[i], df$pre[i]] = matrix[df$post[i], df$pre[i]] + 1}
By the way, the setting of the matrix colnames and rownames is not needed if you don't need it for any other reasons.
We can use table to do this. Convert the 'pre', 'post' columns to factor with levels specified as 1 to 20 and then use table
table(factor(df$pre, levels = 1:20), factor(df$post, levels = 1:20))
If we are using the already created 'matrix' an option is
out <- as.data.frame(table(df))
matrix[as.matrix(out[1:2])] <- out$Freq
This question already has answers here:
Split a large dataframe into a list of data frames based on common value in column
(3 answers)
Closed 4 years ago.
What I am trying to do is filter a larger data frame into 78 unique data frames based on the value of the first column in the larger data frame. The only way I can think of doing it properly is by applying the filter() function inside a for() loop:
for (i in 1:nrow(plantline))
{x1 = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])}
The issue is I don't know how to create a new data frame, say x2, x3, x4... every time the loop runs.
Can someone tell me if that is possible or if I should be trying to do this some other way?
There must be many duplicates for this question
split(plantline, plantline$Plant_Line)
will create a list of data.frames.
However, depending on your use case, splitting the large data.frame into pieces might not be necessary as grouping can be used.
You could use split -
# creates a list of dataframes into 78 unique data frames based on
# the value of the first column in the larger data frame
lst = split(large_data_frame, large_data_frame$first_column)
# takes the dataframes out of the list into the global environment
# although it is not suggested since it is difficult to work with 78
# dataframes
list2env(lst, envir = .GlobalEnv)
The names of the dataframes will be the same as the value of the variables in the first column.
It would be easier if we could see the dataframes....
I propose something nevertheless. You can create a list of dataframes:
dataframes <- vector("list", nrow(plantline))
for (i in 1:nrow(plantline)){
dataframes[[i]] = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])
}
You can use assign :
for (i in 1:nrow(plantline))
{assign(paste0(x,i), filter(rawdta.df, Plant_Line == plantline$Plant_Line[i]))}
alternatively you can save your results in a list :
X <- list()
for (i in 1:nrow(plantline))
{X[[i]] = filter(rawdta.df, Plant_Line == plantline$Plant_Line[i])}
Would be easier with sample data. by would be my favorite.
d <- data.frame(plantline = rep(LETTERS[1:3], 4),
x = 1:12,
stringsAsFactors = F)
l <- by(d, d$plantline, data.frame)
print(l$A)
print(l$B)
Solution using plyr:
ma <- cbind(x = 1:10, y = (-4:5)^2, z = 1:2)
ma <- as.data.frame(ma)
library(plyr)
dlply(ma, "z") # you split ma by the column named z
I have a list of files that I need to perform analysis. I would like to store the results of each iteration to a data frame as a new row. Here is what I tried but got the error:
Error in `$<-.data.frame`(`*tmp*`, "c1", value = c(0, 64010, 0, 64010, : replacement has 2 rows, data has 65
Here is my code (this part of the code only counts number of records in each file)
h <- data.frame(matrix(0, ncol = 2, nrow = 65))
colnames(y) <- c("c1","c2")
my_files <- list.files("C:/Users/....")
for (i in 1:length(my_files))
{
k <- length(readLines(my_files[i], skipNul=TRUE))
h$c1 <- rbind(h$c1, k)
}
length will give you a single number. You are trying to rbind a single value into a two column object. One solution would be to add an NA in column c2 in your loop like so:
h <- rbind(h, c(k,NA))
Try to stay away from for loops. Consider using one of the apply functions, like lapply.
Here's my current problem. I have a list of data frames that consist of different values. I want to be able to iterate through the list of data frames, and select out specific columns of data for each data frame, based on the names of the columns I specify. I want to then assign those selected columns in a separate list of data frames.
I've used another list objects consisting of the names of the different columns I want to extract.
I've taken a stab at a few approaches, but I'm still at the head scratching stage. Help would be appreciated!
Here's some sample code I've cooked up below:
# Create sample data set of five data frames, 10 x 10
M1 <- data.frame(matrix(rnorm(5), nrow = 10, ncol = 10))
M2 <- data.frame(matrix(rnorm(10), nrow = 10, ncol = 10))
M3 <- data.frame(matrix(rnorm(15), nrow = 10, ncol = 10))
M4 <- data.frame(matrix(rnorm(20), nrow = 10, ncol = 10))
M5 <- data.frame(matrix(rnorm(25), nrow = 10, ncol = 10))
# Assign data frames to a list object
mlist<-list(M1, M2, M3, M4, M5)
# Creates a data frame object consisting of the different column names I want to extract later
df.names <- data.frame(One = c("X1", "X3", "X5"), Two = c("X2", "X4", "X6"))
# Converts df.names into a set of characters (not sure if this is needed but it has worked for me in the past)
df.char <- lapply(df.names, function(x) as.character(x[1:length(x)]))
# Creates variable m that will be used to iterate in the for loops below
m<-1:length(mlist)
# Creates list object to set aside selected columns from df.names
mlist.selected<-list()
# A for loop to iterate for each of the df.names elements, and for each dataframe in mlist. *Hopefully* select out the columns of interest labeled in df.names, place into another list object for safe keeping
for (i in 1:length(df.names))
{
for(j in m)
{
#T his is the line of code I'm struggling with and I know it doesn't work. :-(
mlist.selected[j]<-lapply(mlist, function(x) x[df.char[[i]]])
}
}
Using
mlist.selected[[j]] <- lapply(mlist, function(x) x[df.char[[i]]])
in your for loop will get you a bit closer. I'd suggest using a named list with
mlist.selected[[paste("m",j, names(df.names)[i], sep=".")]] <-
lapply(mlist, function(x) x[df.char[[i]]])
to get an even nicer output.
On inspection, this returns repeated lists, which I don't think you want. If I understand what you are trying to do, you can actually get rid of the inner (j) loop:
# create named list of the data.frames
mlist<-list("M1"=M1, "M2"=M2, "M3"=M3, "M4"=M4, "M5"=M5)
# run the loop
for (i in 1:length(df.names)) {
mlist.selected[[paste(names(df.names)[i], sep=".")]] <-
lapply(mlist, function(x) x[df.char[[i]]])
}
Which returns a nicely named list to work with. For example, you can access the saved vectors data from M2 in df.names$Two using mlist.selected$Two$M2.
If I have a list of data frames in R, such as:
x<-c(1:10)
y<-2*x
z<-3*x
df.list <- list(data.frame(x),data.frame(y),data.frame(z))
And I'd like to average over a specific column (this is a simplified example) of all these data frames, is there any easy way to do it?
The length of the list is known but dynamic (i.e. it can change depending on run conditions).
For example:
dfone<-data.frame(c(1:10))
dftwo<-data.frame(c(11:20))
dfthree<-data.frame(c(21:30))
(Assume all the column names are val)
row, output
1, (1+11+21)/3 = 11
2, (2+12+22)/3 = 12
3, (3+13+23)/3 = 13
etc
So output[i,1] = (dfone[i,1]+dftwo[i,1]+dfthree[i,1])/3
To do this in a for loop would be trivial:
for (i in 1:length(dfone))
{
dfoutput[i,'value']=(dfone[i,'value']+dftwo[i,'value']+dfthree[i,'value'])/3
}
But I'm sure there must be a more elegant way?
Edit after the question turned out to be something else. Does this answer your question?
dfs <- list(dfone, dftwo, dfthree)
#oneliner
res <- rowMeans(sapply(dfs,function(x){
return(x[,"val"])
}))
#in steps
#step one: extract wanted column from all data
#this returns a matrix with one val-column for each df in the list
step1 <- sapply(dfs,function(x){
return(x[,"val"])
})
#step two: calculate the rowmeans. this is self-explanatory
step2 <- rowMeans(step1)
#or an even shorter oneliner with thanks to#davidarenburg:
rowMeans(sapply(dfs, `[[`, "value"))