How store for loop results to data frame? - r

I have a list of files that I need to perform analysis. I would like to store the results of each iteration to a data frame as a new row. Here is what I tried but got the error:
Error in `$<-.data.frame`(`*tmp*`, "c1", value = c(0, 64010, 0, 64010, : replacement has 2 rows, data has 65
Here is my code (this part of the code only counts number of records in each file)
h <- data.frame(matrix(0, ncol = 2, nrow = 65))
colnames(y) <- c("c1","c2")
my_files <- list.files("C:/Users/....")
for (i in 1:length(my_files))
{
k <- length(readLines(my_files[i], skipNul=TRUE))
h$c1 <- rbind(h$c1, k)
}

length will give you a single number. You are trying to rbind a single value into a two column object. One solution would be to add an NA in column c2 in your loop like so:
h <- rbind(h, c(k,NA))

Try to stay away from for loops. Consider using one of the apply functions, like lapply.

Related

How to count number of occurrences of data combinations and save in a matrix in R?

I have a problem in which I try to create a matrix with the number of occurrences of specific 'coordinates'. I am working in R.
To illustrate, this is (part of) my data:
pre = c(3,1,3,2,2,4,3,5,3,4,6,5,6,5,4,5,6,6,5,6,5,7,6,7,7,7,4,8,4,8,8,4,4,8,3,9,8,6,9,8)
post = c(4,3,5,3,4,6,5,6,5,4,5,6,6,5,6,5,7,6,7,7,7,4,8,4,8,8,4,4,8,3,9,8,6,9,8,8,9,7,9,9)
df = data.frame(pre,post)
I then define this output matrix with the possible coordinate dimensions(range 1-20 in all data):
matrix = matrix(NA, nrow=20, ncol=20)
colnames(matrix) = seq(1,20,1)
rownames(matrix) = seq(1,20,1)
I then need a loop to run through my data and to store how many of the specific pre-post combinations exist within the data:
for (i in 1:40){matrix[df$post[i], df$pre[i]] = 1}
This works as in that the output now shows which 'coordinates' occurred in the data, but it doesn't say how many times.
For example, I know that pre=4, post=4 occurred twice.
Thus the loop needs to remember the combination already occurred and needs to add one more 1, but I don't know how to program this.
I hope somebody can be of help,
Anne
You could initialize the matrix with zeros instead of NA and then increment the matrix value like this:
pre = c(3,1,3,2,2,4,3,5,3,4,6,5,6,5,4,5,6,6,5,6,5,7,6,7,7,7,4,8,4,8,8,4,4,8,3,9,8,6,9,8)
post = c(4,3,5,3,4,6,5,6,5,4,5,6,6,5,6,5,7,6,7,7,7,4,8,4,8,8,4,4,8,3,9,8,6,9,8,8,9,7,9,9)
df = data.frame(pre,post)
matrix = matrix(0, nrow=20, ncol=20)
colnames(matrix) = seq(1,20,1)
rownames(matrix) = seq(1,20,1)
for (i in 1:40){matrix[df$post[i], df$pre[i]] = matrix[df$post[i], df$pre[i]] + 1}
By the way, the setting of the matrix colnames and rownames is not needed if you don't need it for any other reasons.
We can use table to do this. Convert the 'pre', 'post' columns to factor with levels specified as 1 to 20 and then use table
table(factor(df$pre, levels = 1:20), factor(df$post, levels = 1:20))
If we are using the already created 'matrix' an option is
out <- as.data.frame(table(df))
matrix[as.matrix(out[1:2])] <- out$Freq

Re order issue in for loop R

I am looking to reorder my data into a new dataframe (list in the example below) which the first observation is first, then the last observation is second, both observations are removed from the initial dataframe and then repeat.
data <- seq(1,12,1)
i <- 1
ii <- 1:length(data)
newData <- seq(1,12,1)
for (i in ii){
a <- 1
newData[i] <- data[a]
i <- i + 1
b <- as.numeric(length(data))
newData[i]<- data[b]
index <- c(a, b)
data <- data[-index]
i <- i + 1
}
I receive the error: "Error in newData[i] <- data[b] : replacement has length zero" and the loop stops at i = 8, and the list "data" is empty.
If I run the contents of the loop, but not the loop itself, I get my desired outcome both in this example and my task; but obviously I want to run the loop given the size of my data.
As MrFlick mentioned, you can't modify index in a for loop. But given you only need every second index, you can specify that your loop definition, by using
ii <- seq(1,length(data),2)
However, you don't need a for loop for rearranging the elements of your vector data. you only need a vector of the form (firs, last, second, secon last, etc.):
m = matrix(c(1:6,12:7), ncol=2)
i = as.vector(t(m))
newdata = data[i]

Creating multiple dataframes in a loop in R

I am new to R and I don't know how to create multiple data frames in a loop. For example:
I have a data frame "Data" with 20 rows and 4 columns:
Data <- data.frame(matrix(NA, nrow = 20, ncol = 4))
names(Data) <- c("A","B","C","D")
I want to choose the rows of Data which its values in column T are the closest values to the vector elements of X.
X = c(X1,X2,X3,X4,X5)
Finally, I want to assign them to a separate data frames with their associated X name:
for(i in 1:length(X)){
data_X[i] <- data.frame(matrix(NA))
data_X[i] <- subset(data2, 0 <= A-X[i] | A-X[i]< 0.000001 )
}
Thank you!
Since you didn't give us any numbers, it is difficult to say exactly what you need the for loop to look for. As such, you will need to sort that out yourself, but here is a basic example of what you could do. The important part that I think you are missing is that you need to use assign to send the created dataframes to your global environment or wherever you want them to go for that matter. Paste0 is a handy way to give them each their own name. Take note that some of the data frames will be empty. It may be worthwhile to use an if statement that skips assigning the dataframe if (nrow(data3)==0).
`Data <- data.frame(matrix(sample(1:10,80,replace = T), nrow = 20, ncol = 4))`
`names(Data) <- c("A","B","C","D")`
`X = c(1:10)`
`for(i in 1:length(X)){
data2 <- Data
data3 <- subset(data2, A == X[i])
assign(paste0("SubsetData",i), data3, envir = .GlobalEnv)
}`

Compute 15 rows in parallel (through vectorization) and create df with them

I am creating 15 rows in a dataframe, like this. I cannot show my real code, but the create row function involves complex calculations that can be put in a function. Any ideas on how I can do this using lapply, apply, etc. to create all 15 in parallel and then concatenate all the rows into a dataframe? I think using lapply will work (i.e. put all rows in a list, then unlist and concatenate, but not exactly sure how to do it).
for( i in 1:15 ) {
row <- create_row()
# row is essentially a dataframe with 1 row
rbind(my_df,row)
}
Something like this should work for you,
create_row <- function(){
rnorm(10, 0,1)
}
my_list <- vector(100, mode = "list")
my_list_2 <- lapply(my_list, function(x) create_row())
data.frame(t(sapply(my_list_2,c)))
The create_row function is just make the example reproducible, then we predefine an empty list, then fill it with the result from the create_row() function, then convert the resulting list to a data frame.
Alternatively, predefine a matrix and use the apply functions, over the row margin, then use the t (transpose) function, to get the output correct,
df <- data.frame(matrix(ncol = 10, nrow = 100))
t(apply(df, 1, function(x) create_row(x)))

Selecting out specific columns in data frames embedded within a list

Here's my current problem. I have a list of data frames that consist of different values. I want to be able to iterate through the list of data frames, and select out specific columns of data for each data frame, based on the names of the columns I specify. I want to then assign those selected columns in a separate list of data frames.
I've used another list objects consisting of the names of the different columns I want to extract.
I've taken a stab at a few approaches, but I'm still at the head scratching stage. Help would be appreciated!
Here's some sample code I've cooked up below:
# Create sample data set of five data frames, 10 x 10
M1 <- data.frame(matrix(rnorm(5), nrow = 10, ncol = 10))
M2 <- data.frame(matrix(rnorm(10), nrow = 10, ncol = 10))
M3 <- data.frame(matrix(rnorm(15), nrow = 10, ncol = 10))
M4 <- data.frame(matrix(rnorm(20), nrow = 10, ncol = 10))
M5 <- data.frame(matrix(rnorm(25), nrow = 10, ncol = 10))
# Assign data frames to a list object
mlist<-list(M1, M2, M3, M4, M5)
# Creates a data frame object consisting of the different column names I want to extract later
df.names <- data.frame(One = c("X1", "X3", "X5"), Two = c("X2", "X4", "X6"))
# Converts df.names into a set of characters (not sure if this is needed but it has worked for me in the past)
df.char <- lapply(df.names, function(x) as.character(x[1:length(x)]))
# Creates variable m that will be used to iterate in the for loops below
m<-1:length(mlist)
# Creates list object to set aside selected columns from df.names
mlist.selected<-list()
# A for loop to iterate for each of the df.names elements, and for each dataframe in mlist. *Hopefully* select out the columns of interest labeled in df.names, place into another list object for safe keeping
for (i in 1:length(df.names))
{
for(j in m)
{
#T his is the line of code I'm struggling with and I know it doesn't work. :-(
mlist.selected[j]<-lapply(mlist, function(x) x[df.char[[i]]])
}
}
Using
mlist.selected[[j]] <- lapply(mlist, function(x) x[df.char[[i]]])
in your for loop will get you a bit closer. I'd suggest using a named list with
mlist.selected[[paste("m",j, names(df.names)[i], sep=".")]] <-
lapply(mlist, function(x) x[df.char[[i]]])
to get an even nicer output.
On inspection, this returns repeated lists, which I don't think you want. If I understand what you are trying to do, you can actually get rid of the inner (j) loop:
# create named list of the data.frames
mlist<-list("M1"=M1, "M2"=M2, "M3"=M3, "M4"=M4, "M5"=M5)
# run the loop
for (i in 1:length(df.names)) {
mlist.selected[[paste(names(df.names)[i], sep=".")]] <-
lapply(mlist, function(x) x[df.char[[i]]])
}
Which returns a nicely named list to work with. For example, you can access the saved vectors data from M2 in df.names$Two using mlist.selected$Two$M2.

Resources