I have a tryCatch block that returns a dataframe. In some cases, it returns a empty dataframe with 0 rows. No matter it's empty or not, I need to add some data in there. (some columns with one row). I found when it returns an empty dataframe, adding columns to the dataframe always give me error. For example:
dt <- data.frame()
for (a in 0:2) {
table <- tryCatch(
{ mtcars %>%
filter(am==a) %>%
group_by(vs) %>%
summarise(n=n()) %>%
spread(vs, n)
},
error = function(e) {
return(NULL)
} )
table$am = a
dt <- bind_rows(dt, table)
}
Here is the error message:
Error in `$<-.data.frame`(`*tmp*`, "am", value = 2L) : replacement has 1 row, data has 0
Anyone can help solving this issue? Thanks a lot.
One option could be to declare your data.frame with valid columns but 0 rows. This provides flexibility to add row using nrow(data)+1 and assigned values of desired columns.
data = data.frame(ID = integer(), Result = integer())
data[nrow(data)+1,] = c(1, 5)
data
# ID Result
# 1 1 5
EDIT
OP is facing problem with the tryCatch block. It seems there are conditions when exception is thrown from tryCatch block and value of table is assigned as NULL in such cases.
The possible fix can be to replace
table$am = a
dt <- bind_rows(dt, table)
lines with
if(!is.null(table)){
table$am = a
dt <- bind_rows(dt, table)
}else{
dt[nrow(dt)+1, "am"] <- a
}
data<-data.frame(rbind(data, ID=1))
You can't add entries to an empty data.frame like that. Try
data = data.frame("ID"=1,"Result"=5)
Related
I'm attempting to loop through a list of dataframes that I have and for the same column in each dataframe, sum up that column then divide it by the number of rows in that dataframe and print it out. Not add a row/column to a new dataframe, I just want it to print the result out for each one. I also want it to print out the number of rows in each dataframe separately.
I created this list of dataframes by using this for loop:
Coverages <- list('Cover 0', 'Cover 1', 'Cover 2', 'Cover 3')
DoublePostsLeftDFs <- c()
for (x in Coverages) {
assign(paste("DoublePostsLeft", str_replace_all(x, " ", ""), sep=""), DoublePostsLeft %>% filter(CoverageScheme == x))
name <- paste("DoublePostsLeft", str_replace_all(x, " ", ""), sep="")
DoublePostsLeftDFs <- append(DoublePostsLeftDFs, name)
This successfully creates all the dataframes I need, but I didn't know a better way to make a list of what they were all named which is where I suspect my problem is coming from. Here is what I've attempted to do so far:
for (x in DoublePostsLeftDFs) {
row_number <- nrow(x)
average <- sum(x$desired_column)/nrow(x)
print(row_number)
print(average)
}
When I use that I the error: Error: $ operator is invalid for atomic vectors
So then I tried this:
for (x in DoublePostsLeftDFs) {
new <- as.data.frame(x)
row_number <- nrow(new)
average <- sum(new$desired_column)/nrow(new)
print(row_number)
print(average)
}
And all it did was print out:
[1] 1
[1] 0
for each dataframe in the list. I suspect it has something to do with how I created the list of the dataframes? Any help would be appreciated.
I don't think there is a need to create list of dataframes here. Is this what you want?
library(dplyr)
result <- DoublePostsLeft %>%
group_by(CoverageScheme) %>%
summarise(nrow = n(),
average = mean(desired_column, na.rm = TRUE))
result
I have my code:
new_df = data.frame()
#G = 0
for (i in 1:nrow(furin_data)){
frac = furin_data[i,3]/furin_data[i,5]
#print(frac)
if (frac > 2 || frac < 0.5) {
name = furin_data[i,1]
print(name)
new_df = furin_data[i,]
#print(new_df)
#G = G + 1
}
write.csv(new_df, "C:\\User\\Documents\\MyData.csv", row.names = FALSE)
}
It creates a new data file, but only the last row is written and not all of the the rows based on the condition. I cannot seem to figure out where is the problem.
That's because you're assigning the row to it, so every assignment overrides the previous one. What you want is to add rows to it instead.
new_df[nrow(new_df)+1,] = furin_data[i,]
Another thing is that you created your new_df data frame without any columns, so none are assigned in the transfer. You should define it with the same types and names of columns as furin_data, so those columns could be copied. An easy of initializing it as empty but having the same structure would be:
new_df = furin_data[F,]
Buuut, in the R language, writing a loop is not the best way to do things. R is a vectorized language, meaning it can perform all operations on a vector at once, causing it to execute much much faster. So a conversion of your whole code to R style would be:
library(dplyr)
new_df <-
furin_data %>%
mutate(frac = .[3] / .[5]) %>%
subset(frac > 2 | frac < 0.5)
write.csv(new_df, "C:\\User\\Documents\\MyData.csv", row.names = FALSE)
I'm working on a shiny R app in which I need to parse csv files. From them, I build a dataframe. Then, I want to extract some rows from this dataframe and put them in another dataframe.
I found a way to do that using rbind, but it's pretty ugly and seems inadequate.
function(set){ #set is the data.frame containing the data I want to extract
newTable <- data.frame(
name = character(1),
value = numeric(1),
columnC = character(1),
stringsAsFactors=FALSE)
threshold <- 0
for (i in 1:nrow(set)){
value <- calculateValue(set$Value[[i]]))
if (value >= threshold){
name <- set[which(set$Name == "foo")), ]$Name
columnC <- set[which(set$C == "bar")), ]$C
v <- c(name, value, columnC)
newTable <- rbind(newTable, v)
}
}
If I don't initialize my dataframe values with character(1) or numeric(1), I get an error:
Warning: Error in data.frame: arguments imply differing number of
rows: 0, 1 75: stop 74: data.frame
But then it leaves me with an empty row in my dataframe (empty strings for characters and 0s for numerics).
Since R is a cool language, I assume there's an easier and more efficient to do this. Can anybody help me?
Rather than looping through each row, you can either subset
function(set, threshold) {
set[calculateValue(set$Value) >= threshold, c("name", "value", "columnC")]
}
Or use dplyr to filter rows and select columns to get the subset you want.
library(tidyverse)
function(set, threshold) {
set %>%
filter(calculateValue(Value) >= threshold) %>%
select(name, value, columnC)
}
Then assign the result to a new variable if you want a new dataframe
getValueOverThreshold <- function(set, threshold) {
set %>%
filter(calculateValue(Value) >= threshold) %>%
select(name, value, columnC)
}
newDF <- getValueOverThreshold(set, 0)
You might want to check out https://r4ds.had.co.nz/transform.html
I have searched extensively but not found an answer to this question on Stack Overflow.
Lets say I have a data frame a.
I define:
a <- NULL
a <- as.data.frame(a)
If I wanted to add a column to this data frame as so:
a$col1 <- c(1,2,3)
I get the following error:
Error in `$<-.data.frame`(`*tmp*`, "a", value = c(1, 2, 3)) :
replacement has 3 rows, data has 0
Why is the row dimension fixed but the column is not?
How do I change the number of rows in a data frame?
If I do this (inputting the data into a list first and then converting to a df), it works fine:
a <- NULL
a$col1 <- c(1,2,3)
a <- as.data.frame(a)
The row dimension is not fixed, but data.frames are stored as list of vectors that are constrained to have the same length. You cannot add col1 to a because col1 has three values (rows) and a has zero, thereby breaking the constraint. R does not by default auto-vivify values when you attempt to extend the dimension of a data.frame by adding a column that is longer than the data.frame. The reason that the second example works is that col1 is the only vector in the data.frame so the data.frame is initialized with three rows.
If you want to automatically have the data.frame expand, you can use the following function:
cbind.all <- function (...)
{
nm <- list(...)
nm <- lapply(nm, as.matrix)
n <- max(sapply(nm, nrow))
do.call(cbind, lapply(nm, function(x) rbind(x, matrix(, n -
nrow(x), ncol(x)))))
}
This will fill missing values with NA. And you would use it like: cbind.all( df, a )
You could also do something like this where I read in data from multiple files, grab the column I want, and store it in the dataframe. I check whether the dataframe has anything in it, and if it doesn't, create a new one rather than getting the error about mismatched number of rows:
readCounts = data.frame()
for(f in names(files)){
d = read.table(files[f], header=T, as.is=T)
d2 = round(data.frame(d$NumReads))
colnames(d2) = f
if(ncol(readCounts) == 0){
readCounts = d2
rownames(readCounts) = d$Name
} else{
readCounts = cbind(readCounts, d2)
}
}
if you have an empty dataframe, called for example df, in my opinion another quite simple solution is the following:
df[1,]=NA # ad a temporary new row of NA values
df[,'new_column'] = NA # adding new column, called for example 'new_column'
df = df[0,] # delete row with NAs
I hope this may help.
I am trying to compile data from several files using for loops in R. I would like to get all the data into one table. Following calculation is just an example.
library(reshape)
dat1 <- data.frame("Specimen" = paste("sp", 1:10, sep=""), "Density_1" = rnorm(10,4,2), "Density_2" = rnorm(10,4,2), "Density_3" = rnorm(10,4,2))
dat2 <- data.frame("Specimen" = paste("fg", 1:10, sep=""), "Density_1" = rnorm(10,4,2), "Density_2" = rnorm(10,4,2))
dat <- c("dat1", "dat2")
for(i in 1:length(dat)){
data <- get(dat[i])
melt.data <- melt(data, id = 1)
assign(paste(dat[i], "tbl", sep=""), cast(melt.data, ~ variable, mean))
}
rbind(dat1tbl, dat2tbl)
What is the smoothest way to add an extra column into dat2? I would like to get the same column name ("Density_3" in this case) and fill it up with zeros, if it does not already exist. Assume that I have ~100 tables with number of columns (Density_1, 2, 3 etc) varying between 5 and 6.
I tried following, but it didn't work:
if(names(data) %in% "Density_3" == FALSE){
dat.all$Density_3 <- 0
} else {
dat.all$Density_3 <- dat.all$Density3}
Another one: is there a smooth way to rbind() the tables? It seems that rbind(get(dat)) does not work.
After staring at this question for a while I think its intent may have been obscured by the unnecessary get and assign manipulations. And I think the answer is pylr::rbind.fill
I would have constructed "dat", not as a character vector but as a list of two dataframes, used aggregate( ..., FUN=mean) (because I haven't gotten on the reshape2/plyr bus, except for melt and rbind.fill that is ) and then do.call(rbind.fill, ...) on the resulting list. At any rate this is what I think you want. I do not think it is a good idea to add in zeros for what are really missing values.
> rbind.fill(dat1tbl, dat2tbl)
value Density_1 Density_2 Density_3
1 (all) 5.006709 4.088988 2.958971
2 (all) 4.178586 3.812362 NA