How to coerce a character column to a list column - r

I am trying to bind data frames rows. I generate some data frame with list columns after aggregation but some are character. I can't find a way to bind them. I tried converting the character column using as.list() but that didn't work.
library(dplyr)
df1 <- data.frame(a = c(1,2,3),stringsAsFactors = F)
df1$b <- list(c("1","2"),"4",c("5","6"))
> df1
a b
1 1 1, 2
2 2 4
3 3 5, 6
df2 <- data.frame(a=c(4,5),b=c("9","12"),stringsAsFactors = F)
> df2
a b
1 4 9
2 5 12
dplyr::bind_rows(df2,df1)
Error in bind_rows_(x, .id) :
Column `b` can't be converted from character to list

I don't know the dplyr library well, but using base R's rbind() below seems to be working:
df1 <- data.frame(a = c(1,2,3),stringsAsFactors = F)
df1$b <- list(c("1","2"),"4",c("5","6"))
df2 <- data.frame(a=c(4,5),b=c("9","12"),stringsAsFactors = F)
result <- rbind(df1, df2)
class(result$a)
[1] "numeric"
class(result$b)
[1] "list"
Demo
If you wanted to get this working with bind_rows(), start by looking at the error message. It looks like dplyr doesn't like that one data frame has character data while the other has list data. You could try converting the character column to list and then call bind_rows, e.g.
df2$b <- as.list(df2$b)
dplyr::bind_rows(df2,df1)

Related

Mutating several columns of many dataframes with For loop or Apply

I'm trying to use a loop or an apply family solution for the next problem. I have few dataframes such as:
df1 <- data.frame(a = c(1,2,3,NA,NA,NA,NA,NA,9,NA),b = c(1,2,3,4,NA,NA,NA,8,9,10),c = c(1,2,3,NA,NA,NA,7,8,NA,NA))
df2 <- data.frame(a = c(1,2,3,4,5,6,NA,NA,NA,10),b = c(1,2,3,4,NA,NA,NA,8,9,10),c = c(1,2,3,NA,NA,NA,7,8,NA,NA))
df5 <- data.frame(a = c(1,2,3,4,5,6,NA,NA,9,10),b = c(1,2,3,4,5,6,NA,8,9,10),c = c(1,2,3,NA,NA,NA,7,8,9,NA))
where Im trying to use na.approx to fill in some NA gaps. What I had in mind is:
l <- c(1,2,5)
for (i in l){
df[[i]] <- df[[i]] %>% mutate(a = na.approx(a, na.rm = FALSE))
df[[i]] <- df[[i]] %>% mutate(b = na.approx(b, na.rm = FALSE))
df[[i]] <- df[[i]] %>% mutate(c = na.approx(c, na.rm = FALSE))
}
with this example Im getting the following error:
Error in UseMethod("mutate") :
no applicable method for 'mutate' applied to an object of class "c('double', 'numeric')"
and with my actual data Im getting this error:
Error in `vectbl_as_col_location2()`:
! Can't extract columns past the end.
i Location 13101 doesn't exist.
i There are only 16 columns.
where "13101" would be part of a dataframe named "df13101".
When I check class of dataframes, I get
[1] "data.frame"
for the example but my actual dataframe I get
[1] "grouped_df" "tbl_df" "tbl" "data.frame"
and when I check the type of each variable I want to mutate all are numeric (example and real ones).
I need to understand how to properly call these dataframes and what problems I could face because of the data class or the usage of mutate. I've tried using mapply but I'm very new to R and I'm barely learning about the whole apply family.
Any help would be great, thanks for reading!
The code in the question has these problems:
df[[1]] is not the same as df1. The first one refers to the first column of df (which does not exist) and the second one is the valid input. Instead, if e is the environment where df1, etc. are located then we can refer to df1 as e[["df1"]] in terms of the string "df1".
There is no point in applying na.approx separately to each column since na.approx can handle an entire numeric data frame at once.
This may or may not be a problem for you but note that the code overwrites df1, etc. so if you want to test it again after running it then it will be necessary to recreate the original df1, etc. You may wish to use lists as shown in the second approach below instead.
Below we assume that the input data frames are in the global environment, i.e. sitting in your workspace. (Replace the e <- ... line with e <- environment() if the data frames are in the current, rather than global, environment. If the data frames were defined and located only within a function and they are being referenced within the same function that would be the case.)
e[[nm]] refers to the object whose name in environment e is given by the value of the character string held in the nm variable. We then apply na.approx to that and assign it back. Note that na.approx returns a matrix when applied to a data.frame so we use [] on the left hand side to insert the values from the matrix into the data frame.
library(zoo)
e <- .GlobalEnv
nms <- paste0("df", l)
for (nm in nms) e[[nm]][] <- na.approx(e[[nm]], na.rm = FALSE)
Alternately put the data frames in a named list L
L <- mget(nms) # nms defined above
for (nm in nms) L[[nm]][] <- na.approx(L[[nm]], na.rm = FALSE)
It is easier to do this if the dataframes are stored in a list. You can then apply the function to each numeric column.
library(dplyr)
library(zoo)
l <- c(1,2,5)
list_of_data <- mget(paste0('df', l))
list_of_data <- purrr::map(list_of_data, ~.x %>%
mutate(across(where(is.numeric),
~na.approx(.x, na.rm = FALSE))))
list_of_data
#$df1
# a b c
#1 1 1 1
#2 2 2 2
#3 3 3 3
#4 4 4 4
#5 5 5 5
#6 6 6 6
#7 7 7 7
#8 8 8 8
#9 9 9 NA
#10 NA 10 NA
#$df2
# a b c
#1 1 1 1
#2 2 2 2
#3 3 3 3
#4 4 4 4
#...
#...
If you want the new values to be reflected in the actual dataframes again use list2env.
list2env(list_of_data, .GlobalEnv)

How to modify a list of data.frame and then output the data.frame

I want to create a second column in each of a list of data.frames that is just a duplicate of the first column, and then output those data.frames:
store the data frames:
> FileList <- list(DF1, DF2)
Add another column to each data frame:
> ModifiedDataFrames <- lapply(1:length(FileList), function (x) {FileList[[x]]$Column2 == FileList[[x]]$Column1})
but ModifiedDataFrames[[1]] just returns a list which contains what I assume is the content from DF1$Column1
What am I missing here?
There are a few problems with your code. First, you are using the equivalence operator == for assignment and second you are not returning the correct element from your function. Here is a possible solution:
df1 <- data.frame(Column1 = c(1:3))
df2 <- data.frame(Column1 = c(4:6))
FileList <- list(df1, df2)
ModifiedDataFrames <- lapply(FileList, function(x) {
x$Column2 <- x$Column1
return(x)
})
> ModifiedDataFrames
[[1]]
Column1 Column2
1 1 1
2 2 2
3 3 3
[[2]]
Column1 Column2
1 4 4
2 5 5

dplyr::bind_rows(...) vs. do.call(rbind, ...) for lists of symbols

Suppose I have the below data frames and character vector of names:
x <- data.frame(val = 1)
y <- data.frame(val = 2)
nms <- c("x", "y")
I want to simply row bind the data frames together. I can do this with do.call and rbind without issue:
library(dplyr)
do.call(rbind, syms(nms))
# val
#1 1
#2 2
However if I try dplyr::bind_rows I get a strange error telling me that argument 1 must be a data frame event though it is a data frame:
bind_rows(syms(nms))
#Error: Argument 1 must be a data frame or a named atomic vector, not a data.frame
Would appreciate if someone could tell why this occurs.
We can use mget to return the datasets in a list and then do the bind_rows
library(dplyr)
mget(nms) %>%
bind_rows
# val
#1 1
#2 2

Variable as a column name in data frame

Is there any way to use string stored in variable as a column name in a new data frame? The expected result should be:
col.name <- 'col1'
df <- data.frame(col.name=1:4)
print(df)
# Real output
col.name
1 1
2 2
3 3
4 4
# Expected output
col1
1 1
2 2
3 3
4 4
I'm aware that I can create data frame and then use names() to rename column or use df[, col.name] for existing object, but I'd like to know if there is any other solution which could be used during creating data frame.
You cannot pass a variable into the name of an argument like that.
Instead what you can do is:
df <- data.frame(placeholder_name = 1:4)
names(df)[names(df) == "placeholder_name"] <- col.name
or use the default name of "V1":
df <- data.frame(1:4)
names(df)[names(df) == "V1"] <- col.name
or assign by position:
df <- data.frame(1:4)
names(df)[1] <- col.name
or if you only have one column just replace the entire names attribute:
df <- data.frame(1:4)
names(df) <- col.name
There's also the set_names function in the magrittr package that you can use to do this last solution in one step:
library(magrittr)
df <- set_names(data.frame(1:4), col.name)
But set_names is just an alias for:
df <- `names<-`(data.frame(1:4), col.name)
which is part of base R. Figuring out why this expression works and makes sense will be a good exercise.
In addition to ssdecontrol's answer, there is a second option.
You're looking for mget. First assign the name to a variable, then the value to the variable that you have previously assigned. After that, mget will evaluate the string and pass it to data.frame.
assign("col.name", "col1")
assign(paste(col.name), 1:4)
df <- data.frame(mget(col.name))
print(df)
col1
1 1
2 2
3 3
4 4
I don't recommend you do this, but:
col.name <- 'col1'
eval(parse(text=paste0('data.frame(', col.name, '=1:4)')))

rbind list of data frames with one column of characters and numerics

I have a list of two data frames with the same column names, but different number of rows, rbind.fill can help to put them together into a big data frame, but the problem is that the first column in df1 is numeric data, and df2 is character data, when they are merged, the character data all become 1, I've searched around, but didn't get the problem fixed, any help would be appreciated. A small example would be:
station <- c(1:10)
value <- c(101:110)
df1 <- data.frame(cbind(station,value))
station <- c("a","b")
value <- c(101:102)
df2 <- data.frame(cbind(station,value))
data1 <- rbind.fill(df1,df2)
I would like the characters remain as characters, thanks.
It's not character if it's turning it to numeric, it's a factor. Use str to check this. This will get you going:
df2$station <- as.character(df2$station)
EDIT: or use keep R from converting strings to factors when you crate the data frame:
df2 <- data.frame(cbind(station,value), stringsAsFactors = FALSE)
Console output:
> station <- c(1:10)
> value <- c(101:110)
> df1 <- data.frame(cbind(station,value))
> station <- c("a","b")
> value <- c(101:102)
> df2 <- data.frame(cbind(station,value))
> df2$station <- as.character(df2$station)
>
> library(plyr)
> data1 <- rbind.fill(df1,df2)
> data1
station value
1 1 101
2 2 102
3 3 103
4 4 104
5 5 105
6 6 106
7 7 107
8 8 108
9 9 109
10 10 110
11 a 1
12 b 2
It seems like you don't have to use rbind.fill because your data frames do have the same column names. Just try rbind which will work just fine.
You statet that you are using a list, so you might want to know this little trick with do.call:
df1 <- data.frame(station = 1:10, value = 101:110)
df2 <- data.frame(station = c("a","b"), value = 101:102)
(data1 <- rbind.fill(df1,df2))
rbind(df1,df2)
dfl <- list(df1,df2)
do.call("rbind",dfl)
Note that I have kind of "cleaned" your example code. You don't have to concatenate everything.

Resources