how to use bind_rows with tibble? - r

I am trying to use bind_rows and tibble from tidyverse, and getting unexpected results.
When I combine several data frames with bind_rows and then transform them to a tibble, the column names get messed up:
library(tidyr)
pred.models <- c('1.csv', '2.csv', '3.csv')
prediction.slides <- list()
for (modelid in pred.models){
tmp <- read.csv(modelid)
tmp[,'modelid'] <- modelid
prediction.slides[[length(prediction.slides)+1]] <- (tmp)
}
prediction.slides <- (bind_rows(prediction.slides))
typeof(prediction.slides)
# -> list
# now let's see what we got:
prediction.slides
# -> `bind_rows(prediction.slides)`$hash $class_prob $modelid
However, when I try following:
pred.models <- c('1.csv', '2.csv', '3.csv')
prediction.slides <- list()
for (modelid in pred.models){
tmp <- read.csv(modelid)
tmp[,'modelid'] <- modelid
############################################ Changed here:
prediction.slides[[length(prediction.slides)+1]] <- tibble(tmp)
}
prediction.slides <- (bind_rows(prediction.slides))
I am getting an error Error: Argument 1 can't be a list containing data frames on the last line. Which is very strange given that bind_rows is for combining list of data frames according to the docs.
Any idea how to do it correctly and get a nice tibble as output?
UPD: csv files look like following:
hash,class_prob
1578d8,0.9451976000
1c7644,0.4519760001
dc7358,0.5197600012

The reason is that tibble() doesn't do what you think it does. You need as_tibble() instead. tibble() is used to construct data.frames from given inputs, while as_tibble() transforms the input into a tibble, which is what you want.

Related

Loop in R is only adding the first and last set of data to dataframe

I'm trying to loop through an API, to get data from specific sitecodes and merge it into one dataframe, and for some reason the following code is only getting the original dataframe (RoyalLondon_List) and the last sensor (CDP0004)
SiteCodes_all <- c('CLDP0002', 'CLDP0003', 'CLDP0004')
for(i in 1:length(SiteCodes_all)) {
allsites <- paste0(Base,Node,SiteCodes_all[i],'/',Pollutant,StartTime,EndTime,Averaging,Key)
temp_raw <- GET(allsites)
temp_list <- fromJSON(rawToChar(temp_raw$content))
df <- rbind(RoyalLondon_List, temp_list)
}
Any help appreaciated!
The above code combines the previous data and not the looped API url
Try use this
df <- RoyalLondon_List
for(i in 1:length(SiteCodes_all)) {
allsites <- paste0(Base,Node,SiteCodes_all[i],'/',Pollutant,StartTime,EndTime,Averaging,Key)
temp_raw <- GET(allsites)
temp_list <- fromJSON(rawToChar(temp_raw$content))
df <- dplyr::bind_rows(df, temp_list)
}
dplyr::bind_rows() is a function in the dplyr package that allows you to combine multiple dataframes by appending the rows of one dataframe to the bottom of another.see here to more info about it.

How to name data frame in for loops using object?

I have an object that contains list of lab tests and based on the length of the object, I have created a FOR loop that processes scripts. During each loop, R should create a data frame using list in that object. Please see below.
adlb <- data.frame(subjid = c(1:20), aval = c(100:119))
adlb$paramcd <- ifelse(adlb$subjid <= 10, "ALT", "AST")
lab_list <- unique(filter(adlb, !is.na(aval))$paramcd)
for (i in 1:length(lab_list))
{
lab_name <- unlist(lab_list)[[i]]
print(lab_name)`
**???** <- adlb %>%
dplyr::filter(paramcd == lab_name) %>%
drop_na(aval)
}
When I run it, it should first create data frame named ALT followed by AST. What should I replace ??? with?
Only reason why I would prefer it this way is because it helps me to review data in question and debug scripts when needed.
Thank you in advance.
I tried lab_name[[i]] and few other options but it resulted in either error or incorrect data frame name.
I think this might help:
# example dataframes
df1 <- iris
df2 <- mtcars
df3 <- iris
#put them into list
mylist <- list(df1,df2,df3)
#give names to list
names(mylist) <- c("df_name1","df_name2","df_name3")
#put dataframes into global env
list2env(mylist ,.GlobalEnv)

Apply an `as.character()` function to a list of dataframes

So essentially I have a list of dataframes that I want to apply as.character() to.
To obtain the list of dataframes I have a list of files that I read in using a map() function and a read funtion that I created. I can't use map_df() because there are columns that are being read in as different data types. All of the files are the same and I know that I could hard code the data types in the read function if I wanted, but I want to avoid that if I can.
At this point I throw the list of dataframes in a for loop and apply another map() function to apply the as.character() function. This final list of dataframes is then compressed using bind_rows().
All in all, this seems like an extremely convoluted process, see code below.
audits <- list.files()
my_reader <- function(x) {
my_file <- read_xlsx(x)
}
audits <- map(audits, my_reader)
for (i in 1:length(audits)) {
audits[[i]] <- map_df(audits[[i]], as.character)
}
audits <- bind_rows(audits)
Does anybody have any ideas on how I can improve this? Ideally to the point where I can do everything in a single vectorised map() function?
For reproducibility you can use two iris datasets with one of the columns datatypes changed.
iris2 <- iris
iris2[1] <- as.character(iris2[1])
my_list <- list(iris, iris2)
as.character works on vector whereas data.frame is a list of vectors. An option is to use across if we want only a single use of map
library(dplyr)
library(purrr)
map_dfr(my_list, ~ .x %>%
mutate(across(everything(), as.character)))
I wanted to show a base R solution just incase if it helps anyone else. You can use rapply to recursively go through the list and apply a function. you can specify class and if you want to replace or unlist/list the returned object:
iris2 <- iris
iris2[1] <- as.character(iris2[1])
my_list <- list(iris, iris2)
mylist2 <- rapply(my_list, class = "ANY", f = as.character, how = "replace")
bigdf <- do.call(rbind, mylist2)

rownames on multiple dataframe with for loop in R

I have several dataframe. I want the first column to be the name of each row.
I can do it for 1 dataframe this way :
# Rename the row according the value in the 1st column
row.names(df1) <- df1[,1]
# Remove the 1st column
df1 <- df1[,-1]
But I want to do that on several dataframe. I tried several strategies, including with assign and some get, but with no success. Here the two main ways I've tried :
# Getting a list of all my dataframes
my_df <- list.files(path="data")
# 1st strategy, adapting what works for 1 dataframe
for (i in 1:length(files_names)) {
rownames(get(my_df[i])) <- get(my_df[[i]])[,1] # The problem seems to be in this line
my_df[i] <- my_df[i][,-1]
}
# The error is Could not find function 'get>-'
# 2nd strategy using assign()
for (i in 1:length(my_df)) {
assign(rownames(get(my_df[[i]])), get(my_df[[i]])[,1]) # The problem seems to be in this line
my_df[i] <- my_df[i][,-1]
}
# The error is : Error in assign(rownames(my_df[i]), get(my_df[[i]])[, 1]) : first argument incorrect
I really don't see what I missed. When I type get(my_df[i]) and get(my_df[[i]])[,1], it works alone in the console...
Thank you very much to those who can help me :)
You may write the code that you have in a function, read the data and pass every dataframe to the function.
change_rownames <- function(df1) {
row.names(df1) <- df1[,1]
df1 <- df1[,-1]
df1
}
my_df <- list.files(path="data")
list_data <- lapply(my_df, function(x) change_rownames(read.csv(x)))
We can use a loop function like lapply or purrr::map to loop through all the data.frames, then use dplyr::column_to_rownames, which simplifies the procedure a lot. No need for an explicit for loop.
library(purrr)
library(dplyr)
map(my_df, ~ .x %>% read.csv() %>% column_to_rownames(var = names(.)[1]))

input variable stored in list into loop in R

I'm sure there are much better ways of doing this, I'm open to suggestions.
I have these vectors:
vkt1 <- c("df1", "df2", "df3")
vector2 <- paste("sample", wSheatx, sep="_")
The first vector contains a list of the names of dataframes stored in the environment. These are stored as strings, but I'd like to call them as variable names.
The second vector is just the first one adding "sample" at the beggining, equivalent to:
vector2 <- c('sample_df1', 'sample_df2', 'sample_df3')
These strings from vector2 would serve as the names of new data frames to be created.
Alrighty, so now I want to do something like this:
for (i in 1:length(vector){ # meaning for i in 1,2,3
vector2[i] = data.frame(which(eval(parse(text = vkt1[i])) == "Some_String", arr.ind=TRUE))
addStyle(wb, vkt1[i], cols = 1:ncol(eval(parse(text = vkt1[i]))), rows = vector2[[i]][,1]+1, style = duppedStyle, gridExpand = TRUE)
}
It may look complicated, but the idea is to make a data frames named as the strings contained in vector2, being a subset of the data frames from vkt1 when "Some_String" is found.
Then, use that created data frame and add a style to the entire row when said string is present.
vector2[[i]][,1]+1 is intended to deploy as sample_df1[,1]+1 (in the first iteration)
Note that I'm using eval(parse(text = vkt1[i])) to get the variables from the strings of vkt1. So, say, eval(parse(text = vkt1[1])) is equal do df1 (the data frame, not the string)
Like this, the code gives the following error:
In file(filename, "r") :
cannot open file 'noCoinColor_Concat': No such file or directory
Been trying to get it working like so, but I'm beginning to feel this approach might be very wrong.
It is easier to manage code and data when you keep them in a list instead of separate dataframes.
You can use mget to get all the dataframes in vkt1 in a string and let's say you want to search for 'Some_String' in the first column of each dataframe, so you can do :
new_data <- lapply(mget(vkt1), function(df) df[df[[1]] == 'Some_String', ])
I haven't included the addStyle code here because I don't know from which package it is and what it does but you can easily include it in lapply's anonymous function.
Is it not easier to combine your data frames into a list and then use apply or map family functions to adjust your data frames?
data(mtcars)
df1 <- mtcars %>% filter(cyl == 4)
df2 <- mtcars %>% filter(cyl == 6)
df3 <- mtcars %>% filter(cyl == 8)
df_old_names <- c("df1", "df2", "df3")
df_new_names <- c("df_cyl_4", "df_cyl_6", "df_cyl_8")
df_list <- lapply(df_old_names, get)
names(df_list) <- df_new_names

Resources