Extracting outputs of different length from lapply - r

Could anyone help me to sort out this problem. I am using lapply in a following code provided by #Arun:
out <- lapply(1:length(f1), function(f.idx) {
df1 <- read.delim(f1[f.idx], header = T)
df2 <- read.delim(f2[f.idx], header = T)
df3 <- read.delim(f3[f.idx], header = T)
idx.v <- get_idx(df1)
result <- get_result(idx.v, df2, df3)
})
Now, out is a list of 110 files. These output files are of different lengths, so I cannot use as.data.frame(do.call(rbind, out)).
Is there a way to save each file as separate file in loop-like manner or do I have to do it manually (e.g., out[1], out[2] etc ...).

What are you trying to achieve by do.call(rbind...? That is for combining data in a list. I'm not sure why you would stick an "as.data.frame" in there. If you have a list of data.frames with the same columns, but the number of rows differ and you essentially want to "stack" those data.frames on top of each other, then you should be able to use the following to get one big data.frame and then save that single object:
do.call(rbind, out)
It sounds like you have different data.frames in a list named "out" and are trying to save your data.frames as individual files in your working directory. If that is the case, try something like:
lapply(names(out),
function(x) write.csv(out[[x]],
file = paste(x, ".csv", sep = "")))
If the names of the data.frames in the list are not unique, you might need to take a different approach.
If you link to your earlier related questions, that might be better than simply mentioning who shared the code with you.

Related

Is there a way in R to read multiple excel files, change columns to character, and then merge them?

New-ish to R and I feel like this has a simple solution, but I can't figure it out.
I have 59 excel files that I want to combine. However, 4 of the columns have a mix of dates and NA's (depending on if the study animal is a migrant or not) so R won't let me combine them because some are numeric and some are character. I was hoping to read all of the excel files into R, convert those 4 columns in each file to as.character, and then merge them all. I figured a loop could do this.
Anything I find online has me typing out the name for each read file, which I don't really want to do for 59 files. And once I do have them read into R and those columns converted, can I merge them from R easily? Sorry if this is simple, but I'm not sure what to do that would make this easier.
You can do this quickly using lapply. It was unclear exactly how you wanted to combine the files (a true merge by a common variable, or append the rows, or append the columns). Either way, I do not believe you need to change anything to as.character for any of the below approaches (2A - 2C) to work:
library(readxl)
# 1. Read in all excel files in a folder given a specific filepath
filepath <- "your/file/path/"
file_list <- list.files(path = filepath, pattern='*.xlsx')
df_list <- lapply(file_list, read_excel)
# 2a. Merge data (assuming a unique identifier, i.e "studyid")
final_data <- Reduce(function(...) merge(..., by = "studyid", all = TRUE), df_list)
# 2b. If all files have the same columns, append to one long dataset
final_data <- do.call(rbind, df_list)
# 2c. If you want to make a wide dataset (append all columns)
final_data <- do.call(cbind, df_list)

When binding together a list of csv tables with purrr as dataframe i want to include a tag column conditional on a string in each csv

I am binding togeter participant raw data from 33 csv files with the following code
filenames <- list.files(pattern = '*.csv', recursive = TRUE)
result <- purrr::map_df(filenames, read.csv, .id = 'id')
This works great. Now I need to to include a tag per participant(csv) in the final dataframe to make clear in which of several randomized conditions they were in.
I want to make it conditional on the first word in my first column of each .csv, as each participant got one of several randomized sequences of words.
I thought of something with ifelse() but not sure how to include this is in above code. I am a total R noob, any help is appreciated!
I think this should achieve what you're looking for:
result <- lapply(result, function(x) { x$tag <- x[1, 1]; x })
do.call(rbind, result)

Loop to create one dataframe from multiple URLs

I have a character vector with multiple URLs that each host a csv of crime data for a certain year. Is there an easy way to create a loop that will read.csv and rbind all the dataframes without having to run read.csv 8 times over? The vector of URLs is below
urls <- c('https://opendata.arcgis.com/datasets/73cd2f2858714cd1a7e2859f8e6e4de4_33.csv',
'https://opendata.arcgis.com/datasets/fdacfbdda7654e06a161352247d3a2f0_34.csv',
'https://opendata.arcgis.com/datasets/9d5485ffae914c5f97047a7dd86e115b_35.csv',
'https://opendata.arcgis.com/datasets/010ac88c55b1409bb67c9270c8fc18b5_11.csv',
'https://opendata.arcgis.com/datasets/5fa2e43557f7484d89aac9e1e76158c9_10.csv',
'https://opendata.arcgis.com/datasets/6eaf3e9713de44d3aa103622d51053b5_9.csv',
'https://opendata.arcgis.com/datasets/35034fcb3b36499c84c94c069ab1a966_27.csv',
'https://opendata.arcgis.com/datasets/bda20763840448b58f8383bae800a843_26.csv'
)
The function map_dfr from the purrr package does exactly what you want. It applies a function to every element of an input (in this case urls) and binds together the result by row.
library(tidyverse)
map_dfr(urls, read_csv)
I used read_csv() instead of read.csv() out of personal preference but both will work.
In base R:
result <- lapply(urls, read.csv, stringsAsFactors = FALSE)
result <- do.call(rbind, result)
I usually take this approach as I want to save all the csv files separately in case later I need to do further analysis on each of them. Otherwise, you don't need a for-loop.
for (i in 1:length(urls)) assign(paste0("mycsv-",i), read.csv(url(urls[i]), header = T))
df.list <- mget(ls(pattern = "mycsv-*"))
#use plyr if different column names and need to know which row comes from which csv file
library(plyr)
df <- ldply(df.list) #you can remove first column if you wish
#Alternative solution in base R instead of using plyr
#if they have same column names and you only want rbind then you can do this:
df <- do.call("rbind", df.list)

R: Assigning variable names based on imported filenames

I have a list of filenames that were found by searching the working directory. I want to either make one data frame with multiple elements that can be selected from or multiple data frames. To select either parts of one data frame or pick from multiple data frames, I want to name them using a part of the associated filename.
Currently, I set filenames using list.files and set up the data frame using lapply with read.csv
filenames = list.files(recursive=TRUE,pattern="*dat.csv",full.names=FALSE)
data = lapply(filenames,function(i){
read.csv(i,stringsAsFactors=FALSE)
})
Can someone explain to me the best way to go about this data import and name assignment?
A good way to store this would be as a single, combined data frame with a column describing the original file, let's say type:
data_frames = lapply(filenames,function(i){
ret <- read.csv(i,stringsAsFactors=FALSE)
ret$type <- gsub("dat.csv$", "", i)
ret
})
data = do.call(rbind, data_frames)
Or shorter, with plyr:
library(plyr)
data = ldply(filenames, read.csv, stringsAsFactors = FALSE, .id = "type")
data$type <- gsub("dat.csv$", "", data$type)
That way you could extract whatever subset you wanted with:
# to get all lines from, say, the AAAdat.csv file
subset(data, type == "AAA")
You could store each dataset as an individual variable with a name like AAA, but you shouldn't, because it's a bad idea to use your variable names to store information.
(Note that this assumes your datasets share most, or at least some, columns. If they have entirely different structures, this is not an appropriate approach).

Storing multiple data frames into one data structure - R

Is it possible to have multiple data frames to be stored into one data structure and process it later by each data frame? i.e. example
df1 <- data.frame(c(1,2,3), c(4,5,6))
df2 <- data.frame(c(11,22,33), c(44,55,66))
.. then I would like to have them added in a data structure, such that I can loop through that data structure retrieving each data frame one at a time and process it, something like
for ( iterate through the data structure) # this gives df1, then df2
{
write data frame to a file
}
I cannot find any such data structure in R. Can anyone point me to any code that illustrates the same functionality?
Just put the data.frames in a list. A plus is that a list works really well with apply style loops. For example, if you want to save the data.frame's, you can use mapply:
l = list(df1, df2)
mapply(write.table, x = l, file = c("df1.txt", "df2.txt"))
If you like apply style loops (and you will, trust me :)) please take a look at the epic plyr package. It might not be the fastest package (look data.table for fast), but it drips with syntactic sugar.
Lists can be used to hold almost anything, including data.frames:
## Versatility of lists
l <- list(file(), new.env(), data.frame(a=1:4))
For writing out multiple data objects stored in a list, lapply() is your friend:
ll <- list(df1=df1, df2=df2)
## Write out as *.csv files
lapply(names(ll), function(X) write.csv(ll[[X]], file=paste0(X, ".csv")))
## Save in *.Rdata files
lapply(names(ll), function(X) {
assign(X, ll[[X]])
save(list=X, file=paste0(X, ".Rdata"))
})
What you are looking for is a list.
You can use a function like lapply to treat each of your data frames in the same manner sperately. However, there might be cases where you need to pass your list of data frames to a function that handles the data frames in relation to each other. In this case lapply doesn't help you.
That's why it is important to note how you can access and iterate the data frames in your list. It's done like this:
mylist[[data frame]][row,column]
Note the double brackets around your data frame index.
So for your example it would be
df1 <- data.frame(c(1,2,3), c(4,5,6))
df2 <- data.frame(c(11,22,33), c(44,55,66))
mylist<-list(df1,df2)
mylist[[1]][1,2] would return 4, whereas mylist[1][1,2] would return NULL. It took a while for me to find this, so I thought it might be helpful to post here.

Resources