I have a list which contains sub tables. I want to be able to use purrr to export the tables individually with the name of the item in the list - in the case below i would get three files with each plant named with today's date
library('purrr')
library('tidyverse')
mytest <- iris
mylist <- split(mytest,f = mytest$Species)
names(mylist)
# basically pseudo code for explanation purposes
write_excel_csv(mylist[1], names(mylist[1]))
I'm only learning how to use it effectively at the moment so any help with the explanation and why you did it this way would be great
I get that i could write a for loop to just iterate through the list but i want to use this as a learning experience to start into purrr
Thank you for your time
Map from base R will work fine for something like this:
Map(write.csv, mylist, sprintf("%s-%s.csv", names(mylist), Sys.Date()))
list.files(pattern = "*.csv")
# [1] "setosa-2017-02-13.csv" "versicolor-2017-02-13.csv" "virginica-2017-02-13.csv"
Alternatively, walk2 (and probably several other functions in purrr) could be used in this manner.
Related
The result of running the getOptionChain is a list of symbols that includes both Calls and Puts.
I would like to subset the data and create a dataset that will include only the Puts.
This is the code I'm running to get the option chain. Now I need to subset and create a new dataset only for Puts.
library(quantmod)
Symbols<-c ("AA","AAL","AAOI","ABBV","ABC","ABNB")
Options.20221111 <- lapply(Symbols, getOptionChain)
names(Options.20221111) <- Symbols
What is the best approach to get the Puts alone?
When working with lists, lapply is your friend.
only_puts_list <- lapply(Options.20221111, function(x) x$puts)
This will create a list with only the puts in there.
I've just started learning R so forgive me for my ignorance! I'm reading in lots of .csv files, each of which correlates to a different year (2010-2019). I then filter down the .csv files based on a variable within one of the columns (because the datasets are very large. Currently I am using the below code to do this and then repeating it for each year:
data_2010 <- data.table::fread("//Project/2010 data/2010 data.csv", select = c("date", "id", "type"))
data_b_2010 <- data_2010[which(data_2010$type=="ABC123")]
rm(data_2010)
What I would like to do is use a For-loop to create new object data_20xx for each year, and then read in the .csv files (and apply the filter of "type") for each year too.
I think I know how to create the objects in a For-loop but not entirely sure how I would also assign the .csv files and change the filepath string so it updates with each year (i.e. "//Project/2010 data/2010 data.csv" to "//Project/2011 data/2011 data.csv").
Any help would be greatly appreciated!
Next time please provide a repoducible example so we can help you.
I would use data.table which contains specialized functions to do what you want.
library(data.table)
setwd("Project")
allfiles <- list.files(recursive = T, full.names = T)
allcsv <- allfiles[grepl(".csv", allfiles)]
data_list <- list()
for(i in 1:length(allcsv)) {
print(paste(round(i/length(allcsv),2)))
data_list[i] <- fread(allcsv[i])
}
data_list_filtered <- lapply(data_list, function(x) {
y <- data.frame(x)
return(y[which(y["type"]=="ABC123",)])
})
result <- rbindlist(data_list_filtered)
First, list.files will tell you all the files contained in your working dir by default.
Second, read each csv file into the data_list list using the fast and efficient fread function.
Third, do the filtering within a loop, as requested.
Fourth, use rbindlist from data.table to rbind all of these data.table's.
Finally, if you are not familiar with the data.table syntax, you can run setDF(result) to convert your results back to a data.frame.
I strongly encourage you to learn the data.table syntax as it is quite powerful and efficient for tabular data manipulations. These vignettes will get you started.
I am new to R and need to create one dataframe from 80 .xlsx files that mostly share the same columns and are all in the same folder. I want to bind all these files efficiently in a manner that would work if I added or removed files from the folder later. I want to do this without converting the files to .csv, unless someone can show me how to that efficiently for large numbers of files within R itself.
I've previously been reading files individually using the read_excel function from the readxl package. After, I would use rbind to bind them. This was fine for 10 files, but not 80! I've experimented with many solutions offered online however none of these seem to work, largely because they are using functions other than read_excel or formats other than .xlsx. I haven't kept track of many of my failed attempts, so cannot offer code other than one alternate method I tried to adapt to read_excel from the read_csv function.
#Method 1
library(readxl)
library(purr)
library(dplyr)
library(tidyverse)
file.list <- list.files(pattern='*.xlsx')
alldata <- file.list %>%
map(read_excel) %>%
reduce(rbind)
#Output
New names:
* `` -> ...2
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
Any code on how to do this would be greatly appreciated. Sorry if anything is wrong about this post, it is my first one.
UPDATE:
Using the changes suggested by the answers, I'm now using the code:
file.list <- list.files(pattern='*.xlsx')
alldata <- file.list %>%
map_dfr(read_excel) %>%
reduce(bind_rows)
This output now is as follows:
New names:
* `` -> ...2
Error: Column `10.Alert.alone` can't be converted from numeric to character
This happens regardless of which type of bind() function I use in the reduce() slot. If anyone can help with this, please let me know!
You're on the right track here. But you need to use map_dfr instead of plain-vanilla map. map_dfr outputs a data frame (or actually tibble) for each iteration, and combines them via bind_rows.
This should work:
library(readxl)
library(tidyverse)
file.list <- list.files(pattern='*.xlsx')
alldata <- file.list %>%
map_dfr(~read_excel(.x))
Note that this assumes your files all have consistent column names and data types. If they don't, you may have to do some cleaning. (One trick I've used in complex cases is to add a %>% mutate_all(as.character) to the read_excel command inside the map function. That will turn everything into characters, and then you can convert the data types from there.)
this should get you there/close...
library(data.table)
library(readxl)
#create files list
file.list <- list.files( pattern = ".*\\.xlsx$", full.names = TRUE )
#read files to list of data.frames
l <- lapply( l, readxl::read_excel )
#bind l together to one larger data.table, by columnname, fill missing with NA
dt <- data.table::rbindlist( l, use.names = TRUE, fill = TRUE )
Try using map_dfr.
alldata <- file.list %>%
map_dfr(read_excel)
I'm trying to update an xml file with new nodes using xml2. It's easy if I just write everything manually as text,
oldXML <- read_xml("<Root><Trial><Number>3.14159 </Number><Adjective>Fast </Adjective></Trial></Root>")
but I'm developing an application that will run calculations and then put those values into the xml, so I need a mix of character and variables. It ends up looking like:
var1 <- 4.567
var2 <- "Slow"
newLine <- read_xml(paste0("<Trial><Number>",var1," </Number><Adjective>",var2," </Adjective></Trial>"))
xml_add_child(oldXML,newLine)
I suspect there's a much less kludgy way to do this than using paste0, but I can't get anything else to work. I'd like to be able to just instruct it to update the xml by reference to the dataframe, such that it can create new trials:
<Trial>
<Number>df$number[1]</Number>
<Adjective>df$adjective[1]</Adjective>
</Trial>
<Trial>
<Number>df$number[2]</Number>
<Adjective>df$adjective[2]</Adjective>
</Trial>
Is there any way to create new Trial nodes in approximately that fashion, or at least more naturally than using paste0 to insert variables? Is this something the XML package does better than xml2?
If you have your new values in a data.frame like this:
vars <- data.frame(Number = c(4.567, 3.211),
Adjective = c("Slow", "Slow"),
stringsAsFactors = FALSE)
you can convert it to a list of xml_document's as follows:
vars_xml <- lapply(purrr::transpose(vars),
function(x) {
as_xml_document(list(Trial = lapply(x, as.list)))
})
Then you can add the new nodes to the original xml:
for(trial in vars_xml) xml_add_child(oldXML, trial)
I don't know that this is better than your paste approach. Either way, you can wrap it in a function so you only have to write the ugly code once.
Here's a solution that builds on #Ista's excellent answer. Basically, I've dropped the first lapply in favor of purrr::map (we could probably replace the second lapply with a map, but I couldn't find a more readable way to accomplish that).
library(purrr)
vars_xml <- transpose(vars) %>%
map(~as_xml_document(list(Trial = lapply(.x, as.list))))
I noticed I encounter this task quite often when programming in R, yet I don't think I implement it "pretty".
I get a list of file names, each containing a table or a simple vector. I want to read all the files into some construct (list of tables?) so I can later manipulate them in simple loops.
I know how to read each file into a table/vector, but I do not know how to put all these objects together in one structure (list?).
Anyway, I guess this is VERY routine so I'll be happy to hear about your tricks.
Do all the files have the same # of columns? If so, I think this should work to put them all into one dataframe.
library(plyr)
x <- c(FILENAMES)
df <- ldply(x, read.table, sep = "\t", header = T)
If they don't have all the same columns, then use llply() instead
Or, without plyr:
filenames <- c("file1.txt", "file2.txt", "file3.txt")
mydata <- array(list(NULL))
for (i in 1:length(filenames))
{
mydata[[i]] <- read.table(filenames[i])
}
You can have a look at my answer here: Merge several data.frames into one data.frame with a loop.