I think I am missing something simple, but I am having trouble accessing elements of a list, in a lapply.
Problem: I have a number of files on a FTP I want to download and read. So I need to specify the location, download them and read them. All which I thought can be handled best with a few lists, but I can't really get it to work in my function.
I would like to be able to start with calling a lapply(lst,...) because I need both the variable name (a) and the url in the same function, to download & name them easily.
Code-example:
a <- "ftp://user:pass#url_A1"
b <- "ftp://user:pass#url_B1"
c <- "ftp://user:pass#url_C1"
d <- "ftp://user:pass#url_D1"
lst <- list(a, b, c, d)
names(lst) <- c("a", "b", "c", "d")
Desired goal:
print(lst[[1]]), ...., print(lst[[4]])
What I've tried:
lapply(lst,
function(x) print(x[[]])
)
# Error!
My real code looks something more like:
lapply(lst,
function(x) download.file(url = x[[]], # Error!
destfile = paste0(lok, paste0(names(x), ".csv")),
quiet = FALSE)
)
EDIT:
I know the x[[]] throws an error, it is just to illustrate what I would like to get.
Untested:
lapply(names(lst),function(x){
download.file(url = lst[[x]],
destfile = paste0(lok,paste0(x,".csv")),
quiet = FALSE)
}
This should work given lok is defined.
Related
I have again a list problem.
I have seven different dataframes stored inside a list. And six of these lists are together stored in another list. (Sounds complicated, I know :D)
So, e.g.
data(mtcars)
df1 <- tail(mtcars)
df2 <- mtcars[1:5, 2:10]
df3 <- mtcars
df4 <- head(mtcars)
lower_list1 <- list(df1, df2, df3, df4)
and then I have 5 other lower_lists (lower_list2, lower_list3, lower_list4)
that are stored in the list: upper_list
upper_list <- list(lower_list1, lower_list2, lower_list3, lower_list4)
and now I want to export all of these dataframes in own directories, that I created before with the help of a RegEx:
files <- str_extract(names(upper_list), pattern = "^([a-z])(_)([a-z])([1-9])")
for(i in 1:length(files)) {
dir.create(paste0("./Exports/Taxa-Tables/", files[i]))
}
what I tried so far is:
for (f in upper_list) { # f are the lists inside the upper list
lapply(seq_along(f),
function(i) write.table(f[[i]],
paste0("./parent/", str_extract(names(upper_list)[i],
pattern = "^([a-z])(_)([a-z])([1-9])")]),
row.names = FALSE, sep = "\t"))
}
I think the problem is somewhere here:
str_extract(names(upper_list)[i]
# I am not sure if it is names(upper_list)[[i]] or names(upper_list[[f]], both times I get the error. With the example outside the loop it works, there I wrote
`names(upper_list)[[1]]` #' to get the first list of the upper_list
Maybe one could include a command to get the index of the list?
The error I get is:
Error in file(file, ifelse(append, "a", "w")) :
cannot open the connection
In addition: Warning message:
In file(file, ifelse(append, "a", "w")) :
cannot open file './parent/lower_list1/': Permission denied
If I try the command outside the loop for one of the lower_lists, it works. Do you have any idea how to solve this problem? I hope it is understandable. If not, I could try to upload images that support my description?
Looking for you helpful ideas :)
Kathrin
You can try the codes below using the tidyverse package. Each lower list's data will be saved into their respective directory which I named d1 to d4.
library(tidyverse)
df <- tibble(upper = upper_list, dir_name = paste0("d", 1:4),
file_name = list(paste0("file_", 1:4, ".csv"))) %>%
mutate(dir_vec = map(dir_name, ~rep(.x, 4)),
path = map2(dir_vec, file_name, ~file.path(.x, .y)))
# create the 4 directories
walk(df$dir_name, dir.create)
# save the data stored in list into their directory
map2(df$upper, df$path, ~walk2(.x, .y, write.csv))
Say you have a number of objects in your R environment, such as:
a <- 4
b <- 3
c <- 2
aa <- 2
bb <- 6
cc <- 9
Now say you would like to remove the objects in your environment that are named with the letters 'a' or 'b'. This can be achieved with
rm(list = ls(pattern = "a"))
rm(list = ls(pattern = "b"))
However, imagine trying to solve this problem on a much bigger scale where you would like to remove all objects whose values appear in a list such as:
custom <- list("a", "b")
How do I apply this list as a 'looped' argument to the ls() function?
I have experimented with:
rm(lapply(custom, function(x) ls(pattern = x)))
But this does not seem to do anything.
This feels like quite a common problem, so I fear there is an answer to this issue elsewhere on stackoverflow. Unfortunately I could not find it.
Option 1: This is a vectorized approach. Paste the custom list together using an "or" regex separator (|) and pass it to pattern.
rm(list = ls(pattern = paste(custom, collapse = "|")))
Option 2: If you still want to use lapply(), you would have to include the envir argument in both ls() and rm() since lapply() itself forms a local environment. And you would also want to put rm() inside the lapply() call.
lapply(custom, function(x) {
e <- .GlobalEnv
rm(list = ls(pattern = x, envir = e), envir = e)
})
can be done with a loop easily enough
test.a <- 4
test.b <- 3
test.c <- 2
test.aa <- 2
test.bb <- 6
test.cc <- 9
custom <- c("test.a", "test.b")
for (x in custom) rm(list = ls(pattern = x))
NB I added test. to the start of the object names, to avoid messing with peoples environment if they run this code. We don't wnat to inadvertently delete peoples actual objects named a or b etc.
Perhaps grep will solve your problem:
rm(list=grep("^[ab]$", ls(), value=T))
will remove objects a and b, but nothing else.
rm(list=grep("^[ab]", ls(), value=T))
will remove objects a, aa, b, and bb, but leave c and cc in the environment.
I want to parse the read.table() function to a list of .txt files. These files are in my current directory.
my.txt.list <-
list("subject_test.txt", "subject_train.txt", "X_test.txt", "X_train.txt")
Before applying read.table() to elements of this list, I want to check if the dt has not been already computed and is in a cache directory. dt from cache directory are already in my environment(), in form of file_name.dt
R> ls()
"subject_test.dt" "subject_train.dt"
In this example, I only want to compute "X_test.txt" and "X_train.txt". I wrote a small function to test if dt has already been cached and apply read.table()in case not.
my.rt <- function(x,...){
# apply read.table to txt files if data table is not already cached
# x is a character vector
y <- strsplit(x,'.txt')
y <- paste(y,'.dt',sep = '')
if (y %in% ls() == FALSE){
rt <- read.table(x, header = F, sep = "", dec = '.')
}
}
This function works if I take one element this way :
subject_test.dt <- my.rt('subject_test.txt')
Now I want to sapply to my files list this way:
my.res <- saply(my.txt.list,my.rt)
I have my.resas a list of df, but the issue is the function compute all files and does take into account already computed files.
I must be missing something, but I can't see why.
TY for suggestions.
I think it has to do with the use of strsplit in your example. strsplit returns a list.
What about this?
my.txt.files <- c("subject_test.txt", "subject_train.txt", "X_test.txt", "X_train.txt")
> ls()
[1] "subject_test.dt" "subject_train.dt"
my.rt <- function(x){
y <- gsub(".txt", ".dt", x, fixed = T)
if (!(y %in% ls())) {
read.table(x, header = F, sep = "", dec = '.') }
}
my.res <- sapply(my.txt.files, FUN = my.rt)
Note that I'm replacing .txt with .dt and I'm doing a "not in". You will get NULL entries in the result list if a file is not processed.
This is untested, but I think it should work...
I am running the following code in order to open up a set of CSV files that have temperature vs. time data
temp = list.files(pattern="*.csv")
for (i in 1:length(temp))
{
assign(temp[i], read.csv(temp[i], header=FALSE, skip =20))
colnames(as.data.frame(temp[i])) <- c("Date","Unit","Temp")
}
the data in the data frames looks like this:
V1 V2 V3
1 6/30/13 10:00:01 AM C 32.5
2 6/30/13 10:20:01 AM C 32.5
3 6/30/13 10:40:01 AM C 33.5
4 6/30/13 11:00:01 AM C 34.5
5 6/30/13 11:20:01 AM C 37.0
6 6/30/13 11:40:01 AM C 35.5
I am just trying to assign column names but am getting the following error message:
Error in `colnames<-`(`*tmp*`, value = c("Date", "Unit", "Temp")) :
'names' attribute [3] must be the same length as the vector [1]
I think it may have something to do how my loop is reading the csv files. They are all stored in the same directory in R.
Thanks for your help!
I'd take a slightly different approach which might be more understandable:
temp = list.files(pattern="*.csv")
for (i in 1:length(temp))
{
tmp <- read.csv(temp[i], header=FALSE, skip =20)
colnames(tmp) <- c("Date","Unit","Temp")
# Now what do you want to do?
# For instance, use the file name as the name of a list element containing the data?
}
Update:
temp = list.files(pattern="*.csv")
stations <- vector("list", length(temp))
for (i in 1:length(temp)) {
tmp <- read.csv(temp[i], header=FALSE, skip =20)
colnames(tmp) <- c("Date","Unit","Temp")
stations[[i]] <- tmp
}
names(stations) <- temp # optional; could process file names too like using basename
station1 <- station[[1]] # etc station1 would be a data.frame
This 2nd part could be improved as well, depending upon how you plan to use the data, and how much of it there is. A good command to know is str(some object). It will really help you understand R's data structures.
Update #2:
Getting individual data frames into your workspace will be quite hard - someone more clever than I may know some tricks. Since you want to plot these, I'd first make names more like you want with:
names(stations) <- paste(basename(temp), 1:length(stations), sep = "_")
Then I would iterate over the list created above as follows, creating your plots as you go:
for (i in 1:length(stations)) {
tmp <- stations[[i]]
# tmp is a data frame with columns Date, Unit, Temp
# plot your data using the plot commands you like to use, for example
p <- qplot(x = Date, y = Temp, data = tmp, geom = "smooth", main = names(stations)[i])
print(p)
# this is approx code, you'll have to play with it, and watch out for Dates
# I recommend the package lubridate if you have any troubles parsing the dates
# qplot is in package ggplot2
}
And if you want to save them in a file, use this:
pdf("filename.pdf")
# then the plotting loop just above
dev.off()
A multipage pdf will be created. Good Luck!
It is usually not recommended practice to use the 'assign' statement in R. (I should really find some resources on why this is so.)
You can do what you are trying using a function like this:
read.a.file <- function (f, cnames, ...) {
my.df <- read.csv(f, ...)
colnames(my.df) <- cnames
## Here you can add more preprocessing of your files.
}
And loop over the list of files using this:
lapply(X=temp, FUN=read.a.file, cnames=c("Date", "Unit", "Temp"), skip=20, header=FALSE)
"read.csv" returns a data.frame so you don't need "as.data.frame" call;
You can use "col.names" argument to "read.csv" to assign column names;
I don't know what version of R you are using, but "colnames(as.data.frame(...)) <-" is just an incorrect call since it calls for "as.data.frame<-" function that does not exist, at least in version 2.14.
A short-term fix to your woes is the following, but you really need to read up more on using R as from what you did above I expect you'll get into another mess very quickly. Maybe start by never using assign.
lapply(list.files(pattern = "*.csv"), function (f) {
df = read.csv(f, header = F, skip = 20))
names(df) = c('Date', 'Unit', 'Temp')
df
}) -> your_list_of_data.frames
Although more likely you want this (edited to preserve file name info):
df = do.call(rbind,
lapply(list.files(pattern = "*.csv"), function(f)
cbind(f, read.csv(f, header = F, skip = 20))))
names(df) = c('Filename', 'Date', 'Unit', 'Temp')
At a glance it appears that you are missing a set of subset braces, [], around the elements of your temp list. Your attribute list has three elements but because you have temp[i] instead of temp[[i]] the for loop isn't actually accessing the elements of the list thus treating as an element of length one, as the error says.
The question says it all - I want to take a list object full of data.frames and write each data.frame to a separate .csv file where the name of the .csv file corresponds to the name of the list object.
Here's a reproducible example and the code I've written thus far.
df <- data.frame(
var1 = sample(1:10, 6, replace = TRUE)
, var2 = sample(LETTERS[1:2], 6, replace = TRUE)
, theday = c(1,1,2,2,3,3)
)
df.daily <- split(df, df$theday) #Split into separate days
lapply(df.daily, function(x){write.table(x, file = paste(names(x), ".csv", sep = ""), row.names = FALSE, sep = ",")})
And here is the top of the error message that R spits out
Error: Results must have one or more dimensions.
In addition: Warning messages:
1: In if (file == "") file <- stdout() else if (is.character(file)) { :
the condition has length > 1 and only the first element will be used
What am I missing here?
Try this:
sapply(names(df.daily),
function (x) write.table(df.daily[[x]], file=paste(x, "txt", sep=".") ) )
You should see the names ("1", "2", "3") spit out one by one, but the NULLs are the evidence that the side-effect of writing to disk files was done. (Edit: changed [] to [[]].)
You could use mapply:
mapply(
write.table,
x=df.daily, file=paste(names(df.daily), "txt", sep="."),
MoreArgs=list(row.names=FALSE, sep=",")
)
There is thread about similar problem on plyr mailing list.
A couple of things:
laply performs operations on a list. What you're looking for is d_ply. And you don't have to break it up by day, you can let plyr do that for you. Also, I would not use names(x) as that returns all of the column names of a data.frame.
d_ply(df, .(theday), function(x) write.csv(x, file=paste(x$theday,".csv",sep=""),row.names=F))