I am trying to save streamflow data from USGS using the data Retrieval package of R. It was working until now, but I am not what I changed that it is not working anymore, this is my code:
siteNumber <- c("094985005","09498501","09489500","09489499","09498502","09511300","09498400","09498500","09489700")
i <- 1
n <- length(siteNumber)
for (i in n) {
Daily_Streamflow <- readNWISdv(siteNumber[i],parameterCd="00060", statCd="00003", "","")
name <- paste("DSF", siteNumber[i], sep = "_")
assign(name, value = Daily_Streamflow)
i <- i + 1
}
Now is saving only as data frame the data for the last station. Does someone know what I am doing wrong?
Read ?for. A for() loop iterates over a sequence. You do not need to explicitly increment the index (that is how e.g. while() loops)
for (i in 1:n) {
Daily_Streamflow <- readNWISdv(siteNumber[i],parameterCd="00060", statCd="00003", "","")
name <- paste("DSF", siteNumber[i], sep = "_")
assign(name, value = Daily_Streamflow)
}
Related
I am trying to create an efficient code that opens data files containing a list, extracts one element within the list, stores it in a data frame and then deletes this object before opening the next one.
My idea is doing this using loops. Unfortunately, I am quite new in learning how to do this using loops, and don't know how write the code.
I have managed to open the data-sets using the following code:
for(i in 1995:2015){
objects = paste("C:/Users/...",i,"agg.rda", sep=" ")
load(objects)
}
The problem is that each data-set is extremely large and R cannot open all of them at once. Therefore, I am now trying to extract an element within each list called: tab_<<i value >>_agg[["A"]] (for example tab_1995_agg[["A"]]), then delete the object and iterate over each i (which are different years).
I have tried using the following code but it does not work
for(i in unique(1995:2015)){
objects = paste("C:/Users/...",i,"agg.rda", sep=" ")
load(objects)
tmp = cat("tab",i,"_agg[[\"A\"]]" , sep = "")
y <- rbind(y, tmp)
rm(list=objects)
}
I apologize for any silly mistake (or question) and greatly appreciate any help.
Here’s a possible solution using a function to rename the object you’re loading in. I got loadRData from here. The loadRData function makes this a bit more approachable because you can load in the object with a different name.
Create some data for a reproducible example.
tab2000_agg <-
list(
A = 1:5,
b = 6:10
)
tab2001_agg <-
list(
A = 1:5,
d = 6:10
)
save(tab2000_agg, file = "2000_agg.rda")
save(tab2001_agg, file = "2001_agg.rda")
rm(tab2000_agg, tab2001_agg)
Using your loop idea.
loadRData <- function(fileName){
load(fileName)
get(ls()[ls() != "fileName"])
}
y <- list()
for(i in 2000:2001){
objects <- paste("", i, "_agg.rda", sep="")
data_list <- loadRData(objects)
tmp <- data_list[["A"]]
y[[i]] <- tmp
rm(data_list)
}
y <- do.call(rbind, y)
You could also turn it into a function rather than use a loop.
getElement <- function(year){
objects <- paste0("", year, "_agg.rda")
data_list <- loadRData(objects)
tmp <- data_list[["A"]]
return(tmp)
}
y <- lapply(2000:2001, getElement)
y <- do.call(rbind, y)
Created on 2022-01-14 by the reprex package (v2.0.1)
I'm trying to find the number of complete rows (no NA values) in a bunch of CSV files. Every time I add a new line to my data frame using my for loop though, it outputs how I created that new line above the row. How do I stop this from happening/delete the repetitive label?
I have tried using removeWords and stop words.
complete <- function(directory, site.id = 1:332) {
for (i in site.id) {
path <- paste(getwd(), "/", directory, "/", sprintf("%03d", i), ".csv", sep = "")
dat <- read.csv(path)
DF <- data.frame(sum(!complete.cases(dat)), row.names = i)
print(DF)
}
}
I want the results to look like this:
1 1344
2 2611
3 1948
But they inevitably end up looking like this:
sum..complete.cases.dat..
1 1344
sum..complete.cases.dat..
2 2611
sum..complete.cases.dat..
3 1948
You need to initialize the dataframe outside the loop. By doing it inside the loop you are creating a dataframe that only exists for that iteration of the loop doesn't store the values together, permanently.
df <- data.frame(id = c())
Then when you add each element, direct it to the index of first column and ith row. The row names will automatically count up.
df[i,1] <- sum(!complete.cases(dat))
So it would look like:
df[i,1] <- sum(!complete.cases(dat))
complete <- function(directory, site.id = 1:332) {
for (i in site.id) {
path <- paste(getwd(), "/", directory, "/", sprintf("%03d", i), ".csv", sep = "")
dat <- read.csv(path)
df[i,1] <- sum(!complete.cases(dat))
print(DF)
}
}
I am trying to let user define how many drugs' data user want to upload for specific therapy. Based on that number my function want to let user select data for that many drugs and store them using variables e.g. drug_1_data, drug_2_data, etc.
I have wrote a code but it doesn't work
Could someone please help
no_drugs <- readline("how many drugs for this therapy? Ans:")
i=0
while(i < no_drugs) {
i <- i+1
caption_to_add <- paste("drug",i, sep = "_")
mydata <- choose.files( caption = caption_to_add) # caption describes data for which drug
file_name <- noquote(paste("drug", i, "data", sep = "_")) # to create variable that will save uploaded .csv file
file_name <- read.csv(mydata[i],header=TRUE, sep = "\t")
}
In your example, mydata is a one element string, so subsets with i bigger than 1 will return NA. Furthermore, in your first assignment of file_name you set it to a non-quoted character vector but then overwrite it with data (and in every iteration of the loop you lose the data you created in the previous step). I think what you wanted was something more in the line of:
file_name <- paste("drug", i, "data", sep = "_")
assign(file_name, read.delim(mydata, header=TRUE)
# I changed the function to read.delim since the separator is a tab
However, I would also recommend to think about putting all the data in a list (it might be easier to apply operations to multiple drug dataframes like that), using something like this:
n_drugs <- as.numeric(readline("how many drugs for this therapy? Ans:"))
drugs <- vector("list", n_drugs)
for(i in 1:n_drugs) {
caption_to_add <- paste("drug",i, sep = "_")
mydata <- choose.files( caption = caption_to_add)
drugs[i] <- read.delim(mydata,header=TRUE)
}
I'm trying to save a data frame after every iteration of this loop, while appending the data frame with the loop number. So, I'll be left with 5 data frames all with different names.
In my actual code, all the data frames will be different but for simplicity I've just shown one data frame here.
I've supplied some test code below.
testFunction <- function() {
for (i in 1:5) {
x <- data.frame(c(1:10), c(1,2,3,4,5,6,7,8,9,10), c(10:19))
name <- paste("name", i, sep = "_")
name <- x
}
}
The example data frames created would be named:
testFunction()
name_1
name_2
name_3
name_4
name_5
However, I'm only getting the final data frame "name_5" to save when the loop completes. My issue is I don't know how to save the ith version of the data frame without escaping from the loop.
Any suggestions on how I can solve this?
***** EDIT *****
I have my for loop inside a function, which might be why assign() is not working. I've appended my example above to show this.
Inside your loop, use assign():
for (i in 1:5) {
x <- data.frame(c(1:10), c(1,2,3,4,5,6,7,8,9,10), c(10:19))
assign( paste("name", i, sep = "_") , x)
}
Edit:
As you now want to do this in a function, you would have to specify the environment to assign to. I suspect you want the global environment:
testFunction <- function() {
for (i in 1:5) {
x <- data.frame(c(1:10), c(1,2,3,4,5,6,7,8,9,10), c(10:19))
assign( paste("name", i, sep = "_") , x , envir = globalenv() )
}
}
Please be warned that it is not good practice to write a function that edits the enclosing environment. You'd be better off just returning a named list of your data frames, e.g. like so:
testFunction_2 <- function() {
out_list <- vector(mode = "list", length = 5)
for (i in 1:5) {
x <- data.frame(c(1:10), c(1,2,3,4,5,6,7,8,9,10), c(10:19))
out_list[[i]] <- x
names(out_list)[i] <- paste("name", i, sep = "_")
}
return(out_list)
}
I have different dataframes and what I want to do is:
apply a function repeated times to each dataframe
save results of each repetition on a new dataframe keeping the name of the original dataframes and adding something else to differentiate it
Here is what I have tried until now
# read all files to list
dataframes <- dir( pattern = ".txt")
list_dataframes <- llply(dataframes, read.csv, header = T, sep =" ", dec=".", na.string = "nd")
n <- length(dataframes)
# apply myfunction 10 times
for (j in 1:10){
modified_list <- llply(list_dataframes, myfunction)
}
if (j <10){
num.char <- paste("n0", j, sep="")
} else num.char <- paste("n", j, sep="")
# save back data frames
for (i in 1:n)
write.table(file = paste( "newfile/_modified",num.char, ".csv", sep = ""),
modified_list[i], row.names = F)
What I want as a result is the modified dataframes (in this case the 10 repetitions for each df of the list)that will have:
the name of the original df
the new name
and the number of iteration
Something likeoriginaldfname_newname_n0
I can not find where I'm missing up. Any help will be deeply appreciated
Two major issues, I think:
the } (line 9 above) should be after your second for loop;
your last line should probably reference modified_list[[i]] instead of using the single-[ notation.
So your code should work (untested, slightly modified for style) as:
library(plyr)
# read all files to list
dataframes <- dir(pattern = ".txt")
list_dataframes <- llply(dataframes, read.csv,
header = T, sep = " ", dec=".", na.string = "nd")
n <- length(dataframes)
# apply myfunction 10 times
for (j in 1:10) {
modified_list <- llply(list_dataframes, myfunction)
# save back data frames
for (i in 1:n)
write.table(file = sprintf("newfile/%s_newname_%02d.csv", dataframes[i], j),
modified_list[[i]], row.names = FALSE)
}
If this were code golf, the last portion could be reduced a little with:
for (j in 1:10) {
mapply(function(df, nm) write.csv(file = sprintf('newfile/%s_newname_%02d.csv', nm, j),
df, row.names = FALSE),
llply(list_dataframes, myfunction), dataframes)
}
(This doesn't necessarily make it perfectly clearer, but it does reduce things a bit. Use it if you at some point prefer to not use for loops, though the performance in this case will be almost identical.)
Note:
Please include required libraries, e.g., library(plyr).
Though lapply would have worked just fine, I kept the use of llply to match your example.