for loop with mutate only outputs last result - r

I'm sorry if it's a duplicate, and for the lack of reproducibility, I'd have to link you the files.
What I'm trying to do is this:
I have a data frame with coordinates and names, let's say
df <- tribble(
~Species, ~lat, ~lon,
"a",42.92991, 11.875801,
"b",42.92991, 11.875801,
"c",43.91278, 3.513611,
"d",43.60851, 3.871755,
"e",39.24373, 9.120478
)
I also have a folder with tifrasters, such as
files <- list.files(path="~/world/", pattern="*.tif$", full.name=TRUE, all.files=TRUE)
Now for each iteration I'd like to:
create a new column on the data frame with the file name
insert in that column the extracted value for the corresponding lat and lon
I've tried using this for loop, and while on paper looks just fine, I don't understand why it outputs to funvar the last result only. I't like it overwrites the result instead of appending it.
If I use a similar loop with mutate and simpler objects, it appends them, so I'm not sure what the problem could be
for(i in files){
fraster<- raster(i)
fname<-gsub(".*//|[.].*", "", i)
funvar<-dplyr::mutate(fundata, !!fname:= raster::extract(fraster, coordinates(data.frame(lat,lon))))
}
Thanks!

The way I solved it is a bit of an hack, but works. I explicitly assign the new column to a data frame, like this.
I'm still notsure why mutate doesn't do that by itself
for(i in files){
fraster<- raster(i)
fname<-gsub(".*//|[.].*", "", i)
funvar<-dplyr::mutate(fundata, !!fname:= raster::extract(fraster, coordinates(data.frame(lat,lon))))
fundata[fname] <- funvar[[fname]]
}

From the info you provide I cannot tell if this will work, but normally you would make a RasterStack and avoid the loop.
library(raster)
# NOTE the order of lon, lat`
xy <- cbind(lon, lat)
s <- stack(files)
e <- raster::extract(s, xy)
If that is not possible, you can do something like this
fundata <- data.frame(xy)
for (f in files){
fraster<- raster(f)
fname <- gsub(".tif$", "", basename(f))
fundata[[fname]] <- raster::extract(fraster, xy)
}

Related

I want to create a data.frame with the values that I print of this loop in R

When I run this Loop I can print the results and I want to create a data frame with this data but I cant. Until now I have this:
filenames <- list.files(path=getwd())
numfiles <- length(filenames)
for (i in 1:numfiles) {
file <- read.table(filenames[i],header = TRUE)
ts = subset(file, file$name == "plantNutrientUptake")
tss = subset (ts, ts$path == "//plants/nitrate")
tssc = tss[,2:3]
d40 = tssc[41,2]
print(d40)
print(filenames[i])
}
This is not the most efficient way to do this, but it takes advantage of what code you've already written. First, you'll create an empty data frame with the columns you want, but filled with NA. Then, in each iteration of the loop, you'll fill one row of the data frame.
filenames <- list.files(path=getwd())
numfiles <- length(filenames)
# Create an empty data.frame
df <- data.frame(filename = rep(NA, numfiles), d40 = rep(NA, numfiles))
for (i in 1:numfiles){
file <- read.table(filenames[i],header = TRUE)
ts = subset(file, file$name == "plantNutrientUptake")
tss = subset (ts, ts$path == "//plants/nitrate")
tssc = tss[,2:3]
d40 = tssc[41,2]
# Fill row i of the data frame
df[i,"filename"] = filenames[i]
df[i,"d40"] = d40
}
Hope that does it! Good luck :)
There are a lot of ways to do what you are asking. Also, without a reproducible example it is difficult to validate that code will run. I couldn't tell what type of data was in each of your variable so I just guessed that they were mostly characters with one numeric. You'll need to change the code if that's not true.
The following method is using base R (no other packages). It builds off of what you have done. There are other ways to do this using map, do.call, or apply. But it's important to be able to run through a loop.
As someone commented, your code is just re-writing itself every loop. Luckily you have the variable i that you can use to specify where things go.
filenames <- list.files(path=getwd())
numfiles <- length(filenames)
# Declare an empty dataframe for efficiency purposes
df <- data.frame(
ts = rep(NA_character_,numfiles),
tss = rep(NA_character_,numfiles),
tssc = rep(NA_character_,numfiles),
d40 = rep(NA_real_,numfiles),
stringsAsFactors = FALSE
)
# Loop through the files and fill in the data
for (i in 1:numfiles){
file <- read.table(filenames[i],header = TRUE)
df$ts[i] <- subset(file, file$name == "plantNutrientUptake")
df$tss[i] <- subset (ts, ts$path == "//plants/nitrate")
df$tssc[i] <- tss[,2:3]
df$d40[i] <- tssc[41,2]
print(d40)
print(filenames[i])
}
You'll notice a few things about this code that are extra.
First, I'm declaring the variable type for each column explicitly. You can use rep(NA,numfiles) but that leave R to guess what the column should be. This may not be a problem for you if all of your variables are obviously of the same type. But imagine you have a variable a = c("1","A","B") of all characters. R will go through the first iteration of the loop and guess that the column is numeric. Then on the second run of the loop will crash when it runs into a character.
Next, I'm declaring the entire dataframe before entering the loop. When people tell you that loops in [modern] R are slow it is often because you are re-allocating memory every loop. By declaring the entire dataframe up front you speed up the loop significantly. This also allows you to reference any cell in the dataframe...which is exactly what you want to do in the loop.
Finally, I'm using the $ syntax to make things clear. Writing df[i,"d40"] <- d40 is the same as writing df$d40[i] <- d40. I just think it is clear to use the second method. This is a matter of personal preference.

Reading nodes from multiple html and storing result as a vector

I have a list of locally saved html files. I want to extract multiple nodes from each html and save the results in a vector. Afterwards, I would like to combine them in a dataframe. Now, I have a piece of code for 1 node, which works (see below), but it seems quite long and inefficient if I apply it for ~ 20 variables. Also, something really strange with the saving to vector (XXX_name) it starts with the last observation and then continues with the first, second, .... Do you have any suggestions for simplifying the code/ making it more efficient?
# Extracts name variable and stores in a vector
XXX_name <- c()
for (i in 1:216) {
XXX_name <- c(XXX_name, name)
mydata <- read_html(files[i], encoding = "latin-1")
reads_name <- html_nodes(mydata, 'h1')
name <- html_text(reads_name)
#print(i)
#print(name)
}
Many thanks!
You can put the workings inside a function then apply that function to each of your variables with map
First, create the function:
read_names <- function(var, node) {
mydata <- read_html(files[var], encoding = "latin-1")
reads_name <- html_nodes(mydata, node)
name <- html_text(reads_name)
}
Then we create a df with all possible combinations of inputs and apply the function to that
library(tidyverse)
inputs <- crossing(var = 1:216, node = vector_of_nodes)
output <- map2(inputs$var, inputs$node, read_names)

R - SpatialPointsDataFrame from a list of SpatialPoints

How to create a SpatialPointsDataFrame from a list of SpatialPoints?
In the following there´s a code of a list containing SpatialPoints:.
SP1 <- SpatialPoints(cbind(1,5))
SP2 <- SpatialPoints(cbind(2,4))
SP3 <- SpatialPoints(cbind(3,3))
SP.l<-list(SP1,SP2, SP3)
What I´m looking for is a way to extract the SpatialPoints from the list and create a SpatialPointsDataFrame out of it.
With the following code I can get single SpatialPoints out of the list:
coords_3 = SP.l[[3]]#coords
data_3 = as.data.frame(SP.l[[3]])
SPDF_3 <- SpatialPointsDataFrame(coords=coords_3, data=as.data.frame(data_3))
However I´d like receive all at once.
Maybe something like:
SP <- SpatialPoints(lapply(1:length(lidR.clip.SP.l), function(i) {
...
EDIT:
what was missing was:
SP.l <- do.call("rbind", SP.l)
That´s what I was actually looking for.
thx!
Since there is no minimal working example as hrbrmstr. You need to provide one. For now, I use a sample data from the GISTools package and demonstrate one way. There is a data set call newhaven in the package. breach is the data. I made a copy of it and created foo, which class is SpatialPoints. I created two list elements using foo.
Using your code, I looped through each list element and converted SpatialPoints to SpatialPointsDataFrame. I hope you can figure out how to apply the following code to your case.
library(GISTools)
data(newhaven)
foo <- breach
mylist <- list(foo1 = breach[1:10, ],
foo2 = breach[11:20, ])
lapply(1:length(mylist), function(x){
SpatialPointsDataFrame(coords = mylist[[x]]#coords,
data = as.data.frame(mylist[[x]]))
})
If you want to bind all SPDFs, then you can try the following.
do.call(rbind, lapply(1:length(mylist), function(x){
SpatialPointsDataFrame(coords = mylist[[x]]#coords,
data = as.data.frame(mylist[[x]]))
})
)
what was missing was:
SP.l <- do.call("rbind", SP.l)
That´s what I was actually looking for.
thx!

R: save each loop result into one data frame

I have written a loop in R (still learning). My purpose is to pick the max AvgConc and max Roll_TotDep from each looping file, and then have two data frames that each contains all the max numbers picked from individual files. The code I wrote only save the last iteration results (for only one single file)... Can someone point me a right direction to revise my code, so I can append the result of each new iteration with previous ones? Thanks!
data.folder <- "D:\\20150804"
files <- list.files(path=data.folder)
for (i in 1:length(files)) {
sub <- read.table(file.path(data.folder, files[i]), header=T)
max1Conc <- sub[which.max(sub$AvgConc),]
maxETD <- sub[which.max(sub$Roll_TotDep),]
write.csv(max1Conc, file= "max1Conc.csv", append=TRUE)
write.csv(maxETD, file= "maxETD.csv", append=TRUE)
}
The problem is that max1Conc and maxETD are not lists data.frames or vectors (or other types of object capable of storing more than one value).
To fix this:
maxETD<-vector()
max1Conc<-vector()
for (i in 1:length(files)) {
sub <- read.table(file.path(data.folder, files[i]), header=T)
max1Conc <- append(max1Conc,sub[which.max(sub$AvgConc),])
maxETD <- append(maxETD,sub[which.max(sub$Roll_TotDep),])
write.csv(max1Conc, file= "max1Conc.csv", append=TRUE)
write.csv(maxETD, file= "maxETD.csv", append=TRUE)
}
The difference here is that I made the two variables you wish to write out empty vectors (max1Conc and maxETD), and then used the append command to add each successive value to the vectors.
There are more idiomatic R ways of accomplishing your goal; personally, I suggest you look into learning the apply family of functions. (http://adv-r.had.co.nz/Functionals.html)
I can't directly test the whole thing because I don't have a directory with files like yours, but I tested the parts, and I think this should work as an apply-driven alternative. It starts with a pair of functions, one to ingest a file from your directory and other to make a row out of the two max values from each of those files:
library(dplyr)
data.folder <- "D:\\20150804"
getfile <- function(filename) {
sub <- read.table(file.path(data.folder, filename), header=TRUE)
return(sub)
}
getmaxes <- function(df) {
rowi <- data.frame(AvConc.max = max(df[,"AvConc"]), ETD.max = max(df[,"ETD"]))
return(rowi)
}
Then it uses a couple of rounds of lapply --- embedded in piping courtesy ofdplyr --- to a) build a list with each data set as an item, b) build a second list of one-row data frames with the maxes from each item in the first list, c) rbind those rows into one big data frame, d) and then cbind the filenames to that data frame for reference.
dfmax <- lapply(as.list(list.files(path = data.folder)), getfiles) %>%
lapply(., getmaxes) %>%
Reduce(function(...) rbind(...), .) %>%
data.frame(file = list.files(path = data.folder), .)

Create several data.frames via a for loop and name them accordingly

I want to apply a for-loop to every element of a list (station code of air quality stations) and create a single data.frame for each station with specific data.
My current code looks like this:
for (i in Stations))
{i_PM <- data.frame(PM2.5$DateTime,PM2.5$i)
colnames(i_PM)[1] <- "DateTime"
i_AOT <- subset(MOD2011, MOD2011$Station_ID==i)
i <- merge(i_PM, i_AOT, by="DateTime")}
Stations consists of 28 elements. The result should be a data.frame for every station with the colums DateTime, PM2.5 and several elements from MOD2011.
I just dont get it running as its supposed to be. Im sure its my fault, I couldnt find the specific answer via the internet.
Can you show me my mistake?
Try assign:
for (i in Stations)) {
dat <- data.frame(PM2.5$DateTime,PM2.5$i)
dat2 <- subset(MOD2011, MOD2011$Station_ID==i)
colnames(i_PM)[1] <- "DateTime"
assign(paste(i, "_PM", sep=""), dat)
assign(paste(i, "_AOT", sep=""), dat2)
assign(i, merge(dat, dat2, by="DateTime"))
}
Note, however, that this is bad coding practice. You should reconsider your algorithm. For instance, use a list instead.

Resources