Darksky api in R loop not working - r

I am a huge R fan, but it never seems to work out for me, I am trying to use an API to get weather data, but I cannot write the loop. I have all the codes in the right format, but when I import the file into r, the cells appear like
-33.86659241, 151.2081909, \"2014-10-01T02:00:00"\
and this is preventing me from running the code. So rather than using a loop I need to use a mailmerge to create 5000 lines of code. Any help would be really appreciated.
tmp <- get_forecast_for(-33.86659241, 151.2081909, "2014-10-01T02:00:00", add_headers=TRUE)
fdf <- as.data.frame(tmp)
fdf$ID <- "R_3nNli1Hj2mlvFVo"
fd <- rbind(fd,fdf)
Here is the code with loop -
df <- read.csv("~/Machine Learning/Darksky.csv", header=T,sep=",", fill = TRUE)
for(i in 1:length(df$DarkSky)){
fdf <- get_forecast_for(df$LocationLatitude[i], df$LocationLongitude[i], df$DarkSky[i], add_headers=TRUE)
fdf <- as.data.frame(fdf)
fdf <- fdf[1:2,]
fd <- rbind(fd,fdf)
}
I also wanted to rbind the retreived data onto a dataframe but it does not work. I also wanted to cbind the identifier, which would be the value in df$DarkSky[i], but it will not work.
CSV -
LocationLatitude LocationLongitude DarkSky
-33.86659241 151.2081909 "2014-10-01T02:00:00"
The get_forecast_for function takes three parameters, the latitude, longitude and the date, structured as above, I have the loop working for latitude and longitude, but the time/date is not working.

Related

R Text data scraper loop through dates

I'm doing a little project where there goal is to retrieve data in text format from a website. (http://regsho.finra.org/regsho-Index.html)
The website was nice enough to provide it online but they sorted the data over several days in different links
I thought about looping through the dates and store the data with the following code:
#Download the needed data
my_data <- c()
for (i in 01:13){
my_data <- read.delim(sprintf("http://regsho.finra.org/CNMSshvol202005%i.txt", i), header=TRUE, sep="|")
}
head(my_data)
The problem here is that in line
for (i in 01:13){ # The date in the website is 01-02-03 and the loop seems to ommit the 0
I've used the sprintf() method so I can have a variable in a string.
and this line the empty variable my_data always seems to be overwritten by the last data downloaded.
my_data <- read.delim(sprintf("http://regsho.finra.org/CNMSshvol202005%i.txt", i), header=TRUE, sep="|")
# the empty variable my_data always seems to be overwritten by the last data downloaded.
Could somebody reassure me if i'm going in the right direction because i'm starting to doubt myself here
Any help would be greatly appreciated!
Thanks in advance
This should give you a leading 0 without using an extra package:
sprintf("%02d", i)
i.e.
sprintf("http://regsho.finra.org/CNMSshvol202005%02d.txt", i)

Saving output of for-loop for every iteration

I am currently working on an imputation project where I need to evaluate my methods of imputation. I have my incomplete dataframe with NAs from which I calculate the missing rate for every column/variable. My second data frame contains the complete cases which I extracted from the first data frame. I now want to simulate the missingness structure of the real data in the frame containing the complete cases. the data frame with the generated NAs get stored in the object "result" as you can see in the code. If I now want to replicate this code and thus generate 100 different data frames like "result", how do I replicate and save them separately?
I'm a beginner and would be really thankful for your answers!
I tried to put my loop which generates the NAs in another loop which contains the replicate() command and counts from 1:100 and saves these 100 replicated data frames but it didn't work at all.
result = data.frame(res0=rep(NA, dim(comp_cas)[1]))
for (i in 1:length(Z32_miss_item$miss_per_item)) {
dat = comp_cas[,i]
missRate = Z32_miss_item$miss_per_item[i]
cat (i, " ", paste0(dat, collapse=",") ," ", missRate, "!\n")
df <- data.frame("res"= GenMiss(x=dat, missrate = missRate), stringsAsFactors = FALSE)
colnames(df) = gsub("res", paste0("Var", i), colnames(df))
result = cbind(result, df)
}
result = result[,-1]
I expect that every data frame of the 100 runs get saved in a separate .rda file in my project folder.
also, is imputation and the evaluation of fitness of the latter beginner stuff in r or at what level of proficiency am I if you take a look at the code that I posted?
It is difficult to guess what exactly you are doing without some dummy data. But it is fine to have loops within loops and to save data.frames. Firstly, I would avoid the replicate function here as it has a strange syntax and just stick with plain loops. Secondly, you must make sure that the loops have different indexes (i.e. for(i ... should be surrounded by, say, for(j ... since functions can loop outside their scope in R. Finally, use saveRDS rather than save, as you can then have each object (data.frame) saved in separate .rds files. The save function is designed for saving your whole workspace so that you can pick up where you left off.
fun <- function(i){
df <- data.frame(x=rnorm(5))
names(df) <- paste0("x",i)
df
}
for(j in 1:100){
res <- data.frame(id=1:5)
for(i in 1:10){
res <- cbind(res, fun(i))
}
saveRDS(res, sprintf("replication_%s.rds",j))
}

How to write a loop for creating cropped raster for every id of a shapefile with a raster base?

I'm still new to R and don't know how to create a loop for my workprocess to make it more efficient.
I have a Digital Elevation Model (raster Barrow_5m.tif), a shapefile for lakes and buffer with 10 iDs in a row of the table each.
In the script below I created a new raster file for all values of the lake and the buffer shape file with the data from the DEM raster. This works fine.
setwd("...")
Barrow_5m <- raster("Barrow_5m.tif")
Barrow_DTLB <- st_read("Barrow_DTLB.shp")
Barrow_DTLB_Buffer <- st_read("Barrow_DTLB_BufferOUT.shp")
Barrow_lake <- crop(Barrow_5m, extent(Barrow_DTLB))
raster_lake <- rasterize(Barrow_DTLB, Barrow_lake, mask = TRUE)
Barrow_buffer <- crop(Barrow_2m, extent(Barrow_DTLB_Buffer))
raster_buffer <- rasterize(Barrow_DTLB_Buffer, Barrow_buffer, mask = TRUE)
writeRaster(raster_lake, "raster_lake.tif")
writeRaster(raster_buffer, "raster_buffer.tif")
But now I want to have a raster file for every id of the lake and the buffer shapefile seperately, so 2x10 files.
I thought it's best to write a loop for this, but my skills are not enough so far to do this.
Also other questions didn't bring the solution so far. I tried to help me with this.
Alternatively I could use my end product tif from the script above and undo this in files for every ID.
I want to write the loop and not do it by hand for all the IDs of the shapefiles, because afterwards I am going to do the same with an even bigger shapefile of more values.
I found a solution now, by extracting data by the ID.
It creates a largelist with 11 elements and all values of each id, which is sufficient for my further work. You can also directly creat the mean, max, min, etc values of each element (so each ID).
k <- Barrow_DTLB$ID #k= number of rows
LakesA <- extract(raster_lakeA, Barrow_DTLB[k, ])
LakesA_mean <- extract(raster_lakeA, Barrow_DTLB[k, ], fun=mean)
Maybe this solution is also helpful for a few, who already viewed the question.
I think this should work:
for (i in unique(raster_lake)){
r <- raster_lake
r[!(values(r) == i)] <- NA
r <- trim(r)
writeRaster(r, paste0("raster_lake_", i, ".tif"))
}

R - Creating subsets of several datasets in a loop

I have a quite big number of quite heavy datasets. I would like to extract a subset out of each of them and save it into different csv files (one for each dataset). These are the commands I would like to loop for all the files I have in the folder:
df <-read.csv("1985.csv",header=FALSE,stringsAsFactors=TRUE,sep="\t")
df_short <- df[df$V6=="OPP", ]
write.csv(df_short, file = "OPP_1985.csv",row.names=FALSE)
rm(df)
rm(df_short)
This is probably a very noob question, but I am struggling to understand how to do it, so I would appreciate a lot help with this!
EDIT:
Following #SimonShine's suggestion, I have run this code and it works!
You don't specify if you are trying to collect the subsets into one dataset, or if you are trying to make one file per subset. You refer to OPP_1985 that appears out of scope for the code you wrote. Did you mean to refer to df_short?
You could start by abstracting what you want to do with one datafile into a function, e.g.:
extract_and_save_from_dataset <- function(csvfile) {
df <- read.csv(csvfile, header=F, stringsAsFactors=T, sep="\t")
df_short <- df[df$V6 == "OPP",]
csvfile_short <- gsub(".csv", "_short.csv", csvfile)
write.csv(df_short, file=csvfile_short, row_names=F)
}
Assuming you have a collection of dataset filenames, you could apply this function multiple times:
# csvfiles <- c("OPP_1985.csv", "OPP_1986.csv", ...)
csvfiles <- list.files("/path/to/my/csvfiles")
for (csvfile in csvfiles) {
extract_and_save_from_dataset(csvfile)
}
The data.table approach is probably the fastest option, specially if you have a large dataset. The function fwrite{data.table} works in parallel using many CPUS, making it extremely fast.
Here is how you can divide your original data according to subgroups defined based on the values of df$V6 and save each subset into a separate .csv file.
library (data.table)
set(df)[, fwrite(.SD, paste0("output_", V6,".csv")), by = V6, .SDcols=names(df) ]
ps. The name of the files will be output_*.csv where * is the correspondent V6 value.

Create several data.frames via a for loop and name them accordingly

I want to apply a for-loop to every element of a list (station code of air quality stations) and create a single data.frame for each station with specific data.
My current code looks like this:
for (i in Stations))
{i_PM <- data.frame(PM2.5$DateTime,PM2.5$i)
colnames(i_PM)[1] <- "DateTime"
i_AOT <- subset(MOD2011, MOD2011$Station_ID==i)
i <- merge(i_PM, i_AOT, by="DateTime")}
Stations consists of 28 elements. The result should be a data.frame for every station with the colums DateTime, PM2.5 and several elements from MOD2011.
I just dont get it running as its supposed to be. Im sure its my fault, I couldnt find the specific answer via the internet.
Can you show me my mistake?
Try assign:
for (i in Stations)) {
dat <- data.frame(PM2.5$DateTime,PM2.5$i)
dat2 <- subset(MOD2011, MOD2011$Station_ID==i)
colnames(i_PM)[1] <- "DateTime"
assign(paste(i, "_PM", sep=""), dat)
assign(paste(i, "_AOT", sep=""), dat2)
assign(i, merge(dat, dat2, by="DateTime"))
}
Note, however, that this is bad coding practice. You should reconsider your algorithm. For instance, use a list instead.

Resources