Loop in R using changing variable to write and name files - r

I am trying to create a loop in R that reads daily values of a netcdf file I have imported and converts them into annual sums, then creates a raster for each year. I have converted the netcdf into an array - this is named Biased_corrected.array in my code below. I am not sure how to include the variable 'year' in my file names as it changes with each iteration of the loop. I have tried using paste but this seems to be where it fails. Any suggestions?
# read in file specifying which days correspond to years
YearsDays <- read.csv("Data\\Years.csv") # a df with 49 obs. of 3 variables (year, start day, and end day
YearsDays[1,2:3] #returns 1 and 366 (the days for year 1972)
YearsDays[2,2:3] #returns 367 and 731 (the days for year 1973)
YearsDays[1,1] #returns 1972
YearsDays[2,1] #returns 1973
counter <- 1
startyear <- YearsDays[1,1]
year <- startyear
while(year < 2021){
#set variables to loop through
startday <- YearsDays[counter,2]
endday <- YearsDays[counter,3]
BC_rain.slice <- Biased_corrected.array[,,startday:endday]
paste(year, "_Annual_rain") <- apply(BC_rain.slice, c(1,2), sum)
#save data in a raster
paste(year, "_rain_r") <- raster(t(paste(year, "_Annual_Rain"), xmn=min(x), xmx=max(x), ymn=min(y), ymx=max(y), crs=WGS84)
# move on to next year
counter <- counter + 1
year <- 1971 + counter
}
EDIT: The working code for anyone interested:
YearsDays <- read.csv("Data\\Years.csv") # a df with 49 obs. of 3 variables (year, start day, and end day
for (idx in seq(nrow(YearsDays))){
#set variables to loop through
year <- YearsDays[idx,1]
startday <- YearsDays[idx,2]
endday <- YearsDays[idx,3]
BC_rain.slice <- Biased_corrected.array[,,startday:endday]
assign(paste(year, "_Annual_rain"),apply(BC_rain.slice, c(1,2), sum))
annual_rain <- apply(BC_rain.slice, c(1,2), sum)
#save data in a raster
assign(paste(year, "_rain_r"),raster(t(annual_rain), xmn=min(x), xmx=max(x), ymn=min(y), ymx=max(y), crs=WGS84))
}

You can't use paste to create a variable name as you've listed. You can enclose it within assign or eval, however it may be easier to instead store your results within a data frame. Below is an example of what I believe you're trying to achieve. I have also replaced your while loop and counter with a for loop iterating over years:
YearsDays <- read.csv("Data\\Years.csv") # a df with 49 obs. of 3 variables (year, start day, and end day
output <- data.frame(year = YearsDays[,1], rain_r = NA)
for (idx in seq(nrow(YearsDays))){
#set variables to loop through
year <- YearsDays[idx,1]
startday <- YearsDays[idx,2]
endday <- YearsDays[idx,3]
BC_rain.slice <- Biased_corrected.array[,,startday:endday]
annual_rain <- apply(BC_rain.slice, c(1,2), sum)
#save data in a raster
output$rain_r[output$year == year] <- raster(t(annual_rain, xmn=min(x), xmx=max(x), ymn=min(y), ymx=max(y), crs=WGS84))
}

How about to replace your part
paste(year, "_Annual_rain") <- apply(BC_rain.slice, c(1,2), sum)
#save data in a raster
paste(year, "_rain_r") <- raster(t(paste(year, "_Annual_Rain"), xmn=min(x), xmx=max(x), ymn=min(y), ymx=max(y), crs=WGS84)
to
txt <- paste0(year, "_Annual_rain <- apply(BC_rain.slice, c(1,2), sum)")
eval(parse(text = txt))
# save data in a raster
txt <- paste0(year, "_rain_r <- raster(t(", year, "_Annual_Rain), xmn=min(x), xmx=max(x), ymn=min(y), ymx=max(y), crs=WGS84)")
eval(parse(text = txt))

Related

Plotting multiple graphs from a list

I have a time series data from 1990 to 1994 of 5 variables in 15 sheets. I read all these data to a list. I need to do a time series plot of all the 5 Variables for 15 companies in multiple graphs. How can this be done? I mean I basically need 5 figures each containing the time series plot of 15 companies of the respective variables.
With package ggplot2 this can be done as follows. I assume you have a list of 15 dataframes named df_list.
First, rbind them together with the name of the company as a new column. The companies' names are in this fake data case stored as the df's names.
all_df <- lapply(names(df_list), function(x){
DF <- df_list[[x]]
DF$Company <- x
DF
})
all_df <- do.call(rbind, all_df)
Then, reshape from wide to long format.
long_df <- reshape2::melt(all_df, id.vars = c("Company", "Date"))
Now, plot them. The graphs can be customized at will, there are many posts on it.
library(ggplot2)
ggplot(long_df, aes(x = Date, y = value, colour = Company)) +
geom_line() +
facet_wrap(~ variable)
Data creation code.
set.seed(1234)
Dates <- seq(as.Date("1990-01-01"), as.Date("1994-12-31"), by = "month")
n <- length(Dates)
df_list <- lapply(1:15, function(i){
tmp <- matrix(rnorm(5*n), ncol = 5)
tmp <- apply(tmp, 2, cumsum)
colnames(tmp) <- paste0("Var", 1:5)
tmp <- as.data.frame(tmp)
tmp$Date <- Dates
tmp
})
names(df_list) <- paste("Company", seq_along(df_list), sep = ".")

Changing dataframe column names in R groups at a time

Suppose I have a data frame (DF) that looks like the following:
test <- c('Test1','Test2','Test3')
col.DF.names < c('ID', 'year', 'car', 'age', 'year.1', 'car.1', 'age.1', 'year.2', 'car.2', 'age.2')
ID <- c('A','B','C')
year <- c(2001,2002,2003)
car <- c('acura','benz','lexus')
age <- c(55,16,20)
year.1 <- c(2011,2012,2013)
car.1 <- c('honda','gm','bmw')
age.1 <- c(43,21,34)
year.2 <- c(1961,1962,1963)
car.2 <- c('toyota','porsche','jeep')
age.2 <- c(33,56,42)
DF <- data.frame(ID, year, car, age, year.1, car.1, age.1, year.2, car.2, age.2)
I need the columns of data frame to lose the ".#" and instead have the Test# in front of it, so it looks something like this:
ID Test1.year Test1.car Test1.age Test2.year Test2.car Test2.age Test3.year Test3.car Test3.age
.... with all the data
Does anyone have a suggestion? Basically, starting at the second column, I"d like to add the test[1] name for 3 columns, and then move to the next set of three columns and add test[2] and so on..
I know how to hard code it:
colnames(DF)[2:4] <- paste(test[1], colnames(DF)[2:4], sep = ".")
but this is a toy set, and I would like to somewhat automate it, so I'm not specifically indicating[2:4] for example.
You could try:
colnames(DF)[-1] <- paste(sapply(test, rep, 3), colnames(DF)[-1], sep = ".")
or perhaps the following would be better:
colnames(DF)[-1] <- paste(sapply(test, rep, 3), colnames(DF)[2:4], sep = ".")
or:
colnames(DF)[-1] <- paste(rep(test, each=3), colnames(DF)[2:4], sep = ".")
thanks to #thelatemail

Not able to append the return of a function when I read csv files using a for loop.

The issue i am facing is I am getting individual lists for each .csv i read and it is not appending the result to a single dataframe or list. I am very new to R. Please help me out.
I am getting output as
amazon.csv 10.07
facebook.csv 54.67
Whereas i am expecting all the values in a data frame with column company and CAGR values.
enter code here
preprocess <- function(x){
##flipping data to suit time series analysis
my.data <- x[nrow(x):1,]
#sort(x,)
## setting up date as date format
my.data$date <- as.Date(my.data$date)
##creating a new data frame to sort the data.
sorted.data <- my.data[order(my.data$date),]
#removing the last row as it contains stocks price at moment when i downloaded data
#sorted.data <- sorted.data[-nrow(sorted.data),]
#calculating lenght of the data frame
data.length <- length(sorted.data$date)
## extracting the first date
time.min <- sorted.data$date[1]
##extracting the last date
time.max <- sorted.data$date[data.length]
# creating a new data frame with all the dates in sequence
all.dates <- seq(time.min, time.max, by="day")
all.dates.frame <- data.frame(list(date=all.dates))
#Merging all dates data frame and sorted data frame; all the empty cells are assigned NA vales
merged.data <- merge(all.dates.frame, sorted.data, all=T)
##Replacing all NA values with the values of the rows of previous day
final.data <- transform(merged.data, close = na.locf(close), open = na.locf(open), volume = na.locf(volume), high = na.locf(high), low =na.locf(low))
# write.csv(final.data, file = "C:/Users/rites/Downloads/stock prices", row.names = FALSE)
#
#return(final.data) #--> ##{remove comment for Code Check}
################################################################
######calculation of CAGR(Compound Annual Growth Rate ) #######
#### {((latest price/Oldest price)^1/#ofyears) - 1}*100 ########
################################################################
##Extracting closing price of the oldest date
old_closing_price <- final.data$close[1]
##extracting the closing price of the latest date
new_closing_price <- final.data$close[length(final.data$close)]
##extracting the starting year
start_date <- final.data$date[1]
start_year <- as.numeric(format(start_date, "%Y"))
##extracting the latest date
end_date <- final.data$date[length(final.data$date)]
end_year <- as.numeric(format(end_date, "%Y"))
CAGR_1 <- new_closing_price/old_closing_price
root <- 1/(end_year-start_year)
CAGR <- (((CAGR_1)^(root))-1)*100
return (CAGR)
}
temp = list.files(pattern="*.csv")
for (i in 1:length(temp))
assign(temp[i], preprocess (read.csv(temp[i])))
you need to create an empty data frame and append to this in the loop. You're using assign at the moment which creates variables, not in a data frame. try something like:
df<-data.frame()
for(i in 1:length(temp)){
preproc <- preprocess(read.csv(temp[i])))
df<-rbind(df,data.frame(company = paste0(temp[i]),
value = preproc))
}

Call a function for the number of items in a data frame and merge the results together

I have a function called getWeatherForMonth that takes a start date and end date and returns as data frame of the result for each month. I have another method getWeatherForRange that takes a data frame of ranges. I need to call getWeatherForMonth for each row in the "dates" and combine the results into one data frame. I was using mapply like below but it's not combining the resulting data frames.
library(RJSONIO)
getWeatherForMonth <- function(start.date, end.date) {
url <- "http://api.worldweatheronline.com/premium/v1/past-weather.ashx?key=PUT-YOUR-KEY-HERE&q=London&format=json&date=%s&enddate=%e&tp=24"
url <- gsub("%s", start.date, url)
url <- url <- gsub("%e", end.date, url)
data <- fromJSON(url)
weather <- data$data$weather
GMT <- sapply(weather, function(x){as.character(x[1])})
Max.TemperatureC <- sapply(weather, function(x){as.numeric(x[3])})
Min.TemperatureC <- sapply(weather, function(x){as.numeric(x[4])})
Wind.SpeedKm.h <- sapply(weather, function(x){as.numeric(x$hourly[[1]]$windspeedKmph[1])})
Precipitationmm <- sapply(weather, function(x){as.numeric(x$hourly[[1]]$precipMM[1])})
DewPointC <-sapply(weather, function(x){as.numeric(x$hourly[[1]]$DewPointC[1])})
Wind.Chill <-sapply(weather, function(x){as.numeric(x$hourly[[1]]$WindChillC[1])})
Cloud.Cover <-sapply(weather, function(x){as.numeric(x$hourly[[1]]$cloudcover[1])})
Description <-sapply(weather, function(x){as.character(x$hourly[[1]]$weatherDesc[1])})
Humidity <- sapply(weather, function(x){as.numeric(x$hourly[[1]]$humidity[1])})
Feels.LikeC <- sapply(weather, function(x){as.numeric(x$hourly[[1]]$FeelsLikeC[1])})
df <- data.frame(GMT, Max.TemperatureC, Min.TemperatureC, Wind.SpeedKm.h, Precipitationmm, DewPointC, Wind.Chill, Cloud.Cover, Description, Humidity, Feels.LikeC)
return(df)
}
getWeatherForRange <- function(dates) {
df <- mapply(getWeatherForMonth, dates$start.date, dates$end.date)
return(df)
}
start.date <- seq(as.Date("2015-01-01"), length=12, by="1 month")
end.date <- seq(as.Date("2015-02-01"),length=12,by="months") - 1
dates.2015 <- data.frame(start.date, end.date)
data <- getWeatherForRange(dates)
View(data)
The output looks like this
Screenshot of the current output
Consider using Map(). Specifically, in your getWeatherForRange function, use Map() which is actually a wrapper for the non-simplified version of mapply(), equivalent to mapply(..., SIMPLIFY=FALSE). By default, mapply() returns a vector, matrix, or higher dimensional array. But you require a dataframe (i.e., a list object) return.
This updated function will return a list of dataframes that you can then later run a do.call(rbind, ...), assuming all columns are consistent in each df, to stack all dfs together for a final dataframe.
getWeatherForRange <- function(dates) {
# EQUIVALENT LINES
dfList <- Map(getWeatherForMonth, dates$start.date, dates$end.date)
# dfList <- mapply(getWeatherForMonth, dates$start.date, dates$end.date, SIMPLIFY = FALSE)
return(dfList)
}
start.date <- seq(as.Date("2015-01-01"), length=12, by="1 month")
end.date <- seq(as.Date("2015-02-01"), length=12, by="months") - 1
dates <- data.frame(start.date, end.date)
datalist <- getWeatherForRange(dates) # DATAFRAME LIST
data <- do.call(rbind, datalist) # FINAL DATA FRAME

Creating a sequence of columns in a data frame based on an index for loop or using plyr in r

I wish to create 24 hourly data frames in which each data.frame contains hourly demand for a product as 1 column, and the next 8 columns contain hourly temperatures. For example, for the data.frame for 8am, the data.frame will contain a column for demand at 8am, then eight columns for temperature ranging from the most current hour to the 7 past hours. The additional complication is that for hours before 8AM i.e. "4AM", I have to get yesterday's temperatures. I am hitting my head against the wall trying to figure out how to do this with apply or plyr, or a vectorized function.
demand8AM Temp8AM Temp7AM Temp6AM...Temp1AM
Demand4AM Temp4AM Temp3AM Temp2AM Temp1AM Temp12AM Temp11pm(Lag) Temp10pm(Lag)
In my code Hours are numbers; 1 is 12AM etc.
Here is some simple code I created to create the dataset I am dealing with.
#Creating some Fake Data
require(plyr)
# setting up some fake data
set.seed(31)
foo <- function(myHour, myDate){
rlnorm(1, meanlog=0,sdlog=1)*(myHour) + (150*myDate)
}
Hour <- 1:24
Day <-1:90
dates <-seq(as.Date("2012-01-01"), as.Date("2012-3-30"), by = "day")
myData <- expand.grid( Day, Hour)
names(myData) <- c("Date","Hour")
myData$Temperature <- apply(myData, 1, function(x) foo(x[2], x[1]))
myData$Date <-dates
myData$Demand <-(rnorm(1,mean = 0, sd=1)+.75*myData$Temperature )
## ok, done with the fake data generation.
It looks as though you could benefit from utilizing a time series. Here's my interpretation of what you want (I used the "mean" function in rollapply), not what you asked for. I recommend you read over the xts and zoo packages.
#create dummy time vector
time_index <- seq(from = as.POSIXct("2012-05-15 07:00"),
to = as.POSIXct("2012-05-17 18:00"), by = "hour")
#create dummy demand and temp.C
info <- data.frame(demand = sample(1:length(time_index), replace = T),
temp.C = sample (1:10))
#turn demand + temp.C into time series
eventdata <- xts(info, order.by = time_index)
x2 <- eventdata$temp.C
for (i in 1:8) {x2 <- cbind(x2, lag(eventdata$temp.C, i))}

Resources