I am trying to make my data processing more efficient for a spatial temperature data project. I have a for loop that will do what I want, but it is much too slow for processing multiple years of data. This loop looks at each spatial cell and, based on the 365 temperature values in that year, creates a value for the frequency, duration, number, and temp of heat events that will go into seperate 2d dataframes.
for (b in 1:299) { #longitude
for (c in 1:424) { #latitude
data <- year[b,c] #makes all temps into a vector
for (d in 2:364) {
if (data[d]>=Threshold & data[d+1]>=Threshold) {
frequencydf[b,c]=frequencydf[b,c]+1
tempsdf[b,c]=tempsdf[b,c]+data[d]
}else if (data[d-1]>=Threshold & data[d]>=Threshold & data[d+1]<Threshold) {
frequencydf[b,c]=frequencydf[b,c]+1
numberdf[b,c]=numberdf[b,c]+1
tempsdf[b,c]=tempsdf[b,c]+data[d]
}else {
frequencydf[b,c]=frequencydf[b,c]
numberdf[b,c]=numberdf[b,c]
tempsdf[b,c]=tempsdf[b,c]
}
}
durationdf[b,c]=frequencydf[b,c]/numberdf[b,c]
tempsdf[b,c]=tempsdfd[b,c]/frequencydf[b,c]
}
})
Therefore, I am trying to work with apply fuctions to speed up the process. I think I am running into issues when attempting to analyze each spacial cell by values in the 3rd (time) dimention in my array.
I am starting with the frequency parameter and trying to create the same data frame as above.
frequencylist <- Apply(year_array, fun = frequency.calc1, margins=c(1, 2))
frequencydf <- as.data.frame(frequencylist)
Using this function:
frequency.calc1 = function(cell) {
data <- as.vector(cell)
frequency <- 0
for (d in 2:364) {
if (data[d]>=Threshold & data[d+1]>=Threshold) {
frequency=frequency+1
}else if (data[d-1]>=Threshold & data[d]>=Threshold & data[d+1]<Threshold) {
frequency=frequency+1
}else {
frequency=frequency
}
return(frequency)
}
}
I am very new to creating functions and using the Apply function so any advice would be appreciated!
For-loops and *apply functions run about the same speed. Your problem is all those "if" s.
First of all, you have two separate conditions both of which lead to incrementing frequency. Figure out how to combine them. Next, remember that the R language is vectorized, so you don't need a loop at all. With a little careful thought, you can write a line something like
frequency <- sum(data[1:N-2] >=threshold & data[2:N-1] >=threshold & data[3:N<threshold)
I haven't checked all the ">" vs "<" but you get the idea.
As a side note, NEVER hard-code the range of a loop. You can start with "2" since your conditionals reference "d-1" but let the maximum value be defined as something like length(data) - 1
The solution used to simplify the process is shown below. Sum functions with conditionals were used in place of the if statements. This made the process incredibly efficient and did not use the apply function or an additional function.
for (b in 1:299) {
for (c in 1:424) {
data <- year[b,c]
N=length(data)
frequency[b,c] <- sum(data[1:N] >=Threshold & data[2:N] >=Threshold & data[3:N] <Threshold) + sum(data[1:N] >=Threshold & data[2:N] >=Threshold)
number[b,c] <- sum(data[1:N] >=Threshold & data[2:N] >=Threshold & data[3:N] <Threshold)
duration[b,c] <- frequency[b,c]/number[b,c]
temps[b,c] <- sum(data[data[1:N] >=Threshold & data[2:N] >=Threshold & data[3:N] <Threshold]) + sum(data[data[1:N] >=Threshold & data[2:N] >=Threshold])
temps[b,c] <- temps[b,c]/frequency[b,c]
}}
Thank you for your help #Carl Witthoft
Related
I am trying to define a function with a for loop and inside a conditional in R studio. Yesterday I was able with the help of another thread to devise this piece of code. The problem is that I want to sum the vector elements ma for any possible x, so that is inside the function l. This is a simpler case which I am trying to solve to adapt the original model. However, I do not know how to proceed.
ma<-rep(0,20)
l <- function(x, ma) {
for(i in seq_along(ma)) {
if(i %% 2 == 1) {
ma[i] <- i + x
} else {
ma[i] <- 0
}
}
return(ma)
}
My problem is that I would like to have the sum of i+x+0+i+x... for any possible x. I mean a function of the kind for any possible x.
Question:
Can someone explain to me how to implement such a function in R?
Thanks in advance!
I am going to update the original function:
Theta_alpha_s<-function(s,alpha,t,Basis){
for (i in seq_along(Basis)){
if(i%% 2==1) {Basis[i]=s*i^{-alpha-0.5}*sqrt(2)*cos(2*pi*i*t)}
else{Basis[i]=s*i^{-alpha-0.5}*sqrt(2)*sin(2*pi*i*t)}
}
return(Basis)
}
If you don't want to change the values in Basis, you can create a new vector in the function (here result) that you will return:
l = function(s,alpha,t,Basis){
is.odd = which(Basis %% 2 == 1)
not.odd = which(Basis %% 2 == 0)
result = rep(NA, length(Basis))
result[is.odd] = s*is.odd^{-alpha-0.5}*sqrt(2)*cos(2*pi*is.odd*t)
result[not.odd] = s*not.odd^{-alpha-0.5}*sqrt(2)*sin(2*pi*not.odd*t)
#return(result)
return(c(sum(result[is.odd]), sum(result[not.odd])))
}
I'm new to R and got a assignment to do some basic research with the use of R
I have a csv file imported with data of wind direction and wind speed and want to split the wind speed based on direction
So i created this bit of R code
north.ls = list()
east.ls = list()
south.ls = list()
west.ls = list()
i = as.integer(1)
print("start")
for (i in 1:length(DD)) {
if (DD[i] >=315 & DD[i] <= 360 | DD[i] >= 1 & DD < 45) {
north.ls[[i]] = as.integer(FH[i])
print("nord")
}
if(DD[i] >=45 & DD[i] < 135){
east.ls[[i]] = as.integer(FH[i])
print("east")
}
if(DD[[i]] >= 145 & DD[i] < 225){
south.ls[[i]] = as.integer(FH[i])
print("south")
}
if(DD[[i]] >=225 & DD[i] < 315){
west.ls[[i]] = as.integer(FH[i])
print("west")
}
}
this works fine at puts the right speeds in the right lists but every time the condition is not correct the list still gets a null value so I have a lot of null values in the lists. What is the problem and how can I fix it?
I hope you understand my explanation
thanks in advance
When you create a new item on a list at position [i] without items in previous positions, all those positions get NULLs.
Here's a slightly better way of producing what you're trying to do (I'm making some educated guesses about your data structure and your goals), without introducing these NULLs:
north.ls<-FH[(DD>=315 & DD <= 360) | (DD >= 1 & DD < 45)]
east.ls<-FH[DD>=45 & DD < 135]
south.ls<-FH[DD>=135 & DD < 235]
west.ls<-FH[DD>=235 & DD < 315]
This will give you four vectors that divide the data in FH into north, east, south, and west based on the data in DD. The length of each of the four lists is NOT equal to the length of FH or DD (or each other), and there should be no NULLs introduced unless they're already in FH.
I've used switch for some easy conditionals where variables equal various values, but can't figure out how I would use it for less than or greater than conditionals such as
if (thedate >= as.Date("1981-01-20") & thedate < as.Date("1989-01-20")) {
thepres <- "Reagan"}
if (thedate >= as.Date("1989-01-20") & thedate < as.Date("1993-01-20")) {
thepres <- "George HW Bush"}
if (thedate >= as.Date("1993-01-20") & thedate < as.Date("2001-01-20")) {
thepres <- "Clinton"}
if (thedate >= as.Date("2001-01-01") & thedate < as.Date("2009-01-20")) {
thepres <- "George W Bush"}
if (thedate >= as.Date("2009-01-01")) {
thepres <- "Obama"}
(I know those should be nested ifelse statements but I find more than 3 or 4 difficult to code & follow).
Is there some way to use switch for situations like this, or do I have to go the nested ifelse route? (Or just leave it wildly inefficient like this)
Thanks.
The function cut is pretty good for situations like this. (I didn't include all of the presidents, but hopefully you get the idea)
thedate <- as.Date("1982-02-01")
thepresident <- cut(thedate,
c(as.Date("1981-01-20"), as.Date("1989-01-20"), as.Date("1993-01-20")),
labels=c("Reagan", "George HW Bush"), right=F)
Also, note that this returns a factor, so you may want to convert to a string.
Background
I'm developing a function that takes in a value for w between 1 and 3 and returns n values from one of 3 distributions.
The problem I am having is when n or w are not of length 1. So I've added 2 parameters nIsList and wIsList to create the functionality I want. The way I want this to work is as follows:
(Works as needed)
If nIsList ex( c(1,2,3) ) return a list equivalent to running consume(w,1), consume(w,2), consume(w,3)
(Works as needed)
If wIsList ex( c(1,2,3) ) return a list equivalent to running consume(1,n), consume(2,n), consume(3,n)
(Doesn't work as needed)
If nIsList ex(1,2,3) and wIsList ex(1,2,3)
return a list equivalent to running consume(1,1), consume(2,2), consume(3,3). Instead, I get a list equivalent to running [consume(1,1), consume(1,2), consume(1,3)], [consume(2,1), consume(2,2), consume(2,3)], [consume(3,1),consume(3,2), consume(3,3)]
I understand why I am getting the results I am getting. I just can't seem to figure out how to get the result I want. (As explained above)
Question
I want the function to provide a list for each element in w and n that is consume(w[i], n[i]) when wIsList & nIsList are True. Is there a way to do that using lapply?
The code:
library("triangle")
consume <- function(w, n=1, nIsList=F, wIsList=F){
if(!nIsList & !wIsList){
if(w==1){
return(rtriangle(n,0.3,0.8))
}else if(w==2){
return(rtriangle(n,0.7,1))
}else if(w==3){
return(rtriangle(n,0.9,2,1.3))
}
}
else if(nIsList & !wIsList){
return(sapply(n, consume, w=w))
}
else if(nIsList & wIsList){
return(lapply(n, consume, w=w, wIsList=T))
}
else if(!nIsList & wIsList){
return(lapply(w, consume, n))
}
}
Note: I am having trouble summarizing this question. If you have any suggestions for renaming it please let me know and I will do so.
Thanks to JPC's comment, using mapply does the trick. The new code is as follows:
consume <- function(w, n=1){
nIsList <- length(n) > 1 # Change based on JPC's second comment
wIsList <- length(w) > 1 # Change based on JPC's second comment
if(!nIsList & !wIsList){
if(w==1){
return(rtriangle(n,0.3,0.8))
}else if(w==2){
return(rtriangle(n,0.7,1))
}else if(w==3){
return(rtriangle(n,0.9,2,1.3))
}
}
else if(nIsList & !wIsList){
return(sapply(n, consume, w=w))
}
else if(nIsList & wIsList){
return(mapply(consume,w,n)) ## Updated portion
}
else if(!nIsList & wIsList){
return(lapply(w, consume, n))
}
}
Please can anyone advise how I can turn the following statement into one that will do the same thing but NOT using ifelse please?
<-ifelse(y>=50, 0.2*x+0.8*y, ifelse(y<50 & x>70, y+10, ifelse(y<50 & x<70, y)))
x=80
y=60
So I the final code should give an answer of 64 - selecting the first condition. I will then test it to ensure the other 3 conditions give the correct result for varying values of x and y
Thanks a lot.
This should work:
finalmark <- (x * 0.2 + y * 0.8) * (y >= 50) + (y + 10 * (x > 70)) * (y < 50)
Something like this?
if(y>=50){
0.2*x+0.8*y
}else{
if(y<50 & x>70){
y+10
}else{
if(y<50 & x<70){
y
}else{
"OMG I did not expect this scenario"
}
}
}
try: y=45; x=70 to see why I have the last condition.
If y is a number then, once you've tested for y > = 50 then y must be less than 50 so don't keep testing for that. Similarly, once you've found x > 70 then you don't need the last ifelse. You don't have a return for x = 70. My guess is that you want to test for a <= or >= situation there.
ifelse(y>=50, 0.2*x+0.8*y, ifelse(x>70, y+10, y))
in scalar that's
if(y >= 50){
0.2*x+0.8*y
}else if(x > 70){
y+10
}else y
Given you seem to be having a hard time in general writing the logic I suggest you post a more complete question. It's possible (probable) that you're doing something here that you really don't want to do.
There are several approaches you can take. Below are a few examples of building a function 'f', so that 'f(x,y)' meets your criteria listed in the question using logic other than 'ifelse' statements.
Note: I'm also adding in one amendment to the original post, since 'x=70' would break the logic. I'm adding 'x>=70' to the second criterion.
Option 1: Use a standard 'if / else if / else' logic block. Personally, I like this option, because it's easily readable.
f <- function(x, y){
if (y>= 50){
return(0.2*x+0.8*y)
} else if (y < 50 & x >= 70){
return(y+10)
} else {
return(y)
}
}
Option 2: Combine your two logical tests (there are really only two) into a string, and use a switch. Note that the final and unnamed option is treated as an 'else'.
f <- function(x, y){
return(
switch(paste(x >= 70, y >= 50, sep=""),
TRUEFALSE = y + 10,
FALSEFALSE = y,
0.2*x+0.8*y
)
)
}
Option 3: Order your 'if' statements to reduce logical comparisons. This is the sort of thing to do if you have a large data set or very limited memory. This is slightly harder to troubleshoot, since you have to read the whole block to fully understand it. Option 1 is better if you don't have memory or cycle limitations.
f <- function(x, y){
if (y >= 50){
return(0.2*x+0.8*y)
} else {
if (x >=70){
return(y+10)
} else {
return(y)
}
}
}
There are other options, but these are the simplest that come readily to mind.