Use R switch for less than or greater than? - r

I've used switch for some easy conditionals where variables equal various values, but can't figure out how I would use it for less than or greater than conditionals such as
if (thedate >= as.Date("1981-01-20") & thedate < as.Date("1989-01-20")) {
thepres <- "Reagan"}
if (thedate >= as.Date("1989-01-20") & thedate < as.Date("1993-01-20")) {
thepres <- "George HW Bush"}
if (thedate >= as.Date("1993-01-20") & thedate < as.Date("2001-01-20")) {
thepres <- "Clinton"}
if (thedate >= as.Date("2001-01-01") & thedate < as.Date("2009-01-20")) {
thepres <- "George W Bush"}
if (thedate >= as.Date("2009-01-01")) {
thepres <- "Obama"}
(I know those should be nested ifelse statements but I find more than 3 or 4 difficult to code & follow).
Is there some way to use switch for situations like this, or do I have to go the nested ifelse route? (Or just leave it wildly inefficient like this)
Thanks.

The function cut is pretty good for situations like this. (I didn't include all of the presidents, but hopefully you get the idea)
thedate <- as.Date("1982-02-01")
thepresident <- cut(thedate,
c(as.Date("1981-01-20"), as.Date("1989-01-20"), as.Date("1993-01-20")),
labels=c("Reagan", "George HW Bush"), right=F)
Also, note that this returns a factor, so you may want to convert to a string.

Related

multiApply function for loop on a 3D array

I am trying to make my data processing more efficient for a spatial temperature data project. I have a for loop that will do what I want, but it is much too slow for processing multiple years of data. This loop looks at each spatial cell and, based on the 365 temperature values in that year, creates a value for the frequency, duration, number, and temp of heat events that will go into seperate 2d dataframes.
for (b in 1:299) { #longitude
for (c in 1:424) { #latitude
data <- year[b,c] #makes all temps into a vector
for (d in 2:364) {
if (data[d]>=Threshold & data[d+1]>=Threshold) {
frequencydf[b,c]=frequencydf[b,c]+1
tempsdf[b,c]=tempsdf[b,c]+data[d]
}else if (data[d-1]>=Threshold & data[d]>=Threshold & data[d+1]<Threshold) {
frequencydf[b,c]=frequencydf[b,c]+1
numberdf[b,c]=numberdf[b,c]+1
tempsdf[b,c]=tempsdf[b,c]+data[d]
}else {
frequencydf[b,c]=frequencydf[b,c]
numberdf[b,c]=numberdf[b,c]
tempsdf[b,c]=tempsdf[b,c]
}
}
durationdf[b,c]=frequencydf[b,c]/numberdf[b,c]
tempsdf[b,c]=tempsdfd[b,c]/frequencydf[b,c]
}
})
Therefore, I am trying to work with apply fuctions to speed up the process. I think I am running into issues when attempting to analyze each spacial cell by values in the 3rd (time) dimention in my array.
I am starting with the frequency parameter and trying to create the same data frame as above.
frequencylist <- Apply(year_array, fun = frequency.calc1, margins=c(1, 2))
frequencydf <- as.data.frame(frequencylist)
Using this function:
frequency.calc1 = function(cell) {
data <- as.vector(cell)
frequency <- 0
for (d in 2:364) {
if (data[d]>=Threshold & data[d+1]>=Threshold) {
frequency=frequency+1
}else if (data[d-1]>=Threshold & data[d]>=Threshold & data[d+1]<Threshold) {
frequency=frequency+1
}else {
frequency=frequency
}
return(frequency)
}
}
I am very new to creating functions and using the Apply function so any advice would be appreciated!
For-loops and *apply functions run about the same speed. Your problem is all those "if" s.
First of all, you have two separate conditions both of which lead to incrementing frequency. Figure out how to combine them. Next, remember that the R language is vectorized, so you don't need a loop at all. With a little careful thought, you can write a line something like
frequency <- sum(data[1:N-2] >=threshold & data[2:N-1] >=threshold & data[3:N<threshold)
I haven't checked all the ">" vs "<" but you get the idea.
As a side note, NEVER hard-code the range of a loop. You can start with "2" since your conditionals reference "d-1" but let the maximum value be defined as something like length(data) - 1
The solution used to simplify the process is shown below. Sum functions with conditionals were used in place of the if statements. This made the process incredibly efficient and did not use the apply function or an additional function.
for (b in 1:299) {
for (c in 1:424) {
data <- year[b,c]
N=length(data)
frequency[b,c] <- sum(data[1:N] >=Threshold & data[2:N] >=Threshold & data[3:N] <Threshold) + sum(data[1:N] >=Threshold & data[2:N] >=Threshold)
number[b,c] <- sum(data[1:N] >=Threshold & data[2:N] >=Threshold & data[3:N] <Threshold)
duration[b,c] <- frequency[b,c]/number[b,c]
temps[b,c] <- sum(data[data[1:N] >=Threshold & data[2:N] >=Threshold & data[3:N] <Threshold]) + sum(data[data[1:N] >=Threshold & data[2:N] >=Threshold])
temps[b,c] <- temps[b,c]/frequency[b,c]
}}
Thank you for your help #Carl Witthoft

How to make a for with more than one else if in R?

I want to associate points in these coordinates and associate them with the correspondent number. However every time I tried to run it I got this: the condition has length > 1
for (i in dados) {
if (dados[dados$latitude>="55.84" & dados$latitude<= "55.95" & dados$longitude>="-3.444" & dados$longitude<="-3.198", ]){
dados$neighbourhood <- 1
} else if (dados[dados$latitude>="55.84" & dados$latitude<= "55.95" & dados$longitude>="-3.198" & dados$longitude<="-3.058", ]){
dados$neighbourhood <- 2
} else if (dados[dados$latitude>="55.95" & dados$latitude<="56.01" & dados$longitude>="-3.444" & dados$longitude<="-3.198", ]){
dados$neighbourhood <- 3
} else if (dados[dados$latitude>="55.95" & dados$latitude<="56.01" & dados$longitude>="-3.189" & dados$longitude<="-3.058", ]){
dados$neigbourhod <- 4
}
}
Your help would be much appreciated thanks in advance.
Here is a vectorized way. Use the fact that binary numbers are written with 0/1 times powers of two. And add 1 to have one-based results.
i2 <- dados$latitude >= 55.95 & dados$latitude<= 56.01
i1 <- dados$longitude >= -3.198 & dados$longitude <= -3.058
dados$neigbourhod <- 1 + i1 + i2*2
Often, you need to execute some statements only when some condition is met. You can use following conditional statements in your code to do this.
if Statement: use it to execute a block of code, if a specified condition is true
else Statement: use it to execute a block of code, if the same condition is false
else if Statement: use it to specify a new condition to test, if the first condition is false
ifelse() Function: use it when to check the condition for every element of a vector

R: How to use current function output's within this function

I write a R function using if & else if in it. See the code below:
i_ti_12 <- function(x){
if (x <= 44)
{ti = exp(-13.2238 + 0.152568*x)}
else if (x >= 49)
{ti = -0.01245109 + 0.000315605*x}
else (x > 44 & x < 49)
{ti = (x-44)*(i_ti_12(49))/(49-44) + (49-x)*(i_ti_12(44))/(49-44)}
return(ti)
}
I want to use this function's output, i_ti_12(49) within this function, but the above code doesn't work. The output is:
> i_ti_12(49)
Error: C stack usage 7974292 is too close to the limit
The simple solution is just replace i_ti_12(49) by -0.01245109 + 0.000315605*49, but its not a clear way to solve it and might not work in complex cases.
So I really want to know and to learn if there are clear methods to do this? I mean, like above simple example, write a conditional function using one condition's output in this function. Any help is highly appreciate.
Your last else is followed by a condition (x > 44 & x < 49), which actually is not correct. If you have (x > 44 & x < 49) there, that means you will execute that statement, and ti = (x-44)*(i_ti_12(49))/(49-44) + (49-x)*(i_ti_12(44))/(49-44) is something independent with your if-else structure.
In that case, when you call i_ti_12(49), your function does not know when the recursion should be terminated since you did not define that case.
You can try the code below:
i_ti_12 <- function(x){
if (x <= 44)
{ti = exp(-13.2238 + 0.152568*x)}
else if (x >= 49)
{ti = -0.01245109 + 0.000315605*x}
else
{ti = (x-44)*(i_ti_12(49))/(49-44) + (49-x)*(i_ti_12(44))/(49-44)}
return(ti)
}
such that
> i_ti_12(49)
[1] 0.003013555

Null objects in list when splitting data with R

I'm new to R and got a assignment to do some basic research with the use of R
I have a csv file imported with data of wind direction and wind speed and want to split the wind speed based on direction
So i created this bit of R code
north.ls = list()
east.ls = list()
south.ls = list()
west.ls = list()
i = as.integer(1)
print("start")
for (i in 1:length(DD)) {
if (DD[i] >=315 & DD[i] <= 360 | DD[i] >= 1 & DD < 45) {
north.ls[[i]] = as.integer(FH[i])
print("nord")
}
if(DD[i] >=45 & DD[i] < 135){
east.ls[[i]] = as.integer(FH[i])
print("east")
}
if(DD[[i]] >= 145 & DD[i] < 225){
south.ls[[i]] = as.integer(FH[i])
print("south")
}
if(DD[[i]] >=225 & DD[i] < 315){
west.ls[[i]] = as.integer(FH[i])
print("west")
}
}
this works fine at puts the right speeds in the right lists but every time the condition is not correct the list still gets a null value so I have a lot of null values in the lists. What is the problem and how can I fix it?
I hope you understand my explanation
thanks in advance
When you create a new item on a list at position [i] without items in previous positions, all those positions get NULLs.
Here's a slightly better way of producing what you're trying to do (I'm making some educated guesses about your data structure and your goals), without introducing these NULLs:
north.ls<-FH[(DD>=315 & DD <= 360) | (DD >= 1 & DD < 45)]
east.ls<-FH[DD>=45 & DD < 135]
south.ls<-FH[DD>=135 & DD < 235]
west.ls<-FH[DD>=235 & DD < 315]
This will give you four vectors that divide the data in FH into north, east, south, and west based on the data in DD. The length of each of the four lists is NOT equal to the length of FH or DD (or each other), and there should be no NULLs introduced unless they're already in FH.

How to get equivalent to R ifelse functionality, using logical operators

Please can anyone advise how I can turn the following statement into one that will do the same thing but NOT using ifelse please?
<-ifelse(y>=50, 0.2*x+0.8*y, ifelse(y<50 & x>70, y+10, ifelse(y<50 & x<70, y)))
x=80
y=60
So I the final code should give an answer of 64 - selecting the first condition. I will then test it to ensure the other 3 conditions give the correct result for varying values of x and y
Thanks a lot.
This should work:
finalmark <- (x * 0.2 + y * 0.8) * (y >= 50) + (y + 10 * (x > 70)) * (y < 50)
Something like this?
if(y>=50){
0.2*x+0.8*y
}else{
if(y<50 & x>70){
y+10
}else{
if(y<50 & x<70){
y
}else{
"OMG I did not expect this scenario"
}
}
}
try: y=45; x=70 to see why I have the last condition.
If y is a number then, once you've tested for y > = 50 then y must be less than 50 so don't keep testing for that. Similarly, once you've found x > 70 then you don't need the last ifelse. You don't have a return for x = 70. My guess is that you want to test for a <= or >= situation there.
ifelse(y>=50, 0.2*x+0.8*y, ifelse(x>70, y+10, y))
in scalar that's
if(y >= 50){
0.2*x+0.8*y
}else if(x > 70){
y+10
}else y
Given you seem to be having a hard time in general writing the logic I suggest you post a more complete question. It's possible (probable) that you're doing something here that you really don't want to do.
There are several approaches you can take. Below are a few examples of building a function 'f', so that 'f(x,y)' meets your criteria listed in the question using logic other than 'ifelse' statements.
Note: I'm also adding in one amendment to the original post, since 'x=70' would break the logic. I'm adding 'x>=70' to the second criterion.
Option 1: Use a standard 'if / else if / else' logic block. Personally, I like this option, because it's easily readable.
f <- function(x, y){
if (y>= 50){
return(0.2*x+0.8*y)
} else if (y < 50 & x >= 70){
return(y+10)
} else {
return(y)
}
}
Option 2: Combine your two logical tests (there are really only two) into a string, and use a switch. Note that the final and unnamed option is treated as an 'else'.
f <- function(x, y){
return(
switch(paste(x >= 70, y >= 50, sep=""),
TRUEFALSE = y + 10,
FALSEFALSE = y,
0.2*x+0.8*y
)
)
}
Option 3: Order your 'if' statements to reduce logical comparisons. This is the sort of thing to do if you have a large data set or very limited memory. This is slightly harder to troubleshoot, since you have to read the whole block to fully understand it. Option 1 is better if you don't have memory or cycle limitations.
f <- function(x, y){
if (y >= 50){
return(0.2*x+0.8*y)
} else {
if (x >=70){
return(y+10)
} else {
return(y)
}
}
}
There are other options, but these are the simplest that come readily to mind.

Resources