Argument length zero if statement - r

I have a dataframe named flow with over 17,000 entries which contains daily water quality days for about 50 years. I have a column that has the jday (day of the year) of each entry but now I want to assign each entry a season from 1 to 4 (winter, spring, fall, summer). This is what I have so far:
> for(i in flow){
+ if (flow$jdays[i] <= 80 | flow$jdays[i]>355){
+ flow$season [i] <- 1
+ } else if (flow$jdays [i] > 80 & flow$jdays [i]<= 172){
flow$season [i] <- 2
+ }
+ else if(flow$jdays [i] > 172 & flow$jdays [i]<= 264){
+ flow$season [i] <- 3
+ }
+ else{
+ flow$season [i] <- 4
+ }
+ }
I keep getting the following message:
Error in if (flow$jdays[i] <= 80 | flow$jdays[i] > 355) { :
argument is of length zero

this may be better approach,
flow$season<-ifelse(flow$jdays<=80 | flow$jdays>355 ,1,
ifelse(flow$jdays<=172,2,
ifelse(flow$jdays<=264,3,4)))

This is in error:
for(i in flow){
Change to:
for(in in seq(nrow(flow))){

A vectorized solution using ifelse:
transform(flow, season=
ifelse (jdays <= 80 | jdays>355, 1,
ifelse(jdays <= 172,2,
ifelse(jdays <= 264, 3, 4))))

Related

How can you check whether a sequence is an 'almost increasing sequence' in R?

A sequence (e.g. c(1,2,3,4)) is almost increasing when we can remove exactly one element from the sequence and get a strictly increasing sequence (i.e. a0 < a1 < ... < an). I'm trying to find a way to check whether a sequence is almost increasing. If it is, I want to return TRUE; if it isn't I want to output FALSE. I've got this far:
solution <- function(sequence) {
sequence1 <- unlist(sequence)
if (length(sequence1) == 1) {
next
}
count <- 0
for (i in (length(sequence1) - 1)) {
if (sequence1[i + 1] > sequence1[i]) {
next
} else if (((sequence1[i + 2] > sequence1[i]) & count == 0) & i !=
length(sequence1)-1) {
sequence1 <- sequence1[- (i + 1)]
count <- count + 1
} else if ((sequence1[i + 1] > sequence1[i - 1]) & count == 0 & i != 1) {
sequence1 <- sequence1[-i]
count <- count + 1
} else {
return(FALSE)
}
}
return(TRUE)
}
I've used unlist() because codesignal, for some reason, doesn't accept you to refer to the function argument within the function. This works for some sequences: solution(c(4,1,5)) correctly returns TRUE. It doesn't work for others: solution(c(1, 1, 1, 2, 3)) incorrectly returns TRUE. solution(c(2,1,2,1)) correctly returns FALSE and yet solution(c(1,2,1,2)) incorrectly returns TRUE. I've lost my grip on what's going on. I wonder if anyone can spot anything?
Clarification: the basic idea of my code is to iterate through the sequence and for each element check whether its right neighbour is a bigger number. If it isn't, then we have two options: get rid of i or get rid of i+1, so I check those in turn. Since we can only make one change, i've added the condition that if count is 1, then we skip to finish. Also, if the index is 1 then we can't check i-1, and if the index is length(sequence)-1, then we can't check i+2, so i've added those conditions in to make sure my code skips to the other option if appropriate.
Here is a solution which works for me. The idea is that diff(x) has negative elements for every downwards step in x. For example, min(diff(x)) is positive, if x is strictly increasing. If diff(x)[i] <= 0 for exactly one index i, we have to check whether either removing x[i] or removing x[i+1] makes the sequence strictly increasing. The following function passed all tests I tried:
check_almost <- function(x) {
if (length(x) < 2) {
return(TRUE)
}
d <- diff(x)
i <- which(d <= 0)
if (length(i) == 0) {
return(TRUE) # strictly increasing
} else if (length(i) > 1) {
return(FALSE)
}
return(i == 1 || # we can remove x[1]
i == length(d) || # we can remove x[length(x)]
d[i-1]+d[i] > 0 || # we can remove x[i]
d[i] + d[i+1] > 0) # we can remove x[i+1]
}

I am trying to use an if else loop to check multiple columns for data ranges, how do I make it so that it ignores NAs in my data

I am trying to check numbers in three columns in a data frame and if they're within a certain range, I want a certain output. I have this part of the code, but one of my tests wants to know if all three are negative, then I get a certain output. My issue is that some of the data in some of the columns are NA. I want to ignore the NAs in my logic. Is there a way to do this? A sample of my code is below.
if((DataWSGR$RouteType == 7 | DataWSGR$RouteType == 9) & (DataWSGR$SGR > 5 ) & (0 < DataWSGR$`30_Year_SGR` < 5) & (0 < DataWSGR$`20_Year_SGR` < 5) & (0 < DataWSGR$`10_Year_SGR` < 5)) {}
The 10, 20, and 30 year SGRs are the columns that will have NAs in them.
After fixing the range condition, I think you can just add | is.na(var) to the last three conditions:
if ((DataWSGR$RouteType == 7 |
DataWSGR$RouteType == 9)) &
(DataWSGR$SGR > 5) &
(DataWSGR$`30_Year_SGR` > 0 & DataWSGR$`30_Year_SGR` < 5 | is.na(DataWSGR$`30_Year_SGR`)) &
(DataWSGR$`20_Year_SGR` > 0 & DataWSGR$`20_Year_SGR` < 5 | is.na(DataWSGR$`20_Year_SGR`)) &
(DataWSGR$`10_Year_SGR` > 0 & DataWSGR$`10_Year_SGR` < 5 | is.na(DataWSGR$`10_Year_SGR`))) {
}
If DataWSGR has more than one row, the above will throw an error.
Here is a reproducible example for doing this in a for loop:
df <- data.frame(
route_type = c(7, 6, 9),
sgr = c(6, 3, 6),
sgr_30 = c(3, 1, NA),
sgr_20 = c(1, 1, NA),
sgr_10 = c(2, 1, NA)
)
for (i in 1:nrow(df)) {
if (
(df$route_type[i] == 7 | df$route_type[i] == 9) &
(df$sgr[i] > 5) &
(df$sgr_30[1] > 0 & df$sgr_30[i] < 5 | is.na(df$sgr_30[i])) &
(df$sgr_20[1] > 0 & df$sgr_20[i] < 5 | is.na(df$sgr_20[i])) &
(df$sgr_10[1] > 0 & df$sgr_10[i] < 5 | is.na(df$sgr_10[i]))
) {
print(paste("In range in row", i))
}
}

Problem with a loop in BMI calculator R language

I have a problem with a school task -> BMI calculator
Here is my code:
#Przedziały
niedowaga <- seq(16.00, 18.40, 0.01)
norma <- seq(18.50, 24.90, 0.01)
nadwaga <- seq(25.00, 30.00, 0.01)
print(niedowaga)
print(norma)
print(nadwaga)
#Pytanie
waga = as.integer(readline(prompt="Podaj swoją wagę: "))
wzrost = as.integer(readline(prompt="Podaj swój wzrost w cm: "))
#Formuła
bmi <- waga/wzrost**2 * 10000
#Zaokrąglenie BMI do jednej liczby po przecinku
bmi_round <-round(bmi, digits = 2)
#Wyświetlenie wartości BMI po zaokrągleniu
print(bmi_round)
#Sprawdzenie BMI w oparciu o przedziały
for(bmi_round in niedowaga) {
if(bmi == niedowaga) {
print("Niedowaga")
}
}
else {
if (bmi == norma) {
print("Norma")
}
}
else if (bmi == nadwaga) {
print("Nadwaga")
}
I have three sequence variables, "niedowaga, norma, and nadwaga"
I calculated the BMI index.
Now I need to make a loop to check the computed BMI. "bmi_round" have to check to which sequence it fits - "niedowaga", "norma" and "nadwaga" (the first three variables) and give the output based on the computed BMI and sequence-
How can I do this?
Sorry for language in comments and in variables name - it's polish ;)
A loop is not needed for this:
waga = 30L #changed from readline
wzrost = 60L #changed from readline
bmi <- waga/wzrost**2 * 10000
ifelse(bmi >= 16 & bmi < 18.5, 'Niedowaga',
ifelse(bmi >= 18.5 & bmi < 25, 'Norma',
ifelse(bmi >=25 & bmi <= 30, 'nadwaga', 'outside normal range')))
# or
dplyr::case_when(bmi >= 16 & bmi < 18.5 ~ 'Niedowaga',
bmi >= 18.5 & bmi < 25 ~ 'Norma',
bmi >=25 & bmi <= 30 ~ 'nadwaga',
TRUE ~ 'outside normal range')
For your loop, there are overall errors. It appears that you are trying to compare the bmi_round variable with everything else. Instead, your loop isn't really doing anything - bmi_round is changing to each element of niedowaga in the loop and is not being used. Here is one way to change it
bmi_round <- 23
for(nied in niedowaga){
if (bmi_round == nied) print("Niedowaga")
}
for (norm in norma){
if (bmi_round == norm) print("Norma")
}
# [1] "Norma"
for (nad in nadwaga){
if (bmi_round == norm) print("Nadwaga")
}

Squeeze extreme ranges in a data.frame

I have a data.frame which contain 3 columns named start, end and width. Each line represent a segment over a 1D space with a start, and end and a width such as the "width = end - start + 1"
Here is an example
d = data.frame(
start = c(12, 50, 100, 130, 190),
end = c(16, 80, 102, 142, 201)
)
d$width = d$end - d$start + 1
print(d)
start end width
1 12 16 5
2 50 80 31
3 100 102 3
4 130 142 13
5 190 201 12
Consider two breakpoints and a factor of division
UpperPos = 112
LowerPos = 61
factor = 2
I would like to reduce the width of each segment outside the two breakpoints so that to reduce their width by a factor of factor. If a segment overlaps a breakpoint, then only the part of the segment that is outside this breakpoint should be reduced in width. In addition, the width of each segment must be a multiple of 3 and must be of non-zero length.
Here is my current function that "squeeze" the segments
squeeze = function(d, factor, LowerPos, UpperPos)
{
for (row in 1:nrow(d))
{
if (d[row,]$end <= LowerPos | d[row,]$end >= UpperPos) # Complete squeeze
{
middlePos = round(d[row,]$start + d[row,]$width/2)
d[row,]$width = round(d[row,]$width / factor)
d[row,]$width = d[row,]$width - d[row,]$width %% 3 + 3
d[row,]$start = round(middlePos - d[row,]$width/2)
d[row,]$end = d[row,]$start + d[row,]$width -1
} else if (d[row,]$start <= LowerPos & d[row,]$end >= LowerPos) # Partial squeeze (Lower)
{
d[row,]$start = round(LowerPos - (LowerPos - d[row,]$start)/factor)
d[row,]$width = d[row,]$end - d[row,]$start + 1
if (d[row,]$width %% 3 != 0)
{
add = 3 - d[row,]$width %% 3
d[row,]$width = d[row,]$width + add
d[row,]$start = d[row,]$start - add
}
} else if (d[row,]$start >= UpperPos & d[row,]$end <= UpperPos) # Partial squeeze (Upper)
{
d[row,]$end = round(UpperPos + (d[row,]$end - UpperPos)/factor)
d[row,]$width = d[row,]$end - d[row,]$start + 1
if (d[row,]$width %% 3 != 0)
{
add = 3 - d[row,]$width %% 3
d[row,]$width = d[row,]$width + add
d[row,]$end = d[row,]$start + add
}
} else if (!(d[row,]$end < UpperPos & d[row,]$start > LowerPos) )
{
print(d)
print(paste("row is ",row))
print(paste("LowerPos is ",LowerPos))
print(paste("UpperPos is ",UpperPos))
stop("In MyRanges_squeeze: Should not run this line!")
}
}
return(d)
}
and it returns the expected output
squeeze(d)
start end width
1 12 14 3
2 54 80 27
3 100 102 3
4 132 140 9
5 192 200 9
However, my function squeeze is way too slow. Can you help me to improve it?
Note that this answer only addresses how one may speed up your function, which is what you asked in your question, and not the validity of your logic with respect to your requirements.
As far as I can tell, all of your operations use vectorized operators. So, there is no need to loop over rows in squeeze. In the following, I have encapsulated all of your code that is within the if-else blocks as separate vectorized functions:
## This computes the case where d$end <= LowerPos | d$end >= UpperPos
f1 <- function(d, factor) {
middlePos = round(d$start + d$width/2)
d$width = round(d$width / factor)
d$width = d$width - d$width %% 3 + 3
d$start = round(middlePos - d$width/2)
d$end = d$start + d$width -1
d
}
## This is used below in f2
f4 <- function(d) {
add = 3 - d$width %% 3
d$width = d$width + add
d$start = d$start - add
d
}
## This computes the case where d$start <= LowerPos & d$end >= LowerPos
f2 <- function(d, factor, LowerPos) {
d$start = round(LowerPos - (LowerPos - d$start)/factor)
d$width = d$end - d$start + 1
ifelse(d$width %% 3 != 0, f4(d), d)
}
## This is used below in f3
f5 <- function(d) {
add = 3 - d$width %% 3
d$width = d$width + add
d$end = d$start + add
d
}
## This computes the case where d$start >= UpperPos & d$end <= UpperPos
f3 <- function(d, factor, UpperPos) {
d$end = round(UpperPos + (d$end - UpperPos)/factor)
d$width = d$end - d$start + 1
ifelse (d$width %% 3 != 0, f5(d), d)
}
Now, in squeeze, we use f1, f2, and f3 to compute the squeeze for all three cases separately. We also include the case for no squeeze as just d. We then rbind them to one big data frame, dd. Now, all we need is to pick the correct row from each block of rows (each of size nrow(d)) in dd based on the case for that row. For this, we compute a ind for the case (i.e., 1 to 4) using a series of ifelse's. The value of ind is the block to chose from, and its position is the row from that block to choose from. We use this to subset dd to get the output.
squeeze <- function(d, factor, LowerPos, UpperPos) {
d1 <- f1(d, factor)
d2 <- f2(d, factor, LowerPos)
d3 <- f3(d, factor, UpperPos)
dd <- do.call(rbind,list(d1,d2,d3,d))
ind <- ifelse(d$end <= LowerPos | d$end >= UpperPos, 1,
ifelse(d$start <= LowerPos & d$end >= LowerPos, 2,
ifelse(d$start >= UpperPos & d$end <= UpperPos, 3, 4)))
dd[(ind-1) * nrow(d) + 1:nrow(d),]
}
Using this version, the result is the same as yours:
out <- squeeze(d, factor, LowerPos, UpperPos)
## start end width
##1 12 14 3
##7 54 80 27
##18 100 102 3
##4 132 140 9
##5 192 200 9

Error when running R file from command line

I have an R file which imports a file, does some data manipulation, and performs a logistic regression model, and then saves those results to a txt file. However, when I run the file from the command line, I get the following error message and don't know what's going on.
anonymous#anonymous-Latitude-E6520:~/Downloads$ R --no-save < Auto_Model.r > out.txt
Warning message:
NAs introduced by coercion
Error in if (x == "\\N") NA else if (x > 1 & x < 6999) "1:6999" else if (x > :
missing value where TRUE/FALSE needed
Calls: bin.value -> do.call -> mapply -> .Call -> <Anonymous>
Execution halted
anonymous#anonymous-Latitude-E6520:~/Downloads$ R --no-save < Auto_Model.r
The R script which results in the error is below =
> ## IMPORT DATA:
> #setwd("~/Desktop")
> library(foreign)
> dat = read.csv("dat.csv", stringsAsFactors=FALSE)
>
> ## zipcode =
> dat$zipcode = as.character(dat$zipcode)
>
> bin.value = Vectorize(function(x) {
+ if (x == "\\N") NA
+ else if (x > 1 & x < 6999) "1:6999"
+ else if (x > 7000 & x < 9999) "7000:9999"
+ else if (x > 10000 & x < 14849) "10000:14849"
+ else if (x > 14850 & x < 19699) "14850:19699"
+ else if (x > 19700 & x < 29999) "19700:29999"
+ else if (x > 30000 & x < 31999) "30000:31999"
+ else if (x > 32000 & x < 34999) "32000:34999"
+ else if (x > 35000 & x < 42999) "35000:42999"
+ else if (x > 43000 & x < 49999) "43000:49999"
+ else if (x > 50000 & x < 59999) "50000:59999"
+ else if (x > 60000 & x < 69999) "60000:69999"
+ else if (x > 70000 & x < 79999) "70000:79999"
+ else if (x > 80000 & x < 89999) "80000:89999"
+ else if (x > 90000 & x < 96999) "90000:96999"
+ else if (x > 97000 & x < 99820) "97000:99820"
+ else NA
+ })
>
> dat$zipcode2 = as.character(bin.value(as.integer(dat$zipcode)))
Error in if (x == "\\N") NA else if (x > 1 & x < 6999) "1:6999" else if (x > :
missing value where TRUE/FALSE needed
Calls: bin.value -> do.call -> mapply -> .Call -> <Anonymous>
Execution halted
I assume some is wrong in how I am trying to manipulate the mode of the zipcode variable but nothing I've tried seems to fix the issue.
> str(dat$zipcode)
int [1:12635] 76148 33825 61832 11368 98290 92078 44104 62052 55106 20861 ...
>
It seems to me that what you're trying to do is already done by function cut:
bin.value <- function(x){
cut(as.integer(x),
breaks= c(1,6999,9999,14849,19699,29999,31999,34999,42999,49999,59999,69999,79999,89999,96999,99820),
labels= c("1:6999", "7000:9999", "10000:14849", "14850:19699", "19700:29999", "30000:31999", "32000:34999", "35000:42999", "43000:49999", "50000:59999", "60000:69999", "70000:79999", "80000:89999", "90000:96999", "97000:99820"))
}
Otherwise your specific problem is caused by as.integer:
a <- c("\\N",sample(seq(0,100000,by=1),10))
a
[1] "\\N" "38987" "50403" "75683" "66706" "27924" "17216" "77539" "80658" "2335" "53010"
as.integer(a)
[1] NA 38987 50403 75683 66706 27924 17216 77539 80658 2335 53010
\\N is therefore traited straight away as NA which your loop only handle at the end, meanwhile all ifstatements try to compare a missing value with some elements.
as.integer(a)[1]=="\\N"
[1] NA # Instead of TRUE or FALSE

Resources