R: Ignorance of logical condition - r

I am trying to find k maximum values from a global data.frame and update a logical variable outlier in each iteration. This is the code I am using
while(k > 0){
#find the max value index
index = which(data$value==max(data$value[which(data$brand==brand_rm &
data$outlier == FALSE)],na.rm = T))[1]
#update outlier column
data$outlier[index] <<- TRUE
k = k - 1
}
However it will only work in the first iteration and will ignore the data$outlier == FALSE condtion in the next iterations. Can you please explain why this happens and what am I doing wrong?

I find the better approach to do this which is:
while(k > 0){
#find the outlier index
row_to_remove = which(data[,value]==max(data[data$outlier ==FALSE &
data$brand==brand_rm ,value]))
ata$outlier[row_to_remove] <<- TRUE
}

if I understand your question right, you don't even need a loop for that. See my reproducible example below.
data <- data.frame(value=11:20, outlier=FALSE)
k <- 2
kth_value = sort(data$value, decreasing = TRUE)[k] #Get the Kth highest value
data$outlier <- ifelse( data$value>=kth_value, TRUE, FALSE) #Compare to the kth

Related

R programming: How to set while loop condition based if all required values in vector have been copied from sample?

I new to R and I'm trying to see how many iterations are needed to fill a vector with numbers 1 to 55 (no duplicates) from a random sample using runif.
At the moment, the vector has a lots of duplicates in it and my number of iterations being returned is the size of the vector. So, i'm not sure if my logic is correct.
The aim of the if statement is to check if the value from the sample exists in the vector, and if it does, choose the next one. But i'm not sure if it's correct, since the next number could already exist in the vector. Any help would be much appreciated
numbers=as.integer(runif(800, min=1, max=55)) ## my sample from runif
i=sample(numbers, 1)
## setting up my vector to store 55 unique values (1 to 55)
p=rep(0,55)
## my counters
j=0
n=1
## my while loop
while (p[n] %in% 0){
## if the sample value already exists in the vector, choose the next value from the sample
if (numbers[n] %in% p) {
p[n]=numbers[n+1]
}
else {
p[n] = numbers[n]
}
n = n + 1
j = j + 1
}
I believe that the following is what you want. Instead pf a while loop on p, the while loop should search for a new value in numbers.
set.seed(2021) # make the results reproducible
numbers <- sample(55, 800, TRUE)
## setting up my vector to store 55 unique values (1 to 55)
p <- integer(55)
# assign the elemnts of p one by one
for(j in seq_along(p)){
## if the sample value already exists in the vector,
## choose the next value from the sample
n <- 1
while (numbers[n] %in% p) {
n <- n + 1
}
if(n <= length(numbers)){
p[j] <- numbers[n]
}
}
j
#[1] 55
length(unique(p)) == length(p)
#[1] TRUE

extract data one row below based on specific condition

have a very large data ~1GB and would like to extract summary data with such condition:
for loop:
if(a[i] == 999) then extract b[i+1]
else next
so that i can then table(b) to find the its distribution/composition, assuming column b is of class character, column a is of class integer
my R code:
summary123 <- data.frame()
j = 1
k = 1
for(i in 1:nrow(df1)){
if(df1$a[i] == 999 & i != nrow(df1)){
j = i + 1
summary123[k,1] <- df1$b[j]
k = k + 1
}
else{
next
}
}
however it is taking a long time, would like faster R-code equivalent
Use lead from dplyr:
output=lead(df1$b,1)[df1$a==999]
Then the answer you are looking for is:
output[-1]
(basically removing the last element, which is a NA introduced by the lead function)

Continuous Error in argument "x" is missing, with no default

I am trying to build a function that takes a numeric vector of homework scores (of length n), and an optional logical argument drop, to compute a single homework value. If drop = TRUE, the lowest HW score must be dropped.
step1 function to get average
get_average <- function(x,na.rm=TRUE) {
if(na.rm==TRUE){
x = remove_missing(x)}
total <- 0
for (n in 1:length(x)) {
total= total + x[n]
}
return(total/length(x))
}
put it all together
score_homework <- function(x,drop=TRUE)
{
if(drop==TRUE)
x = drop_lowest(x)
{get_average(x)}}
However I keep getting the error Error in score_homework() : argument "x" is missing, with no default
I'm not sure this is what you want, but here goes.
First generate some dummy data:
# Set seed
set.seed(1234)
# Generate dummy homework data with <NA> values
homework <- c(rep(NA, 20), rnorm(n = 100, mean = 50, sd = 10))
# Have a quick look
hist(homework)
Then we write the function:
# Make function
homework_func <- function(data, drop = TRUE) {
# Remove NA
data <- data[!is.na(data)]
# Calculate the average depending on whether 'drop' is T or F
if(drop == TRUE) {
data <- data[data > min(data)]
mean(data)
} else {
mean(data)
}
}
# Use function with 'drop = TRUE'
homework_func(data = homework, drop = TRUE)
#> [1] 48.65349
# Use function with 'drop = FALSE'
homework_func(data = homework, drop = FALSE)
#> [1] 48.43238
Here is a function to eliminate the lowest missing score that's less complicated than the version in the original post. I sort the scores in descending order in case the there is a tie for the lowest score. In that case, we should only remove one instance of the lowest score. Also, you're really better off using R's mean() function than writing your own.
scores <- c(78,93,61,NA,61,83,92,95,NA,100)
removeMinScore <- function(x) {
x <- x[order(-x)] # order descending
x <- x[!is.na(x)] # remove NAs
x[1:length(x)-1] # return all but lowest score, removes only 1 tied value
}
That said, if you must write your own version of mean(), here is a simpler approach that takes advantage of existing R functions.
TIP: Since is.na() returns a vector of TRUE and FALSE values, you can sum these to count the number of non-missing values in a vector.
mymean <- function(x) {sum(x, na.rm=TRUE) / sum(!is.na(x))}
The results look like this.
The modified version of score_homework() would be:
score_homework <- function(x,drop=TRUE){
if(drop == TRUE) return mean(removeMinScore(x),na.rm=TRUE)
else mean(x,na.rm=TRUE)
}
The results from testing the function are as follows.

In R: if statement gives 'missing value where TRUE/FALSE needed' error, but there are no missing values

I have written a function to identify peaks in a series of acceleration values. (I am aware of the quantmod package & findPeaks function, but it doesn't identify peaks according to my criteria.) I want to identify a peak as any value that follows three consecutive increases and precedes three consecutive decreases.
Here is my function... I apologise if it is very inelegant, but it's my first attempt at doing this. The vector x is a series of about 900-1200 acceleration values; e.g. 1.003841, 1.003570, 1.003428, 1.003261, 1.003033, 1.002630...
peakFinder <- function(x){
diffs <- sign(diff(x))
lags <- 1:length(diffs)
frame <- data.frame(diffs, lags)
frame$diffs <- ifelse(is.na(frame$diffs), 0, frame$diffs)
pks <- 0
for(l in frame$lags){
if ((frame[l,1] == 1) & (frame[l+1,1] == 1) & (frame[l+2,1] == 1)
& (frame[l+3,1] == -1) & (frame[l+4,1] == -1) & (frame[l+5,1] == -1)){
pks <- c(pks, l+2)
}
}
pks <- pks[-1]
pks
}
The if statement keeps giving me the error "missing value where TRUE/FALSE needed". This is confusing because there are no missing values in either frame$diffs or frame$lags. I am probably making some other basic error, but I can't figure out what it is.
I would really appreciate some help!
OK, i think a slightly simplified version would be this:
x <- c(09,10,12,13,11,09,08,10,12,20,19,18,17) # peak 13 and 20
if (length(x) >= 7) # assuming length > 7
{
diffs <- sign(diff(x))
pks <- 0
for(i in 3:(length(diffs)-3))
{
if (all(diffs[(i-2):(i)]==+1) && all(diffs[(i+1):(i+3)] == -1))
{
print(paste("Peak at", x[i+1]))
}
}
}
when executed prints
[1] "Peak at 13"
[1] "Peak at 20"
so you can adopt it to your function.

Subtracting from random values in a weighted matrix in R

and thanks in advance for your help!
This question is related to one I posted before, but I think it deserves its own post because it is a separate challenge.
Last time I asked about randomly selecting values from a matrix after adding a vector. In that example, the matrix and the vector were both binary. Now I would like to change the values in a weighted matrix after adding a weighted vector. Here is some example code to play with.
require(gamlss.dist)
mat1<-matrix(c(0,0,0,0,1,0, 0,10,0,0,0,5, 0,0,0,0,1,0, 0,0,3,0,0,0, 0,0,0,0,3,0,
0,0,2,0,0,0, 2,1,0,1,0,1, 0,0,0,0,37,0, 0,0,0,2,0,0, 0,0,0,0,0,1, 1,0,0,0,0,0,
0,1,1,0,0,0), byrow=T, ncol=6, nrow=12)
vec1<-c(0,0,0,1,1,1)
ones <- which(vec1 == 1L)
temp=rZIP(sum(vec1)) #rZIP is a function from gamlss.dist that randomly selects values from a zero-inflated distribution
vec1[ones]<-temp
The values in the vector are sampled from a zero-inflated distribution (thanks to this question). When I bind the vector to the matrix, I want to randomly select a non zero value from the same column, and subtract the vector value from it. I can see a further complication arising if the vector value is greater than the randomly selected value in the same column. In such an instance, it would simply set that value to zero.
Here is some modified code from the earlier question that does not work for this problem but maybe will be helpful.
foo <- function(mat, vec) {
nr <- nrow(mat)
nc <- ncol(mat)
cols <- which(vec != 0) #select matrix columns where the vector is not zero
rows <- sapply(seq_along(cols),
function(x, mat, cols) {
ones <- which(mat[,cols[x]] != 0)
out <- if(length(ones) != 0) {
ones
} else {
sample(ones, 1)
}
out
}, mat = mat, cols = cols)
ind <- (nr*(cols-1)) + rows #this line doesn't work b/c it is not binary
mat[ind] <- 0 #here is where I would like to subtract the vector value
mat <- rbind(mat, vec)
rownames(mat) <- NULL
mat
}
Any ideas? Thanks again for all of the fantastic help!
EDIT:
Thanks to help from bnaul down below, I am a lot closer to the answer, but we have run into the same problem we hit last time. The sample function doesn't work properly on columns where there is only one nonzero value. I have fixed this using Gavin Simpson's if else statement (which was the solution in the previous case). I've adjusted the matrix to have columns with only one nonzero value.
mat1<-matrix(c(0,0,0,0,1,0, 0,0,0,0,0,5, 0,0,0,0,1,0, 0,0,0,0,0,0, 0,0,0,0,3,0,
0,0,2,0,0,0, 2,1,0,1,0,1, 0,0,0,0,37,0, 0,0,0,2,0,0, 0,0,0,0,0,1, 1,0,0,0,0,0,
0,0,0,0,0,0), byrow=T, ncol=6, nrow=12)
vec1<-c(0,1,0,0,1,1)
ones <- which(vec1 == 1L)
temp=rZIP(sum(vec1))
vec1[ones]<-temp
mat2 = rbind(mat1, vec1)
apply(mat2, 2, function(col) { #Returns matrix of integers indicating their column
#number in matrix-like object
nonzero = which(head(col,-1) != 0); #negative integer means all but last # of elements in x
sample_ind = if(length(nonzero) == 1){
nonzero
} else{
sample(nonzero, 1)
}
; #sample nonzero elements one time
col[sample_ind] = max(0, col[sample_ind] - tail(col,1)); #take max of either 0 or selected value minus Inv
return(col)
}
)
Thanks again!
mat2 = rbind(mat1, vec1)
apply(mat2, 2, function(col) {
nonzero = which(head(col,-1) != 0);
sample_ind = sample(nonzero, 1);
col[sample_ind] = max(0, col[sample_ind] - tail(col,1));
return(col)
}
)
I made a couple of simplifications; hopefully they don't conflict with what you had in mind. First, I ignore the requirement that you only operate on the nonzero elements of the vector, since subtracting 0 from anything will not change it. Second, I bind the matrix and vector and then perform the operation column-wise on the result, since this is a bit easier than tracking the indices in two separate data structures and then combining them afterward.

Resources