ifelse behaviour with which function in R - r

Just been playing with some basic functions and it seems rather strange how ifelse behaves if I use which() function as one of the arguments when the ifelse condition is true, e.g.:
#I want to identify the location of all values above 6.5
#only if there are more than 90 values in the vector a:
set.seed(100)
a <- rnorm(100, mean=5, sd=1)
ifelse(length(a)>90, which(a>6.5), NA)
I get this output:
[1] 4
When in fact it should be the following:
[1] 4 15 25 40 44 47 65
How then can I make ifelse return the correct values using which() function?
It seems it only outputs the first value that matches the condition. Why does it do that?

You actually don't want to use ifelse in this case. As BondedDust pointed out, you should think of ifelse as a function that takes three vectors and picks values out of the second two based on the TRUE/FALSE values in the first. Or, as the documentation puts it:
ifelse returns a value with the same shape as test which is filled
with elements selected from either yes or no depending on whether the
element of test is TRUE or FALSE.
You probably simply wanted to use a regular if statement instead.
One potential confusion with ifelse is that it does recycle arguments. Specifically, if we do
ifelse(rnorm(10) < 0,-1,1)
you'll note that the first argument is a logical vector of length 10, but our second two "vectors" are both of length one. R will simply extend them as needed to match the length of the first argument. This will happen even if the lengths are not evenly extendable to the correct length.

Related

R returns incorrect answers for logical operations?

I am trying to subset a dataframe based on values in a single column (Column_A), using code similar to the following:
new_df <- subset(df, df$Column_A<4)
I noticed that this code returns all rows where the value for Column_A is less than 4...as well as one row where the value is 12.4 (so, greater than 4).
I tried to look more closely at what R believes the value of this cell to be--df$Column_A[[2]] returned the expected value of 12.4.
I then tested several other variants of this logical operation--e.g.df$Column_A[[2]]<12 , df$Column_A[[2]]<11 , df$Column_A[[2]]<10 , df$Column_A[[2]]<9...
The first three expressions returned the expected answer ("FALSE"). However, df$Column_A[[2]]<9 and all variants of this expression with lower values (e.g. <8, <7...) return the answer ("TRUE"). This is clearly incorrect.
I have no idea what is causing this and would really appreciate any insight.
It could happen if the class of the column is character
"12.4" < 4
[1] TRUE
Remedy is to convert to numeric first and then subset
df$Column_A <- as.numeric(df$Column_A)
subset(df, Column_A < 4)

Why does min() not return the actual minimum value?

I'm trying to find the minimum value in a list of numerical values in a table using the function min() in R, but I've noticed that it sometimes doesn't return the actual lowest numerical value.
For instance, if the list consisted of 7.760, 12.015, 13.043, and 70.789, if I did min(list), it would return 12.015 as the minimum value and not 7.760. I noticed by just manually sorting the table of values from lowest to highest using the arrows at the top of the column in the table, it would return the order of 12.015, 13.043, 7.760, 70.789 as if ranking the values by the first digit instead of the number as a whole.
Is there a way to fix it or a different function to use in this case?
Sounds like your numbers are actually character strings
vec <- c(7.760, 12.015, 13.043, 70.789)
min(vec)
# [1] 7.76
vec <- as.character(vec)
min(vec)
# [1] "12.015"
Simply convert them to numeric with as.numeric(vec), then min() will behave as expected

How does the function argument work in R's 'combn'?

Despite reading the documentation, I'm struggling to understand how the function argument works in the combn utility.
I have a table with two columns of data, for each column, I want to calculate the ratio of each unique combination of data pairs in that column. Let's just focus on one column for simplicity:
V1
1 342.3
2 123.5
3 472.0
4 678.3
...
14 567.2
I can use the following to return all the unique combinations:
combn(table[,1], 2)
but of course this just returns each pair of values. I want to divide them to get a ratio, but can't seem to figure out how to set this up.
I understand that for something like outer, for example, you can just provide the operator as the argument but how does this transfer to combn?
combn(table[,1], 2, FUN = "/")
# obviously not correct
The issue is that the function will receive exactly one parameter. And that parameter will be vector of the elements in that particular set. The / function require two separate parameters, not a single vector of values. Instead you could write
combn(table[,1], 2, FUN = function(x) x[1]/x[2])
So here we get one parameter x and we divide the first value by the second.
Other functions such as
combn(1:4, 2, FUN = sum)
work just fine because they expect to receive a single vector of values.

issue summing columns

I have a very large dataset and I'm trying to get the sums of values. The variables are binary with 0s and 1s.
Somehow, when I run a for loop
for (i in 7:39){
agegroup1[53640, i]<-sum(agegroup1[, i])
}
The loop processes but everything but the first column would contain nothing but just NA. I tried calling the values up and would see 0 and 1s, as well as checking the class (it returns "integer"). But when adding it all up, R does not work.
Any advice?
cs <- colSums(agegroup1[, 7:39])
will give you the vector of column sums without looping (at the R level).
If you have any missing values (NAs) in agegroup1[, 7:39] then you may want to add na.rm = TRUE to the colSums() call (or even your sum() call).
You don't say what agegroup1 is or how many rows it has etc, but to finalise what your loop is doing, you then need
agegroup1[53640, 7:39] <- cs
What was in agegroup1[53640, ] before you started adding the column sums? NA? If so that would explain some behaviour.
We do really need more detail though...
#Gavin Simpson provided a workable solution but alternatively you could use apply. This function allows you to apply a function to the row or column margin.
x <- cbind(x1=1, x2=c(1:8), y=runif(8))
# If you wanted to sum the rows of columns 2 and 3
apply(x[,2:3], 1, sum, na.rm=TRUE)
# If you want to sum the columns of columns 2 and 3
apply(x[,2:3], 2, sum, na.rm=TRUE)

Two data formatting questions for R

I have two questions, both are pretty simple I believe dealing with R.
I would like to create a IF statement that will assign a NA value to certain rows in a column. I have tried the following command:
a[a[,21]==0,5:10] <-NA
the error says:
Error in [<-.data.frame(tmp, a[, 21] == 0, 5:20, value = NA) : missing values are not allowed in subscripted assignments of data frames
Essentially that code is supposed to take any 0 value in column 21, and replace the values for that row from columns 5 to 10 to NA. There are NA's in column 21 already, but I am not sure whether that does anything?
I am not sure how to craft this next function at all. I need to manipulate data that contains positive and negative controls. However, when I manipulate the data, I don't want the positive and negative control values to be apart of the manipulation, but I want the positive and negative controls to remain in the columns because I have to use them later. Is there anyway to temporarily ignore these values so they aren't included in the manipulation?
Here sample data:
L = c(2,1,4,3,1,4,2,4,5,1)
R = c(2,4,5,1,"Neg",2,"",1,2,1)
T = c(2,1,4,2,"CTRL",2,"PCTRL",2,1,4)
test <- data.frame(L=L,R=R,T=T)
I would like to be able to temporarily ignore these rows based on the characters "Neg" "CTRL"/"" "PCTRL" rather than the position of them in the data frame if possible. Notice how for negative control, Neg and CTRL are in separate columns, same row, just like positive control where there is a blank and PCTRL in separate columns yet same rows. Any way to do this given these odd conditions?
Hope this was written clearly enough, and I thank anyone in advance for taking the time to help me!
Try this for subsetting your dataframe to those rows where R is not "Neg":
subset(test, R!="Neg")
For the NA problem, you probably already have NAs in your data frame, right? Try if this works:
a[a[,21] %in% 0, 5:10] <- NA
Try instead:
a[ which(a[,21]==0), 5:10] <-NA
Explanation: the == operation is returning NA values and the [<- function doesn't accept them. The which function will return a numeric vector and "throw away the NA's". As an aside, the [ function (without the '<-') will return all NA rows. This is considered a 'feature', but I find it to be an 'annoyance', so I will typically use which for selection as well as for selective-assignment.
For the first problem: if a[,21] is negative, do you want to assign NA? In this case,
a[replace(a[,21],is.na(a[,21]),0)==0,5:10] <- NA
Otherwise (note that I replaced replacement value of "0" with something nonzero ("1" used here but doesn't really matter as long as it's not zero),
a[replace(a[,21],is.na(a[,21]),1)==0,5:10] <- NA
As for the second problem,
subset(test,! (L %in% c("Neg","") | T %in% c("CTRL","PCTRL")))
In case the filtering conditions in L and T are not always coinciding. If they always coincide, then you can just apply test to one of L or T. Also, you may also want to keep in mind that T used to stand for TRUE in S, S-PLUS, and R (still does); you can reassign another value to T and things will be okay but I believe it's generally discouraged (same for c, which people also like to assign to).

Resources