R as.numeric in a formula - r

I have the following data frame:
id<-c(1,2,3,4)
x<-c(0,2,1,0)
A<-c(1,3,4,3)
df<-data.frame(id,x,A)
now I want to make a variable called value in a way that if x>0, value for each id would be equal to A+1/x, and if x=0, value would be equal to A.
with this aim I have typed
value <- df$A + as.numeric(df$x > 0)*(1/df$x)
what I expect is a vector a follows:
[1] 1 3.5 5.0 3
however what I get by typing the above command is :
[1] NaN 3.5 5.0 NaN
I wonder if anyone can help me with this problem.
Thanks in advance!

I think it would be simpler to use function ifelse in this case:
ifelse(df$x>0, df$A + (1/df$x), df$A)
[1] 1.0 3.5 5.0 3.0
See ?ifelse for more details.
With the command you were trying to apply, although as.numeric(df$x>0) was indeed giving you a vector of 1 and 0 (so it was a good idea), it didn't change the fact that 1/df$x was an NaN when df$x was equal to 0 (since you can not divide by 0).

Related

Calculate a value in a column for each row

Here is the table that I am trying to manipulate:
colnames sampA sampB
#1 conA conB
#2 1.1 4.4
#3 2.2 5.5
#4 3.3 6.6
I want to calculate log2(x(1-x)) for each number in $sampB. Here is my code so far:
DF[-1,3] <- apply(DF[-1,]$sampB,1,function(x) log2(x(1-x)))
then I got the error message:
dim(X) must have a positive length
You shouldn't need apply(), as log2() is vectorized. Try this
x <- as.numeric(as.character(DF$sampB[-1]))
log2(x * (1 - x))
I took off the first element because I'm not really sure what that conB part is about (and now you have confirmed it in the comments). I also suspect that the column might be a factor (because of conB), so I wrapped the column in as.numeric(as.character(...)). That may not be necessary, but better safe than sorry.

How to replace NaN and Inf using 1 of the 2 matrices value?

I have 2 large matrices with 100 columns and 10,000 rows. I am trying to divide a by b and calculate log2, and then abs().
a<-c(2,0,2,3,0,6)
b<-c(0,1,4,6,0,6)
a/b
Inf 0.0 0.5 0.5 NaN 1.0
I would like to use "2" instead of "Inf" and use "0" instead of NaN.
You could do:
res <- a/b;
res[is.infinite(res)] <- a[is.infinite(res)];
res[is.nan(res)] <- 0;
Although, note, the documentation to is.nan says:
Computations involving ‘NaN’ will return ‘NaN’ or perhaps ‘NA’:
which of those two is not guaranteed and may depend on the R
platform (since compilers may re-order computations).
I get NaN though.
It looks to me that you just don't want to divide by 0 and when b==0 just return the corresponding value of a. If that's the case, no need to work on Inf and NaN: just replace the 0s in b with ones.
b[b==0]<-1
a/b
#[1] 2.0 0.0 0.5 0.5 0.0 1.0

Is there any subtle difference between 'mean' and 'average' in R?

Is there any difference between what these two lines of code do:
mv_avg[i-2] <- (sum(file1$rtn[i-2]:file1$rtn[i+2])/5)
and
mv_avg[i-2] <- mean(file1$rtn[i-2]:file1$rtn[i+2])
I'm trying to calculate the moving average of first 5 elements in my dataset. I was running a for loop and the two lines are giving different outputs. Sorry for not providing the data and the rest of the code for you guys to execute and see (can't do that, some issues).
I just want to know if they both do the same thing or if there's a subtle difference between them both.
It's not an issue with mean or sum. The example below illustrates what's happening with your code:
x = seq(0.5,5,0.5)
i = 8
# Your code
x[i-2]:x[i+2]
[1] 3 4 5
# Index this way to get the five values for the moving average
x[(i-2):(i+2)]
[1] 3.0 3.5 4.0 4.5 5.0
x[i-2]=3 and x[i+2]=5, so x[i-2]:x[i+2] is equivalent to 3:5. You're seeing different results with mean and sum because your code is not returning 5 values. Therefore dividing the sum by 5 does not give you the average. In my example, sum(c(3,4,5))/5 != mean(c(3,4,5)).
#G.Grothendieck mentioned rollmean. Here's an example:
library(zoo)
rollmean(x, k=5, align="center")
[1] 2.1 3.1 4.1 5.1 6.1 7.1 8.1

Coercing a character to a numeric in R

I'm a newbie to R and I've learnt that a character string like "12.5" can be coerced to a numeric using as.numeric() function, which gives me the following result.
> as.numeric("12.5")
[1] 12.5
But when I try following, the result doesn't contain the fractional part.
> as.numeric("12.0")
[1] 12
is there a way to keep the fractional part in the result...
Thanks in advance...
Is it really necessary to print whole numbers that way? If you're worried about how it will appear with other numbers, say in a vector or data frame, not to worry. If you have at least one decimal number in the vector, the whole number will appear as a decimal as well.
> as.numeric(c("12.0", "12.1"))
## [1] 12.0 12.1
> data.frame(x = as.numeric(c("12.0", "12.1")))
## x
## 1 12.0
## 2 12.1
If it's simply for appearance purposes, there are a few functions that can make 12.0 appear numeric. Keep in mind, however that this does not coerce to numeric, even though it looks like it does.
> noquote("12.0")
## [1] 12.0
> cat("12.0")
## 12.0

Delete rows with negative values

In R I am trying to delete rows within a dataframe (ants) which have a negative value under the column heading Turbidity. I have tried
ants<-ants[ants$Turbidity<0,]
but it returns the following error:
Warning message:
In Ops.factor(ants$Turbidity, 0) : < not meaningful for factors
Any ideas why this may be? Perhaps I need to make the negative values
NA before I then delete all NAs?
Any ideas much appreciated, thank you!
#Joris: result is
str(ants$Turbidity)
num [1:291] 0 0 -0.1 -0.2 -0.2 -0.5 0.1 -0.4 0 -0.2 ...
Marek is right, it's a data problem. Now be careful if you use [as.numeric(ants$Turbidity] , as that one will always be positive. It gives the factor levels (1 to length(ants$Turbidity)), not the numeric factors.
Try this :
tt <- as.numeric(as.character(ants$Turbidity))
which(!is.na(tt))
It will give you a list of indices where the value was not numeric in the first place. This should enable you to first clean up your data.
eg:
> Turbidity <- factor(c(1,2,3,4,5,6,7,8,9,0,"a"))
> tt <- as.numeric(as.character(Turbidity))
Warning message:
NAs introduced by coercion
> which(is.na(tt))
[1] 11
You shouldn't use the as.numeric(as.character(...)) structure to convert problematic data, as it will generate NA's that will mess with the rest. Eg:
> Turbidity[tt > 5]
[1] 6 7 8 9 <NA>
Levels: 0 1 2 3 4 5 6 7 8 9 a
Always do summary(ants) after reading in data, and check if you get what you expect.
It will save you lots of problems. Numeric data is prone to magic conversion to character or factor types.
EDIT. I forget about as.character conversion (see Joris comment).
Message mean that ants$Turbidit is a factor. It will work when you do
ants <- ants[as.numeric(as.character(ants$Turbidity)) > 0,]
or
ants <- subset(ants, as.character(as.numeric(Turbidity)) > 0)
But the real problem is that your data are not prepared to analysis. Such conversion should be done in the beginning. You should be careful cause there could be non-numeric values also.
This should also work using the tidyverse (assuming column is the correct data type).
ants %>% dplyr::filter(Turbidity >= 0)

Resources