Avoid NaN and Inf when dividing in R (using within formula) - r

I'm trying to add a column to my data set in R with the Within formula.
Data set name: Full_Stats
Objective: Add Minutespergoal column using within formula
Formula
Full_Stats2<-within(Full_Stats,
{Minutespergoal<-Minutes_played/Goal })
The formula works fine, but I'd like to avoid having NaN and Inf in the result. How could I fix this?
Please let me know if any question.
Thanks

NaN occurs by dividing zero by zero, and infinity occurs by dividing a non-zero number by zero. You can avoid these by making sure that your denominator Goal is never zero. Assuming you wanted to remove these values you could try:
Full_Stats2<-within(Full_Stats,
{Minutespergoal<-Minutes_played/Goal })[Goal != 0]

Related

R mean of one column based on another [duplicate]

I have a dataset named bwght which contains the variable cigs (cigarattes smoked per day)
When I calculate the mean of cigs in the dataset bwght using:
mean(bwght$cigs), I get a number 2.08.
Only 212 of the 1388 women in the sample smoke (and 1176 does not smoke):
summary(bwght$cigs>0) gives the result:
Mode FALSE TRUE NA's
logical 1176 212 0
I'm asked to find the average of cigs among the women who smoke (the 212).
I'm having a hard time finding the right syntax for excluding the non smokers = 0
I have tried:
mean(bwght$cigs| bwght$cigs>0)
mean(bwght$cigs>0 | bwght$cigs=TRUE)
if (bwght$cigs > 0){
sum(bwght$cigs)
}
x <-as.numeric(bwght$cigs, rm="0");
mean(x)
But nothing seems to work! Can anyone please help me??
If you want to exclude the non-smokers, you have a few options. The easiest is probably this:
mean(bwght[bwght$cigs>0,"cigs"])
With a data frame, the first variable is the row and the next is the column. So, you can subset using dataframe[1,2] to get the first row, second column. You can also use logic in the row selection. By using bwght$cigs>0 as the first element, you are subsetting to only have the rows where cigs is not zero.
Your other ones didn't work for the following reasons:
mean(bwght$cigs| bwght$cigs>0)
This is effectively a logical comparison. You're asking for the TRUE / FALSE result of bwght$cigs OR bwght$cigs>0, and then taking the mean on it. I'm not totally sure, but I think R can't even take data typed as logical for the mean() function.
mean(bwght$cigs>0 | bwght$cigs=TRUE)
Same problem. You use the | sign, which returns a logical, and R is trying to take the mean of logicals.
if(bwght$cigs > 0){sum(bwght$cigs)}
By any chance, were you a SAS programmer originally? This looks like how I used to type at first. Basically, if() doesn't work the same way in R as it does in SAS. In that example, you are using bwght$cigs > 0 as the if condition, which won't work because R will only look at the first element of the vector resulting from bwght$cigs > 0. R handles looping differently from SAS - check out functions like lapply, tapply, and so on.
x <-as.numeric(bwght$cigs, rm="0")
mean(x)
I honestly don't know what this would do. It might work if rm="0" didn't have quotes...?
mean(bwght[bwght$cigs>0,"cigs"])
I found the statement failed, returning "argument is not numeric or logical: returning NA"
Converting to matrix solved this:
mean(data.matrix(bwght[bwght$cigs>0,"cigs"]))

Octave: Values inside a matrix that are close

I have a vector that is being filled with random numbers within this range [0,1]. I want to somehow accept only the vectors, in which an element inside of it has a maximum deviation of 0,02 from its previous one and its next one.
For example I have the below vector [3,1]. This is acceptable, because the deviation of the 2nd element, between the first and the third element is not bigger than 0,02. Vector is not always consisted of 3 rows, it could be more.
**Vector**
0.32957
0.33097
0.33946
This is what i thought:
n=4
P=rand(1,n);
sort(P,"ascend");
for L=2:n
while P(L-1)-P(L)>0.02
P=rand(1,n);
endwhile
endfor
Vectorize this!
isvalid=~any(diff(sort(a))>0.02);
sort(a) : if its not sorted, sort
diff() : take the difference between adjacent elements
___ >0.02: Check if any of those differences is bigger than what you accept
~any(): if any is bigger, then return zero, "not valid".
From your code, it seems that there may be more to the question than what you ask, you seem to have the XY problem. You want to create a random vector that has the properties that you describe. You seem to be using uniform random numbers, so let me propose a way to generate your vector where your conditions are always true.
a(1)=rand(1); %or any other way to generate a first value.
length=100; %desired length.
a(2:length)=rand(length-1,1)*0.02; %generate random numbers never bigger than 0.02
a=cumsum(a); %cumulative sum
This ensures the vector is increasing in value, and never increasing more than 0.02

How to compute for the mean and sd

I need help on 4b please
‘Warpbreaks’ is a built-in dataset in R. Load it using the function data(warpbreaks). It consists of the number of warp breaks per loom, where a loom corresponds to a fixed length of yarn. It has three variables namely, breaks, wool, and tension.
b. For the ‘AM.warpbreaks’ dataset, compute for the mean and the standard deviation of the breaks variable for those observations with breaks value not exceeding 30.
data(warpbreaks)
warpbreaks <- data.frame(warpbreaks)
AM.warpbreaks <- subset(warpbreaks, wool=="A" & tension=="M")
mean(AM.warpbreaks<=30)
sd(AM.warpbreaks<=30)
This is what I understood this problem and typed the code as in the last two lines. However, I wasn't able to run the last two lines while the first 3 lines ran successfully. Can anybody tell me what is the error here?
Thanks! :)
Another way to go about it:
This way you aren't generating a bunch of datasets and then working on remembering which is which. This is more a personal thing though.
data(warpbreaks)
mean(AM.warpbreaks[which(AM.warpbreaks$breaks<=30),"breaks"])
sd(AM.warpbreaks[which(AM.warpbreaks$breaks<=30),"breaks"])
There are two problems with your code. The first is that you are comparing to 30, but you're looking at the entire data frame, rather than just the "breaks" column.
AM.warpbreaks$breaks <= 30
is an expression that refers to the breaks being less than thirty.
But mean(AM.warpbreaks$breaks <= 30) will not give the answer you want either, because R will evaluate the inner expression as a vector of boolean TRUE/FALSE values indicating whether that break is less than 30.
Generally, you just want to take another subset for an analysis like this.
AM.lt.30 <- subset(AM.warpbreaks, breaks <= 30)
mean(AM.lt.30$breaks)
sd(AM.lt.30$breaks)

How to do a PCA with 0 (zero) values

I want to do a PCA in R with monthly rainfall values. Since there is no rain during winter, quite a few values in my columns are 0.
When I run the PCA, the following message appears in the console: Error in cov.wt(z) : 'x' must contain finite values only
I think what R is telling me here is that it does not like my 0 values.
So, I tried to change my 0 values to 'real numbers' by multiplying everything with 1.0000000001. But even if I do that and run R again with the new values, it pops up with the same message.
I read that I would need to either get rid of the rows with any missing values in them (which I can't) or use a PCA code that can deal with missing values by somehow imputing them. But my 0's are actual values, not missing values.
I find a lot of information on the web on how to deal with missing values or NA values but nothing on how to deal with zero values. Does anyone have any suggestions how I can do this? Many thanks for your help!
My guess is that "Error in cov.wt(z) : 'x' must contain finite values only" is complaining that some covariances are non-finite, i.e. NA/NaN. This can happen if you have variables that have a standard deviation of 0.
Example code:
latent = rnorm(10)
data = data.frame(rep(0,10), #10 0's
latent+rnorm(10), #latent and noise
latent+rnorm(10), #
latent+1.5*rnorm(10)) #
colnames(data) = c("zeros","var1","var2","var3")
library(psych)
principal(data) #error!
principal(data[-1]) #no errors

Conditional mean statement

I have a dataset named bwght which contains the variable cigs (cigarattes smoked per day)
When I calculate the mean of cigs in the dataset bwght using:
mean(bwght$cigs), I get a number 2.08.
Only 212 of the 1388 women in the sample smoke (and 1176 does not smoke):
summary(bwght$cigs>0) gives the result:
Mode FALSE TRUE NA's
logical 1176 212 0
I'm asked to find the average of cigs among the women who smoke (the 212).
I'm having a hard time finding the right syntax for excluding the non smokers = 0
I have tried:
mean(bwght$cigs| bwght$cigs>0)
mean(bwght$cigs>0 | bwght$cigs=TRUE)
if (bwght$cigs > 0){
sum(bwght$cigs)
}
x <-as.numeric(bwght$cigs, rm="0");
mean(x)
But nothing seems to work! Can anyone please help me??
If you want to exclude the non-smokers, you have a few options. The easiest is probably this:
mean(bwght[bwght$cigs>0,"cigs"])
With a data frame, the first variable is the row and the next is the column. So, you can subset using dataframe[1,2] to get the first row, second column. You can also use logic in the row selection. By using bwght$cigs>0 as the first element, you are subsetting to only have the rows where cigs is not zero.
Your other ones didn't work for the following reasons:
mean(bwght$cigs| bwght$cigs>0)
This is effectively a logical comparison. You're asking for the TRUE / FALSE result of bwght$cigs OR bwght$cigs>0, and then taking the mean on it. I'm not totally sure, but I think R can't even take data typed as logical for the mean() function.
mean(bwght$cigs>0 | bwght$cigs=TRUE)
Same problem. You use the | sign, which returns a logical, and R is trying to take the mean of logicals.
if(bwght$cigs > 0){sum(bwght$cigs)}
By any chance, were you a SAS programmer originally? This looks like how I used to type at first. Basically, if() doesn't work the same way in R as it does in SAS. In that example, you are using bwght$cigs > 0 as the if condition, which won't work because R will only look at the first element of the vector resulting from bwght$cigs > 0. R handles looping differently from SAS - check out functions like lapply, tapply, and so on.
x <-as.numeric(bwght$cigs, rm="0")
mean(x)
I honestly don't know what this would do. It might work if rm="0" didn't have quotes...?
mean(bwght[bwght$cigs>0,"cigs"])
I found the statement failed, returning "argument is not numeric or logical: returning NA"
Converting to matrix solved this:
mean(data.matrix(bwght[bwght$cigs>0,"cigs"]))

Resources