I have a time series in R Studio. Now I want to calculate the log() of this series. I tried the following:
i <- (x-y)
ii <- log(i)
But then I get the following: Warning message: In log(i): NaNs produced
To inspect this I used: table(is.nan(ii)) which gives me the following output:
FALSE TRUE
2480 1
So I assume, that there is 1 NaN in my time series now. My question is: what code can I use, that R shows me for which observation a NaN was produced?
Here is a small data sample: i <- c(9,8,4,5,7,1,6,-1,8,4)
Btw how do I type mathematical formulas in stackoverflow, for example for log(x)? Many thanks
As I said in my comment, to know which observation generated the NaN, you can use function which:
i <- c(9,8,4,5,7,1,6,-1,8,4)
which(is.nan(log(i))) # 8
Use the test to subset your original vector with the values that produce NaN:
> i <- c(9,8,4,5,-7,1,6,-1,8,4,Inf,-Inf,NA)
> i[which(is.nan(log(i)))]
[1] -7 -1 -Inf
Warning message:
In log(i) : NaNs produced
Here you see that -7, -1, and -Inf produced NaN.
Note that log(NA) is not NaN, its NA, which is a different sort of not-numberness.
Related
I have a data frame with 25 Variables. I want to remove the outliers from it.
I have searched SO forum and found that there are custom kind of solutions people are proposing for different posts.
Is there some standard R function that removes the outliers from the data?
Here are two functions I found from search. How good they are OR is there some standard same kind of better solution to achieve this in R in any package.
OR a function which I pass one column as argument & it returns outliers removed data.
remove_outliers:
Link 1
Removing outliers - quick & dirty:
Link 2
EDIT
The data in my data frame contains continuous data from two sources i.e. weather and ground. From weather, the predictors are temperature, humidity, wind, rain, solar radiation. And from ground are groundwater and soil moisture. I want to find a relation between soil moisture and other variables. I am analysing data using different models. Now I want to se the results after removing the outliers from data.
EDIT
I used and edited code from one of the tutorials I added reference above. It's working fine when there are some outliers in the data. But it raises error when there are no. How to correct this.
Here is code:
outlier_rem<-Data_combined #data-frame with 25 var, few have outliers
#removong outliers from the column
outliers <- boxplot(outlier_rem$var1, plot=FALSE)$out
#print(outliers)
#ol<-outlier_rem[which(outlier_rem$var1 %in% outliers),]
ol<-outlier_rem[-which(outlier_rem$var1 %in% outliers),]
dim(ol)
boxplot(ol)
Here is error msg when ol returns 0 vale.
> dim(ol)
[1] 0 25
> boxplot(ol)
no non-missing arguments to min; returning Infno non-missing arguments to max; returning -InfError in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs) :
need finite 'ylim' values
I use the Chebyshev's inequality as a criterion for dropping extreme values. It has the advantage that it holds true in many probablility distributions. The rule states tha no more than 1/k^2 of the values can be more than k standard deviations away from the mean. For example:
> x <- rchisq(1000, 13)
>
> mean(x)
[1] 12.83906
> sd(x)
[1] 4.93234
>
> Ndesv <- 5
>
> x[x > (mean(x) + Ndesv * sd(x))]
[1] 38.7575
>
> Conf <- (1 - 1 / Ndesv^2)
> print(Conf)
[1] 0.96
>
Hope it helps you.
Gamma function should not take any negative value as an argument. Look at the code below where strange thing happens. Is this some problem with R?
I was using function optim to optimize some function containing:
gamma(sum(alpha))
with respect to alpha. R returns negative alpha.
> gamma(sum(alpha))
[1] 3.753+14
>sum(alpha)
[1] -3
gamma(-3)
[1] NaN
Warning message:
In gamma(-3) NaN's produced.
Can somebody explain? Or any suggestion for the optimization?
Thanks!
Gamma function is "not defined" at negative integer argument values so R returns Not a Number (NaN). The reason of the "strange" behaviour is decimal representation of numbers in R. In case the number differs from the nearest integer not very much, R rounds it during printing (in fact when you type alpha, R is calling for print(alpha). Please see the examples of such a behaviour below.
gamma(-3)
# [1] NaN
# Warning message:
# In gamma(-3) : NaNs produced
x <- -c(1, 2, 3) / 2 - 1e-15
x
# [1] -0.5 -1.0 -1.5
sum(x)
# [1] -3
gamma(sum(x))
# [1] 5.361428e+13
curve(gamma, xlim = c(-3.5, -2.5))
Please see a graph below which explains the behaviour of gamma-function near negative integers:
I am doing logistics regression and want remove outliers with help of cooks d.So i was trying to cbind my dataset and cooks d values.
i have removed missing values so thats not an issue.I dont have (x observation deleted due to missingness) line in my summary.
following is my code-
fit<-glm(CHURN~CHILDREN+CREDITA+CREDITAA+CREDITB+ CREDITC+CREDITDE+CREDITGY+ CREDITZ+PRIZMRUR+
PRIZMUB+PRIZMTWN+REFURB+WEBCAP+TRUCK+RV+OCCPROF+OCCCLER+ OCCCRFT+OCCSTUD+OCCHMKR+
OCCRET+ OCCSELF+OWNRENT+MARRYUN+MARRYYES+MARRYNO+ MAILORD+MAILRES+MAILFLAG+TRAVEL+PCOWN+
CREDITCD+ NEWCELLY+NEWCELLN+INCMISS +MCYCLE+SETPRCM + REVENUE +MOU+RECCHRGE+
DIRECTAS+OVERAGE+ROAM+CHANGEM+CHANGER+DROPVCE+BLCKVCE+ UNANSVCE+CUSTCARE+THREEWAY+
MOUREC+OUTCALLS+INCALLS+PEAKVCE+OPEAKVCE+DROPBLK+ CALLFWDV+CALLWAIT+MONTHS+UNIQSUBS+
ACTVSUBS+PHONES+MODELS+EQPDAYS+AGE1+AGE2+REFER+INCOME+ CREDITAD+SETPRC,data = mydata1,
family = binomial(logit))
summary(fit)
cd <- cooks.distance(fit)
mydata2<-cbind(mydata1,cd)
i get the error-
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 40000, 38941
My dataset(mydata1) has 40000 values and cd has 38941.
Why is it happening?
Building on what JDL in comments suggested to you this is "probably" due to missing or inappropriate data.
To explain I have slightly altered the help example for the cooks.distance function by editing the yi variable to have a single NA value.
xi <- 1:5
yi <- c(0,2,14,19,NA) # number of mice responding to dose xi
mi <- rep(40, 5) # number of mice exposed
glmI <- glm(cbind(yi, mi -yi) ~ xi, family = binomial)
summary(glmI)
If you run this you can note that all the code still works.. However if you run the next line of that help example instead of getting 5 output values the same length as xi and yi you will get 4 due to the NA value in yi.
signif(cooks.distance(glmI), 3)
1 2 3 4
0.311 0.258 1.430 13.100
You might possibly get similar problems if there are Infs or other impossible values that "break" the glm fit. Note that if you look at summary(glmI) it contains the line:
(1 observation deleted due to missingness)
Can someone please explain what I am doing wrong here. I want to
find a confidence interval for an average response of my variable
"list1." R has an example online using the 'faithful' dataset and it
works fine. However, whenever I try to find a confidence/prediction
interval, I ALWAYS get this error message. I have been at this for 5
hours and tried a million different things, nothing works.
> list1 <- c(1,2,3,4,5) #first data set
> list2 <- c(2,4,5,6,7) # second data set
> frame <- data.frame(list1,list2) # made a data.frame object
> reg <- lm(list1~list2,data=frame) # regression
> newD = data.frame(list1 = 2.3) #new data input for confidence/prediction interval estimation
> predict(reg,newdata=newD,interval="confidence")
fit lwr upr
1 0.7297297 -0.08625234 1.545712
2 2.3513514 1.88024388 2.822459
3 3.1621622 2.73210185 3.592222
4 3.9729730 3.45214407 4.493802
5 4.7837838 4.09033237 5.477235
Warning message:
'newdata' had 1 row but variables found have 5 rows #Why does this keep happening??
The problem is that you are trying to pass in a new independent variable for prediction, but the name of that predictor matches the dependent variable from the initial model. The formula syntax in the regression is y ~ x. When you use the predict() function, you can pass new independent (x) variables. See the Details section of ?predict for more details.
This however seems to work:
newD2 = data.frame(list2 = 2.3) #note the name is list2 and not list1
predict(reg, newdata = newD2, interval = "confidence")
---
fit lwr upr
1 0.972973 0.2194464 1.7265
I'm trying to integrate the poisson distribution (dpois) in R but I get an incorrect answer (0 with absolute error 0) and 21 warnings. I don't understand how R is digesting my simple meal and why it pukes out 21 warnings.
dpoisd1 <- function(x) {dpois(x, 0.0001)}
dpoisd1(1:20)
integrate(dpoisd1, lower = 1, upper = 20)
it yields 0 with absolute error < 0 and some 21 warnings. I would really appreciate it if someone could show me my mistake(s).
Use warnings to have a look at the warnings:
warnings()
#Warning messages:
#1: In dpois(x, 1e-04) : non-integer x = 10.500000
#<snip>
The first parameter of dpois must be a non-negative integer (see help("dpois")). integrate passes non-integer values to it. In fact, it is not clear, what you want to calculate. You are trying to integrate a discrete density function. Possibly you want ppois, the cumulative distribution function.