I'm trying to run some diagnostics on a binary logistic regression model. Specifically, the marginal model plots. Unfortunately, I keep getting the "need finite 'xlim' values" error. The code below reproduces the issue. My model includes both numeric and categorical variables (which get converted to dummy variables in the model). Anyway, I know this error can occur when all values are NA, but that isn't the case for any of my data and I'm not sure whats going on.
set.seed(020275)
df <- data.frame(y=sample(c(0,1), 10, replace=TRUE),
cat=sample(c("Red", "Blue", "Green"), 10, replace=TRUE),
loc=sample(c("North", "South", "East", "West"), 10, replace=TRUE),
count=runif(10, 0, 10),
stringsAsFactors = FALSE)
glmModel <- glm(y ~ cat + loc + count, family=binomial(), data=df)
glmModel
library(car)
marginalModelPlots(glmModel)
I get the following error:
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
Looking for some ideas/suggestions/guidance on how to deal with this.
It appears the character data typed vectors (cat and loc in the example above) are not compatible with marginalModelPlots, at least for the version of the car package I'm currently using (2.1-1). I found I could use the terms parameter to limit the plots to a subset of the variables while also including the Linear Predictor plot (as shown below).
marginalModelPlots(glmModel, terms= ~ count)
Related
I am working with R. I am trying to follow the code from a previous stackoverflow post over here: Kullback-Leibler distance between 2 samples
In particular, I am trying to determine the "distance" between two datasets:
#load library
library(FNN)
library(dplyr)
#create two data sets
df = iris
data1 = sample_n(df, 20)
data2 = sample_n(df, 20)
#plot KL divergence
plot(KLx.dist(data1,data2))
However, this produces the following error:
Error in plot.window(...) : need finite 'ylim' values
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
Does anyone know why this error is being produced?
Thanks
According to the KLx.dist documentation, this funciton requires data matrix as input. In the iris dataset, we then need to remove the Species column which is a factor variable. Removing the Species column before sampling would solve the problem :
data(iris)
library(FNN)
library(dplyr)
#create two data sets
df = iris[,1:4]
data1 = sample_n(df, 20)
data2 = sample_n(df, 20)
#plot KL divergence
plot(KLx.dist(data1,data2))
Trying to plot some data in R - I am a basic user and teaching myself. However, whenever I try to plot, it fails, and I am not sure why.
> View(Pokemon_BST)
> Pokemon_BST <- read.csv("~/Documents/Pokemon/Pokemon_BST.csv")
> View(Pokemon_BST)
> plot("Type_ID", "Gender_ID")
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
5: In min(x) : no non-missing arguments to min; returning Inf
6: In max(x) : no non-missing arguments to max; returning -Inf
This is my code, but I thought it might be an issue with my .csv file? I have attributed numbers to the "Type_ID" and "Gender_ID" columns. Type_ID has values between 1-20; Gender_ID has 1 for male, 2 for female, and 3 for both. I should state that both ID columns are just made of numeric values. Nothing more.
I then tried using barplot function. This error occurred:
> barplot("Gender_ID", "Type_ID")
Error in width/2 : non-numeric argument to binary operator
In addition: Warning message:
In mean.default(width) : argument is not numeric or logical: returning NA
There are no missing values, no characters within these columns, nothing that SHOULD cause an error according to my basic knowledge. I am just not sure what is going wrong.
To me it seems as you are giving the plot function the wrong inputs.
For the x and y axis plot expects numeric values and you are only providing a single string. The function does not know that the "Type_ID" and "Gender_ID" come from the Pokemon_BST data frame.
To reach your data you must tell R where the object comes from. You do this by opening square brackets behind the object you want to access and write the names of the objects to be accessed into it.
View(Pokemon_BST)
Pokemon_BST <- read.csv("~/Documents/Pokemon/Pokemon_BST.csv")
# Refer to the object
plot(Pokemon_BST["Type_ID"], Pokemon_BST["Gender_ID"])
# Sould also work now
barplot(Pokemon_BST["Gender_ID"], Pokemon_BST["Type_ID"])
See also here for a introduction on subsetting in R
The problem is how you're passing the values to the plot function. In your code above, "Gender_ID" is just some string and the plot function doesn't know what to do with that. One way to plot your values is to pass the vectors Pokemon_BST$Gender_ID and Pokemon_BST$Type_ID to the function.
Here's a sample dataframe with the plot you were intending.
Pokemon_BST <- data.frame(
Type_ID = sample(1:20, 10, replace = TRUE),
Gender_ID = sample(1:3, 10, replace = TRUE))
plot(Pokemon_BST$Gender_ID, Pokemon_BST$Type_ID)
I have asked this question elsewhere
I want to verify if my data follows a normal or any other type of distribution (like cauchy for example).
I really want to understand how to use qqplot =]
Even though the qqnorm works well:
qqnorm(data);qqline(data)
When I try the qqplot:
qqplot(data, "normal")
qqplot(data, "cauchy")
it generates an error:
Error in plot.window(...) : valores finitos são necessários para 'ylim'
In addition it creates the warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
You should read the documentation for qqplot. The second argument to qqplot should be another data vector, not a string. If you want to compare your data to a specific distribution, you can follow the technique used in qqnorm and generate a vector of quantiles for any distribution. Let's say x is the data we want to plot:
x <- rcauchy(5000)
Since x has 5000 elements, we want to generate 5000 evenly-spaced quantiles from our target distribution. First, let's try the normal distribution:
y.norm <- qnorm(ppoints(length(x)))
qqplot(x, y.norm)
Now let's try the same thing with the Cauchy distribution.
y.cauchy <- qcauchy(ppoints(length(x)))
qqplot(x, y.cauchy)
(Note that the Cauchy distribution in particular will not behave very well in QQ plots, so this may not actually help you with your real goal.)
In using R software I face the error term which is:
dmodel1=list()
for(i in 1:12){
sun.st = i
data1 = fdata(file1, nd = nd, sun.st)
d1= ffit(data1,order=2)
dmodel1[[i]]=d1
fplot(d1, plot.year,label=colnames(data1)[2],ylab=1)
cat(colnames(data1)[2], "\n")
}
the error is:
Error in plot.window(...) : need finite 'ylim' values
In addition: Warning messages:
1: In model.response(mf, "numeric") :
using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : - not meaningful for factors
3: In Ops.factor(x[, 2], dataff1f) : - not meaningful for factors
4: In max(dataFF2[, 3], na.rm = TRUE) :
no non-missing arguments to max; returning -Inf
So, would you please help me to overcome this?
Thanks
Bah
Can you reproduce the error if you use some random values in your variables? Without the data is difficult to assess whether the data is somehow "corrupted" (or at least inadequate for what you want to do).
I need to apply the smote-algorithm to a data set, but can't get it to work.
Example:
x <- c(12,13,14,16,20,25,30,50,75,71)
y <- c(0,0,1,1,1,1,1,1,1,1)
frame <- data.frame(x,y)
library(DMwR)
smotedobs <- SMOTE(y~ ., frame, perc.over=300)
This gives the following error:
Error in scale.default(T, T[i, ], ranges) : subscript out of bounds
In addition: Warning messages:
1: In FUN(newX[, i], ...) :
no non-missing arguments to max; returning -Inf
2: In FUN(newX[, i], ...) : no non-missing arguments to min; returning Inf
Would appriciate any kind of help or hints.
SMOTE has a bug in OS Win7 32 bit,
It assume the target variable in the parameter 'form' is the last column in the dataset, the following code will explain
library(DMwR)
data(iris)
# data <- iris[, c(1, 2, 5)] # SMOTE work
data <- iris[, c(2, 5, 1)] # SMOTE bug
data$Species <- factor(ifelse(data$Species == "setosa", "rare", "common"))
head(data)
table(data$Species)
newData <- SMOTE(Species ~., data, perc.over=600, perc.under=100)
table(newData$Species)
It will show following message
Error in colnames<-(*tmp*, value = c("Sepal.Width", "Species", "Sepal.Length" :
'names' attribute [3] must be the same length as the vector [2]
In Win7 64bit, the order problem does not occur!!
I don't have the full answer. I can provide another clue though:
If you convert 'y' to a factor, SMOTE will return without error - but the synthesized observations have NA values for x.
There is a bug in the SMOTE code. It assumes the y function it's being fed is already a factor variable, currently it does not handle the edge case of non-factors. Make sure to cast to a factor before calling the method.