error labelling axis of plot using Ecdf - r

I am attempting to plot a graph using the code below:
Require(Hmisc)
Ecdf(ceac_primary,xlab="axis label",xlim=c(5000,50000),q=c(0.9,0.1),
ylab="Probability of Success",main="CEAC")
Where ceac_primary is a data frame with 1 variable of 90k observations.
When I include the 'xlab="axis label"' I keep getting the following error:
Error in Ecdf.default(v, group = group, weights = weights, normwt = normwt, :
formal argument "xlab" matched by multiple actual arguments
However if I exclude the x axis label part of the code, it plots the graph fine.
Is this a known problem, and if so, are there alternative ways to plot an x axis label?
Thanks

Digging around in the source code for Ecdf.data.frame (the method that is called when passing a data.frame to Ecdf), it looks like that function creates an object that is later passed to the xlab argument. Therefore, xlab is not expected as a user-supplied argument when running Ecdf with a data.frame. Here's the code that creates the object lab that gets passed to xlab within Ecdf.data.frame:
lab <- if (vnames == "names")
nam[j]
else label(v, units = TRUE, plot = TRUE, default = nam[j])
Then Ecdf is called with xlab = lab, but also any arguments in the elipses of Ecdf.data.frame are also passed to Ecdf. Since xlab is not a formal argument of Ecdf.data.frame, this is why you get your error.
To get around it, try either of the following:
Convert your data.frame to a vector of the appropriate class (numeric, I presume), and then run
Ecdf(ceac_primary_Vec, xlab = "axis label")
Or, you can create a label for the one column in your data.frame using the label function in the Hmisc package. If that column is called myCol, you can run
label(ceac_primary$myCol) <- "axis label"
Ecdf(ceac_primary)
And that should get your axis label printing correctly.

Related

Error in axis(side = side, at = at, labels = labels, ...) : invalid value specified for graphical parameter "pch"

I have applied DBSCAN algorithm on built-in dataset iris in R. But I am getting error when tried to visualise the output using the plot( ).
Following is my code.
library(fpc)
library(dbscan)
data("iris")
head(iris,2)
data1 <- iris[,1:4]
head(data1,2)
set.seed(220)
db <- dbscan(data1,eps = 0.45,minPts = 5)
table(db$cluster,iris$Species)
plot(db,data1,main = 'DBSCAN')
Error: Error in axis(side = side, at = at, labels = labels, ...) :
invalid value specified for graphical parameter "pch"
How to rectify this error?
I have a suggestion below, but first I see two issues:
You're loading two packages, fpc and dbscan, both of which have different functions named dbscan(). This could create tricky bugs later (e.g. if you change the order in which you load the packages, different functions will be run).
It's not clear what you're trying to plot, either what the x- or y-axes should be or the type of plot. The function plot() generally takes a vector of values for the x-axis and another for the y-axis (although not always, consult ?plot), but here you're passing it a data.frame and a dbscan object, and it doesn't know how to handle it.
Here's one way of approaching it, using ggplot() to make a scatterplot, and dplyr for some convenience functions:
# load our packages
# note: only loading dbscacn, not loading fpc since we're not using it
library(dbscan)
library(ggplot2)
library(dplyr)
# run dbscan::dbscan() on the first four columns of iris
db <- dbscan::dbscan(iris[,1:4],eps = 0.45,minPts = 5)
# create a new data frame by binding the derived clusters to the original data
# this keeps our input and output in the same dataframe for ease of reference
data2 <- bind_cols(iris, cluster = factor(db$cluster))
# make a table to confirm it gives the same results as the original code
table(data2$cluster, data2$Species)
# using ggplot, make a point plot with "jitter" so each point is visible
# x-axis is species, y-axis is cluster, also coloured according to cluster
ggplot(data2) +
geom_point(mapping = aes(x=Species, y = cluster, colour = cluster),
position = "jitter") +
labs(title = "DBSCAN")
Here's the image it generates:
If you're looking for something else, please be more specific about what the final plot should look like.

Is there a way to remove points from a Mclust classification plot in R?

I am trying to plot the GMM of my dataset using the Mclust package in R. While the plotting is a success, I do not want points to show in the final plot, just the ellipses. For a reference, here is the plot I have obtained:
GMM Plot
But, I want the resulting plot to have only the ellipses, something like this:
GMM desired plot
I have been looking at the Mclust plot page in: https://rdrr.io/cran/mclust/man/plot.Mclust.html and looking at the arguments of the function, I see there is a scope of adding other graphical parameters. Looking at the documentation of the plot function, there is a parameter called type = 'n' which might help to do what I want but when I write it, it produces the following error:
Error in plot.default(data[, 1], data[, 2], type = "n", xlab = xlab, ylab = ylab, :
formal argument "type" matched by multiple actual arguments
For reference, this is the code I used for the first plot:
library(mclust)
Data1_2 <- Mclust(Data, G=15)
summary(Data1_2, parameters = TRUE, classification = TRUE)
plot(Data1_2, what="classification")
The code I tried using for getting the graph below is:
Data1_4 <- Mclust(Data, G=8)
summary(Data1_4, parameters = TRUE, classification = TRUE)
plot(Data1_4, what="classification", type = "n")
Any help on this matter will be appreciated. Thanks!
If you look under the source code of plot.Mclust, it calls plot.Mclust.classification which in turn calls coordProj for the dot and ellipse plot. Inside this function, the size is controlled by the option CEX= and shape PCH=.
So for your purpose, do:
library(mclust)
clu = Mclust(iris[,1:4], G = 3, what="classification")
plot(clu,what="classification",CEX=0)

Trying to make a histogram with two variables and keep coming up with 'x' must be numeric

I am trying to make a good histogram with two variables, height, and free throw percent. I've​ imported the data using excel and used the hist function but keep coming up with 'x' must be numeric.
I can read the data and create a table.
I've tried hist(height$freethrow)
and hist(shortkings)
This is what my data looks like
Second part of my data
hist(shortkings)
Error in hist.default(shortkings) : 'x' must be numeric
hist(shortkings, xlab = Height, ylab = Freethrow, main = Freethrow)
Error in hist.default(shortkings, xlab = Height, ylab = Freethrow, main = Freethrow) :
'x' must be numeric
I would like to create a histogram that shows distribution.
What happens if you run class(shortkings$Height)
If you see that it's non-numeric, then you can do the following and re-run the hist() function
shortkings$Height <- as.integer(shortkings$Height)

Multiple histograms with title and mean as a line?

I'm struggeling with the histogram function in my exploratory analysis. I would like to run a couple of variables in my dataset through a histogram function and for each add the title and a line at the arithmetic mean. This is how far I've got (but the main title is still missing):
histo.abline <-function(x){
hist(x)
abline(v = mean(x, na.rm = TRUE), col = "blue", lwd = 4)}
sapply(dataset[c(7:10)], histo.abline)
I tried to add a main argument in the histogram function but it just doesn't pick the right variable name of my dataset vector. When I put main=x there, it says returns NULL for each variable. Colnames, names and other functions didn't work either. Could you help me?
you can try to do it with ggplot:
library(ggplot)
histo.abline <-function(dataset,colnum){
p<-ggplot(dataset,aes(dataset[,colnum]))+geom_histogram(bins=5,fill=I("blue"),col=I("red"), alpha=I(.2))+
geom_vline(xintercept = mean(dataset[,colnum], na.rm = TRUE))+xlab(as.character(names(dataset)[colnum]))
return(p)
}
since you have not provided data lets work with mtcars and create a list of histograms
dataset=mtcars
listOfHistograms<-lapply(3:7,function(x) histo.abline(dataset,x))
your list has 5 histograms that you can plot for instance the first by:
print(listOfHistograms[[1]])
More histogram options for ggplot here: https://www.r-bloggers.com/how-to-make-a-histogram-with-ggplot2/
hope this helps
EDIT: Multiple Plot in one graph
One way to do it is through cowplot library:
library(cowplot)
plot_grid(plotlist=listOfHistograms[1:4])

Change x and y labels on a gbm partial plot

I am having trouble changing the x and y labels on a partial plot for a gbm model. I need to rename them for the journal article.
I read this in and create the plot as follows:
library(gbm)
final<- readRDS(final_gbm_model)
summary(final, n.trees=final$n.trees)
Here is the summary output:
var rel.inf
ProbMn50ppb ProbMn50ppb 11.042750
ProbDOpt5ppm ProbDOpt5ppm 7.585275
Ngw_1975 Ngw_1975 6.314080
PrecipMinusETin_1971_2000_GWRP PrecipMinusETin_1971_2000_GWRP 4.988598
N_total N_total 4.776950
DTW60YrJurgens DTW60YrJurgens 4.415016
CVHM_TextZone CVHM_TextZone 4.225048
RiverDist_NEAR RiverDist_NEAR 4.165035
LateralPosition LateralPosition 4.036406
CAML1990_natural_water CAML1990_natural_water 3.720303
PctCoarseMFUpActLayer PctCoarseMFUpActLayer 3.668184
BioClim_BIO12 BioClim_BIO12 3.561071
MFDTWSpr2000Faunt MFDTWSpr2000Faunt 3.383900
PBot_krig PBot_krig 3.362289
WaterUse2 WaterUse2 3.291040
AVG_CLAY AVG_CLAY 3.280454
Age_yrs Age_yrs 3.144734
MFVelSept2000 MFVelSept2000 3.064030
AVG_SILT AVG_SILT 2.882709
ScreenLength ScreenLength 2.683542
HydGrp_C HydGrp_C 2.666106
AVG_POR AVG_POR 2.563147
MFVelFeb2000 MFVelFeb2000 2.505106
HiWatTabDepMin HiWatTabDepMin 2.421521
RechargeAnnualmmWolock RechargeAnnualmmWolock 2.252706
I can create a partial dependence plot as follows:
plot(final,"ProbMn50ppb",n.trees=final$n.trees)
But if I try to set the label arguments I get the following error:
plot(final,"ProbMn50ppb",n.trees=final$n.trees,ylab="LNNO3")
Error in plot.default(X$X1, X$y, type = "l", xlab = x$var.names[i.var], :
formal argument "ylab" matched by multiple actual arguments
How can I change the y and x axis labels?
The plot.gbm function passes its own name to the generic plot function so the two are colliding. So you will not be able to customize the plot the way you want in that mode. But the authors did provide an alternative when you set return.grid=TRUE. Instead of building a plot, it will output the data itself. You can then use that for any plot including ggplot2.
plotdata <- plot(gbm1, return.grid=TRUE)
plot(plotdata, type="l", ylab="ylab", xlab="xlab")
Example data from help(gbm)
You can also change the gbm object itself before plotting (or in a function):
your_gbm_obj$var.names[index] = "axis label"

Resources