Adding data labels to boxplot in R - r

Here's my code:
iFacVector <- as.factor(c(1,1,1,1,10,1,1,1,12,9,9,1,10,12,1,9,5))
iTargetVector <- c(2,1,0,1,6,9,15,1,8,0,1,2,1,1,9,12,1)
bp <- plot(iFacVector,iTargetVector)
text(bp,tapply(iTargetVector,iFacVector,median),labels=tapply(iTargetVector,iFacVector,median),cex=.8)
I am getting the following (classic R) error:
Error in xy.coords(x, y, recycle = TRUE) :
(list) object cannot be coerced to type 'double'
The vectors I am passing are numeric so I don't know what the problem is. I have tried unlist() and as.vector(). I have also tried using bp$stats[3,] as the labels.

The help for text gives the arguments as
text(x, ...)
so the first argument in your code, bp, is being treated as the x coordinate for where to place the text. You can just leave off the bp and get better behavior. You might also want to add pos=3 to get a nicer placement of the text.

Related

R : Plot many boxplot in the same graph using dataframe

Hello everyone and thank you for helping me with R.
i have a 39cols * 168rows matrix which looks like this :
and i want to plot boxplot (1 for each row) in the same graph.
Two hours of intense research on how to make that and i still have no clue.
What i tried (f is the read.csv i got ) :
boxplot(x = as.list(as.data.frame(f)))
qp <- boxplot(x = as.list(as.data.frame(f)))
rn <- as.numeric(rownames(f))
plot(qp,rn)
and i've got :
Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' and 'y' lengths differ
even if i do not know if the result of the plot was the thing i wanted.
If you want one box per row, you need to transpose the matrix before passing it to boxplot:
boxplot(t(f))
This will then do the right thing — provided you’ve read in your data correctly. In your image, this isn’t the case: the header columns are (incorrectly) part of the data. Be sure to pass header = TRUE to your data reading function to fix this.

R qqplot argument "y" is missing error

I am relatively new to R and I am struggling with a error messages related to qqplot. Some sample data are at the bottom. I am trying to do a qqplot on some azimuth data, i.e. like compass directions. I've looked around here and the ?qqplot R documentation, but I don't see a solution I can understand in either. I don't understand the syntax for the function or the format the data are supposed to be in, or probably both. I First I tried loading the data as a single column of values, i.e. just the "Azimuth" column.
azimuth <- read.csv(file.choose(), header=TRUE)
qqplot(azimuth$Azimuth)
returns the following error,
Error in sort(y) : argument "y" is missing, with no default
Then I tried including the corresponding dip angles along with the azimuth data and received the same error. I also tried,
qqnorm(azimuth)
but this returned the following error,
Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' and 'y' lengths differ
Dataframe "azimuth":
Azimuth Altitude
23.33211466 -6.561729793
31.51267873 4.801537153
29.04577711 5.24504954
23.63450905 14.03342708
29.12535459 7.224141678
20.76972007 47.95686329
54.89253987 4.837417689
56.57958227 13.12587996
13.09845182 -7.417776178
26.45155154 31.83546988
29.15718557 25.47767069
28.09084746 14.61603384
28.93436865 -1.641785416
28.77521371 17.30536039
29.58690392 -2.202076058
0.779859221 12.92044019
27.1359178 12.20305106
23.57084707 11.97925859
28.99803063 3.931326877
dput() version:
azimuth <-
structure(list(Azimuth = c(23.33211466, 31.51267873, 29.04577711,
23.63450905, 29.12535459, 20.76972007, 54.89253987, 56.57958227,
13.09845182, 26.45155154, 29.15718557, 28.09084746, 28.93436865,
28.77521371, 29.58690392, 0.779859221, 27.1359178, 23.57084707,
28.99803063), Altitude = c(-6.561729793, 4.801537153, 5.24504954,
14.03342708, 7.224141678, 47.95686329, 4.837417689, 13.12587996,
-7.417776178, 31.83546988, 25.47767069, 14.61603384, -1.641785416,
17.30536039, -2.202076058, 12.92044019, 12.20305106, 11.97925859,
3.931326877)), .Names = c("Azimuth", "Altitude"), class = "data.frame", row.names = c(NA, -19L))
Try:
qqPlot
with a capital P.
Maybe you want to create the graph.
Have you ever tried?
qqnorm(azimuth$Azimuth);qqline(azimuth$Azimuth)
It seems that the qqplot function takes two input parameters, x and y as follows:
qqplot(x, y, plot.it = TRUE, xlab = "your x-axis label", ylab="your y-axis label", ...)
When you made your call as given above, you only gave one vector, hence R complained the y argument was missing. Check you input data set and see if you can find what x and y should be for your call to qqplot.

Cannot coerce type 'closure' to vector of type 'double' (polygon)

plot( dnorm , col='white')
polygon( dnorm, col='grey' )
returns the above error message, not on plot, but on polygon.
body(polygon) %>% grep(pattern='numeric') finds only one occurrence on line 4, which doesn't seem to have anything to do with this error. So I'm at a loss as to where to look for the source of the problem.
plot has a function method, whereas polygon does not. From ?plot:
x: the coordinates of points in the plot. Alternatively, a single plotting structure, function or any R object with a plot method can be provided.
Additionally, from ?plot.function, the S3 method to plot functions:
## S3 method for class 'function'
plot(x, y = 0, to = 1, from = y, xlim = NULL, ylab = NULL, ...)
This explains why you get a plot with values from 0 to 1 with plot when you pass dnorm as an argument.
Note functions like dnorm are also known as closures. This explains why you get that error with polygon. Since polygon does not accept functions as an argument, it tries to convert dnorm, a closure, to a vector, but that isn't a valid conversion.
The error in polygon is actually happening in the as.double call within xy.coord:
> polygon(dnorm)
Error in as.double(y) :
cannot coerce type 'closure' to vector of type 'double'
> traceback()
2: xy.coords(x, y)
1: polygon(dnorm)
Note as.double doesn't register in the trace stack because it is a primitive. By looking at the source of xy.coords, you can see where the error is happening. To semi-confirm:
> as.double(dnorm)
Error in as.double(dnorm) :
cannot coerce type 'closure' to vector of type 'double'
dnorm(-3:3) actually produces a numeric vector, which is why that works with polygon.
The call to plot will resolve to a variety of default methods for different types of objects. See methods(plot) for a list in your environment. For dnorm it is plot.function, which takes the function as an argument and provides a set of inputs into the function. Incidentally this will also work with rnorm because plot.function provides a default argument of n=101.
A more common alias for plot.function is curve.
curve(dnorm, col="grey")
The polygon has no such analogous method for various types of objects.
You need to polygon( dnorm(-3:3) ) or whatever the xlim limits are. polygon lacks a method for treating functions (although plot has one).

How does the curve function in R work? - Example of curve function

How does the following code work? I got the example when I was reading the help line of R ?curve. But i have not understood this.
for(ll in c("", "x", "y", "xy"))
curve(log(1+x), 1, 100, log = ll,
sub = paste("log= '", ll, "'", sep = ""))
Particularly , I am accustomed to numeric values as arguments inside the for-loop as,
for(ll in 1:10)
But what is the following command saying:
for(ll in c("","x","y","xy"))
c("","x","y","xy") looks like a string vector? How does c("","x","y","xy") work inside curve
function as log(1+x)[what is x here? the string "x"? in c("","x","y","xy")] and log=ll ?
Apparently, there are no answers on stack overflow about how the curve function in R works and especially about the log argument so this might be a good chance to delve into it a bit more (I liked the question btw):
First of all the easy part:
c("","x","y","xy") is a string vector or more formally a character vector.
for(ll in c("","x","y","xy")) will start a loop of 4 iterations and each time ll will be '','x','y','xy' respectively. Unfortunately, the way this example is built you will only see the last one plotted which is for ll = 'xy'.
Let's dive into the source code of the curve function to answer the rest:
First of all the what does the x represent in log(1+x)?
log(1+x) is a function. x represents a vector of numbers that gets created inside the curve function in the following part (from source code):
x <- exp(seq.int(log(from), log(to), length.out = n)) #if the log argument is 'x' or
x <- seq.int(from, to, length.out = n) #if the log argument is not 'x'
#in our case from and to are 1 and 100 respectively
As long as the n argument is the default the x vector will contain 101 elements. Obviously the x in log(1+x) is totally different to the 'x' in the log argument.
as for y it is always created as (from source code):
y <- eval(expr, envir = ll, enclos = parent.frame()) #where expr is in this case log(1+x), the others are not important to analyse now.
#i.e. you get a y value for each x value on the x vector which was calculated just previously
Second, what is the purpose of the log argument?
The log argument decides which of the x or y axis will be logged. The x-axis if 'x' is the log argument, y-axis if 'y' is the log argument, both axis if 'xy' is the log argument and no log-scale if the log argument is ''.
It needs to be mentioned here that the log of either x or y axis is being calculated in the plot function in the curve function, that is the curve function is only a wrapper for the plot function.
Having said the above this is why if the log argument is 'x' (see above) the exponential of the log values of the vector x are calculated so that they will return to the logged ones inside the plot function.
P.S. the source code for the curve function can be seen with typing graphics::curve on the console.
I hope this makes a bit of sense now!

r - Add text to each lattice histogram with panel.text but has error "object x is missing"

In the following R code, I try to create 30 histograms for the variable allowed.clean by the factor zip_cpt(which has 30 levels).
For each of these histograms, I also want to add mean and sample size--they need to be calculated for each level of the factor zip_cpt. So I used panel.text to do this.
After I run this code, I had error message inside each histogram which reads "Error using packet 21..."x" is missing, with..." (I am not able to read the whole error message because they don't show up in whole). I guess there's something wrong with the object x. Is it because mean(x) and length(x) don't actually apply to the data at each level of the factor zip_cpt?
I appreciate any help!
histogram(~allowed.clean|zip_cpt,data=cpt.IC_CAB1,
type='density',
nint=100,
breaks=NULL,
layout=c(10,3),
scales= list(y=list(relation="free"),
x=list(relation="free")),
panel=function(x,...) {
mean.values <-mean(x)
sample.n <- length(x)
panel.text(lab=paste("Sample size = ",sample.n))
panel.text(lab=paste("Mean = ",mean.values))
panel.histogram(x,col="pink", ...)
panel.mathdensity(dmath=dnorm, col="black",args=list(mean=mean(x, na.rm = TRUE),sd=sd(x, na.rm = TRUE)), ...)})
A discussion I found online is helpful for adding customized text (e.g., basic statistics) on each of the histograms:
https://stat.ethz.ch/pipermail/r-help/2007-March/126842.html

Resources