Scatter plot Text labels in R - r

Below is the code.
stripchart(Age~Smoke, data = survey_clean_data , pch=16 , col = "blue", method = "jitter" ,main = "AGE VS SMOKE",na.rm = T)
I want to add labels to it like below image,
I tried several options.. but it is getting written on top of of other.
means = c(paste("mean_Age =",roumean(survey_clean_data[Smoke == "Heavy","Age"],na.rm =T)),
paste("mean_Age =",mean(survey_clean_data[Smoke == "Never","Age"],na.rm =T)),
paste("mean_Age =",mean(survey_clean_data[Smoke == "Regul","Age"],na.rm =T)),
paste("mean_Age =",mean(survey_clean_data[Smoke == "Occas","Age"],na.rm =T)))
text(50,survey_clean_data$Smoke,labels = means)
DATA: library(MASS) attach(survey)

There are a few problems with your code. The main thing is that you are sending text() four labels (the contents of means), but a number of y-coordinates equal to the number of data points, since you are sending it survey_clean_data. R tries to equalize these uneven vectors, resulting in the over plotting.
Instead, you might do (data are artificial since you didn't provide any):
stripchart(Age~Smoke, data = survey_clean_data , pch=16 , col = "blue", method = "jitter" ,main = "AGE VS SMOKE",na.rm = T)
means <- aggregate(Age~Smoke, data = survey_clean_data, FUN = mean) # mean of each category
means$y <- 1:4 # add y-coordinates for each category
with(means, text(50, Smoke, labels = sprintf('Mean Age = %0.1f', Age))) # plot text labels on top of stripchart
Result:

answer give by jdobres worked fine. The below is one more solution.
add ylim=c(0.8,4.2) parameter to the scatterplot. You can adjust these ranges from c(1,4) to c(0.8,4.2). The later one worked for me.
stripchart(Age~Smoke, data = survey_clean_data , pch=16 , col = 634, method = "jitter" ,main = "AGE VS SMOKE",na.rm = T,ylim=c(0.8,4.2))
With the below line you can adjust the vertical height of the text.
eg: +0.1, -0.1 etc
text(50,c(1:4)+0.1,means)

Related

Homogenizing scale for density plot

I am making a series of plots from a point pattern (PPP) with the density (kernel) function. I would like that the maximum plotted number is 200 in all cases and just the heatmap accordingly (two of the images only go up to 100). I have not been able to find a solution to this problem using the R base plot.
Microglia_Density <- density(Microglia_PPP, sigma =0.1, equal.ribbon = TRUE, col = topo.colors, main = "")
plot(Microglia_Density, main = "Microglia density")
Astrocytes_Density <- density(Astrocytes_PPP, sigma =0.1, equal.ribbon = TRUE, col = topo.colors, main = "")
plot(Astrocytes_Density, main = "Astrocytes density")
Neurons_Density <- density(Neurons_PPP, sigma =0.1, equal.ribbon = TRUE, col = topo.colors, main = "")
plot(Neurons_Density, main = "Neuronal density")
I would appreciate recommendations. Regards
Since we don’t have access to your data I simulate fake data in a square.
There are several options to do what you want. First you should know that
density() is a generic function, so when you invoke it on a ppp like
Microglia_PPP actually the function density.ppp() is invoked.
This function returns an im object (effectively a 2-d “image” of values).
You plot this with plot() which in turn calls plot.im(), so you should
read the help file of plot.im(), where it says that the argument col
controls the colours used in the plot. Either you can make a colour map
covering the range of values you are interested in and supply that, or if you
know that one of the images has the colour map you want to use you can save
it and reuse for the others:
library(spatstat)
set.seed(42)
Microglia_PPP <- runifpoint(100)
Neurons_PPP <- runifpoint(200)
Neurons_Density <- density(Neurons_PPP, sigma = 0.1)
Microglia_Density <- density(Microglia_PPP, sigma = 0.1)
my_colourmap <- plot(Neurons_Density, main = "Neuronal density", col = topo.colors)
plot(Microglia_Density, main = "Microglia density", col = my_colourmap)
Notice the colour maps are the same, but it only covers the range from
approximately 80 to 310. Any values of the image outside this range will not
be plottet, so they appear white.
You can make a colour map first and then use it for all the plots
(see help(colourmap)):
my_colourmap <- colourmap(topo.colors(256), range = c(40,315))
plot(Neurons_Density, main = "Neuronal density", col = my_colourmap)
plot(Microglia_Density, main = "Microglia density", col = my_colourmap)
Finally another solution if you want the images side by side is to make them
an imlist (image list) and use plot.imlist() with equal.ribbon = TRUE:
density_list <- as.imlist(list(Neurons_Density, Microglia_Density))
plot(density_list, equal.ribbon = TRUE, main = "")

How do you change the order of explanatory and response variables in a mosaic plot? [duplicate]

My current plot:
My desired plot (nevermind the variables s)
Specifically: explanatory variables on the bottom with an x-axis, response variables on the right, relative frequency and the y-axis on the left. I'll attach my R code below.
mosaictable <- matrix (c (3, 9, 22, 21), byrow = T, ncol = 2)
rownames (mosaictable) = c ("White", "Blue ")
colnames (mosaictable) = c ("Captured", "Not Captured")
mosaicplot ((mosaictable), sub = "Pigeon Color", ylab = "Relative frequency",
col = c ("firebrick", "goldenrod1"), font = 2, main = "Mosaic Plot of Pigeon Color and Their Capture Rate"
)
axis (1)
axis (4)
This particular flavor of mosaic display where you have a "dependent" variable on the y-axis and want to add corresponding annotation, is sometimes also called a "spine plot". R implements this in the spineplot() function. Also plot(y ~ x) internally calls spineplot() when both y and x are categorical.
In your case, spineplot() does almost everything you want automatically provided that you supply it with a nicely formatted "table" object:
tab <- as.table(matrix(c(3, 22, 9, 21), ncol = 2))
dimnames(tab) <- list(
"Pigeon Color" = c("White", "Blue"),
"Relative Frequency" = c("Captured", "Not Captured")
)
tab
## Relative Frequency
## Pigeon Color Captured Not Captured
## White 3 9
## Blue 22 21
And then you get:
spineplot(tab)
Personally, I would leave it at that. But if it is really important to switch the axis labels from left to right and vice versa, then you can do so by first suppressing axes = FALSE and then adding them manually afterwards. The coordinates for that need to be obtained from the marginal distribution of the first variable and the conditional distribution of the second variable given the first, respectively
x <- prop.table(margin.table(tab, 1))
y <- prop.table(tab, 1)[2, ]
spineplot(tab, col = c("firebrick", "goldenrod1"), axes = FALSE)
axis(1, at = c(0, x[1]) + x/2, labels = rownames(tab), tick = FALSE)
axis(2)
axis(4, at = c(0, y[1]) + y/2, labels = colnames(tab), tick = FALSE)

Calculate intersection point of two density curves in R

I have two vectors of 1000 values (a and b), from which I created density plots and histograms. I would like to retrieve the coordinates (or just the y value) where the two plots cross (it does not matter if it detects several crossings, I can discriminate them afterwards). Please find the data in the following link. Sample Data
xlim = c(min(c(a,b)), max(c(a,b)))
hist(a, breaks = 100,
freq = F,
xlim = xlim,
xlab = 'Test Subject',
main = 'Difference plots',
col = rgb(0.443137, 0.776471, 0.443137, 0.5),
border = rgb(0.443137, 0.776471, 0.443137, 0.5))
lines(density(a))
hist(b, breaks = 100,
freq = F,
col = rgb(0.529412, 0.807843, 0.921569, 0.5),
border = rgb(0.529412, 0.807843, 0.921569, 0.5),
add = T)
lines(density(b))
Using locate() is not optimal, since I need to retrieve this from several plots (but will use that approach if nothing else is viable). Thanks for your help.
We calculate the density curves for both series, taking care to use the same range. Then, we compare whether the y-value for a is greater than b at each x-value. When the outcome of this comparison flips, we know the lines have crossed.
df <- merge(
as.data.frame(density(a, from = xlim[1], to = xlim[2])[c("x", "y")]),
as.data.frame(density(b, from = xlim[1], to = xlim[2])[c("x", "y")]),
by = "x", suffixes = c(".a", ".b")
)
df$comp <- as.numeric(df$y.a > df$y.b)
df$cross <- c(NA, diff(df$comp))
points(df[which(df$cross != 0), c("x", "y.a")])
which gives you

Panel functions in Lattice using differing data

I am working with a data frame called d in R. I want to plot a scatter plot using two of the columns, include a best-fit regression line, and also plot binned means.
I have calculated the centers of the bins and binned means, and included those as columns in the data frame.
I can make the scatter plot and regression line work, but cannot get the binned means to show up. Using the code below I get no errors, but the panel.points function does not show up.
scatter.Epsilon <- xyplot(Epsilon ~ data.subset.UpdatedVS30.091015,
data = d,
grid = TRUE,
scales = list(x = list(log = 10)),
xlab = "Vs30 (m/s)",
ylab = "Epsilon",
ylim = c(-4, 3),
xlim = c(10^2,10^3.4),
subscripts = TRUE,
panel=function(x,y,subscripts,...) {
panel.xyplot(x,y)
panel.abline(mod <- lm(y ~ x), col = 'black')
panel.points(d$bin.ep[subscripts], d$means.ep[subscripts],
col = 'red')})
scatter.Epsilon
A simplified data set would be:
dist <- rnorm(10,4,100)
x <- seq(1,100)
bin <-rep(50,100)
mean <- rep(mean(dist),100)
d <- data.frame(x,dist,bin,mean)
where dist ~ x is the scatterplot component, and mean represents the binned mean for data points between 1-100, and bin is the bin's center (at 50). I want to add one point at (bin, mean) on top of dist ~ x. My real data set has multiple bins and means based on data.subset.UpdatedVS30.091015 that I want to add on top of Epsilon ~ data.subset.UpdatedVS30.091015.
I think you might be trying to do too much work in the call to panel.points. Using your example data, this code works fine:
scatter.Epsilon <- xyplot(dist ~ x,
data = d,
grid = TRUE,
subscripts = TRUE,
panel=function(x,y,subscripts,...) {
panel.xyplot(x,y)
panel.abline(mod <- lm(y ~ x), col = 'black')
panel.points(bin,mean,col = 'red')})
and plots a red point right where it should be. Have you tried just
panel.points(bin.ep,means.ep,col='red')
There is no grouping variable in your formula, so no need for subscripts.

How to add boxplots to scatterplot with jitter

I am using following commands to produce a scatterplot with jitter:
ddf = data.frame(NUMS = rnorm(500), GRP = sample(LETTERS[1:5],500,replace=T))
library(lattice)
stripplot(NUMS~GRP,data=ddf, jitter.data=T)
I want to add boxplots over these points (one for every group). I tried searching but I am not able to find code plotting all points (and not just outliers) and with jitter. How can I solve this. Thanks for your help.
Here's one way using base graphics.
boxplot(NUMS ~ GRP, data = ddf, lwd = 2, ylab = 'NUMS')
stripchart(NUMS ~ GRP, vertical = TRUE, data = ddf,
method = "jitter", add = TRUE, pch = 20, col = 'blue')
To do this in ggplot2, try:
ggplot(ddf, aes(x=GRP, y=NUMS)) +
geom_boxplot(outlier.shape=NA) + #avoid plotting outliers twice
geom_jitter(position=position_jitter(width=.1, height=0))
Obviously you can adjust the width and height arguments of position_jitter() to your liking (although I'd recommend height=0 since height jittering will make your plot inaccurate).
I've written an R function called spreadPoints() within a package basiclotteR. The package can be directly installed into your R library using the following code:
install.packages("devtools")
library("devtools")
install_github("JosephCrispell/basicPlotteR")
For the example provided, I used the following code to generate the example figure below.
ddf = data.frame(NUMS = rnorm(500), GRP = sample(LETTERS[1:5],500,replace=T))
boxplot(NUMS ~ GRP, data = ddf, lwd = 2, ylab = 'NUMS')
spreadPointsMultiple(data=ddf, responseColumn="NUMS", categoriesColumn="GRP",
col="blue", plotOutliers=TRUE)
It is a work in progress (the lack of formula as input is clunky!) but it provides a non-random method to spread points on the X axis that doubles as a violin like summary of the data. Take a look at the source code, if you're interested.
For a lattice solution:
library(lattice)
ddf = data.frame(NUMS = rnorm(500), GRP = sample(LETTERS[1:5], 500, replace = T))
bwplot(NUMS ~ GRP, ddf, panel = function(...) {
panel.bwplot(..., pch = "|")
panel.xyplot(..., jitter.x = TRUE)})
The default median dot symbol was changed to a line with pch = "|". Other properties of the box and whiskers can be adjusted with box.umbrella and box.rectangle through the trellis.par.set() function. The amount of jitter can be adjusted through a variable named factor where factor = 1.5 increases it by 50%.

Resources