Display a specific value with boxplot - r

I make a histogram and a boxplot on different data. I want to display a specific value on these graphs (a point with a label); is it possible with boxplot and hist or is it better to use the ggplot2 package? I don't really know how to use it.
Thanks in advance for your help on this probably very simple question!
The first one represents the distribution of the share of couples without children in different localities and the second one the share of under 14 years old in the population of these localities.
I would like to indicate with a dot the value of a particular locality (present in the source data table)
hist(tableau.complet$proportion.coupleSenf,
main = "ex.main",
xlab = "ex.x",
ylab = "ex.y",
col = "brown")
boxplot(tableau.complet$part.moins.14,
main="ex..main",
ylab="ex.ylab",
col= "brown",
las =1 )

Related

How to creat a scatterplot to vizualize the relationships between two waves of a study in R?

I've been having trouble creating a scatterplot in R to compare waves 1 and 2 of the following study
gay <- read.csv("https://raw.githubusercontent.com/umbertomig/intro-prob-stat-FGV/master/datasets/gay.csv")
I have tried using the following codes but no alterations I could figure on my own seem to work
wave1 <- filter(gay, gay$wave == 1)
wave2 <- filter(gay, gay$wave == 2)
pairs(gay, pch = 19)
pairs(wave1, pch = 19)
pairs(wave2, pch = 19)
plot(gay$ssm, gay$wave, pch = 19)
plot(wave1, gay$ssm, pch = 19)
Any ideas?
EDIT: None of these codes actually have problems to them, but they don't provide useful results either. This problem asks specifically for a scatterplot, comparing the results of the first wave of study to the results to the second. I think it wants me to plot a scatter between ~ssm and ~treatment (being ~treatment a variable with 3 possible results) but I could be completely wrong on this one.
Thank you all in advance for the insights on this question so far, they have been useful!
In case it helps, the full question for this problem is
## Most surveys find at least some outliers or individuals whose responses are substantially different from the rest of the data. In addition, some respondents may change their responses erratically over time. Create a scatter plot to visualize the relationships between wave 1 and each of the subsequent waves in study 2. Use only the control group. Interpret the results.

R: plot multiple lines in different colours from subset of database

I've created a database with six different countries and multiple GDP and inequality measures.
For starters, I want to plot the GDP growth of the countries in one plot. This works out perfectly fine:
plot(my_six_countries$Year, my_six_countries$GDP.growth.rate, main = "Development of GDP growth", xlab = "Year", ylab = "GDP growth", type = "l", col = 600)
However, I want the lines for the different countries to be displayed in different colours and not just 600. I virtually spend the whole day on this super nooby problem and I've tried all sort of things from creating a colour vector over subsetting manually to playing with ggplot - but I'm really stuck.
Any idea how the lines could be displayed in different colours?
Thank you so much!
I just wanted to say that I ended up using a way less elegant method - but it worked.
Firstly, I subsetted my countries.
c1 <- subset(countries,countries$Country=="c1")
c2 <- subset(countries,countries$Country=="c2")
c3 <- subset(countries,countries$Country=="c3")
Secondly, I plotted the lines one by one.
plot(c1$Year, c1$GDP, type = "l", bty="l", col="brown")
lines(c2$Year, c2$GDP, col="cornflowerblue")
lines(c3$Year, c3$GDP, col="darkblue")

Having difficulty with my plots colors and creating a legend

This is probably a basic question. I’ve produced a plot that displays the home ranges for different lemurs. Great! Hard part done. But they are all lime green. How can I choose a different colour for each of my 5 ID's? It seems like is should be simple but I can’t see anything online. Would anyone be able to suggest something?
I’ve pasted my code below
dd <- read.csv(file.choose(), header = T)
xy <- dd[,c("X","Y")]
id <- dd[,"ID"]
hr<- mcp(xy,id,percent=95)
plot(hr,
main="95% Minimum Convex Polygon",
xlab="X Coordinate",
ylab="Y Coordinate")
Once i have 5 separate colors for my 5 ID's (frodo, bilbo, merry, pippin, sam) it would also be great to create a legend displaying the colors and the related ID. I was playing around with the following code
legend('topright', names(hr)[-1] ,
lty=1, col=c('red', 'blue', 'green',' brown'), bty='o', cex=1.5)
But that seems to just display a legend for the x,y coordinates not my ID's displayed in the plot. Can anyone tell me what i'm doing wrong?
Edit: I got it! The function "col=" doesnt work for polygons. Its "colpol=" Thanks for all the help
The hr object has a class of "area" and "data.frame". There is an area method for plot. It has a colpol argument. See ?plot.area when adehabitat is loaded:
plot(hr, colpol=c('red', 'blue', 'green',' brown') )
Originally it was not clear that you wanted to color the 4 (not 5) areas produced. I thought you wanted the points colored by group, which is what this produced.
If you know that ID is already a factor then the factor call is not needed. as.numeric applied to a factor turns it into an integer ranging from 1 to the number of levels, and that is being used as an index into that vector of 5 colors. If you want to see the names all of the 657 colors available, just type colors(). Refer to ?colors for additional links for managing color palettes.
As pointed out, we don't have the data or the mcp function to see what the hr object gets plotted as. If the plot method for that object is not assigning individual colors for the points, then do this instead:
points(xy[,1], xy[,2],
col = c("red", "green", "blue", "orange", "sandybrown")[as.numeric(factor(dd[,"ID"]))]
)
Is this what you are looking for
plot(hr$X,hr$Y,main="95% Minimum Convex Polygon",xlab="X Coordinate",
ylab="Y Coordinate",
col = rainbow(length(hr$ID))[rank(hr$ID)],
pch=c(1:25)[as.numeric(factor(hr$ID))])
legend('topleft', unique(unlist(as.character(factor(hr$ID)))) ,lty=1,
col=rainbow(length(hr$ID))[ unique(unlist(rank(hr$ID)))],
pch=c(1:25)[unique(unlist(as.numeric(factor(hr$ID))))],
bty='o', cex=1.5)

Manually defining the colours of a wireframe

I am plotting some surfaces in R using the lattice package. I can't find a way to choose the colours of the surface. Here is an example:
Here is an example of how i plot each:
theseCol=heat.colors(150)
mm=paste("WB numbers where present\n(",nstoch," sims)",sep="")
WBnumbers=wbPrev_series
rownames(WBnumbers)=KList
colnames(WBnumbers)=iMwbList
wireframe(WBnumbers, zlim=c(0,max(wbPrev_series,na.rm=TRUE)), colorkey=FALSE,
col.regions=theseCol, scales = list(arrows = FALSE), drape = TRUE,
main=mm, zlab="", xlab="K", ylab="iMwb")
I would like for the first surface to be as it is, but for the others to be coloured not by their z levels but by the 1st surface's z levels. I tried multiple things but wireframe always accepts the colours i give as the possible ranges for the current variable.
Anyway this could be done?
Thanks
Here is the answer Dave W. posted some years back on the R-help mailing list. You probably can google up the entire thread.
From: David Winsemius
Following the advice in help(wirefrane) you need to look at the
levelplot section for advice re: a proper specification to colorkey
and follow the appropriate links in the help pages. Whether your data
is a proper input to wireframe cannot be determined from the included
information, although I suppose your reported success suggests it is.
This is an untested (since there was nothing to test) wild-assed guess
after reading the material I pointed to:
wireframe(data.m,aspect = c(0.3), shade=TRUE, screen = list(z = 0, x =
-45),
light.source = c(0,0,10), distance =
0.2,zlab="Freq",xlab="base",ylab="Fragment",
col=level.colors(x, at = do.breaks(range(data.m), 30),
col.regions = colorRampPalette(c("red", "white",
"blue")(30))
)
EDIT:
Per Josh's request, I played around a bit. The following will apply color shading (drape):
wireframe(dmat,drape=TRUE,col='black',col.regions = colorRampPalette(c("red", "white", "blue"))(30) )
Which sets the "drape" colors but not the gridlines themselves.
It's a darn shame that wireframe doesn't respect par(new=TRUE), because if it did we could slice the data matrix into z-ranges and overplot one color at a time.
I will have to check my "archive" of old experiments w/ R graphics when I get home, but I think I ended up using the scatterplot3d package to get data-dependent grid colors.

R heat map: Ordering by value; label issues

I am looking to improve upon output I implemented in R based on Jeromy's answer here (thanks!). Mine is a 31x31 matrix with positive and negative values, and uses basically the same ggplot2 code:
library(ggplot2)
library(reshape)
z<-cor(insheet3,use="complete.obs",method="kendall")
zm<-melt(z)
ggplot(zm, aes(X1,X2, fill=value)) + geom_tile() +
scale_fill_gradient2(low = "blue", high = "dark violet")
I need to change three things:
Right now, the rows appear in reverse alphabetical order, which means no visible data trends. How can I influence the order of the rows and columns, such that either:
A. (Preferred:) The columns are ordered by correlation value (negative to positive or vice versa), as they are in the ellipse package output on that same page; or
B. The columns are manually ordered, so that I can group similar variables?
Along the bottom X-axis, my variable names are overlapping dramatically and are unreadable. They need to remain long (i.e., OrthoPhos, Ammonia, Residential...), so how can I rotate their labels 90 degrees?
Is there a way to remove the "X1" and "X2" labels along each axis?
Thank you!
Following what I'll call an extensive/religious R journey into correlation matrix possibilities, I wanted to share what I'm finally going to use. Also, thanks to the previous answerers; I've found that there are many "right" answers to this.
Since my reviewers insisted I include numbers and not just colors, and that I stay away from more "confusing" and "busy" output like correlogram, I finally found "image" and based my final output on this example. Thanks #Marcinthebox.
Also to appease StackOverflow, here is a link to the image, rather than the image itself.
Because some of these specifications took a while to figure out and were critical to the final output, here's my code, shortened as much as I could.
#Subsetting to only the vectors I want to see in the correlation, as ordered
insheet<-subset(insheet1,
select=c("Cond", "CL", "SO4", "TN", "TP", "OrthoPhos", "DO", ...., "Rural"))
#Defining "high" and "low" colors
library(colorspace)
mycolors<-diverge_hcl(8, h = c(8, 240), c = 80, l = c(50,100), power = 1)
#Correlating them into a matrix
sheet<-cor(insheet,use="complete.obs")
#Making it!
image(x=seq(dim(sheet)[2]), y=seq(dim(sheet)[2]), z=sheet, ann=FALSE,
col=mycolors, xlab="x column", ylab="y column", xaxt='n', yaxt='n')
text(expand.grid(x=seq(dim(sheet)[2]), y=seq(dim(sheet)[2])),
labels=round(c(sheet),2), cex=0.5)
axis(1, 1:dim(insheet2)[2], colnames(insheet2), las=2)
axis(2, 1:dim(insheet2)[2], colnames(insheet2), las=2)
par(mar=c(5.5, 5.5, 2, 1)) #Moves margins over to allow for axis labels
I was also able to for-loop this to output multiple .wmf files, once errors were suppressed. Too bad I couldn't visualize significant p-values as well... another time. Thanks!
I assume that you mean "clustering" for point 1.?
For such tasks I prefer the heatmap.2() function from the gplots package, which offers various clustering options.
For point 2 and 3: The heatmap.2() function will also take care of the 90º rotation and the labels since it is using a data matrix as input instead of a data table.

Resources