annotate a faceted geom_point - r

Using ggplot2, I would like annotate my faceted geom_pont plots : I am plotting some data per plant for 2 parameters and I would like to annotate each faceted plots with the population size of each plant which make the plot. Below is a similar example to my data.
Lets subset the CO2 dataset to make the example more relevant. I count the number of plant for which the uptake is above 20 and rename the column:
require(plyr)
require(dplyr)
require(ggplot2)
CO2_mod<-subset(CO2,uptake>20)
COUNT<-ddply(.data=CO2_mod,
.variable=.(Plant,Treatment),
.fun=count)
names(COUNT)[3] <- c("PopSize")
Here is the code for faceted plots based on treatments:
p1<-ggplot(CO2_mod, aes(x=Plant, y=uptake))
p2<-p1+geom_point(aes())+
facet_grid(Treatment~., scales="free")
p2
Now I would like to annotate each faceted plot with the PopSize value per Plant and per Treatment from the COUNT df.
I have tried this code without success:
y<-max(CO2_mod$uptake)+1
COUNT<-mutate(COUNT,y=paste0(y))
p2<-p1+geom_point(aes())+
facet_grid(Treatment~., scales="free")+
geom_text(data=COUNT, aes(x=Plant, y=y, label=PopSize),
colour="black")
p2
The error warning says : Error: Discrete value supplied to continuous scale
What would be the right way to do this?
thanks!

Inspecting COUNT shows that y is a character vector:
str(COUNT)
# 'data.frame': 10 obs. of 4 variables:
# $ Plant : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 2 3 4 5 6 7 8 9 12
# $ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 2 2 2 1 1 1 2
# $ PopSize : int 6 6 6 6 6 6 5 6 5 2
# $ y : chr "46.5" "46.5" "46.5" "46.5" ...
If we modify COUNT so that y is numeric:
COUNT<-mutate(COUNT,y=as.numeric(y))
we get this plot:

Related

ggplot2: reorder x axis label by factor of levels doesn't work

I want to draw a plot with x-axis labels equal to c("chr","lab1","lab3","lab10")
But when I set up the level of x-axis, it seems the reorder of x-axis doesn't work.
Can anyone help me solve the problem?
I don't know why I cannot change the order of x-axis labels.
library(ggplot2)
library(viridis)
set.seed(123)
data.df=data.frame(gene=c("APC","NF2","APC","NF2","APC","NF2","APC","NF2"),
variable=c("lab1","lab1","lab3","lab3","lab10","lab10","chr","chr"),
value=c(sample.int(100, 6),5,22))
ggplot(data.df, aes(x=factor(variable, level = c("chr",paste0("lab",c(1,3,10)))),
y=factor(gene, level = c("NF2","APC")))) +
geom_tile(data=filter(data.df,variable != 'chr'),
aes(fill= value),colour='white') +
geom_point(data = filter(data.df,variable=="chr"),
aes(col=as.factor(value)))+
scale_fill_viridis_b()
===============
An update about the dataset, my entire dataset includes 1254 obs. of 3 variables.
To set up the level of variable seems work for this example, but it doesn't work for my entire dataset.
> str(melt.df)
'data.frame': 1254 obs. of 3 variables:
$ gene : Factor w/ 57 levels "NF2","TNNI3",..: 29 16 39 49 18 31 9 20 52 46 ...
$ variable: Factor w/ 22 levels "chromosome_name",..: 2 2 2 2 2 2 2 2 2 2 ...
$ value : num 100 100 99.8 100 100 ...
Final update, I find a way to solve my problem by add + scale_x_discrete(limits = c("chr",paste0("lab",c(1,3,10))))
Thank Ricardo SemiĆ£o e Castro and TarJae for your help!
ggplot2 plots any factor variable in the order of their levels, and by default, R sets the levels of a factor alphabetically, such that "lab10" (which starts with a 1) is before "lab3". To correct this, reorder the levels of your variable:
factor(data.df$variable, c("chr","lab1","lab3","lab10"))

How to change order for pyramid plots with ggplot2 to dataset order?

I have a dataset with climate suitability values (0-1) for tree species for both present and future.
I would like to visualise the data in a pyramid plot with the ggplot2 package, whereas present should be displayed on the left side of the plot and future on the right side and the tree species in the according order given in my raw dataset.
b2010<-read.csv("csi_before2010_abund_order.csv",header=T,sep = ";")
str(b2010)
'data.frame': 20 obs. of 7 variables:
$ species: Factor w/ 10 levels "Acer platanoides",..: 9 9 7 7 8 8 6 6 5 5 ...
$ time : Factor w/ 2 levels "future","present": 2 1 2 1 2 1 2 1 2 1 ...
$ grid1 : num 0.6001 0.5945 0.6366 0.0424 0.6941 ...
$ grid2 : num 0.6399 0.5129 0.6981 0.0399 0.711 ...
$ grid3 : num 0.6698 0.5212 0.6863 0.0446 0.6795 ...
$ mean : num 0.6366 0.5429 0.6737 0.0423 0.6949 ...
$ group : Factor w/ 1 level "before 2010": 1 1 1 1 1 1 1 1 1 1 ...
b2010$mean = ifelse(b2010$time == "future", b2010$mean * -1,b2010$mean)
head(b2010)
species time grid1 grid2 grid3 mean group
1 Tilia europaea present 0.60009009 0.63990200 0.66975713 0.63658307 before 2010
2 Tilia europaea future 0.59452874 0.51294094 0.52115256 -0.54287408 before 2010
3 Sorbus intermedia present 0.63659602 0.69813931 0.68629903 0.67367812 before 2010
4 Sorbus intermedia future 0.04242327 0.03990654 0.04460707 -0.04231229 before 2010
5 Tilia cordata present 0.69414478 0.71097034 0.67950863 0.69487458 before 2010
6 Tilia cordata future 0.55790818 0.53918493 0.51979470 -0.53896260 before 2010
ggplot(b2010, aes(x = factor(species), y = mean, fill = time)) +
geom_bar(stat = "identity") +
facet_share(~time, dir = "h", scales = "free", reverse_num = T) +
coord_flip()
Now, future and present are in the wrong order and also the species are ordered alphabetically, even though they are clearly "factors" and should therefore be ordered according to my dataset. I would very much appreciate your help.
Thank you and kind regards
You are misunderstanding how factors work. Bars are plotted in the order as printed by levels(b2010$species). In order to change this order, you'll have to manually reorder them, i.e.
b2010$species <- factor(b2010$species,
levels = c("Sorbus intermedia", "Tilia chordata"...))
These levels can naturally be also a function of some statistic, i.e. mean. To do that, you would do something along the lines of
myorder <- b2010[order(b2010$mean) & b2010$time == "present", "species"]
b2010$species <- factor(b2010$species, levels = myorder)

Scatterplots in R using lattice and cloud, how to determine colors by factors?

I am still struggling with R plots and colors -- some results are as I expected, some not.
I have a 2-million point data set, generated by a simulation process. There are several variables on the dataset, but I am interested on three and on a factor that describe the class for that data point.
Here is a short snippet of code that reads the points and get some basic statistics on it:
library(lattice)
library(plyr)
myData <- read.table("dados - b1000 n10000 var 0,2 - MAX40.txt",
col.names=c("Class","Thet1Thet2","Thet3Thet2","Thet3Thet1",
"K12","K23","delta","w_1","w_2","w_3"))
count (myData$Class)
That gives me
## x freq
## 1 A 8030
## 2 B 17247
## 3 C 4999
## 4 D 16495
## 5 E 1949884
## 6 N 3345
(the input file is quite large, cannot add it as a link)
I want to see these points in a scatterplot matrix, so I use the code
colors=c("red","green","blue","cyan","magenta","yellow")
# Let's try with a very small dot size, see if we can visualize the inners of the cube.
cloud(myData$delta ~ myData$K12 + myData$K23, xlab="K12", ylab="K23", zlab="delta",
cex=0.001,main="All Classes",col.point = colors[myData$Class])
Here is the result. As expected, points from class E are in vast majority, so I cannot see points of other classes. The problem is that I expected the points to be plotted in magenta (classes are A, B, C, D, E, N; colors are red, green, blue, cyan, magenta, yellow).
When I do the plot class by class it works as expected, see two examples:
data <- subset(myData, Class=="A")
cloud(data$delta ~ data$K12 + data$K23, xlab="K12", ylab="K23", zlab="delta",pch=20,main="Class A",
col.point = colors[data$Class])
gives this:
And this snippet of code
data <- subset(myData, Class=="E")
cloud(data$delta ~ data$K12 + data$K23, xlab="K12", ylab="K23", zlab="delta",pch=20,main="Class E",
col.point = colors[data$Class])
gives this:
This also seems as expected: a plot of points of all classes except E.
data <- subset(myData, Class!="E")
cloud(data$delta ~ data$K12 + data$K23, xlab="K12", ylab="K23", zlab="delta",pch=20,
cex=0.01,main="All Classes (except E)",col.point = colors[data$Class])
The question is, why on the first plot the points are blue instead of magenta?
This question is somehow similar to Color gradient for elevation data in a XYZ plot with R and Lattice but now I am using factors to determine colors on the scatterplot.
I've also read Changing default colours of a lattice plot by factor -- grouping plots by a factor (using the parameter groups.factor=myData$Class) does not solve my problem, plots are still in blue but separated by class.
Edited to add more information: this fake data set can be used for tests.
num <- 10
data <- as.data.frame(
cbind(
x=rep(seq(1,num), each=num*num),
y=rep(seq(1,num), each=num),
z=rep(seq(1,num))
))
# This is ugly but works!
data$Class[data$z==1]<-'A'
data$Class[data$z==2]<-'A'
data$Class[data$z==3]<-'B'
data$Class[data$z==4]<-'B'
data$Class[data$z==5]<-'C'
data$Class[data$z==6]<-'C'
data$Class[data$z==7]<-'D'
data$Class[data$z==8]<-'D'
data$Class[data$z==9]<-'E'
data$Class[data$z==10]<-'E'
str(data)
When I plot it with
colors=c("red","green","blue","cyan","magenta","yellow")
cloud(data$z ~ data$x + data$y, xlab="X", ylab="Y", zlab="Z",main="All Classes",
col.point = colors[data$Class])
I get the plot below. All points are in blue.
JeremyCG found the problem. Here is the complete code that works, for future reference.
library(lattice)
num <- 10
data <- as.data.frame(
cbind(
x=rep(seq(1,num), each=num*num),
y=rep(seq(1,num), each=num),
z=rep(seq(1,num))
))
data$Class[data$z==1]<-'A'
data$Class[data$z==2]<-'A'
data$Class[data$z==3]<-'B'
data$Class[data$z==4]<-'B'
data$Class[data$z==5]<-'C'
data$Class[data$z==6]<-'C'
data$Class[data$z==7]<-'D'
data$Class[data$z==8]<-'D'
data$Class[data$z==9]<-'E'
data$Class[data$z==10]<-'E'
str(data)
That showed the issue:
## 'data.frame': 1000 obs. of 4 variables:
## $ x : int 1 1 1 1 1 1 1 1 1 1 ...
## $ y : int 1 1 1 1 1 1 1 1 1 1 ...
## $ z : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Class: chr "A" "A" "B" "B" ...
Class must be a factor. This solved it:
data$Class <- as.factor(data$Class)
str(data)
## 'data.frame': 1000 obs. of 4 variables:
## $ x : int 1 1 1 1 1 1 1 1 1 1 ...
## $ y : int 1 1 1 1 1 1 1 1 1 1 ...
## $ z : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Class: Factor w/ 5 levels "A","B","C","D",..: 1 1 2 2 3 3 4 4 5 5 ...
Then plot it:
colors=c("red","green","blue","cyan","magenta","yellow")
cloud(data$z ~ data$x + data$y, xlab="X", ylab="Y", zlab="Z",
pch=20,main="All Classes",col = colors[data$Class])
Here is the result:
Thanks #jeremycg !

ggplot2 time series with an ordered factor on the x-axis

I'd be extremely grateful for your assistance with the following issue.
I wish to create a representative time series for different subjects who have undertaken a test at discrete intervals. The data frame is called Hayling.Impulsivity. Here is a sample of the data in wide format:
Subject Baseline 2-weeks 6-weeks 3-months
1 1 15 23 5 NA
2 2 15 27 3 4
3 3 5 7 0 19
4 4 1 5 2 6
5 5 3 7 18 27
6 6 0 2 19 2`
I then made Subject a factor:
Hayling.Impulsivity$Subject<-factor(Hayling.Impulsivity$Subject)
I then melted the data frame into long format using the reshape package:
Long.H.I.<-melt(Hayling.Impulsivity, id.vars="Subject", variable.name="Follow Up", value.name="Hayling AB Error Score")
I then ordered the measurement variables:
Long.H.I.$"Follow Up"<-factor(Long.H.I.$"Follow Up", levels=c("Baseline", "2-weeks", "6-weeks", "3-months"), ordered=TRUE)
Here's the structure of this data frame:
'data.frame': 52 obs. of 3 variables:
$ Subject : Factor w/ 13 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Follow Up : Ord.factor w/ 4 levels "Baseline"<"2-weeks"<..: 1 1 1 1 1 1 1 1 1 1 ...
$ Hayling AB Error Score: num 15 15 5 1 3 0 3 0 0 33 ...
Now I try to construct the time series in ggplot:
ggplot(Long.H.I., aes("Follow Up", "Hayling AB Error Score", group=Subject, colour=Subject))+geom_line()
But all I get is an empty plot. I'm not permitted to post an image to show you but the x and y axes are labelled only with "Follow Up" and "Hayling AB Error Score" respectively. There are no actual scales / values / categories on either axis and no points have been plotted.
Where have I gone wrong?
It looks like spaces in your column names are causing the problem even if you use aes_string. You could replace the spaces with underscores and then label the x and y axes explicitly. Code could look like:
Hayling.Impulsivity$Subject<-factor(Hayling.Impulsivity$Subject)
Long.H.I.<-melt(Hayling.Impulsivity, id.vars="Subject",
variable.name="Follow_Up", value.name="Hayling_AB_Error_Score")
Long.H.I.$Follow_Up <-factor(Long.H.I.$"Follow_Up",
levels=c("Baseline","2-weeks","6-Weeks","3-months"), ordered=TRUE)
ggplot(Long.H.I., aes(Follow_Up, Hayling_AB_Error_Score, group=Subject, colour=Subject))+
geom_line() +
labs(x="Follow Up", y="Hayling AB Error Score")

How do I get faceted barplot values to show as negative

I'm struggling a bit to make a vertically- faceted barplot. I added a 'thus far' version of my work below. My main issue is that the negative values aren't showing as I'd expect. Shouldn't there be some line, or tick, indicating 0, with negative bars registering below it? The code below should be fully reproducible. You can see several negative values in the final data set I'm trying to plot. I'm getting a rather verbose error beginning with 'Mapping a variable to y and also using stat="bin".' I sense it's likely related to my issue, but I'm not able to find or derive a concrete solution.
Also, as secondary points, if anyone has any advice past the current snag, my goal end- result would be to color those negative bars red, and the positive ones green, to add the 'spdrNames' to the y axis, to label the bars with the actual value, and to remove the illegible values from the x axis.
require('ggplot')
require('reshape')
require('tseries')
spdrTickers = c('XLY','XLP','XLE','XLF','XLV','XLI','XLB','XLK','XLU')
spdrNames = c('Consumer Discretionary','Consumer Staples', 'Energy',
'Financials','Health Care','Industrials','Materials','Technology',
'Utilities')
latestDate =Sys.Date()
dailyPrices = lapply(spdrTickers, function(ticker) get.hist.quote(instrument= ticker, start = "2012-01-01",
end = latestDate, quote="Close", provider = "yahoo", origin="1970-01-01", compression = "d", retclass="zoo"))
perf5Day = lapply(dailyPrices, function(x){(x-lag(x,k=-5))/lag(x,k=-5)})
perf20Day = lapply(dailyPrices, function(x){(x-lag(x,k=-20))/lag(x,k=-20)})
perf60Day = lapply(dailyPrices, function(x){(x-lag(x,k=-60))/lag(x,k=-60)})
names(perf5Day) = spdrTickers
names(perf20Day) = spdrTickers
names(perf60Day) = spdrTickers
perfsMerged = lapply(spdrTickers, function(spdr){merge(perf5Day[[spdr]],perf20Day[[spdr]],perf60Day[[spdr]])})
perfNames = c('1Week','1Month','3Month')
perfsMerged = lapply(perfsMerged, function(x){
names(x)=perfNames
return(x)
})
latestDataPoints = t(sapply(perfsMerged, function(x){return(x[nrow(x)])}))
latestDataPoints = data.frame(cbind(spdrTickers,latestDataPoints))
names(latestDataPoints) = c('Ticker', '1Week','1Month','3Month')
drm = melt(latestDataPoints, id.vars=c('Ticker'))
names(drm) = c('Ticker','Period','Value')
p = ggplot(drm, aes(x=Ticker,y=Value)) + geom_bar() + coord_flip() + facet_grid(. ~ Period)
Yields this:
Somehow you have converted your Value-values to a factor:
str(drm)
'data.frame': 27 obs. of 3 variables:
$ Ticker: Factor w/ 9 levels "XLB","XLE","XLF",..: 9 6 2 3 8 4 1 5 7 9 ...
$ Period: Factor w/ 3 levels "1Week","1Month",..: 1 1 1 1 1 1 1 1 1 2 ...
$ Value : Factor w/ 27 levels "0.0164396430248944",..: 2 4 5 1 8 3 7 6 9 11 ...
Probably happens here:
latestDataPoints = data.frame(cbind(spdrTickers,latestDataPoints))
> str( latestDataPoints )
'data.frame': 9 obs. of 4 variables:
$ Ticker: Factor w/ 9 levels "XLB","XLE","XLF",..: 9 6 2 3 8 4 1 5 7
$ 1Week : Factor w/ 9 levels "0.0164396430248944",..: 2 4 5 1 8 3 7 6 9
$ 1Month: Factor w/ 9 levels "-0.00139291932675571",..: 2 3 1 5 8 4 6 7 9
$ 3Month: Factor w/ 9 levels "-0.0110357512357742",..: 3 2 1 5 9 6 7 8 4
Since just before that step you had a numeric matrix from: t(sapply(perfsMerged, function(x){return(x[nrow(x)])}))
Then doing this:
latestDataPoints[2:4] <- lapply( latestDataPoints[2:4], function(x)
as.numeric(as.character(x)) )
drm = melt(latestDataPoints, id.vars=c('Ticker'))
names(drm) = c('Ticker','Period','Value')
p = ggplot(drm, aes(x=Ticker,y=Value)) + geom_bar() + coord_flip() +
facet_grid(. ~ Period)
png();print(p);dev.off()
Produces:
The construction data.frame(cbind(...)) is a real trap. I've seen is used by supposedly authoritative sources and it is a recurrent source of puzzlement. I think R would be a safer language to use if the interpreter would simply highlight that combination in red (along with as.numeric applied to factors.) When you cbind a character vector to a numeric matrix, you get an all character matrix.

Resources