geom_vline with Character xintercept - r

I have some ggplot code that worked fine in 0.8.9 but not in 0.9.1.
I am going to plot the data in theDF and would like to plot a vertical line at xintercept="2010 Q1." theGrid is merely used to create theDF.
theGrid <- expand.grid(2009:2011, 1:4)
theDF <- data.frame(YrQtr=sprintf("%s Q%s", theGrid$Var1, theGrid$Var2),
Minutes=c(1000, 2200, 1450, 1825, 1970, 1770, 1640, 1920, 1790, 1800, 1750, 1600))
The code used is:
g <- ggplot(theDF, aes(x=YrQtr, y=Minutes)) +
geom_point() +
opts(axis.text.x=theme_text(angle=90))
g + geom_vline(data=data.frame(Vert="2010 Q2"), aes(xintercept=Vert))
Again, this worked fine in R 2.13.2 with ggplot2 0.8.9, but does not in R 2.14+ with ggplot2 0.9.1.
A workaround is:
g + geom_vline(data=data.frame(Vert=4), aes(xintercept=Vert))
But that is not a good solution for my problem.
Maybe messing around with scale_x_discrete might help?

You could try this:
g + geom_vline(aes(xintercept = which(levels(YrQtr) %in% '2010 Q1')))
This avoids the need to create a dummy data frame for the selected factor level. The which() command returns the index (or indices if the right side of the %in% operator is a vector) of the factor level[s] to associate with vlines.
One caution with this is that if some of the categories do not appear in your data and you use drop = TRUE in the scale, then the lines will not show up in the right place.

When you use a character vector, or a factor, for the x-axis in a plot the default values given to each of the unique items is simply integer starting at 1. So, if your levels are c("A" "B", "C") then the x-axis locations are c(1,2,3). There is no such thing as a character location, just a character label. If you want a vertical line at A then put it at 1. If you want it half way in between A and B then put it 1.5. Again, those are the defaults. If a particular plot did something else you can easily work that out by putting lines at a few locations and seeing what happens.

Related

Q: How combine two types of lines using ggplot?

I am trying to plot the following graph:
This plot was made using a command in R; however, I need to change the x-axis. As you see the x-axis starts at 0 and finish at 46. I want that the x-axis starts in 1972 and finishes in 2018 seq(1972, 2018). The data used for this graph is the following:
For regime one
structure(c(0.996336942021931, 0.982749831853788, 0.25257000136794,
0.707797489518183, 0.339372705184362, 0.999209103898399, 0.348786927897612,
0.821500770877589, 0.569473419352121, 0.544946043345147, 0.15347485404411,
0.987921203799956, 0.00247541125926418, 0.999925918450173, 0.996940249283586,
0.0141234625702467, 0.105466117156579, 0.999992944275275, 0.991723355647765,
0.0958472062267191, 0.0362729940372193, 0.999999790503447, 0.0750715811130157,
0.999975836828039, 0.998991768987905, 0.327943641159186, 5.05723080618291e-05,
0.999999999869691, 0.995538324405397, 0.123355227931813, 0.999776636825943,
0.00875781169836433, 0.696284480883101, 0.854839147672286, 0.113243492249383,
0.00984853715078062, 0.442061195271808, 0.999959859676686, 0.0249739384218217,
0.715262186931097, 0.269481397703521, 0.708458897302807, 0.0444979324520481,
0.000133950914911277, 0.997976154782607, 0.191386380576805, 0.99775339928206,
0.97921531595208, 0.27690132186733, 0.671995422154737, 0.458800347851363,
0.999155966774432, 0.417000082142666, 0.838969001100901, 0.576424593247709,
0.439169303472056, 0.227227711549776, 0.978527102362448, 0.00408165810824898,
0.999955057843957, 0.994643622809094, 0.00847570472458959, 0.163000467960203,
0.999995704786608, 0.987482614312069, 0.0569007267419926, 0.0585312256476362,
0.999999671060746, 0.118213072794827, 0.99998536150034, 0.998897081324845,
0.212968271334585, 8.35316288758489e-05, 0.999999999920876, 0.993537683112221,
0.188538497918178, 0.999604116439039, 0.00905848219612739, 0.769430430615986,
0.794457999021984, 0.0665707154963958, 0.00776458004359329, 0.5668500474175,
0.999931021995446, 0.0265573724408095, 0.661699294173752, 0.296009575623967,
0.587638579198176, 0.0251758869152202, 0.000220356219397782,
0.997352716237698, 0.191386380576805), .Dim = c(46L, 2L))
for regime 2:
structure(c(0.00366305797806813, 0.0172501681462116, 0.74742999863206,
0.292202510481817, 0.660627294815638, 0.000790896101601132, 0.651213072102388,
0.178499229122411, 0.430526580647879, 0.455053956654853, 0.846525145955889,
0.0120787962000438, 0.997524588740736, 7.40815498269273e-05,
0.00305975071641352, 0.985876537429753, 0.894533882843421, 7.05572472485335e-06,
0.00827664435223535, 0.904152793773281, 0.963727005962781, 2.09496553467159e-07,
0.924928418886985, 2.41631719608902e-05, 0.00100823101209502,
0.672056358840815, 0.999949427691938, 1.30308744399533e-10, 0.00446167559460289,
0.876644772068187, 0.00022336317405711, 0.991242188301636, 0.303715519116899,
0.145160852327714, 0.886756507750617, 0.990151462849219, 0.557938804728191,
4.01403233139628e-05, 0.975026061578178, 0.284737813068903, 0.730518602296479,
0.291541102697193, 0.955502067547952, 0.999866049085089, 0.00202384521739295,
0.808613619423195, 0.00224660071793958, 0.0207846840479196, 0.72309867813267,
0.328004577845263, 0.541199652148637, 0.000844033225568314, 0.582999917857334,
0.161030998899099, 0.423575406752291, 0.560830696527944, 0.772772288450224,
0.0214728976375518, 0.995918341891751, 4.49421560426429e-05,
0.00535637719090558, 0.99152429527541, 0.836999532039797, 4.29521339242403e-06,
0.0125173856879312, 0.943099273258007, 0.941468774352364, 3.28939253926857e-07,
0.881786927205173, 1.46384996596921e-05, 0.00110291867515508,
0.787031728665414, 0.999916468371124, 7.91243531099699e-11, 0.00646231688777926,
0.811461502081822, 0.00039588356096145, 0.990941517803873, 0.230569569384014,
0.205542000978016, 0.933429284503604, 0.992235419956407, 0.4331499525825,
6.89780045536876e-05, 0.973442627559191, 0.338300705826248, 0.703990424376033,
0.412361420801824, 0.97482411308478, 0.999779643780602, 0.00264728376230197,
0.808613619423195), .Dim = c(46L, 2L))
I know that the red line can be plotted using geom_line but I do not know how can the black bars plot? maybe using geom_bar, and also how can I merge the plots?
Thanks for your help
It's actually plotted using base R (good old times), using your first data for For regime one:
plot(Regime1[,1],type="h",xaxt="n",ylab="",cex.axis=0.6,xlab="",xlim=c(0,46))
lines(Regime1[,2],col="red")
mtext("Smoothed Probabilities",2,padj=-5,col="red",cex=0.7)
mtext("Fitted Probabilities",4,padj=1,cex=0.7)
axis(side=1,at=c(0,20,46),labels=c(1972,1992,2018))
Your xaxis values are actually 0:46, so you turn off the x-axis ticks using xaxt="n", then with axis(), you put it at 0,20,46 with the labels 1972...
It also depends on your plotting device, so might have to change the padj parameter in the axis to adjust the axis labels. I guess you can check out post like this for base R plotting functions.
In ggplot2, I guess you just create a data.frame with the Index as the years you need, and you call geom_segment() to plot the vertical lines :
library(ggplot2)
Regime1 = data.frame(Regime1)
colnames(Regime1) = c("Fitted","Smoothed")
Regime1$index = 1:nrow(Regime1)+1972
ggplot(Regime1,aes(x=index))+
geom_segment(aes(xend=index,y=0,yend=Fitted,col="Fitted")) +
geom_line(aes(y=Smoothed,col="Smoothed")) + theme_minimal() +
scale_color_manual(values=c("black","red"))
For a ggplot2 solution, you are going to need a data.frame or tibble with 4 columns (Regime, Year, Smoothed, and Fitted). Based on the data you provided, this would have 92 rows.
Now assuming you use those column names (and storing your data into the variable example.dat), a ggplot2 solution is
example.dat %>%
ggplot( aes(x=Year) ) +
geom_line( aes(y=Smoothed), color="red" ) +
geom_linerange( aes(ymax=Fitted), ymin=0 ) +
facet_wrap( ~ Regime, ncol=1 )
Then you might need to adjust some of the scales to get the best plot.

Adding Label *row number* into the Plot

How Can I modify this code to have a plot so that it shows for
each point on the graph its corresponding row number as a label.
inter <- seq(7.5, 21.5, 1)
LogDifference <- c("na",1.5,0.8,0.6,0.01,-0.57,-0.11,0.41,0.068,-0.19,-0.31,0.05,0.14,0.6,0.5)
S<-data.frame(inter,LogDifference)
plot(x = S$inter,S$LogDifference)
First of all, notice that your basic plot is not doing what you want.
The y values being plotted are the numbers 1 to 14. I think that you wanted
the numerical values that are in LogDifference. You can fix this by
first converting LogDifference to character (it is a factor), then converting
to numeric. I am just leaving out the "na".
After that, you can use text to place labels next to the points.
The full code is:
inter <- seq(7.5, 21.5, 1)
LogDifference <- c("na",1.5,0.8,0.6,0.01,-0.57,-0.11,0.41,0.068,
-0.19,-0.31,0.05,0.14,0.6,0.5)
S<-data.frame(inter,LogDifference)
plot(x = S$inter[-1], as.numeric(as.character(S$LogDifference[-1])))
text(x=inter[-1]+0.4, y=as.numeric(as.character(LogDifference[-1]))+0.05, labels=2:15)

Plot a table with box size changing

Does anyone have an idea how is this kind of chart plotted? It seems like heat map. However, instead of using color, size of each cell is used to indicate the magnitude. I want to plot a figure like this but I don't know how to realize it. Can this be done in R or Matlab?
Try scatter:
scatter(x,y,sz,c,'s','filled');
where x and y are the positions of each square, sz is the size (must be a vector of the same length as x and y), and c is a 3xlength(x) matrix with the color value for each entry. The labels for the plot can be input with set(gcf,properties) or xticklabels:
X=30;
Y=10;
[x,y]=meshgrid(1:X,1:Y);
x=reshape(x,[size(x,1)*size(x,2) 1]);
y=reshape(y,[size(y,1)*size(y,2) 1]);
sz=50;
sz=sz*(1+rand(size(x)));
c=[1*ones(length(x),1) repmat(rand(size(x)),[1 2])];
scatter(x,y,sz,c,'s','filled');
xlab={'ACC';'BLCA';etc}
xticks(1:X)
xticklabels(xlab)
set(get(gca,'XLabel'),'Rotation',90);
ylab={'RAPGEB6';etc}
yticks(1:Y)
yticklabels(ylab)
EDIT: yticks & co are only available for >R2016b, if you don't have a newer version you should use set instead:
set(gca,'XTick',1:X,'XTickLabel',xlab,'XTickLabelRotation',90) %rotation only available for >R2014b
set(gca,'YTick',1:Y,'YTickLabel',ylab)
in R, you should use ggplot2 that allows you to map your values (gene expression in your case?) onto the size variable. Here, I did a simulation that resembles your data structure:
my_data <- matrix(rnorm(8*26,mean=0,sd=1), nrow=8, ncol=26,
dimnames = list(paste0("gene",1:8), LETTERS))
Then, you can process the data frame to be ready for ggplot2 data visualization:
library(reshape)
dat_m <- melt(my_data, varnames = c("gene", "cancer"))
Now, use ggplot2::geom_tile() to map the values onto the size variable. You may update additional features of the plot.
library(ggplot2)
ggplot(data=dat_m, aes(cancer, gene)) +
geom_tile(aes(size=value, fill="red"), color="white") +
scale_fill_discrete(guide=FALSE) + ##hide scale
scale_size_continuous(guide=FALSE) ##hide another scale
In R, corrplotpackage can be used. Specifically, you have to use method = 'square' when creating the plot.
Try this as an example:
library(corrplot)
corrplot(cor(mtcars), method = 'square', col = 'red')

R: Create a more readable X-axis after binning data in ggplot2. Turn bins into whole numbers

I have a dummy variable call it "drink" and a corresponding age variable that represents a precise age estimate (several decimal points) for each person in a dataset. I want to first "bin" the age variable, extracting the mean value for each bin based on the "drink" dummy, and then graph the result. My code to do so looks like this:
df$bins <- cut(df$age, seq(from = 17, to = 31, by = .2), include.lowest = TRUE)
df.plot <- ddply(df, .(bins), summarise, avg.drink = mean(drinks_alcohol))
qplot(bins, avg.drink, data = df.plot)
This works well enough, but the x-axis in the graph is unreadable because it corresponds to the length size of the bins. Is there a way to make the modify the X-axis to show, for example, ages 19-23 only, with the "ticks" still aligning with the correct bins? For example, in my current code there is a bin for (19, 19.2] and another bin for (20, 20.2]. I would want only the bins that start in whole numbers to be identified on the X-axis with the first number (19, 20), not the second (19.2, 20.2) shown.
Is there any straightforward way to do this?
The most direct way to specify axis labels is with the appropriate scale function... in the case of factors on the x axis, scale_x_discrete. It will use whatever labels you give it with the labels argument, or you can give it a function that formats things as you like.
To "manually" specify the labels, you just need to create a vector of appropriate length. In this case, if you factor values go are intervals beginning with seq(17, 31.8, by = 0.2) and you want to label bins beginning with integers, then your labels vector will be
bin_starts = seq(17, 31.8, by = 0.2)
bin_labels = ifelse(bin_starts - trunc(bin_starts) < 0.0001, as.character(bin_starts), "")
(I use the a - b < 0.0001 in case of precision problems, though it shouldn't be a problem in this particular case).
A more robust solution would to label the factor levels with the number at the start of the interval from the beginning. cut also has a labels argument.
my_breaks = seq(17, 32, by = 0.2)
df$bins <- cut(df$age, breaks = my_breaks, labels = head(my_breaks, -1),
include.lowest = TRUE)
You could then fairly easily write a formatter (following templates from the scales package) to print only the ones you want:
int_only = function(x) {
# test if we can coerce to numeric, if not do nothing
if (any(is.na(as.numeric(x)))) return(x)
# otherwise convert to numeric and return integers and blanks as labels
x = as.numeric(x)
return(ifelse(x - trunc(x) < 1e-10, as.character(x), ""))
}
Then, using the nicely formatted data created above, you should be able to pass int_only as a formatter function to labels to get the labels you want. (Note: untested! necessary tweaks left as an exercise for the reader, though I'll gladly accept edits :) )

How to create a bwplot with date on x-axis

users
thanks to the reply of #McQueenDon on r-nabble
http://r.789695.n4.nabble.com/boxplot-with-x-axis-time-td4686787.html#a4687746
I managed to produce a boxplot::base of a single variable with the x-axis correctly formatted and spaced for the date of acquisition.
What if I would like to produce it with bwplot::lattice? I need this because I would like also to use a conditional factor.
Here you are a reproducible example (thanks again to #McQueenDon )
data(iris)
pippo= stack(iris[,-5])
pippo$date= rep(c("2013/01/29", "2013/03/01", "2013/11/01",
"2013/12/01", "2014/02/01", "2014/07/02"), 100)
pippo$date= as.Date(pippo$date)
boxplot(pippo$values ~ pippo$date) ## NOT exactly what I want
bx<- boxplot(pippo$values ~ pippo$date, plot= F)
bxp(bx, at=sort(unique(pippo$date))) # this is what I was looking for !
require(lattice)
bwplot(values~date, pippo, horizontal=F) #dates looks not correctly spaced even though they are correctly ordered and formatted
# finally I would like to condition to the 'ind' variable
bwplot(values~date| ind, pippo, horizontal=F, layout= c(2,2))
Thanks
Giuseppe
How about
xyplot(values~date| ind, pippo, horizontal=F, layout= c(2,2),
panel=panel.bwplot, box.width=20)
Here we use xyplot with a custom panel= parameter rather than bwplot because bwplot converts the x to a factor first which renumbers all the levels with sequential integers; xyplot does not do this.
If you wanted to label the exact dates, you could try
dts<-unique(pippo$date)
xyplot(values~date| ind, pippo, horizontal=F, layout= c(2,2),
panel=panel.bwplot, box.width=20,
scales=list(x=list(at=dts)))
but that looks quote crowded in this particular example.

Resources