I'm tyring to facet wrap this scatter plot by the y axis.For example, if the y-axis goes up to 1000, I would like to separate this graph into 4, the first where the y-axis goes from 0-250, the next 251-500, the next 501-750, and the last 751-1000. Is this type of facet wrapping possible?
library(ggplot2)
A
nrow(A)
# 1000
ncol(A)
# 3
head(A)
# Track Base Location
# 1 1 A 1
# 2 1 C 2
# 3 1 G 3
# 4 1 G 4
# 5 1 A 5
# 6 1 A 6
p <- ggplot(data = A, aes(y=Track, x=Location)) +
geom_point(aes(colour=Base),shape=15,size=2)
print(p)
This is what I have right now, as you can see, it doesn't look aesthetically pleasing.
You can - you just have to make an indicator variable which shows which facet each point should belong in.
(Quick aside - what you have placed in your question is still not a reproducible example - what we are after is something we can copy-paste into our terminals that will still demonstrate your problem. For example, in my answer I've shown you a reproducible example where I only have 100 rows in A rather than you 1000, and my intervals are different to yours, but the key is you can copy-paste it straight to your terminal and you can easily see how it will extend to your problem).
e.g.
# here's a reproducible example, with 100 data points
A <- data.frame(Track=1, Location=1:100, Base=factor(sample(c('A', 'T', 'C', 'G'), 100, replace=T)))
library(ggplot2)
ggplot(transform(A, panel=cut(Location, seq(0, 100, by=25), include.lowest=T)),
aes(x=Track, y=Location)) +
geom_point(aes(colour=Base),shape=15,size=2) +
facet_wrap( ~ panel, scales='free')
Key points:
transform(A, panel=...) adds an extra column "panel" into your dataframe (try running that on its own to see)
cut(Location, seq(0, 100, by=25), include.lowest=T) makes a factor that indicates which interval each Location is in. The intervals here are [0, 25], (25,50] (50,75] (75,100] (the include.lowest makes sure that the 0 is included in the first interval). For the breaks you mentioned in your question you'd do something like seq(0, 1000, by=250)
the facet_wrap(~ panel,...) makes one facet per interval
the scales='free' in the facet_wrap makes it so that all the y scales may be different in each panel (skip it out and they all get plotted on a common scale - just try to skip it out and you will see what I mean).
Related
Let's say I have a histogram with two overlapping groups. Here's a possible command from ggplot2 and a pretend output graph.
ggplot2(data, aes(x=Variable1, fill=BinaryVariable)) + geom_histogram(position="identity")
So what I have is the frequency or count of each event. What I'd like to do instead is to get the difference between the two events in each bin. Is this possible? How?
For example, if we do RED minus BLUE:
Value at x=2 would be ~ -10
Value at x=4 would be ~ 40 - 200 = -160
Value at x=6 would be ~ 190 - 25 = 155
Value at x=8 would be ~ 10
I'd prefer to do this using ggplot2, but another way would be fine. My dataframe is set up with items like this toy example (dimensions are actually 25000 rows x 30 columns) EDITED: Here is example data to work with GIST Example
ID Variable1 BinaryVariable
1 50 T
2 55 T
3 51 N
.. .. ..
1000 1001 T
1001 1944 T
1002 1042 N
As you can see from my example, I'm interested in a histogram to plot Variable1 (a continuous variable) separately for each BinaryVariable (T or N). But what I really want is the difference between their frequencies.
So, in order to do this we need to make sure that the "bins" we use for the histograms are the same for both levels of your indicator variable. Here's a somewhat naive solution (in base R):
df = data.frame(y = c(rnorm(50), rnorm(50, mean = 1)),
x = rep(c(0,1), each = 50))
#full hist
fullhist = hist(df$y, breaks = 20) #specify more breaks than probably necessary
#create histograms for 0 & 1 using breaks from full histogram
zerohist = with(subset(df, x == 0), hist(y, breaks = fullhist$breaks))
oneshist = with(subset(df, x == 1), hist(y, breaks = fullhist$breaks))
#combine the hists
combhist = fullhist
combhist$counts = zerohist$counts - oneshist$counts
plot(combhist)
So we specify how many breaks should be used (based on values from the histogram on the full data), and then we compute the differences in the counts at each of those breaks.
PS It might be helpful to examine what the non-graphical output of hist() is.
Here's a solution that uses ggplot as requested.
The key idea is to use ggplot_build to get the rectangles computed by stat_histogram. From that you can compute the differences in each bin and then create a new plot using geom_rect.
setup and create a mock dataset with lognormal data
library(ggplot2)
library(data.table)
theme_set(theme_bw())
n1<-500
n2<-500
k1 <- exp(rnorm(n1,8,0.7))
k2 <- exp(rnorm(n2,10,1))
df <- data.table(k=c(k1,k2),label=c(rep('k1',n1),rep('k2',n2)))
Create the first plot
p <- ggplot(df, aes(x=k,group=label,color=label)) + geom_histogram(bins=40) + scale_x_log10()
Get the rectangles using ggplot_build
p_data <- as.data.table(ggplot_build(p)$data[1])[,.(count,xmin,xmax,group)]
p1_data <- p_data[group==1]
p2_data <- p_data[group==2]
Join on the x-coordinates to compute the differences. Note that the y-values aren't the counts, but the y-coordinates of the first plot.
newplot_data <- merge(p1_data, p2_data, by=c('xmin','xmax'), suffixes = c('.p1','.p2'))
newplot_data <- newplot_data[,diff:=count.p1 - count.p2]
setnames(newplot_data, old=c('y.p1','y.p2'), new=c('k1','k2'))
df2 <- melt(newplot_data,id.vars =c('xmin','xmax'),measure.vars=c('k1','diff','k2'))
make the final plot
ggplot(df2, aes(xmin=xmin,xmax=xmax,ymax=value,ymin=0,group=variable,color=variable)) + geom_rect()
Of course the scales and legends still need to be fixed, but that's a different topic.
My question maybe very simple but I couldn't find the answer!
I have a matrix with 12 entries and I made a stack barplot with barplot function in R.
With this code:
mydata <- matrix(nrow=2,ncol=6, rbind(sample(1:12, replace=T)))
barplot(mydata, xlim=c(0,25),horiz=T,
legend.text = c("A","B","C","D","E","F"),
col=c("blue","green"),axisnames = T, main="Stack barplot")
Here is the image from the code:
What I want to do is to give each of the group (A:F , only the blue part) a different color but I couldn't add more than two color.
and I also would like to know how can I start the plot from x=2 instead of 0.
I know it's possible to choose the range of x by using xlim=c(2,25) but when I choose that part of my bars are out of range and I get picture like this:
What I want is to ignore the part of bars that are smaller than 2 and start the x-axis from two and show the rest of bars instead of put them out of range.
Thank you in advance,
As already mentioned in the other post is entirely clear your desired output. Here another option using ggplot2. I think the difficulty here is to reshape2 the data, then the plot step is straightforwardly.
library(reshape2)
library(ggplot2)
## Set a seed to make your data reproducible
set.seed(1)
mydata <- matrix(nrow=2,ncol=6, rbind(sample(1:12, replace=T)))
## tranfsorm you matrix to names data.frame
myData <- setNames(as.data.frame(mydata),LETTERS[1:6])
## put the data in the long format
dd <- melt(t(myData))
## transform the fill variable to the desired behavior.
## I used cumsum to bes sure to have a unique value for all VAR2==2.
## maybe you should chyange this step if you want an alternate behvior
## ( see other solution)
dd <- transform(dd,Var2 =ifelse(Var2==1,cumsum(Var2)+2,Var2))
## a simple bar plot
ggplot(dd) +
## use stat identity since you want to set the y aes
geom_bar(aes(x=Var1,fill=factor(Var2),y=value),stat='identity') +
## horizontal rotation and zooming
coord_flip(ylim = c(2, max(dd$value)*2)) +
theme_bw()
Another option using lattice package
I like the formula notation in lattice and its flexibility for flipping coordinates for example:
library(lattice)
barchart(Var1~value,groups=Var2,data=dd,stack=TRUE,
auto.key = list(space = "right"),
prepanel = function(x,y, ...) {
list(xlim = c(2, 2*max(x, na.rm = TRUE)))
})
You do this by using the "add" and "offset" arguments to barplot(), along with setting axes and axisnames FALSE to avoid double-plotting: (I'm throwing in my color-blind color palette, as I'm red-green color-blind)
# Conservative 8-color palette adapted for color blindness, with first color = "black".
# Wong, Bang. "Points of view: Color blindness." nature methods 8.6 (2011): 441-441.
colorBlind.8 <- c(black="#000000", orange="#E69F00", skyblue="#56B4E9", bluegreen="#009E73",
yellow="#F0E442", blue="#0072B2", reddish="#D55E00", purplish="#CC79A7")
mydata <- matrix(nrow=2,ncol=6, rbind(sample(1:12, replace=T)))
cols <- colorBlind.8[1:ncol(mydata)]
bar2col <- colorBlind.8[8]
barplot(mydata[1,], xlim=c(0,25), horiz=T, col=cols, axisnames=T,
legend.text=c("A","B","C","D","E","F"), main="Stack barplot")
barplot(mydata[2,], offset=mydata[1,], add=T, axes=F, axisnames=F, horiz=T, col=bar2col)
For the second part of your question, the "offset" argument is used for the first set of bars also, and you change xlim and use xaxp to adjust the x-axis numbering, and of course you must also adjust the height of the first row of bars to remove the excess offset:
offset <- 2
h <- mydata[1,] - offset
h[h < 0] <- 0
barplot(h, offset=offset, xlim=c(offset,25), xaxp=c(offset,24,11), horiz=T,
legend.text=c("A","B","C","D","E","F"),
col=cols, axisnames=T, main="Stack barplot")
barplot(mydata[2,], offset=offset+h, add=T, axes=F, axisnames=F, horiz=T, col=bar2col)
I'm not entirely sure if this is what you're looking for: 'A' has two values (x1 and x2), but your legend seems to hint otherwise.
Here is a way to approach what you want with ggplot. First we set up the data.frame (required for ggplot):
set.seed(1)
df <- data.frame(
name = letters[1:6],
x1=sample(1:6, replace=T),
x2=sample(1:6, replace=T))
name x1 x2
1 a 5 3
2 b 3 5
3 c 5 6
4 d 3 2
5 e 5 4
6 f 6 1
Next, ggplot requires it to be in a long format:
# Make it into ggplot format
require(dplyr); require(reshape2)
df <- df %>%
melt(id.vars="name")
name variable value
1 a x1 5
2 b x1 3
3 c x1 5
4 d x1 3
5 e x1 5
6 f x1 6
...
Now, as you want some bars to be a different colour, we need to give them an alternate name so that we can assign their colour manually.
df <- df %>%
mutate(variable=ifelse(
name %in% c("b", "d", "f") & variable == "x1",
"highlight_x1",
as.character(variable)))
name variable value
1 a x1 2
2 b highlight_x1 3
3 c x1 4
4 d highlight_x1 6
5 e x1 2
6 f highlight_x1 6
7 a x2 6
8 b x2 4
...
Next, we build the plot. This uses the standard colours:
require(ggplot2)
p <- ggplot(data=df, aes(y=value, x=name, fill=factor(variable))) +
geom_bar(stat="identity", colour="black") +
theme_bw() +
coord_flip(ylim=c(1,10)) # Zooms in on y = c(2,12)
Note that I use coord_flip (which in turn calls coord_cartesian) with the ylim=c(1,10) parameter to 'zoom in' on the data. It doesn't remove the data, it just ignores it (unlike setting the limits in the scale). Now, if you manually specify the colours:
p + scale_fill_manual(values = c(
"x1"="coral3",
"x2"="chartreuse3",
"highlight_x1"="cornflowerblue"))
I would like to simplify the proposed solution by #tedtoal, which was the finest one for me.
I wanted to create a barplot with different colors for each bar, without the need to use ggplot or lettuce.
color_range<- c(black="#000000", orange="#E69F00", skyblue="#56B4E9", bluegreen="#009E73",yellow="#F0E442", blue="#0072B2", reddish="#D55E00", purplish="#CC79A7")
barplot(c(1,6,2,6,1), col= color_range[1:length(c(1,6,2,6,1))])
I've seen similar questions asked, and this discussion about adding functionality to ggplot Setting x/y lim in facet_grid . In my research I often want to produce several panels plots, say for different simulation trials, where the axes limits remain the same to highlight differences between the trials. This is especially useful when showing the plot panels in a presentation. In each panel plot I produce, the individual plots require independent y axes as they're often weather variables, temperature, relative humidity, windspeed, etc. Using
ggplot() + ... + facet_wrap(~ ..., scales = 'free_y')
works great as I can easily produce plot panels of different weather variables.
When I compare between different plot panels, its nice to have consistent axes. Unfortunately ggplot provides no way of setting the individual limits of each plot within a panel plots. It defaults to using the range of given data. The Google Group discussion linked above discusses this shortcoming, but I was unable to find any updates as to whether this could be added. Is there a way to trick ggplot to set the individual limits?
A first suggestion that somewhat sidesteps the solution I'm looking for is to combine all my data into one data table and use facet_grid on my variable and simulation
ggplot() + ... + facet_grid(variable~simulation, scales = 'free_y')
This produces a fine looking plot that displays the data in one figure, but can become unwieldy when considering many simulations.
To 'hack' the plotting into producing what I want, I first determined which limits I desired for each weather variable. These limits were found by looking at the greatest extents for all simulations of interest. Once determined I created a small data table with the same columns as my simulation data and appended it to the end. My simulation data had the structure
'year' 'month' 'variable' 'run' 'mean'
1973 1 'rhmax' 1 65.44
1973 2 'rhmax' 1 67.44
... ... ... ... ...
2011 12 'windmin' 200 0.4
So I created a new data table with the same columns
ylims.sims <- data.table(year = 1, month = 13,
variable = rep(c('rhmax','rhmin','sradmean','tmax','tmin','windmax','windmin'), each = 2),
run = 201, mean = c(20, 100, 0, 80, 100, 350, 25, 40, 12, 32, 0, 8, 0, 2))
Which gives
'year' 'month' 'variable' 'run' 'mean'
1 13 'rhmax' 201 20
1 13 'rhmax' 201 100
1 13 'rhmin' 201 0
1 13 'rhmin' 201 80
1 13 'sradmean' 201 100
1 13 'sradmean' 201 350
1 13 'tmax' 201 25
1 13 'tmax' 201 40
1 13 'tmin' 201 12
1 13 'tmin' 201 32
1 13 'windmax' 201 0
1 13 'windmax' 201 8
1 13 'windmin' 201 0
1 13 'windmin' 201 2
While the choice of year and run is aribtrary, the choice of month need to be anything outside 1:12. I then appended this to my simulation data
sim1data.ylims <- rbind(sim1data, ylims)
ggplot() + geom_boxplot(data = sim1data.ylims, aes(x = factor(month), y = mean)) +
facet_wrap(~variable, scale = 'free_y') + xlab('month') +
xlim('1','2','3','4','5','6','7','8','9','10','11','12')
When I plot these data with the y limits, I limit the x-axis values to those in the original data. The appended data table with y limits has month values of 13. As ggplot still scales axes to the entire dataset, even when the axes are limited, this gives me the y limits I desire. Important to note that if there are data values greater than the limits you specify, this will not work.
Before: Notice the differences in the y limits for each weather variable between the panels.
After: Now the y limits remain consistent for each weather variable between the panels.
I hope to edit this post in the coming days and add a reproducible example for better explanation. Please comment if you've heard anything about adding this functionality to ggplot.
assume the following frequency table in R, which comes out of a survey:
1 2 3 4 5 8
m 5 16 3 16 5 0
f 12 25 3 10 3 1
NA 1 0 0 0 0 0
The rows stand for the gender of the survey respondent (male/female/no answer). The colums represent the answers to a question on a 5 point scale (let's say: 1= agree fully, 2 = agree somewhat, 3 = neither agree nor disagree, 4= disagree somewhat, 5 = disagree fully, 8 = no answer).
The data is stored in a dataframe called "slm", the gender variable is called "sex", the other variable is called "tv_serien".
My problem is, that I don't find a (in my opinion) proper way to create a line chart, where the x-axis represents the 5-point scale (plus the don't know answers) and the y-axis represents the frequencies for every point on the scale. Furthemore I want to create two lines (one for males, one for females).
My solution so far is the following:
I create a plot without plotting the "content" and the x-axis:
plot(slm$tv_serien, xlim = c(1,6), ylim = c(0,100), type = "n", xaxt = "n")
The problem here is that it feels like cheating to specify the xlim=c(1,6), because the raw scores of slm$tv_serienare 100 values. I tried also to to plot the variable via plot(factor(slm$tv_serien)...), but then it would still create a metric scale from 1 to 8 (because the dont know answer is 8).
So my first question is how to tell R that it should take the six distinct values (1 to 5 and 8) and take that as the x-axis?
I create the new x axis with proper labels:
axis(1, 1:6, labels = c("1", "2", "3", "4", "5", "DK"))
At least that works pretty well. ;-)
Next I create the line for the males:
lines(1:5, table(slm$tv_serien[slm$sex == 1]), col = "blue")
The problem here is that there is no DK (=8) answer, so I manually have to specify x = 1:5 instead of 1:6 in the "normal" case. My question here is, how to tell R to also draw the line for nonexisting values? For example, what would have happened, if no male had answered with 3, but I want a continuous line?
At last I create the line for females, which works well:
lines(1:6, table(slm$tv_serien[slm$sex == 2], col = "red")
To summarize:
How can I tell R to take the 6 distinct values of slm$tv_serien as the x axis?
How can i draw continuous lines even if the line contains "0"?
Thanks for your help!
PS: Attached you find the current plot for the abovementiond functions.
PPS: I tried to make a list from "1." to "4." but it seems that every new list element started again with "1.". Sorry.
Edit: Response to OP's comment.
This directly creates a line chart of OP's data. Below this is the original answer using ggplot, which produces a far superior output.
Given the frequency table you provided,
df <- data.frame(t(freqTable)) # transpose (more suitable for plotting)
df <- cbind(Response=rownames(df),df) # add row names as first column
plot(as.numeric(df$Response),df$f,type="b",col="red",
xaxt="n", ylab="Count",xlab="Response")
lines(as.numeric(df$Response),df$m,type="b",col="blue")
axis(1,at=c(1,2,3,4,5,6),labels=c("Str.Agr.","Sl.Agr","Neither","Sl.Disagr","Str.Disagr","NA"))
Produces this, which seems like what you were looking for.
Original Answer:
Not quite what you asked for, but converting your frequency table to a data frame, df
df <- data.frame(freqTable)
df <- cbind(Gender=rownames(df),df) # append rownames (Gender)
df <- df[-3,] # drop unknown gender
df
# Gender X1 X2 X3 X4 X5 X8
# m m 5 16 3 16 5 0
# f f 12 25 3 10 3 1
df <- df[-3,] # remove unknown gender column
library(ggplot2)
library(reshape2)
gg=melt(df)
labels <- c("Agree\nFully","Somewhat\nAgree","Neither Agree\nnor Disagree","Somewhat\nDisagree","Disagree\nFully", "No Answer")
ggp <- ggplot(gg,aes(x=variable,y=value))
ggp <- ggp + geom_bar(aes(fill=Gender), position="dodge", stat="identity")
ggp <- ggp + scale_x_discrete(labels=labels)
ggp <- ggp + theme(axis.text.x = element_text(angle=90, vjust=0.5))
ggp <- ggp + labs(x="", y="Frequency")
ggp
Produces this:
Or, this, which is much better:
ggp + facet_grid(Gender~.)
I have data conditioned on two variables, one major condition, one minor condition. I want a xyplot (lattice) with points and lines (type='b'), in one panel so that the major condition determines the color and the minor condition is used for drawing the lines.
Here is an example that is representative of my problem (see the code below to produce the data frame). d is the major condition, and c is the minor condition.
> dat
x y c d
1 1 0.9645269 a A
2 2 1.4892217 a A
3 3 1.4848654 a A
....
10 10 2.4802803 a A
11 1 1.5606218 b A
12 2 1.5346806 b A
....
98 8 2.0381943 j B
99 9 2.0826099 j B
100 10 2.2799917 j B
The way to get the connecting lines to be conditioned on c is to use groups=c in the plot. Then the way to tell them apart is to use a formula conditioned on d:
xyplot(y~x|d, data=dat, type='b', groups=c)
However, I want the plots in the same panel. Removing the formula condition on d produces one panel, but when group=d is specified, there are "retrace" lines drawn:
xyplot(y~x, data=dat, type='b', groups=d, auto.key=list(space='inside'))
What I want looks very like the above plot, only without these "retrace" lines.
It's possible to set the colors explicitly in this example, as I know that there are five lines of category 'A' followed by five of category 'B', but this won't easily work for my real problem. In addition, auto.key is useless when setting the colors this way:
xyplot(y~x, data=dat, type='b', groups=c, col=rep(5:6, each=5))
The data:
set.seed(1)
dat <- do.call(
rbind,
lapply(1:10,
function(x) {
firsthalf <- x < 6
data.frame(x=1:10, y=log(1:10 + rnorm(10, .25) + 2 * firsthalf),
c=letters[x],
d=LETTERS[2-firsthalf]
)
}
)
)
The default graphical parameters are obtained from the superpose.symbol and superpose.line. One solution s to set them using par.settings argument.
## I compute the color by group
col <-by(dat,dat$c,
FUN=function(x){
v <- ifelse(x$d=='A','darkgreen','orange')
v[1] ## I return one parameter , since I need one color
}
)
xyplot(y~x, data=dat, type='b', groups=c,
auto.key = list(text =levels(dat$d),points=F),
par.settings=
list(superpose.line = list(col = col), ## color of lines
superpose.symbol = list(col=col), ## colors of points
add.text = list(col=c('darkgreen','orange')))) ## color of text in the legend
Does it have to be lattice? In ggplot it is rather easy:
library(ggplot2)
ggplot(dat, aes(x=x,y=y,colour=d)) + geom_line(aes(group=c),size=0.8) + geom_point(shape=1)
This is a quick and dirty example. You can customize the colour of the lines, the legend , the axis, the background,...