Stacked/Dodged barplot using base R with x-axis is numerical data

Stacked/Dodged barplot using base R with x-axis is numerical data - r

I have looked at all barplot questions in the sites but still couldn't figure out what to do with my dataset. I don't know if it's a duplicate but any help would be so much appreciated
My dataset
Region Scenario HC NPV1 NPV2
C 1 0.1 10 5
C 2 0.2 8 4
C 3 0.3 7 3
C 4 0.4 6 2
N 1 0.1 10 5
N 2 0.2 8 4
N 3 0.3 7 3
N 4 0.4 6 2
W 1 0.1 10 5
W 2 0.2 8 4
W 3 0.3 7 3
W 4 0.4 6 2
I want to create a barplot where HC, Scenario is at x-axis, NPV1 and NPV2 is the height and be distinguished by different patterns. A region should be a common name in the middle of each 4 scenarios.
Thanks a lot.
Expected output is something like this.

Further to my above comments, I'm quite unclear about how you'd like to visualise your data. What exactly would you like to show on the x axis?
As a start, perhaps you are after something like this?
library(tidyverse)
df %>%
gather(key, val, -Region, -Scenario, -HC) %>%
unite(x, Region, Scenario, HC) %>%
ggplot(aes(x, val, fill = key)) +
geom_col()
Here categories on the x-axis are of the form <Region>_<Scenario>_<HC>.
Update
To achieve a plot similar to the one you're showing you can do the following
library(tidyverse)
df %>%
gather(key, val, -Region, -Scenario, -HC) %>%
ggplot(aes(HC, val, fill = key)) +
geom_col(position = "dodge2") +
facet_wrap(~Region, nrow = 1, strip.position = "bottom") +
theme_minimal() +
theme(strip.placement = "outside")
Explanation: strip.position = "bottom" ensures that strip labels are at the bottom, and strip.placement = "outside" ensures that strip labels are below the axis labels (to be precise, between the axis labels and axis title).
Sample data
df <- read.table(text =
"Region Scenario HC NPV1 NPV2
C 1 0.1 10 5
C 2 0.2 8 4
C 3 0.3 7 3
C 4 0.4 6 2
N 1 0.1 10 5
N 2 0.2 8 4
N 3 0.3 7 3
N 4 0.4 6 2
W 1 0.1 10 5
W 2 0.2 8 4
W 3 0.3 7 3
W 4 0.4 6 2
", header = T)

Related

Problems adapting the y-axis to 2x2 ANOVA bargraph using R and ggplot [duplicate]

This question already has answers here:
geom_bar bars not displaying when specifying ylim
(4 answers)
Closed last year.
I am not a Pro R user but I already tried multiple things and can't find a solution to the problem.
I created a bar graph for 2x2 ANOVA including error bars, APA theme and custom colors based on this website: https://sakaluk.wordpress.com/2015/08/27/6-make-it-pretty-plotting-2-way-interactions-with-ggplot2/
It works nicely but the y-axis starts at 0 although my scale only ranges from 1 - 7. I am trying to adapt the axis but I get strange errors.
This is what I did:
# see https://sakaluk.wordpress.com/2015/08/27/6-make-it-pretty-plotting-2-way-interactions-with-ggplot2/
interactionMeans(anova.2)
plot(interactionMeans(anova.2))
#using ggplot
install.packages("ggplot2")
library(ggplot2)
# create factors with value
GIFTSTUDY1DATA$PRICE <- ifelse (Scenario == 3 | Scenario == 4, 1, -1 )
table(GIFTSTUDY1DATA$PRICE)
GIFTSTUDY1DATA$PRICE <- factor(GIFTSTUDY1DATA$PRICE, levels = c(-1, +1),
labels = c("2 expensive", "1 inexpensive"))
GIFTSTUDY1DATA$AFFECT <- ifelse (Scenario == 1 | Scenario == 3, -1, +1 )
table(GIFTSTUDY1DATA$AFFECT)
GIFTSTUDY1DATA$AFFECT <- factor(GIFTSTUDY1DATA$AFFECT,
levels = c(-1,1),
labels = c("poor", "rich"))
# get descriptives
dat2 <- describeBy(EVALUATION,list(GIFTSTUDY1DATA$PRICE,GIFTSTUDY1DATA$AFFECT),
mat=TRUE,digits=2)
dat2
names(dat2)[names(dat2) == 'group1'] = 'Price'
names(dat2)[names(dat2) == 'group2'] = 'Affect'
dat2$se = dat2$sd/sqrt(dat2$n)
# error bars +/- 1 SE
limits = aes(ymax = mean + se, ymin=mean - se)
dodge = position_dodge(width=0.9)
# set layout
apatheme=theme_light()+
theme(panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
panel.border=element_blank(),
axis.line=element_line(),
text=element_text(family='Arial'))
#plot
p=ggplot(dat2, aes(x = Affect, y = mean, fill = Price))+
geom_bar(stat='identity', position=dodge)+
geom_errorbar(limits, position=dodge, width=0.15)+
apatheme+
ylab('mean gift evaluatoin')+
scale_fill_manual(values=c("yellowgreen","skyblue4"))
p
Which gives me this figure:
https://i.stack.imgur.com/MwdVo.png
Now, if I try to change the y-axis using ylim or scale_y_continous
p + ylim(1,7)
p + scale_y_continuous(limits = c(1,7))
I get a graph with the y-axis as wanted but no bars and an error message stating
Removed 4 rows containing missing values (geom_bar).
https://i.stack.imgur.com/p66H8.png
Using
p + expand_limits(y=c(1,7))
p
changes the upper end of the y-axis but still includes the zero!
What am I doing wrong? Do I have to start all over without using geom_bar?
Thanks in advance.

While Magnus Nordmo's answer is helpful, I would like to add the reason why ggplot2 behaves this way.
Consider the following plot (friendly reminder that geom_col() is shorthand for geom_bar(stat = "identity")):
df <- data.frame(x = letters[1:7],
y = 1:7)
g <- ggplot(df, aes(x, y)) +
geom_col()
g
You can clearly see that the bars look like rectangles. Checking the underlying plot data, confirms that the bars are parameterised as rectangles with xmin/xmax/ymin/ymax parametrisation:
> layer_data(g)
x y PANEL group ymin ymax xmin xmax colour fill size linetype alpha
1 1 1 1 1 0 1 0.55 1.45 NA grey35 0.5 1 NA
2 2 2 1 2 0 2 1.55 2.45 NA grey35 0.5 1 NA
3 3 3 1 3 0 3 2.55 3.45 NA grey35 0.5 1 NA
4 4 4 1 4 0 4 3.55 4.45 NA grey35 0.5 1 NA
5 5 5 1 5 0 5 4.55 5.45 NA grey35 0.5 1 NA
6 6 6 1 6 0 6 5.55 6.45 NA grey35 0.5 1 NA
7 7 7 1 7 0 7 6.55 7.45 NA grey35 0.5 1 NA
Now consider the following plot:
g2 <- ggplot(df, aes(x, y)) +
geom_col() +
scale_y_continuous(limits = c(1, 7))
This one is empty, and reflects the case you have posted. Inspecting the underlying data yields the following:
> layer_data(g2)
y x PANEL group ymin ymax xmin xmax colour fill size linetype alpha
1 1 1 1 1 NA 1 0.55 1.45 NA grey35 0.5 1 NA
2 2 2 1 2 NA 2 1.55 2.45 NA grey35 0.5 1 NA
3 3 3 1 3 NA 3 2.55 3.45 NA grey35 0.5 1 NA
4 4 4 1 4 NA 4 3.55 4.45 NA grey35 0.5 1 NA
5 5 5 1 5 NA 5 4.55 5.45 NA grey35 0.5 1 NA
6 6 6 1 6 NA 6 5.55 6.45 NA grey35 0.5 1 NA
7 7 7 1 7 NA 7 6.55 7.45 NA grey35 0.5 1 NA
You can see that the ymin column is replaced by NAs. This behaviour depends on the oob (out-of-bounds) argument of scale_y_continuous(), which defaults to the scales::censor() function. This censors (replaces with NA) any values that are outside the axis limits, which includes the 0 which should be the ymin column. As a consequence, the rectangles can't be drawn.
There are two ways to work around this. One candidate is indeed as Magnus suggested to use the ylim argument in the coord_cartesian() function:
ggplot(df, aes(x, y)) +
geom_col() +
coord_cartesian(ylim = c(1, 7))
Specifying the limits inside a coord_* function causes the graphical objects to be clipped. You can see this in action when you turn the clipping off:
ggplot(df, aes(x, y)) +
geom_col() +
coord_cartesian(ylim = c(1, 7), clip = "off")
The other option is to use an alternative oob argument in the scale_y_continuous, for example scales::squish:
g3 <- ggplot(df, aes(x, y)) +
geom_col() +
scale_y_continuous(limits = c(1, 7),
oob = scales::squish)
g3
What this does, is that it replaces any value outside the limits by the nearest limit, e.g. the ymin of 0 becomes 1:
> layer_data(g3)
y x PANEL group ymin ymax xmin xmax colour fill size linetype alpha
1 1 1 1 1 1 1 0.55 1.45 NA grey35 0.5 1 NA
2 2 2 1 2 1 2 1.55 2.45 NA grey35 0.5 1 NA
3 3 3 1 3 1 3 2.55 3.45 NA grey35 0.5 1 NA
4 4 4 1 4 1 4 3.55 4.45 NA grey35 0.5 1 NA
5 5 5 1 5 1 5 4.55 5.45 NA grey35 0.5 1 NA
6 6 6 1 6 1 6 5.55 6.45 NA grey35 0.5 1 NA
7 7 7 1 7 1 7 6.55 7.45 NA grey35 0.5 1 NA
Another thing you could do is provide a custom function to the oob argument, that simply returns it's input. Since by default, clipping is on, this reflects the coord_cartesian(ylim = c(1,7)) case:
ggplot(df, aes(x, y)) +
geom_col() +
scale_y_continuous(limits = c(1, 7),
oob = function(x, ...){x})
I hope this clarified what is going on here.

I have encountered a similar problem which was solved by replacing
scale_y_continuous(limits = c() with
coord_cartesian(ylim = c())
I think this might work for you.
Here is an example:
library(tidyverse)
ggplot(mtcars,aes(factor(am),hp)) +
geom_bar(stat = "identity") +
coord_cartesian(ylim = c(1000,3000))
Also see link:
Google R Discussion

Remove Column Names in R

I am relatively new to R and am struggling to remove the column names for this graph. Here is a small sample of the 4417 row data which contains 3 trials and 8 tests. I have used row.names=FALSE, which doesn't remove their names from the graph.
Test TestNumber Display Trial TrueValue Subject Response
Vertical Distance, Aligned 1 1 B 0.6 1 0.6
Vertical Distance, Aligned 1 1 B 0.6 2 0.55
Vertical Distance, Aligned 1 1 B 0.6 3 0.7
Vertical Distance, Aligned 1 1 B 0.6 4 0.6
Vertical Distance, Aligned 1 1 B 0.6 5 0.65
Vertical Distance, Aligned 1 1 B 0.6 6 0.6
Vertical Distance, Aligned 1 1 B 0.6 7 0.5
Vertical Distance, Aligned 1 1 B 0.6 8 0.65
Vertical Distance, Aligned 1 1 B 0.6 9 0.5
ggplot(ds, aes(x=factor(Response),
y=TrueValue,
row.names=FALSE,
color=Trial,sd(x)))
+ geom_boxplot(notch=FALSE)
+ scale_y_continuous("Response")
+ scale_x_discrete('Trial')
+ theme_bw()
+ theme(axis.text.x=element_text(angle = -90, hjust = 0))
+ theme(text=element_text(size=10, family="Arial"))
+ ggtitle('Trial Median Comparison \n to Look for Over Estimation')

Plotting several X,Y column pairs as data series, while excluding (0,0) points

I'm trying to plot three data series in a single plot. The X and Y coordinates of each series are in separate columns in my data frame:
X1 Y1 X2 Y2 X3 Y3
1 0 1 0 2 0 3
2 1 2 1 3 1 4
3 2 3 2 4 2 5
4 3 4 3 5 3 6
5 4 5 4 6 4 7
6 5 6 5 7 5 8
7 6 7 6 8 6 9
8 0 0 7 9 7 8
9 0 0 8 8 0 0
10 0 0 9 7 0 0
Since the trailing (0,0) data points of each series are invalid, only this subset of points should eventually be plotted:
X1 Y1 X2 Y2 X3 Y3
1 0 1 0 2 0 3
2 1 2 1 3 1 4
3 2 3 2 4 2 5
4 3 4 3 5 3 6
5 4 5 4 6 4 7
6 5 6 5 7 5 8
7 6 7 6 8 6 9
8 7 9 7 8
9 8 8
10 9 7
Additionally, the X-axis of the first series should be inverted:
Even without cleaning up with data frame first, I struggled to plot the column pairs as individual series in ggplot2 (see 'legend').
require(ggplot2)
report <- function(df){
plot = ggplot(data=df, aes(x=-X1, y=Y1, size=3)) + #inverted X-axis of series 1
layer(geom="point") +
geom_point(aes(X2, Y2, colour="red", size=2)) +
geom_point(aes(X3, Y3, colour="blue", size=1)) +
xlab("X") + ylab("Y")
print(plot)
}
X1 = c(0,1,2,3,4,5,6,0,0,0)
Y1 = c(1,2,3,4,5,6,7,0,0,0)
X2 = c(0,1,2,3,4,5,6,7,8,9)
Y2 = c(2,3,4,5,6,7,8,9,8,7)
X3 = c(0,1,2,3,4,5,6,7,0,0)
Y3 = c(3,4,5,6,7,8,9,8,0,0)
df <- data.frame(X1,Y1,X2,Y2,X3,Y3)
colnames(df) <- c("X1","Y1","X2","Y2","X3","Y3")
report(df)
What would be the best way to get rid of the invalid (0,0) data points in each series, and how should I plot them properly?

I think you actually want to transform your data.frame in order to make your ggplot call more concise. Here is the updated version to plot your data correctly using the dplyr package to transform the data.
In response to comment requesting additional info on dplyr. It provides the %>% operator which simply passed the argument to the left into the function on the right as the first argument. It allows for much more readable R code. The mutate function adds the Series variable via a manual setting of the variable given the knowledge of which points are part of which series. Then the filter function removes the 0,0 points which you indicated were not wanted. You can inspect the df after these operations to see the final output. Hope this helps interpret the below code. Also here is a link to the dplyr page.
library(dplyr)
df <- rbind.data.frame(
data.frame(X=-X1, Y=Y1),
data.frame(X=X2, Y=Y2),
data.frame(X=X3, Y=Y3))
df <- df %>%
mutate(Series=rep(c('S1', 'S2', 'S3'), each=10)) %>%
filter(!(X == 0 & Y == 0))
png('foo.png')
ggplot(df) + geom_point(aes(x=X, y=Y, color=Series, size=Series))
dev.off()
Also if you want to manual set the values of color and size as well as adding the lines as in your ideal example plot, here is a more complex ggplot command:
ggplot(df, aes(x=X, y=Y, color=Series, size=Series)) +
geom_point() + geom_line(size=1) + theme_bw() +
scale_color_manual(values=c('black', 'red', 'blue')) +
scale_size_manual(values=seq(4,2,-1))

ggplot2 expecting square matrix even though matrix is not symmetric

Hi I am trying to plot a heat map in ggplot2, using a matrix with 9 rows and 10 columns
I melt the matrix using the "as.matrix" notation in reshape2 and get the following output
A1 = melt(as.matrix(A))
Var1 Var2 value
1 1 X0.05 8.690705e-01
2 2 X0.05 1.930320e-01
3 3 X0.05 1.474900e-02
4 4 X0.05 3.498176e-04
5 5 X0.05 2.451419e-06
6 6 X0.05 4.946808e-09
7 7 X0.05 2.832895e-12
8 8 X0.05 4.563140e-16
9 9 X0.05 2.055474e-20
10 1 X0.1 5.906241e-01
11 2 X0.1 7.416265e-01
12 3 X0.1 2.311771e-01
13 4 X0.1 3.892639e-02
14 5 X0.1 3.361408e-03
15 6 X0.1 1.445629e-04
16 7 X0.1 3.043528e-06
17 8 X0.1 3.103555e-08
18 9 X0.1 1.522292e-10
The output is correct with each column represented by 9 values
I then rescale by value and get the following output
A2 = ddply(A1, .(var2), transform, rescale = rescale(value))
Var1 Var2 value rescale
1 1 X0.05 8.690705e-01 1.000000e+00
2 2 X0.05 1.930320e-01 2.221132e-01
3 3 X0.05 1.474900e-02 1.697101e-02
4 4 X0.05 3.498176e-04 4.025192e-04
5 5 X0.05 2.451419e-06 2.820737e-06
6 6 X0.05 4.946808e-09 5.692068e-09
7 7 X0.05 2.832895e-12 3.259684e-12
8 8 X0.05 4.563140e-16 5.250361e-16
9 9 X0.05 2.055474e-20 0.000000e+00
10 1 X0.1 5.906241e-01 7.963902e-01
11 2 X0.1 7.416265e-01 1.000000e+00
12 3 X0.1 2.311771e-01 3.117163e-01
13 4 X0.1 3.892639e-02 5.248786e-02
14 5 X0.1 3.361408e-03 4.532480e-03
15 6 X0.1 1.445629e-04 1.949266e-04
16 7 X0.1 3.043528e-06 4.103651e-06
17 8 X0.1 3.103555e-08 4.164269e-08
18 9 X0.1 1.522292e-10 0.000000e+00
Everything still looks fine and when I plot the heat map the output is correct, so far so good
ggplot(A2, aes(Var2, Var1)) + geom_tile(aes(fill = rescale), colour = "white") + scale_fill_gradient(low = "light blue", high = "dark blue")
The problem comes up when I add custom labels, where the y axis goes from 1 to 9 (displaying the number of heterozygote individuals) and the x-axis goes from 0.05 to 0.5 (displaying the minor allele frequency)
x = [0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50]
y = [1 2 3 4 5 6 7 8 9]
ggplot(A4, aes(Var2, Var1)) + geom_tile(aes(fill = rescale), colour = "white") + scale_fill_gradient(low = "light blue", high = "dark blue") + scale_x_discrete(labels= x, expression("Minor allele frequency")) + scale_y_discrete(labels= y, expression("Number of Heterozygotes"))
However this time the y axis is all messed up
It seems to me that ggplot automatically assumes a 10X10 matrix and tries to add the missing labels. I tried to find an option in reshape where I could maybe declare the shape of the matrix, however I was unable to find a solution. Has anyone come across this problem. Any help would be much appreciated, thanks in advance

Here is one approach. You can change tick mark labels with scale_x_discrete. As for y, I converted Var1 to factor.
ggplot(mydf, aes(x = Var2, y = as.factor(Var1), fill = rescale)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "light blue", high = "dark blue") +
scale_x_discrete(breaks=c("X0.05", "X0.1"), labels=c("0.05", "0.1")) +
xlab("Minor allele frequency") +
ylab("Number of Heterozygotes")

Why doesn't qplot plot lines in multiple series for this data file?

It's my first day learning R and ggplot. I've followed some tutorials and would like plots like are generated by the following command:
qplot(age, circumference, data = Orange, geom = c("point", "line"), colour = Tree)
It looks like the figure on this page:
http://www.r-bloggers.com/quick-introduction-to-ggplot2/
I had a handmade test data file I created, which looks like this:
site temp humidity
1 1 1 3
2 1 2 4.5
3 1 12 8
4 1 14 10
5 2 1 5
6 2 3 9
7 2 4 6
8 2 8 7
but when I try to read and plot it with:
test <- read.table('test.data')
qplot(temp, humidity, data = test, color=site, geom = c("point", "line"))
the lines on the plot aren't separate series, but link together:
http://imgur.com/weRaX
What am I doing wrong?
Thanks.

You need to tell ggplot2 how to group the data into separate lines. It's not a mind reader! ;)
dat <- read.table(text = " site temp humidity
1 1 1 3
2 1 2 4.5
3 1 12 8
4 1 14 10
5 2 1 5
6 2 3 9
7 2 4 6
8 2 8 7",sep = "",header = TRUE)
qplot(temp, humidity, data = dat, group = site,color=site, geom = c("point", "line"))
Note that you probably also wanted to do color = factor(site) in order to force a discrete color scale, rather than a continuous one.

Categories

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Stacked/Dodged barplot using base R with x-axis is numerical data - r

Related

Problems adapting the y-axis to 2x2 ANOVA bargraph using R and ggplot [duplicate]

Remove Column Names in R

Plotting several X,Y column pairs as data series, while excluding (0,0) points

ggplot2 expecting square matrix even though matrix is not symmetric

Why doesn't qplot plot lines in multiple series for this data file?

Categories

Resources