Here is some code which reproduces my issue:
x <- as.factor(1:20)
y <- 1:20
id <- as.factor(c(rep(0,19),1))
g1 <- ggplot() + geom_bar(stat = "identity", aes(x = x, y = y, color = id, fill = id), width = 0.5) + ggtitle("g1")
g1 # First print
y <- 20:1
g2 <- ggplot() + geom_bar(stat = "identity", aes(x = x, y = y, color = id, fill = id), width = 0.5) + ggtitle("g2")
g2
g1 # Second print
As you can see when running the code above, the first time you print g1, you have a barplot starting at (factor 1, y = 1), ending at (factor 20, y = 20).
After having created g2, if you print again g1, it looks the same than g2, except the title which isn't modified.
I'm really puzzled, any help would be much appreciated !
ggplot works best when you pull data from a data.frame rather than the global environment. If you did
x <- as.factor(1:20)
y <- 1:20
id <- as.factor(c(rep(0,19),1))
g1 <- ggplot(data.frame(x, y, id)) +
geom_bar(stat = "identity", aes(x = x, y = y, color = id, fill = id), width = 0.5) +
ggtitle("g1")
y <- 20:1
g2 <- ggplot(data.frame(x, y, id)) +
geom_bar(stat = "identity", aes(x = x, y = y, color = id, fill = id), width = 0.5) +
ggtitle("g2")
everything would work fine.
The "problem" is that ggplot doesn't actually "build" the plot until you print it. And when you are linking to variable names with aes(), it just tracks the variable name, not the value. So it uses whatever the current value is when the plot prints. When we "trap" data inside a data.frame, we are capturing the current value of the variable so that we can use that later.
Related
I'm really new to R and I'm trying to plot data from air polution with NOx from 5 different locations (having a data of monthly averages from every location from 01-1996 to 12-2019). Each plot line should represent different location.
I've created a ggplot but I find it really unclear. I would like to ask you about your tips to make that plot better to read (It will be no bigger than A4, because it will be included in my work and printed). I would also like to have more years on X axis (1996, 1997, 1998)
ALIBA <- read_csv("ALIBA_Praha/NOx/all_sorted.csv")
BMISA <- read_csv("BMISA_Mikulov/NOx/all_sorted.csv")
CCBDA <- read_csv("CCBDA_CB/NOx/all_sorted.csv")
TKARA <- read_csv("TKARA_Karvina/NOx/all_sorted.csv")
UULKA <- read_csv("UULKA_UnL/NOx/all_sorted.csv")
ggplot() +
geom_line(data = ALIBA, aes(x = START_TIME, y = VALUE), color = "blue") +
geom_line(data = BMISA, aes(x = START_TIME, y = VALUE), color = "red") +
geom_line(data = CCBDA, aes(x = START_TIME, y = VALUE), color = "yellow") +
geom_line(data = TKARA, aes(x = START_TIME, y = VALUE), color = "green") +
geom_line(data = UULKA, aes(x = START_TIME, y = VALUE), color = "pink")
all csv files are in format:
START_TIME,VALUE
1996-01-01T00:00:00Z,61.3049451304964
1996-02-01T00:00:00Z,47.7234010245664
1996-03-01T00:00:00Z,33.083512309072
1996-04-01T00:00:00Z,47.771166691758
1996-05-01T00:00:00Z,24.7022422574005
1996-06-01T00:00:00Z,25.4495954480684
1996-07-01T00:00:00Z,23.301224242488
...
Thanks
First, I would paste all data sets together:
ALIBA <- read_csv("ALIBA_Praha/NOx/all_sorted.csv")
ALIBA$Location <- "ALIBA" # and so on
BMISA <- read_csv("BMISA_Mikulov/NOx/all_sorted.csv")
CCBDA <- read_csv("CCBDA_CB/NOx/all_sorted.csv")
TKARA <- read_csv("TKARA_Karvina/NOx/all_sorted.csv")
UULKA <- read_csv("UULKA_UnL/NOx/all_sorted.csv")
df <- rbind(ALIBA, BMISA, ...) # and so on
ggplot(data = df, aes(x = START_TIME, y = VALUE, color = Location) +
geom_line(size = 1) + # play with the stroke thickness
scale_color_brewer(palette = "Set1") + # here you can choose from a wide variety of palettes, just google
How would you like to add more years? In the same graph (everything will be tiny) or in seperate "windows" (= facets, better)?
I need to add an operation (division) in the right corner at the top of every facet. I need to divide value of z by value of x for A, B, C.
for A the operation result is 0.17 (400/2300), for B is 0.1363(30/200) and for C is 0.10 (2/19)
I was going to use annotate, but I read to better use geom_text when using facet
dt<-data.frame(va=c(rep("A",3),rep("B",3), rep("C",3))
,vb=c(rep(letters[24:26],3))
,value=c(23*100,13*100,4*100,22*10,12*10,3*10,19,8,2))
ggplot(data=dt, aes(x=vb,y=value)) +
geom_col(stat="identity",position="dodge")+
facet_wrap(~va,scales="free_y")+
geom_text(aes(label=value))
You can make a new dataframe in which you calculate the result you want to display, as well as the x, y position. Then you can change the dataframe used by the geom_text layer with the data argument.
dt_calc <- dt %>%
pivot_wider(va, vb) %>%
mutate(result = z / x, xpos = "z", ypos = pmax(x, y, z))
ggplot(data = dt, aes(x = vb, y = value)) +
geom_col(position = "dodge") +
facet_wrap( ~ va, scales = "free_y") +
geom_text(aes(x = xpos, y = ypos, label = round(result, 2)), data = dt_calc)
Output:
Oh wise ones: I've got a question about the use of geom_linerange(), attached is what I hope is a workable example to illustrate my problem.
b=c(100,110,90,100,120,130,170,150,150,120,140,150,120,90,90,100,40,50,40,40,20,60,30)
test<-data.frame(a=c(2,2,2,4,4,4,4,6,6,6,6,6,6,8,8,8,10,10,10,10,10,10,10),
b=b,c=c(b-15))
testMelt <- melt(
test,
id = c("a"),
measured = c("b", "c")
)
p <- ggplot(
aes(
x = factor(a),
y = value,
fill= variable
),
data = testMelt) +
geom_boxplot() +
stat_smooth(aes(group=variable,x=factor(a),y=value,fill=factor(variable)),data=testMelt)
My actual dataset is much larger, and the boxplots are a bit overwhelming. I think what I want is to use geom_linerange() somehow to show the range of the data, at "b" and "c", at each value of "a".
The best I've come up with is:
p<- p+ geom_linerange(aes(as.factor(a),ymin=min(value),ymax=value,color=variable))
I can assume the "c" values are always equal to or less than "b", but if the range is smaller, this "covers it up". Can I jitter the lines somehow? Is there a better solution?
In your geom_linerange call, add an additional argument position=position_dodge(width=0.3). You can adjust the absolute width to change the separation between the vertical lines.
My understanding of the question is that you want the line range to reflect the range for the combination a:b:c.
geom_linerange(aes(as.factor(a),ymin=min(value),ymax=value,color=variable)) will set the minimum value to the whole-dataset minimum (hence all the lines appear with the same minimum value.
A couple of solutions.
Calculate the minima and maxima yourself
test_range <- ddply(testMelt, .(a,variable), summarize,
val_min = min(value), val_max = max(value))
then run
ggplot(data = testMelt) +
geom_boxplot(aes(x = factor(a), y = value, fill = variable)) +
stat_smooth(aes(group = variable, x = factor(a), y = value,
fill = factor(variable))) +
geom_linerange(data = test_range, aes(x = as.factor(a), ymin = val_min,
ymax = val_max, color = variable),
position = position_dodge(width = 0.3))
Or, for an alternative to boxplots / line range use a violin plot.
ggplot(data = testMelt) +
geom_violin(aes(x = factor(a), y = value, fill = variable)) +
stat_smooth(aes(group = variable, x = factor(a), y = value,
fill = factor(variable)))
I'd like to annotate all y-values greater than a y-threshold using ggplot2.
When you plot(lm(y~x)), using the base package, the second graph that pops up automatically is Residuals vs Fitted, the third is qqplot, and the fourth is Scale-location. Each of these automatically label your extreme Y values by listing their corresponding X value as an adjacent annotation. I'm looking for something like this.
What's the best way to achieve this base-default behavior using ggplot2?
Updated scale_size_area() in place of scale_area()
You might be able to take something from this to suit your needs.
library(ggplot2)
#Some data
df <- data.frame(x = round(runif(100), 2), y = round(runif(100), 2))
m1 <- lm(y ~ x, data = df)
df.fortified = fortify(m1)
names(df.fortified) # Names for the variables containing residuals and derived qquantities
# Select extreme values
df.fortified$extreme = ifelse(abs(df.fortified$`.stdresid`) > 1.5, 1, 0)
# Based on examples on page 173 in Wickham's ggplot2 book
plot = ggplot(data = df.fortified, aes(x = x, y = .stdresid)) +
geom_point() +
geom_text(data = df.fortified[df.fortified$extreme == 1, ],
aes(label = x, x = x, y = .stdresid), size = 3, hjust = -.3)
plot
plot1 = ggplot(data = df.fortified, aes(x = .fitted, y = .resid)) +
geom_point() + geom_smooth(se = F)
plot2 = ggplot(data = df.fortified, aes(x = .fitted, y = .resid, size = .cooksd)) +
geom_point() + scale_size_area("Cook's distance") + geom_smooth(se = FALSE, show_guide = FALSE)
library(gridExtra)
grid.arrange(plot1, plot2)
I'm using ggplot2 to show lines and points on a plot. What I am trying to do is to have the lines all the same color, and then to show the points colored by an attribute. My code is as follows:
# Data frame
dfDemo <- structure(list(Y = c(0.906231077471568, 0.569073561538186,
0.0783433165521566, 0.724580209473378, 0.359136092118470, 0.871301974471722,
0.400628333618918, 1.41778205350433, 0.932081770977729, 0.198188442350644
), X = c(0.208755495088456, 0.147750173706688, 0.0205864576474412,
0.162635017485883, 0.118877260137735, 0.186538613831806, 0.137831912094464,
0.293293029083812, 0.219247919537514, 0.0323148791663826), Z = c(11112951L,
11713300L, 14331476L, 11539301L, 12233602L, 15764099L, 10191778L,
12070774L, 11836422L, 15148685L)), .Names = c("Y", "X", "Z"
), row.names = c(NA, 10L), class = "data.frame")
# Variables
X = array(0.1,100)
Y = seq(length=100, from=0, by=0.01)
# make data frame
dfAll <- data.frame()
# make data frames using loop
for (x in c(1:10)){
# spacemate calc
Floors = array(x,100)
# include label
Label = paste(' ', toString(x), sep="")
df1 <- data.frame(X = X * x, Y = Y, Label)
# merge df1 to cumulative df, dfAll
dfAll <- rbind(dfAll, df1)
}
# plot
pl <- ggplot(dfAll, aes(x = X, y = Y, group = Label, colour = 'Measures')) + geom_line()
# add points to plot
pl + geom_point(data=dfDemo, aes(x = X, y = Y)) + opts(legend.position = "none")
This almost works, but I am unable to color the points by Z when I do this. I can plot the points separately, colored by Z using the following code:
ggplot(dfDemo, aes(x = X, y = Y, colour = Z)) + geom_point()
However, if I use the similar code after plotting the lines:
pl + geom_point(data=dfDemo, aes(x = X, y = Y, colour = Z)) + opts(legend.position = "none")
I get the following error:
Error: Continuous variable () supplied to discrete scale_hue.
I don't understand how to add the points to the chart so that I can colour them by a value. I appreciate any suggestion how to solve this.
The issue is that they are colliding the two colour scales, one from the ggplot call and the other from geom_point. If you want the lines of one colour and the points of different colours then you need to erase the colour setting from ggplot call and put it inside the geom_line outside the aes call so it isn't mapped. Use I() to define the colour otherwise it will think is just a variable.
pl <- ggplot(dfAll, aes(x = X, y = Y, group = Label)) +
geom_line(colour = I("red"))
pl + geom_point(data=dfDemo, aes(x = X, y = Y, colour = Z)) +
opts(legend.position = "none")
HTH