Plotting variable means for each level of the independent variable. R - r

Given the next code and dataframe:
require(data.table)
require(ggplot2)
dat1 <- fread('J S1 S2 S3 S4 Z
1 4 5 3 2 0
1 6 5 6 5 1
2 3 5 8 9 0
2 12 11 34 44 1
3 11 23 23 22 0
3 12 15 22 21 1')
temp <- melt(dat1, id.vars = c("J", "Z"))
ggplot(temp, aes(x = J, y = value, color = variable, shape = as.factor(Z))) +
geom_point()
I'd like to plot in the same graph the mean of values (S1, S2, S3, S4) for each level of J. I mean, for S1, get 3 points in my graph: 5.5, 7.5, 11.5. For S2, another 3 points, and so on...
I'm trying this:
ggplot(temp, aes(x = J, y = mean(value), color = variable, shape = as.factor(Z))) +
geom_point()
Plot
I get only one point for each full set of data. But I'd like to get in the same graph the mean of S1 for each level of J (1,2,3), the mean of S2 for each level of J, the mean of S3 for each level of J, and the mean of S4 for each level of J.

You need to add rows for mean in your data.
Please let me know if this make sense or you wish to have something different.
You can do:
library(data.table)
temp1 <- setDT(temp)[,.(value = mean(value)),by=.(J,variable)]
ggplot(temp1, aes(x = J, y = value, color=factor(variable))) +
geom_point()
OR you can do :
ggplot(temp1, aes(x = variable, y = value, color=factor(J))) +
geom_point()
EDIT, after OP's request:
To get Z variable into account, you need to summarize the data basis Z as well like below and then plot:
temp1 <- setDT(temp)[,.(value = mean(value)),by=.(J,variable,Z)]
ggplot(temp1, aes(x = variable, y = value, color=factor(J),shape=factor(Z))) +
geom_point()
Now the plot contains three categorical variables, "variable","J" and "Z", you can play with them by switching them interchangeably to see what fits your need, don't forget to use factor() before them in case you want to use shape and color in the aesthetics. If you want to draw a graph for 0s and 1s separately then you have to use facet_wrap, like below:
ggplot(temp1, aes(x = variable, y = value, color=factor(J),shape=factor(Z))) +
geom_point() + facet_wrap(~Z)

Related

Trouble graphing two columns on one graph in R

I just started learning R. I melted my dataframe and used ggplot to get this graph. There's supposed to be two lines on the same graph, but the lines connecting seem random.
Correct points plotted, but wrong lines.
# Melted my data to create new dataframe
AvgSleep2_DF <- melt(AvgSleep_DF , id.vars = 'SleepDay_Date',
variable.name = 'series')
# Plotting
ggplot(AvgSleep2_DF, aes(SleepDay_Date, value, colour = series)) +
geom_point(aes(colour = series)) +
geom_line(aes(colour = series))
With or without the aes(colour = series) in the geom_line results in the same graph. What am I doing wrong here?
The following might explain what geom_line() does when you specify aesthetics in the ggplot() call.
I assign a deliberate colour column that differs from the series specification!
df <- data.frame(
x = c(1,2,3,4,5)
, y = c(2,2,3,4,2)
, colour = factor(c(rep(1,3), rep(2,2)))
, series = c(1,1,2,3,3)
)
df
x y colour series
1 1 2 1 1
2 2 2 1 1
3 3 3 1 2
4 4 4 2 3
5 5 2 2 3
Inheritance in ggplot will look for aesthetics defined in an upper layer.
ggplot(data = df, aes(x = x, y = y, colour = colour)) +
geom_point(size = 3) + # setting the size to stress point layer call
geom_line() # geom_line will "inherit" a "grouping" from the colour set above
This gives you
While we can control the "grouping" associated to each line(segment) as follows:
ggplot(data = df, aes(x = x, y = y, colour = colour)) +
geom_point(size = 3) +
geom_line(aes(group = series) # defining specific grouping
)
Note: As I defined a separate "group" in the series column for the 3rd point, it is depicted - in this case - as a single point "line".

Drawing a multiple line ggplot figure

I am working on a figure which should contain 3 different lines on the same graph. The data frame I am working on is the follow:
I would like to be able to use ind(my data point) on x axis and then draw 3 different lines using the data coming from the columns med, b and c.
I only managed to obtain draw one line.
Could you please help me? the code I am using now is
ggplot(data=f, aes(x=ind, y=med, group=1)) +
geom_line(aes())+ geom_line(colour = "darkGrey", size = 3) +
theme_bw() +
theme(plot.background = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank())
The key is to spread columns in question into a new variable. This happens in the gather() step in the below code. The rest is pretty much boiler plate ggplot2.
library(ggplot2)
library(tidyr)
xy <- data.frame(a = rnorm(10), b = rnorm(10), c = rnorm(10),
ind = 1:10)
# we "spread" a and b into a a new variable
xy <- gather(xy, key = myvariable, value = myvalue, a, b)
ggplot(xy, aes(x = ind, y = myvalue, color = myvariable)) +
theme_bw() +
geom_line()
With melt and ggplot:
df$ind <- 1:nrow(df)
head(df)
a b med c ind
1 -87.21893 -84.72439 -75.78069 -70.87261 1
2 -107.29747 -70.38214 -84.96422 -73.87297 2
3 -106.13149 -105.12869 -75.09039 -62.61283 3
4 -93.66255 -97.55444 -85.01982 -56.49110 4
5 -88.73919 -95.80307 -77.11830 -47.72991 5
6 -86.27068 -83.24604 -86.86626 -91.32508 6
df <- melt(df, id='ind')
ggplot(df, aes(ind, value, group=variable, col=variable)) + geom_line(lwd=2)

Colouring a specific point in a scatterplot in R

I'm somewhat new to R and ggplot2. I've been trying to create a scatterplot graph that has one specific point coloured. For example, here is my basic data frame
manager Confirmed Overturned keeping Stands total
A.J. Hinch 11 24 0 14 49
Angel Hernandez 0 1 0 0 1
Bill Miller 3 1 0 4 8
Bob Melvin 6 16 0 6 28
Brad Ausmus 3 11 0 13 27
With this I can create a simple scatterplot using this code,
p <- ggplot(data = Outcome, aes(x = Overturned, y = total))
p + geom_point()
I know how to add general colour, and add a colour scale, but I don't know how to colour just one point. For example, let's say I wanted to colour A.J. Hinch blue, and make every other point a different colour (probably grey or black), how would I do that?
Here is a link to the graph I want to create in Tableau.
https://public.tableau.com/profile/julien1554#!/vizhome/ManagerChallenges2014-2015/Sheet1
All help is appreciated, thanks.
You would just add another scatter plot layer to your plot. Here is the code that I used. Hope it helps!
> df = as.data.frame(cbind(Overturned = c(24,1,1,16,11), total = c(49,1,8,28,27)))
> library(ggplot2)
> p <- ggplot(data = df, aes(x = Overturned, y = total)) # creates the graph
> p + geom_point(data = df, color = "gray") + # creates main scatter plot with gray points
geom_point(data = df[1,], color = "blue") # colors A.J. Hinch's point blue
Here is the resulting graph:
Note that I'm just using the last name because when I read your data from the clipboard it thought the first names were row labels.
Outcome$color_me <- ifelse(Outcome$manager == "Hinch", "color_me", "normal")
textdf <- Outcome[Outcome$manager == "Hinch", ]
mycolors <- c("color_me" = "blue", "normal" = "grey50")
ggplot(data = Outcome, aes(x = Overturned, y = total)) +
geom_point(size = 3, aes(colour = color_me))
or with the manually defined color:
ggplot(data = Outcome, aes(x = Overturned, y = total)) +
geom_point(size = 3, aes(colour = color_me)) +
scale_color_manual("Status", values = mycolors)

Simple df plotting in R

I am a new user of R and I have a dataframe with three columns car,var and val. I have about 90 rows and I want to plot the two columns var and val. My data frame looks like
car var val
a kl -14
b km -1
c kn -3
d ko -20
I tried this plot(data$var,data$val) but I want to have something like this with X axis the var and Y axis the val. How can I do this with ggplot?
You can make a similar plot to the one you post using geom_line. You need to use the aesthetic group = 1 because the x-axis data are discrete and each group has only a single observation.
df <- read.table(header = TRUE, text = "
car var val
a kl -14
b km -1
c kn -3
d ko -20
")
ggplot(df, aes(x = var, y = val, group = 1)) +
geom_line(colour = "green")
Given that the x-axis data are discrete, it probably makes more sense to use a geom_bar to get a bar plot.
ggplot(df, aes(x = var, y = val, group = 1)) +
geom_bar(stat = "identity")

Plotting lines and the group aesthetic in ggplot2

This question follows on from an earlier question and its answers.
First some toy data:
df = read.table(text =
"School Year Value
A 1998 5
B 1999 10
C 2000 15
A 2000 7
B 2001 15
C 2002 20", sep = "", header = TRUE)
The original question asked how to plot Value-Year lines for each School. The answers more or less correspond to p1 and p2 below. But also consider p3.
library(ggplot2)
(p1 <- ggplot(data = df, aes(x = Year, y = Value, colour = School)) +
geom_line() + geom_point())
(p2 <- ggplot(data = df, aes(x = factor(Year), y = Value, colour = School)) +
geom_line(aes(group = School)) + geom_point())
(p3 <- ggplot(data = df, aes(x = factor(Year), y = Value, colour = School)) +
geom_line() + geom_point())
Both p1 and p2 do the job. The difference between p1 and p2 is that p1 treats Year as numeric whereas p2 treats Year as a factor. Also, p2 contains a group aesthetic in geom_line. But when the group aesthetic is dropped as in p3, the lines are not drawn.
The question is: Why is the group aesthetic necessary when the x-axis variable is a factor but the group aesthetic is not needed when the x-axis variable is numeric?
In the words of Hadley himself:
The important thing [for a line graph with a factor on the horizontal axis] is to manually specify the grouping. By
default ggplot2 uses the combination of all categorical variables in
the plot to group geoms - that doesn't work for this plot because you
get an individual line for each point. Manually specify group = 1
indicates you want a single line connecting all the points.
You can actually group the points in very different ways as demonstrated by koshke here

Resources