ggplot2 interaction plot error - r

I am trying to create an interaction plot, and R is throwing the error geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?, which I don't understand why. Below is my data frame:
topPagesCount DIRTY_INDUSTRY IND_DIRTY_HETEROGENEITY
1 10 1.4444444 1.1727001
2 831 1.4444444 1.1727001
3 1 0.8218391 0.4599108
4 0 0.8218391 0.4599108
5 0 0.8821549 0.4870270
6 30 0.8190476 0.6582197
7 26 0.8218391 0.4599108
8 0 1.4444444 1.1727001
9 7 0.8821549 0.4870270
10 398 0.8218391 0.4599108
and below is my code:
greatDF$DIRTY_INDUSTRY_fac <- as.factor (greatDF$DIRTY_INDUSTRY)
ggplot(data = greatDF, aes(x = IND_DIRTY_HETEROGENEITY, y=topPagesCount,
colour=DIRTY_INDUSTRY_fac, group=DIRTY_INDUSTRY_fac))+
stat_summary(fun.y=mean, geom="point")+
stat_summary(fun.y=mean, geom="line")
I don't see any reason for the error because clearly, there are more than 1 type of value for my response variable topPagesCount for the interaction term DIRTY_INDUSTRY:IND_DIRTY_HETEROGNEITY...am I right? maybe I am misunderstanding something...
thanks,

The reason this happens, as #Troy points out, is because the grouping itself is meaningless for geom_line() or geom_path(). There are no points to be connected with lines at all!
That's why everything is OK when you remove the last line. Note that this "error" is not an actual error, it plots the legend as it is intended to look, there is not a single actual line that should be plotted according to your aesthetics and stats.
How to fix this? Well, that depends on what you are trying to achieve, as usual. Note the difference between your code and mine:
ggplot(data = greatDF, aes(x = IND_DIRTY_HETEROGENEITY, y=topPagesCount,
colour=DIRTY_INDUSTRY_fac, group=DIRTY_INDUSTRY_fac)) +
geom_line(size=1.4) +
geom_point(size=5, shape=10) +
stat_summary(fun.y=mean, geom="point", size=5)
Is my guess correct? You may see this question for more insights on the topic.

Related

ggplot doesn't plot the order of the data.frame

If I have a head(df) like:
feature Comparison Primary diff key
1 work 15.441176 20.588235 5.1470588 1
2 employee 22.794118 19.117647 -3.6764706 2
3 good 11.029412 11.764706 0.7352941 3
4 improve 8.088235 10.294118 2.2058824 4
5 career 2.941176 8.823529 5.8823529 5
6 manager 2.941176 8.823529 5.8823529 6
and I'm trying to plot something with:
p = ggplot(x, aes(x = feature,size=8)) + geom_point(aes(y = Primary)) +
geom_point(aes(y=Comparison)) + coord_flip()
ggplotly(p)
Is there something I'm missing that causes p not to plot the order of the data above? the first five on the plot are
work
train
time
skill
people
But according to the df, it should be work, employee, good, improve, career.
There are these things called "levels" which ggplot uses to determine the order things should appear in the plot. If you ran levels(x$feature) in the console, then I bet the list you see has the same order as what appears in the plot.
To have them show up in the order you want, you can just have to override the "levels" for the feature column.
x$feature = factor(x$feature, levels = c("work",
"employee",
"good",
"improve",
"manager"))

Changing the xlim of numeric value causing error ggplot R

I have a grouped barplot produced using ggplot in R with the following code
ggplot(mTogether, aes(x = USuniquNegR, y = value, fill = variable)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_discrete(name = "Area",
labels = c("Everywhere", "New York")) +
xlab("Reasons") +
ylab("Proportion of total complaints") +
coord_flip() +
ggtitle("Comparison between NY and all areas")
mTogether is created using the following code
mTogether <- melt(together, id.vars = 'USuniquNegR')
The Data Frame together is made up of
USperReasons USperReasonsNY USuniquNegR
1 0.198343304187759 0.191304347826087 Late Flight
2 0.35987114588127 0.321739130434783 Customer Service Issue
3 0.0667280257708237 0.11304347826087 Lost Luggage
4 0.0547630004601933 0.00869565217391304 Flight Booking Problems
5 0.109065807639208 0.121739130434783 Can't Tell
6 0.00460193281178095 0 Damaged Luggage
7 0.0846755637367694 0.0782608695652174 Cancelled Flight
8 0.0455591348366314 0.0521739130434783 Bad Flight
9 0.0225494707777266 0.0347826086956522 longlines
10 0.0538426138978371 0.0782608695652174 Flight Attendant Complaints
Together can be generated by the following
together<-data.frame(cbind(USperReasons,USperReasonsNY,USuniquNegR))
where
USperReasons <- c(0.19834,0.35987,.06672,0.05476,0.10906,.00460,.08467,0.04555,0.02254,0.05384)
USperReasonsNY <- c(0.191304348,0.321739130,0.113043478,0.008695652,0.121739130,0.000000000,0.078260870,0.05217391,0.034782609,0.078260870)
USuniquNegR <- c("Late Flight","Customer Service Issue","Lost Luggage","Flight Booking Problems","Can't Tell","Damaged Luggage","Cancelled Flight","Bad Flight","longlines","Flight Attendant Complaints")
The problem is when I try change xlim of the ggplot using
+ xlim(0, 1)
I just seem to get an error:
Discrete value supplied to continuous scale
I can't understand why this happens but I need to resolve it because currently the x axis starts below 0 and is very highly packed:
image of ggplot output
The problem is that you are cbind()ing your column vectors together, which converts the numbers to characters. Fix that and the rest should fix itself.
together<-data.frame(USperReasons,USperReasonsNY,USuniquNegR)
You need to remove the cbind from
together<-data.frame(cbind(USperReasons,USperReasonsNY,USuniquNegR))
because str(together) tells that all three columns are factors.
With
together <- data.frame(USperReasons, USperReasonsNY, USuniquNegR)
the plot looks reasonable to me (without having to use ylim or xlim).
So, the error was not within ggplot2 but in data preparation.
Therefore, please, provide a full working example which can be copied, pasted and run when asking a question next time. Thank you.

vertical line chart - change line plotting direction to top-down in R

I am looking for a way where data points are connected following a top-down manner to visualize a ranking. In that the y-axis represents the rank and the x-axis the attributes. With the normal setting the line connects the point starting from left to right. This results that the points are connected in the wrong order.
With the data below the line should be connected from (6,1) to (4,2) and then (5,3) etc. Optimally the ranking scale need to be inverted so that rank one starts on the top.
data <- read.table(header=TRUE, text='
attribute rank
1 6
2 5
3 4
4 2
5 3
6 1
7 7
8 11
9 10
10 8
11 9
')
plot(data$attribute,data$rank,type="l")
Is there a way to change the line drawing direction? My second idea would be to rotate the graph or maybe you have better ideas.
The graph I am trying to achieve is somewhat similar to this one:
example vertical line chart
You can do this with ggplot:
library(ggplot2)
ggplot(data, aes(y = attribute, x = rank)) +
geom_line() +
coord_flip() +
scale_x_reverse()
It solves the problem exactly the way you suggested. The first part of the command (ggplot(...) + geom_line()) creates an "ordinary" line plot. Note that I have already switched x- and y-coordinates. The next command (coord_flip()) flips x- and y-axis, and the last one (scale_x_reverse) changes the ordering of the x-axis (which is plotted as the y-axis) such that 1 is in the top left corner.
Just to show you that something like the example you linked in your question can be done with ggplot2, I add the following example:
library(tidyr)
data$attribute2 <- sample(data$attribute)
data$attribute3 <- sample(data$attribute)
plot_data <- pivot_longer(data, cols = -"rank")
ggplot(plot_data, aes(y = value, x = rank, colour = name)) +
geom_line() +
geom_point() +
coord_flip() +
scale_x_reverse()
If you intend to do your plots with R, learning ggplot2 is really worthwhile. You can find many examples on Cookbook for R.

What is happening with my geom_line() in ggplot2?

I am no expert in R, but I have used ggplot2 many times and never had any problems. Still, this time I am not able to plot lines in my graph and I have no idea why (it should be something really simple though).
For instance for:
def.percent period
1 5.0657339 1984-1985
2 3.9164528 1985-1986
3 -1.756613 1986-1987
4 2.8184863 1987-1988
5 -2.606311 1988-1989
I have to code:
ggplot(plot.tab, aes(x=period, y=def.percent)) + geom_line() + geom_point() + ggtitle("Deforestation rates within Properties")
BUt when I run it, it just plots the points without a line. It also gives me this message:
geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?
Its not really an error but I cannot figure it out how to plot the lines... Any ideas?
Your x axis (period) is a factor rather than numeric, so it doesn't connect them. You can fix this by setting group = 1 in the aesthetics, which tells ggplot2 to group them all together into a single line:
ggplot(plot.tab, aes(x = period, y = def.percent, group = 1)) +
geom_line() +
geom_point() +
ggtitle("Deforestation rates within Properties")

gg plot error bars positioning

I am trying to make the error bars above each bar plot, but I have the bar plots in three groups, and 6 bar plots and it's positioning the error bars with respect to each group, but I want the positioning if each error bar above each bar. Here's what my data looks like:
NewData
Group Session HeartRate StdError n sd
1 Novices one 71.89276 1.821146 29 9.807170
2 Experts one 66.40705 1.923901 26 9.810008
3 Novices two 71.38609 1.571261 29 8.011889
4 Experts two 67.79910 1.788151 26 9.117818
5 Novices three 71.79759 1.941730 29 10.456534
6 Experts three 67.04564 1.938620 26 9.885061
And here is my code:
plot_2 = ggplot(NewData, aes(x=Session, y=HeartRate, fill=Group)) +
theme_bw() +
geom_bar(position="dodge",stat="identity")+
scale_x_discrete(limits=c("one","two","three")) +
coord_cartesian(ylim = c(50, 80)) +
geom_errorbar(aes(ymin=HeartRate-StdError,ymax=HeartRate+StdError),position="dodge",width=.25)
Here's the output: http://i.imgur.com/BrLB6Px.png
Any help would be appreciated. Thanks!
OK-- I found a solution, not really sure how or why it works, but here's my new code:
dodge <- position_dodge(width=0.9)
plot_2 = ggplot(NewData, aes(x=Session, y=HeartRate, fill=Group)) +
geom_bar(position=dodge)+
scale_x_discrete(limits=c("one","two","three")) +
coord_cartesian(ylim = c(50, 80)) +
geom_errorbar(aes(ymin=HeartRate-StdError,ymax=HeartRate+StdError),position=dodge,width=.25)
And here's the desired result: http://i.imgur.com/PodCeh5.png
It's kind of hard to tell what it is you want to see as output, but from what I gather, perhaps this will do what you want?
geom_errorbar(aes(ymin=HeartRate,ymax=HeartRate+StdError*2),position="dodge",width=.25)
If you want to move the error bar along the x-axis, we need to modify position=dodge. I don't see much documentation but you might try something like...
geom_errorbar(aes(ymin=HeartRate-StdError,ymax=HeartRate+StdError),position=group-10,width=.25)
or
geom_errorbar(aes(ymin=HeartRate-StdError,ymax=HeartRate+StdError),position=Session-10,width=.25)
Or (most likely?) this:
geom_errorbar(aes(ymin=HeartRate-StdError,ymax=HeartRate+StdError),position=x-10,width=.25)

Resources