R: ggplot2 - plots doesn't look right - vertical instead of diagonal lines - r

I'd like to create a plot for 2 categorical variables. Therefore I created two dummy sets but - although they contain the same items - they look totally different. Same happens with my real data as well.
I also tried to perform it with ordered columns, same result.
Please see attached my coding, the three plots (plot 2 is the best practice) + real data plot (but anonymized) to show you problem. Don't understand why there are those vertical lines.
Thank you in advance
library(ggplot2)
library(dplyr)
dat1 <- data.frame(
sex = factor(c("Male","Female","Male","Female")), levels=c("Female","Male"),
time = factor(c("Lunch","Dinner","Lunch","Dinner"), levels=c("Lunch","Dinner")),
total_bill = c(17.42, 16.81, 16.24, 13.53)
)
dat1
#plot1: shows horizontal lines although it should look like the plot 2
ggplot(data=dat1, aes(x=sex, y=total_bill, group=time, colour=time)) +
geom_line() +
geom_point() +
ggtitle("Plot 1")
#different approach for plot1
arrange(dat1 , sex, time)
dat1
#has ordered columns like I wanted it to be
#still looks like plot1
ggplot(data=dat1, aes(x=sex, y=total_bill, group=time, colour=time)) +
geom_line() +
geom_point() +
ggtitle("Plot 1 ordered")
dat2 <- data.frame(
sex = factor(c("Female","Female","Male","Male")),
time = factor(c("Lunch","Dinner","Lunch","Dinner"), levels=c("Lunch","Dinner")),
total_bill = c(13.53, 16.81, 16.24, 17.42)
)
dat2
#plot2: look like I'd like to have it this way
ggplot(data=dat2, aes(x=sex, y=total_bill, group=time, colour=time)) +
geom_line() +
geom_point() +
ggtitle("Plot 2")
Also an outline of the data plot which has those vertical lines in it
library(ggplot2)
library(dplyr)
mydata2
# ####Output (fictive data but same problem as with my real data, still vertical lines in it but should only have 2 lines like in plot2)
# group NM score
# 1 KG S 2537.94514
# 2 EG S 1766.39019
# 3 KG S 3970.91802
# 4 KG S 4089.14939
# 5 KG S 2795.42964
# 6 EG S 2286.60411
# 7 KG S 4027.22993
# 8 KG S 1030.18328
# 9 EG S 719.73679
# 10 EG S 724.93663
# 11 EG S 2929.03717
# 12 EG S 521.55736
# 13 KG S 1435.85625
# 14 EG S 1496.39471
# 15 EG S 3521.25827
# 16 KG S 2138.17928
# 17 EG S 1233.86267
# 18 KG S 591.33086
# 19 EG S 2171.97341
# 20 EG S 3871.92536
# 21 EG S 468.10133
# 22 KG S 2419.67419
# 23 KG S 1338.29305
# 24 KG S 1629.33862
# 25 EG S 560.39680
# 26 EG S 546.22468
# 27 KG S 3398.94647
# 28 KG S 1117.72716
# 29 EG S 2794.90527
# 30 EG S 3606.77693
# 31 KG S 3558.67156
# 32 KG S 196.64992
# 33 EG S 2174.69930
# 34 EG S 3444.10732
# 35 KG S 670.60907
# 36 EG S 3719.20997
# 37 KG S 65.76227
# 38 EG S 3420.12225
# 39 KG S 1405.83738
# 40 KG S 2859.33873
# 41 EG T 1296.75111
# 42 EG T 436.53580
# 43 KG T 213.09334
# 44 EG T 2073.70465
# 45 KG T 1679.98816
# 46 EG T 1599.26738
# 47 EG T 777.65179
# 48 EG T 1738.45395
# 49 KG T 3269.54120
# 50 EG T 3506.07302
# 51 EG T 1764.61915
# 52 EG T 493.47846
# 53 KG T 1729.02949
# 54 EG T 1454.57702
# 55 EG T 2577.32018
# 56 EG T 295.08653
# 57 EG T 3811.24064
# 58 KG T 2320.35879
# 59 EG T 1285.65291
# 60 KG T 3600.26095
# 61 EG T 3738.89452
# 62 KG T 3472.53512
# 63 KG T 1203.33462
# 64 EG T 1809.41229
# 65 EG T 3536.17972
# 66 EG T 2637.59869
# 67 KG T 1279.44567
# 68 KG T 1141.81247
# 69 KG T 3951.54206
# 70 KG T 1940.11505
# 71 KG T 192.74602
# 72 KG T 1235.81839
# 73 EG T 1907.09384
# 74 KG T 1772.86806
# 75 KG T 997.92437
# 76 KG T 217.81433
# 77 KG T 3595.69359
# 78 EG T 910.07955
# ####End of output
ggplot(data=mydata2, aes(x=group, y=score, group=NM, shape=NM, colour=NM)) +
geom_line(aes(linetype=NM), size=1) + # Set linetype by sex
geom_point(size=3, fill="white") + # Use larger points, fill with white
expand_limits(y=0) + # Set y range to include 0
scale_colour_hue(name="Sex of participant", # Set legend title
l=30) + # Use darker colors (lightness=30)
scale_shape_manual(name="Sex of participant",
values=c(22,21)) + # Use points with a fill color
scale_linetype_discrete(name="Sex of participant") +
xlab("Group") + ylab("Score") + # Set axis labels
ggtitle("Data") + # Set title
theme_bw() +
theme(legend.position=c(.7, .4)) # Position legend inside
# This must go after theme_bw
Plot1
Plot1 ordered
Plot2
Data plot which looks wrong

the data2 that makes the plot you want looks like this:
> dat2
sex time total_bill
1 Female Lunch 13.53 # female has lunch and dinner
2 Female Dinner 16.81
3 Male Lunch 16.24 # male has lunch and dinner
4 Male Dinner 17.42
However your data1 before and after the arrange looks like this:
before
sex levels time total_bill
1 Male Female Lunch 17.42
2 Female Male Dinner 16.81 # female only has dinner
3 Male Female Lunch 16.24 # male only has lunch
4 Female Male Dinner 13.53
after
sex levels time total_bill
1 Female Male Dinner 16.81 # female only has dinner
2 Female Male Dinner 13.53
3 Male Female Lunch 17.42 # male only has lunch
4 Male Female Lunch 16.24
In both data sets females either only have lunch and males only have dinner. Also for the levels column there is the same problem.
So the lines in your plots are drawn between the two female points at dinner, rather than across females points at lunch and dinner.
For your updated question, I run your code with the column names like this:
# group NM sex score
# 1 KG S 2537.945
# 2 EG S 1766.390
# 3 KG S 3970.918
and get this? what is the issue?

Related

Connect the red points with a line in ggplot

Please help me;
I made a plot comprising some red and blue points using ggplot.
Now I want to connect the red points to each other with a line and connect the blue points to each other with another line
These are my codes
m <- as.factor(c(7,"12 PCA", 21, "24 PCA", "31 PCA", 38, 70))
## Then we plot the points
ggplot(pH, aes(x= m, y=All))+ ylim(60,100)+
scale_x_discrete(limits=c(7,"12 PCA", 21, "24 PCA", "31 PCA", 38, 70))+
geom_point(data=pH, aes(y=All), colour = 'red', size =1)+
geom_point(data=pH, aes(y=Test), colour = 'blue', size=1)
And this is my plot
How can I do that?
Thanks
I think it's generally best to not work with independent vectors of data when possible, instead placing it in a single frame. In this case, one column will be used to indicate which "group" the dots belong to.
dat <- data.frame(m=c(m,m), All=c(94,95,96,95,94,95,96, 74,67,74,67,68,73,74), grp=c(rep("red",7), rep("blue",7)))
dat
# m All grp
# 1 7 94 red
# 2 12 PCA 95 red
# 3 21 96 red
# 4 24 PCA 95 red
# 5 31 PCA 94 red
# 6 38 95 red
# 7 70 96 red
# 8 7 74 blue
# 9 12 PCA 67 blue
# 10 21 74 blue
# 11 24 PCA 67 blue
# 12 31 PCA 68 blue
# 13 38 73 blue
# 14 70 74 blue
Plot code:
library(ggplot2)
ggplot(dat, aes(m, All, group=grp, color=grp)) +
geom_point() +
geom_line() +
scale_color_manual(values = c(blue = "blue", red = "red"))

Making multi-line plots in R using ggplot2

I would like to compile some data into a ggplot() line plot of different colors.
It's rainfall in various places over 100 days, and the data is quite different between locations which is giving me fits.
I've tried using different suggestions from this forum and they don't seem to be working well for this data. Sample data:
Time Location1 Location2 Location3
0 48 99.2966479761526 2
1 51 98.7287820735946 4
2 58 98.4803262236528 4.82842712474619
3 43 97.8941490454599 5.46410161513775
4 47 96.6091435402632 6
5 47 95.207282404881 6.47213595499958
6 41 94.8696538619697 6.89897948556636
7 34 94.6514389757067 7.29150262212918
8 40 93.7297335476615 7.65685424949238
9 57 93.2440731907263 8
My code thus far is
ggplot(Rain) +
geom_line(aes(x=Time,y=Location1,col="red")) +
geom_line(aes(x=Time,y=Location2,col="blue")) +
geom_line(aes(x=Time,y=Location3,col="green")) +
scale_color_manual(labels = c("Location 1","Location 2","Location 3"),
values = c("red","blue","green")) +
xlab("Time (Days)") + ylab("Rainfall (Inches)") + labs(color="Locations") +
ggtitle("Rainfall Over 100 Days In Three Locations")
So far it gives me everything that I want but for some reason the colors are wrong when I plot it, i.e. it plots location 1 in green while I told it red in my first geom_line.
library(tidyr)
library(ggplot2)
df_long <- gather(data = df1, Place, Rain, -Time)
ggplot(df_long) +
geom_line(aes(x=Time, y=Rain, color=Place))
Data:
df1 <- read.table(text="Time Location1 Location2 Location3
0 48 99.2966479761526 2
1 51 98.7287820735946 4
2 58 98.4803262236528 4.82842712474619
3 43 97.8941490454599 5.46410161513775
4 47 96.6091435402632 6
5 47 95.207282404881 6.47213595499958
6 41 94.8696538619697 6.89897948556636
7 34 94.6514389757067 7.29150262212918
8 40 93.7297335476615 7.65685424949238
9 57 93.2440731907263 8",
header=T, stringsAsFactors=F)

how to add regression lines for each factor on a plot

I've created a model and I'm trying to add curves that fit the two parts of the data, insulation and no insulation. I was thinking about using the insulation coefficient as a true/false term, but I'm not sure how to translate that into code. Entries 1:56 are "w/o" and 57:101 are "w/". I'm not sure how to include the data I'm using but here's the head and tail:
month year kwh days est cost avgT dT.yr kWhd.1 id insulation
1 8 2003 476 21 a 33.32 69 -8 22.66667 1 w/o
2 9 2003 1052 30 e 112.33 73 -1 35.05172 2 w/o
3 10 2003 981 28 a 24.98 60 -6 35.05172 3 w/o
4 11 2003 1094 32 a 73.51 53 2 34.18750 4 w/o
5 12 2003 1409 32 a 93.23 44 6 44.03125 5 w/o
6 1 2004 1083 32 a 72.84 34 3 33.84375 6 w/o
month year kwh days est cost avgT dT.yr kWhd.1 id insulation
96 7 2011 551 29 e 55.56 72 0 19.00000 96 w/
97 8 2011 552 27 a 61.17 78 1 20.44444 97 w/
98 9 2011 666 34 e 73.87 71 -2 19.58824 98 w/
99 10 2011 416 27 a 48.03 64 0 15.40741 99 w/
100 11 2011 653 31 e 72.80 53 1 21.06452 100 w/
101 12 2011 751 33 a 83.94 45 2 22.75758 101 w/
bill$id <- seq(1:101)
bill$insulation <- as.factor(ifelse(bill$id > 56, c("w/"), c("w/o")))
m1 <- lm(kWhd.1 ~ avgT + insulation + I(avgT^2), data=bill)
with(bill, plot(kWhd.1 ~ avgT, xlab="Average Temperature (F)",
ylab="Daily Energy Use (kWh/d)", col=insulation))
no_ins <- data.frame(bill$avgT[1:56], bill$insulation[1:56])
curve(predict(m1, no_ins=x), add=TRUE, col="red")
ins <- data.frame(bill$avgT[57:101], bill$insulation[57:101])
curve(predict(m1, ins=x), add=TRUE, lty=2)
legend("topright", inset=0.01, pch=21, col=c("red", "black"),
legend=c("No Insulation", "Insulation"))
ggplot2 makes this a lot easier than base plotting. Something like this should work:
ggplot(bill, aes(x = avgT, y = kWhd.1, color = insulation)) +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = FALSE) +
geom_point()
In base, I'd create a data frame with point you want to predict on, something like
pred_data = expand.grid(
kWhd.1 = seq(min(bill$kWhd.1), max(bill$kWhd.1), length.out = 100),
insulation = c("w/", "w/o")
)
pred_data$prediction = predict(m1, newdata = pred_data)
And then use lines to add the predictions to your plot. My base graphics is pretty rusty, so I'll leave that to you (or another answerer) if you want it.
In base R it's important to order the x-values. Since this is to be done on multiple factors, we can do this with by, resulting in a list L.
Since your example data is not complete, here's an example with iris where we consider Species as the "factor".
L <- by(iris, iris$Species, function(x) x[order(x$Petal.Length), ])
Now we can do the plot and add loess predictions as lines with a sapply.
with(iris, plot(Sepal.Width ~ Petal.Length, col=Species))
sapply(seq(L), function(x)
lines(L[[x]]$Petal.Length,
predict(loess(Sepal.Width ~ Petal.Length, L[[x]], span=1.1)), # span=1.1 for smoothing
col=x))
Yields

R ggplot ordering bars within groups

I'm attempting to format a grouped bar plot in R with ggplot such that bars are in decreasing order per group. This is my current plot:
based on this data frame:
> top_categories
Category Count Community
1 Singer-Songwriters 151 1
2 Adult Alternative 147 1
3 Dance Pop 95 1
4 Folk 89 1
5 Adult Contemporary 88 1
6 Pop Rap 473 2
7 Gangsta & Hardcore 413 2
8 Soul 175 2
9 East Coast 170 2
10 West Coast 135 2
11 Album-Oriented Rock (AOR) 253 3
12 Singer-Songwriters 217 3
13 Soft Rock 196 3
14 Folk 145 3
15 Adult Contemporary 106 3
16 Soul 278 4
17 Blues 137 4
18 Funk 119 4
19 Quiet Storm 76 4
20 Dance Pop 74 4
21 Indie & Lo-Fi 235 5
22 Indie Rock 234 5
23 Adult Alternative 114 5
24 Alternative Rock 49 5
25 Singer-Songwriters 47 5
created with this code:
ggplot(
top_categories,
aes(
x=Community,
y=Count,
group=Category,
label=Category
)
) +
geom_bar(
stat="identity",
color="black",
fill="#9C27B0",
position="dodge"
) +
geom_text(
angle=90,
position=position_dodge(width=0.9),
hjust=-0.05
) +
ggtitle("Number of Products in each Category in Each Community") +
guides(fill=FALSE)
Based on suggestions from related posts, I've attempted to use the reorder function and turn the Count into a factor, both with results that seem to break the ordering of the bars vs. the text or rescale the plot in a nonsensical way such as this (with factors):
Any tips on how I might accomplish this in-group ordering? Thanks!
When you group by Category, the bars are ordered according to the order of appearance of Categories in the dataframe. This works fine for Community 1 and 2 as your rows are already ordered by decreasing Count. But in Community 3, as Category "Singer-Songwriters" is the first occcurring Category in the dataframe, it is put first.
Grouping instead by an Id variable resolves the problem:
top_categories$Id=rep(c(1:5),5)
ggplot(
top_categories,
aes(
x=Community,
y=Count,
group=Id,
label=Category
)
) +
geom_bar(
stat="identity",
color="black",
fill="#9C27B0",
position="dodge"
) +
geom_text(
angle=90,
position=position_dodge(width=0.9),
hjust=-0.05
) +
ggtitle("Number of Products in each Category in Each Community") +
guides(fill=FALSE)

ggplot create map with arrows

I have a data frame like this
id lon lat
1 A -69.5 -58.5
2 A -69.5 -58.5
3 A -69.5 -57.5
4 A -68.5 -57.5
5 A -68.5 -57.5
6 A -68.5 -57.5
7 A -66.5 -57.5
8 A -68.5 -56.5
9 A -68.5 -56.5
10 A -67.5 -56.5
11 A -65.5 -56.5
12 A -65.5 -56.5
13 A -65.5 -55.5
14 A -62.5 -54.5
15 B -177 -52.5
16 B -178 -50.5
17 B -179 -48.5
18 B 179 -47.5
19 B 178 -46.5
20 B 177 -46.5
and I want to produce a map of the position of A and B, linked by oriented lines. However when ids cross the Pacific (lon=-180 -> lon=+180) I get an arrow crossing the whole figure, like shown below.
This is the code I am using
worldmap = map_data("world")
ggplot(test, aes(x = lon, y=lat, colour = factor(id))) +
geom_polygon(data=worldmap,center=180,aes(x=long, y=lat, group=group), fill="black",colour="black") +
xlab("") +ylab("")+theme(axis.text=element_blank(),axis.ticks=element_blank())+ theme(panel.background = element_rect(fill = 'white', colour = 'black') ,panel.grid.major = element_blank(),panel.grid.minor = element_blank())+
geom_path(size =2,arrow = arrow(angle=30,length = unit(0.6, "inches")))
How can I fix it?
Thanks
I guess that depends on what you think the "right" think to do is. I decided to break up the pathes that cross the glob into two segments by adding in points at the edge of the map, and then creating a "sequence" indicator so ggplot knows which lines to connect. Here's the transformation for your sample data
test2 <- do.call(rbind, lapply(split(test, test$id), function(x) {
cp <- cumsum(c(FALSE, diff(x$lon)>250))
xx<-split(x, cp)
xx<-Map(cbind, xx, seq=seq_along(xx))
Reduce(function(a,b) {
lasta<-a[nrow(a),]
firstb<-b[1,]
lasta$lon <- 180*sign(lasta$lon)
firstb$lon <- 180*sign(firstb$lon)
lasta$lat <- mean(lasta$lat, firstb$lat)
firstb$lat <- lasta$lat
rbind(a,lasta, firstb,b)
}, xx)
}))
tail(test2)
# id lon lat seq
# B.17 B -179 -48.5 1
# B.171 B -180 -48.5 1
# B.18 B 180 -48.5 2
# B.181 B 179 -47.5 2
# B.19 B 178 -46.5 2
# B.20 B 177 -46.5 2
here you can see that we've broken the B line up into two sequences. Then if we use a group aesthetic
geom_path(aes(group=interaction(id, seq)), ...)
then R will only connect those points that are in the same id/seq group. This will prevent the line from going across the ocean. However, because we are drawing two lines for that group rather than one, there's no way to turn of the arrow head for just one of the segments. you might want to find another way to indicate start/end.

Resources