ggplot2 geom_area overlay area plots in front of each other - r

I am trying to make an area plot with the different areas are overlaid on one another rather than stacked.
I have a dataframe that looks like this:
r variable value
1 45.0 Cat 1 4.057250e+03
2 52.5 Cat 1 3.537323e+03
3 56.1 Cat 1 3.429861e+03
4 57.3 Cat 1 3.395330e+03
5 57.6 Cat 1 3.389983e+03
6 45.0 Cat 2 4.545455e-03
7 52.5 Cat 2 4.509400e+01
8 56.1 Cat 2 3.525753e+02
9 57.3 Cat 2 4.185094e+02
10 57.6 Cat 2 4.336622e+02
11 45.0 Cat 3 4.074432e+03
12 52.5 Cat 3 3.630504e+03
13 56.1 Cat 3 3.919076e+03
14 57.3 Cat 3 3.957039e+03
15 57.6 Cat 3 3.970083e+03
16 45.0 Cat 4 1.718182e+01
17 52.5 Cat 4 9.318133e+01
18 56.1 Cat 4 4.892154e+02
19 57.3 Cat 4 5.617087e+02
20 57.6 Cat 4 5.801001e+02
I am trying to get area plots for each category. My code for that is:
p <- ggplot(reshaped_data, aes(r, value))
p <- p + labs(x = "X Axis", y = "Y Axis") + ggtitle(title)
p <- p + geom_area(aes(colour = variable, fill= variable), position = 'stack')
p
And the result I am getting looks like this:
How can I make it so that the area graphs aren't stacked on each other, but the smallest are overlaid in front of the bigger ones?
Thanks

Using tidyverse:
library(forcats)
p + geom_area(aes(colour = variable,
fill= fct_reorder(variable, value, .desc = TRUE)), position = 'identity')
Remove .desc = TRUE if it does the opposite of what you want.

As Nathan wrote you have to use geom_area(position = "identity", ...)
But before this you should reorder the levels of variable:
df$variable <- factor(df$variable, unique(df[order(df$value, decreasing = T),"variable"]) )
or
df$variable <- reorder(df$variable, df$value, function(x) -max(x) )

Related

time series aesthetics with ggplot2

hello I have tried to graph the following data
I have tried to graph the following time series
fecha importaciones
1 Ene\n1994 171.0
2 Feb\n1994 170.7
3 Mar\n1994 183.7
4 Abr\n1994 214.6
5 May\n1994 227.2
6 Jun\n1994 221.1
7 Jul\n1994 216.4
8 Ago\n1994 235.3
9 Sep\n1994 227.0
10 Oct\n1994 216.0
11 Nov\n1994 221.5
12 Dic\n1994 270.9
13 Ene\n1995 250.4
14 Feb\n1995 259.6
15 Mar\n1995 258.2
16 Abr\n1995 232.9
17 May\n1995 335.0
18 Jun\n1995 295.2
19 Jul\n1995 302.5
20 Ago\n1995 283.3
21 Sep\n1995 264.4
22 Oct\n1995 277.6
23 Nov\n1995 289.1
24 Dic\n1995 280.5
25 Ene\n1996 252.4
26 Feb\n1996 250.1
.
.
.
320 Ago\n2020 794.6
321 Sep\n2020 938.2
322 Oct\n2020 966.3
323 Nov\n2020 958.9
324 Dic\n2020 1059.2
325 Ene\n2021 1056.2
326 Feb\n2021 982.5
I graph it with office cal
but trying to plot it in R with ggplot
ggplot(datos, aes(x = fecha, y = importaciones)) +
geom_line(size = 1) +
scale_color_manual(values=c("#00AFBB", "#E7B800"))+
theme_minimal()
I have tried to graph with all the possible steps but it does not fit me in a correct way for someone to guide me
Change the x-axis to date class.
library(ggplot2)
df$fecha <- lubridate::dmy(paste0(1, df$fecha))
ggplot(datos, aes(x = fecha, y = importaciones, group = 1)) +
geom_line(size = 1) +
scale_color_manual(values=c("#00AFBB", "#E7B800"))+
theme_minimal()
You can use scale_x_date to change the breaks and display format of dates on x-axis.

Remove link between time series and add minor date tick on x_axis in ggplot

I was trying to plot a time series composed of weekly averanges. Here is the plot that I have obtained:
[weekly averages A]
[1]: https://i.stack.imgur.com/XMGMs.png
As you can see the time serie do not cover all the years completely, so, when I have got no data ggplot links two subsequent years. I think I have to group the data in some ways, but I do not understand how. Here is the code:
df4 <- data.frame(df$Date, df$A)
colnames(df4)<- c("date","A")
df4$date <- as.Date(df4$date,"%Y/%m/%d")
df4$week_day <- as.numeric(format(df4$date, format='%w'))
df4$endofweek <- df4$date + (6 - df4$week_day)
week_aveA <- df4 %>%
group_by(endofweek) %>%
summarise_all(list(mean=mean), na.rm=TRUE) %>%
na.omit()
g1 = ggplot() +
geom_step(data=week_aveA, aes(group = 1, x = (endofweek), y = (A_mean)), colour="gray25") +
scale_y_continuous(expand = c(0, 0), limits = c(0, 2500)) +
scale_x_date(breaks="year", labels=date_format("%Y")) +
labs(y = expression(A~ ~index),
x = NULL) +
theme(axis.text.x = element_text(size=10),
axis.title = element_text(size=10))
Here an extraction (the former three years) of the dataset:
endofweek date_mean A_mean week_day_mean
1 20/03/2010 17/03/2010 939,2533437 3
2 27/03/2010 24/03/2010 867,3620121 3
3 03/04/2010 31/03/2010 1426,791222 3
4 10/04/2010 07/04/2010 358,5698314 3
5 17/04/2010 13/04/2010 301,1815352 2
6 24/04/2010 21/04/2010 273,4922895 3,333333333
7 01/05/2010 28/04/2010 128,5989633 3
8 08/05/2010 05/05/2010 447,8858881 3
9 15/05/2010 12/05/2010 387,9828891 3
10 22/05/2010 19/05/2010 138,0770986 3
11 29/05/2010 26/05/2010 370,2147933 3
12 05/06/2010 02/06/2010 139,0451791 3
13 12/06/2010 09/06/2010 217,1286356 3
14 19/06/2010 16/06/2010 72,36972411 3
15 26/06/2010 23/06/2010 282,2911902 3
16 03/07/2010 30/06/2010 324,3215936 3
17 10/07/2010 07/07/2010 210,568691 3
18 17/07/2010 14/07/2010 91,76930829 3
19 24/07/2010 21/07/2010 36,4211218 3,666666667
20 31/07/2010 28/07/2010 37,53981103 3
21 07/08/2010 04/08/2010 91,33282642 3
22 14/08/2010 11/08/2010 28,38587352 3
23 21/08/2010 18/08/2010 58,72836406 3
24 28/08/2010 24/08/2010 102,1050612 2,5
25 04/09/2010 02/09/2010 13,45357513 4,5
26 11/09/2010 08/09/2010 51,24017212 3
27 18/09/2010 15/09/2010 159,7395663 3
28 25/09/2010 21/09/2010 62,71136678 2
29 02/04/2011 31/03/2011 1484,661164 4
30 09/04/2011 06/04/2011 656,1827964 3
31 16/04/2011 13/04/2011 315,3097313 3
32 23/04/2011 20/04/2011 293,2904042 3
33 30/04/2011 26/04/2011 255,7517519 2,4
34 07/05/2011 04/05/2011 360,7035289 3
35 14/05/2011 11/05/2011 342,0902797 3
36 21/05/2011 18/05/2011 386,1380421 3
37 28/05/2011 24/05/2011 418,9624807 2,833333333
38 04/06/2011 01/06/2011 112,7568 3
39 11/06/2011 08/06/2011 85,17855619 3,2
40 18/06/2011 15/06/2011 351,8714638 3
41 25/06/2011 22/06/2011 139,7936898 3
42 02/07/2011 29/06/2011 68,57716191 3,6
43 09/07/2011 06/07/2011 62,31823822 3
44 16/07/2011 13/07/2011 80,7328917 3
45 23/07/2011 20/07/2011 114,9475331 3
46 30/07/2011 27/07/2011 90,13118758 3
47 06/08/2011 03/08/2011 43,29372258 3
48 13/08/2011 10/08/2011 49,39935204 3
49 20/08/2011 16/08/2011 133,746822 2
50 03/09/2011 31/08/2011 76,03928942 3
51 10/09/2011 05/09/2011 27,99834637 1
52 24/03/2012 23/03/2012 366,2625797 5,5
53 31/03/2012 28/03/2012 878,8535513 3
54 07/04/2012 04/04/2012 1029,909052 3
55 14/04/2012 11/04/2012 892,9163416 3
56 21/04/2012 18/04/2012 534,8278693 3
57 28/04/2012 25/04/2012 255,1177585 3
58 05/05/2012 02/05/2012 564,5280546 3
59 12/05/2012 09/05/2012 767,5018168 3
60 19/05/2012 16/05/2012 516,2680148 3
61 26/05/2012 23/05/2012 241,2113073 3
62 02/06/2012 30/05/2012 863,6123397 3
63 09/06/2012 06/06/2012 201,2019288 3
64 16/06/2012 13/06/2012 222,9955486 3
65 23/06/2012 20/06/2012 91,14166632 3
66 30/06/2012 27/06/2012 26,93145693 3
67 07/07/2012 04/07/2012 67,32183278 3
68 14/07/2012 11/07/2012 46,25297513 3
69 21/07/2012 18/07/2012 81,34359825 3,666666667
70 28/07/2012 25/07/2012 49,59130851 3
71 04/08/2012 01/08/2012 44,13438077 3
72 11/08/2012 08/08/2012 30,15773151 3
73 18/08/2012 15/08/2012 57,47256772 3
74 25/08/2012 22/08/2012 31,9109555 3
75 01/09/2012 29/08/2012 52,71058484 3
76 08/09/2012 04/09/2012 24,52495229 2
77 06/04/2013 01/04/2013 1344,388042 1,5
78 13/04/2013 10/04/2013 1304,838687 3
79 20/04/2013 17/04/2013 892,620141 3
80 27/04/2013 24/04/2013 400,1720434 3
81 04/05/2013 01/05/2013 424,8473083 3
82 11/05/2013 08/05/2013 269,2380208 3
83 18/05/2013 15/05/2013 238,9993749 3
84 25/05/2013 22/05/2013 128,4096151 3
85 01/06/2013 29/05/2013 158,5576121 3
86 08/06/2013 05/06/2013 175,2036942 3
87 15/06/2013 12/06/2013 79,20250839 3
88 22/06/2013 19/06/2013 126,9065428 3
89 29/06/2013 26/06/2013 133,7480108 3
90 06/07/2013 03/07/2013 218,0092943 3
91 13/07/2013 10/07/2013 54,08460936 3
92 20/07/2013 17/07/2013 91,54285041 3
93 27/07/2013 24/07/2013 44,64567928 3
94 03/08/2013 31/07/2013 229,5067999 3
95 10/08/2013 07/08/2013 49,70729373 3
96 17/08/2013 14/08/2013 53,38618335 3
97 24/08/2013 21/08/2013 217,2800997 3
98 31/08/2013 28/08/2013 49,43590136 3
99 07/09/2013 04/09/2013 64,88783029 3
100 14/09/2013 11/09/2013 11,04300773 3
So at the end I have one mainly question: how can I eliminated the connection between the years? ... and an aesthetic question: how can I add minor ticks on the x_axis? At least one every 6 months, just to make the plot easy to read.
Thanks in advance for any suggestion!
Edit
This is the code I tried with the suggestion, maybe I mistype some part of it.
library(tidyverse)
library(dplyr)
library(lubridate)
df4 <- data.frame(df$Date, df$A)
colnames(df4)<- c("date","A")
df4$date <- as.Date(df4$date,"%Y/%m/%d")
df4$week_day <- as.numeric(format(df4$date, format='%w'))
df4$endofweek <- df4$date + (6 - df4$week_day)
week_aveA <- df4 %>%
group_by(endofweek) %>%
summarise_all(list(mean=mean), na.rm=TRUE) %>%
na.omit()
week_aveA$endofweek <- as.Date(week_aveA$endofweek,"%d/%m/%Y")
week_aveA$A_mean <- as.numeric(gsub(",", ".", week_aveA$A_mean))
week_aveA$week_day_mean <- as.numeric(gsub(",", ".", week_aveA$week_day_mean))
week_aveA$year <- format(week_aveA$endofweek, "%Y")
library(ggplot2)
library(methods)
library(scales)
mylabel <- function(x) {
ifelse(grepl("-07-01$", x), "", format(x, "%Y"))
}
ggplot() +
geom_step(data=week_aveA, aes(x = endofweek, y = A_mean, group = year), colour="gray25") +
scale_y_continuous(expand = c(0, 0), limits = c(0, 2500)) +
scale_x_date(breaks="6 month", labels = mylabel) +
labs(y = expression(A~ ~index),
x = NULL) +
theme(axis.text.x = element_text(size=10),
axis.title = element_text(size=10))
You have to group by year:
Add a variable with the year to your dataset
Map the year variable on the group aesthetic
For the ticks. Increase the number of the breaks. If you want only ticks but not labels you can use a custom function to get rid of unwanted labels, e.g. my approach below set the breaks to "6 month" but replaces the mid-year labels with an empty string:
week_aveA$endofweek <- as.Date(week_aveA$endofweek,"%d/%m/%Y")
week_aveA$A_mean <- as.numeric(gsub(",", ".", week_aveA$A_mean))
week_aveA$week_day_mean <- as.numeric(gsub(",", ".", week_aveA$week_day_mean))
week_aveA$year <- format(week_aveA$endofweek, "%Y")
library(ggplot2)
mylabel <- function(x) {
ifelse(grepl("-07-01$", x), "", format(x, "%Y"))
}
ggplot() +
geom_step(data=week_aveA, aes(x = endofweek, y = A_mean, group = year), colour="gray25") +
scale_y_continuous(expand = c(0, 0), limits = c(0, 2500)) +
scale_x_date(breaks="6 month", labels = mylabel) +
labs(y = expression(A~ ~index),
x = NULL) +
theme(axis.text.x = element_text(size=10),
axis.title = element_text(size=10))

How to use geom_errorbar with facet_wrap in ggplot2

I am facing a problem adding error bars to my plots. I have a data frame like this:
> str(bank1)
'data.frame': 24 obs. of 4 variables:
$ site : Factor w/ 12 levels "BED","BEU","EB",..: 8 9 10 3 11 1 6 7 5 4 ...
$ canopy : Factor w/ 3 levels "M_Closed","M_Open",..: 3 3 3 3 2 2 2 2 1 1 ...
$ variable: Factor w/ 2 levels "depth5","depth10": 1 1 1 1 1 1 1 1 1 1 ...
$ value : int 200 319 103 437 33 51 165 38 26 29 ...
I plot it like this:
gs1 <- ggplot(bank1, aes(x = canopy, y= value , fill = variable)) +
geom_bar(stat='identity', position = 'dodge', fill = 'darkgray')+
xlab("Canopy cover")+ylab("Seed Bank")+
facet_wrap(~variable,nrow=1)
gs1
This gives a plot like this:
My problem is when I want to add the error bars (standard deviation), the code does not run. I use this code:
bank2 <- bank1
bank2.mean = ddply(bank2, .(canopy, variable), summarize,
plant.mean = mean(value), plant.sd = sd(value))
gs1 <- ggplot(bank1, aes(x = canopy, y= value , fill = variable)) +
geom_bar(stat='identity', position = 'dodge', fill = 'darkgray')+
geom_errorbar(aes(ymin=plant.mean-plant.sd, ymax = plant.mean +
plant.sd), width = 0.5)+
xlab("Canopy cover")+ylab("Seed Bank")+
facet_wrap(~variable,nrow=1)
gs1
I searched for help here, here, here and here but I did not understand how to proceed.
Kindly help!
Here I reproduce an example:
> set.seed(1)
> Data1 <- data.frame(
+ site= c("KOA","KOB","KOO","EB","PNS","BED","KB","KER","KAU","KAD","RO","BEU"),
+ variable = sample(c("depth5", "depth10"), 12, replace = TRUE),
+ canopy=sample(c("open", "M_open", "M_closed"), 12, replace = TRUE),
+ value=sample(c(100,500,50,20,112,200,230,250,300,150,160,400))
+ )
> Data1
site variable canopy value
1 KOA depth5 M_closed 20
2 KOB depth5 M_open 112
3 KOO depth10 M_closed 100
4 EB depth10 M_open 400
5 PNS depth5 M_closed 230
6 BED depth10 M_closed 50
7 KB depth10 M_open 250
8 KER depth10 M_closed 200
9 KAU depth10 M_closed 500
10 KAD depth5 open 150
11 RO depth5 M_open 300
12 BEU depth5 open 160
> gs1 <- ggplot(Data1, aes(x = canopy, y= value , fill = variable)) +
+ geom_bar(stat='identity', position = 'dodge', fill = 'darkgray')+
+ xlab("Canopy cover")+ylab("Seed Bank")+
+ facet_wrap(~variable,nrow=1)
> gs1
> Data2 <- Data1
> data2.mean = ddply(Data2, .(canopy, variable), summarize,
+ plant.mean = mean(value), plant.sd = sd(value))
> gs1 <- ggplot(Data2, aes(x = canopy, y= value , fill = variable)) +
+ geom_bar(stat='identity', position = 'dodge', fill = 'darkgray')+
+ geom_errorbar(aes(ymin=plant.mean-plant.sd, ymax = plant.mean +
+ plant.sd), width = 0.5)+
+ xlab("Canopy cover")+ylab("Seed Bank")+
+ facet_wrap(~variable,nrow=1)
> gs1
Error in FUN(X[[i]], ...) : object 'plant.mean' not found
I get the same error with my original data
The solution to my problem is here. The way I wanted. You need these packages
library(ggplot2)
library(dplyr)
My data frame bank1 was piped into a new data frame cleandata to calculate the mean, sd and se and summarize the results
cleandata <- bank1 %>%
group_by(canopy, variable) %>%
summarise(mean.value = mean(value),
sd.value = sd(value), count = n(),
se.mean = sd.value/sqrt(count))
The summarized results look like this:
> head(cleandata)
# A tibble: 6 x 6
# Groups: canopy [3]
canopy variable mean.value sd.value count se.mean
<fct> <fct> <dbl> <dbl> <int> <dbl>
1 Open depth5 265. 145. 4 72.4
2 Open depth10 20.5 12.8 4 6.41
3 M_Open depth5 71.8 62.6 4 31.3
4 M_Open depth10 6.5 4.20 4 2.10
5 M_Closed depth5 20 8.98 4 4.49
6 M_Closed depth10 0.5 1 4 0.5
Finally, the plotting was done with this piece of code:
gs1 <- ggplot(cleandata, aes(x=canopy, y=mean.value)) +
geom_bar(stat = "identity", color = "black", position = position_dodge())+
geom_errorbar(aes(ymin = mean.value - sd.value, ymax = mean.value + sd.value),
width=0.2)+
xlab("Canopy cover")+ylab("Seed Bank")+
facet_wrap(~variable,nrow=1)
gs1
This gives a graph with error bars (standard deviation) as given below
Problem solved! Cheers!

R: ggplot2 - plots doesn't look right - vertical instead of diagonal lines

I'd like to create a plot for 2 categorical variables. Therefore I created two dummy sets but - although they contain the same items - they look totally different. Same happens with my real data as well.
I also tried to perform it with ordered columns, same result.
Please see attached my coding, the three plots (plot 2 is the best practice) + real data plot (but anonymized) to show you problem. Don't understand why there are those vertical lines.
Thank you in advance
library(ggplot2)
library(dplyr)
dat1 <- data.frame(
sex = factor(c("Male","Female","Male","Female")), levels=c("Female","Male"),
time = factor(c("Lunch","Dinner","Lunch","Dinner"), levels=c("Lunch","Dinner")),
total_bill = c(17.42, 16.81, 16.24, 13.53)
)
dat1
#plot1: shows horizontal lines although it should look like the plot 2
ggplot(data=dat1, aes(x=sex, y=total_bill, group=time, colour=time)) +
geom_line() +
geom_point() +
ggtitle("Plot 1")
#different approach for plot1
arrange(dat1 , sex, time)
dat1
#has ordered columns like I wanted it to be
#still looks like plot1
ggplot(data=dat1, aes(x=sex, y=total_bill, group=time, colour=time)) +
geom_line() +
geom_point() +
ggtitle("Plot 1 ordered")
dat2 <- data.frame(
sex = factor(c("Female","Female","Male","Male")),
time = factor(c("Lunch","Dinner","Lunch","Dinner"), levels=c("Lunch","Dinner")),
total_bill = c(13.53, 16.81, 16.24, 17.42)
)
dat2
#plot2: look like I'd like to have it this way
ggplot(data=dat2, aes(x=sex, y=total_bill, group=time, colour=time)) +
geom_line() +
geom_point() +
ggtitle("Plot 2")
Also an outline of the data plot which has those vertical lines in it
library(ggplot2)
library(dplyr)
mydata2
# ####Output (fictive data but same problem as with my real data, still vertical lines in it but should only have 2 lines like in plot2)
# group NM score
# 1 KG S 2537.94514
# 2 EG S 1766.39019
# 3 KG S 3970.91802
# 4 KG S 4089.14939
# 5 KG S 2795.42964
# 6 EG S 2286.60411
# 7 KG S 4027.22993
# 8 KG S 1030.18328
# 9 EG S 719.73679
# 10 EG S 724.93663
# 11 EG S 2929.03717
# 12 EG S 521.55736
# 13 KG S 1435.85625
# 14 EG S 1496.39471
# 15 EG S 3521.25827
# 16 KG S 2138.17928
# 17 EG S 1233.86267
# 18 KG S 591.33086
# 19 EG S 2171.97341
# 20 EG S 3871.92536
# 21 EG S 468.10133
# 22 KG S 2419.67419
# 23 KG S 1338.29305
# 24 KG S 1629.33862
# 25 EG S 560.39680
# 26 EG S 546.22468
# 27 KG S 3398.94647
# 28 KG S 1117.72716
# 29 EG S 2794.90527
# 30 EG S 3606.77693
# 31 KG S 3558.67156
# 32 KG S 196.64992
# 33 EG S 2174.69930
# 34 EG S 3444.10732
# 35 KG S 670.60907
# 36 EG S 3719.20997
# 37 KG S 65.76227
# 38 EG S 3420.12225
# 39 KG S 1405.83738
# 40 KG S 2859.33873
# 41 EG T 1296.75111
# 42 EG T 436.53580
# 43 KG T 213.09334
# 44 EG T 2073.70465
# 45 KG T 1679.98816
# 46 EG T 1599.26738
# 47 EG T 777.65179
# 48 EG T 1738.45395
# 49 KG T 3269.54120
# 50 EG T 3506.07302
# 51 EG T 1764.61915
# 52 EG T 493.47846
# 53 KG T 1729.02949
# 54 EG T 1454.57702
# 55 EG T 2577.32018
# 56 EG T 295.08653
# 57 EG T 3811.24064
# 58 KG T 2320.35879
# 59 EG T 1285.65291
# 60 KG T 3600.26095
# 61 EG T 3738.89452
# 62 KG T 3472.53512
# 63 KG T 1203.33462
# 64 EG T 1809.41229
# 65 EG T 3536.17972
# 66 EG T 2637.59869
# 67 KG T 1279.44567
# 68 KG T 1141.81247
# 69 KG T 3951.54206
# 70 KG T 1940.11505
# 71 KG T 192.74602
# 72 KG T 1235.81839
# 73 EG T 1907.09384
# 74 KG T 1772.86806
# 75 KG T 997.92437
# 76 KG T 217.81433
# 77 KG T 3595.69359
# 78 EG T 910.07955
# ####End of output
ggplot(data=mydata2, aes(x=group, y=score, group=NM, shape=NM, colour=NM)) +
geom_line(aes(linetype=NM), size=1) + # Set linetype by sex
geom_point(size=3, fill="white") + # Use larger points, fill with white
expand_limits(y=0) + # Set y range to include 0
scale_colour_hue(name="Sex of participant", # Set legend title
l=30) + # Use darker colors (lightness=30)
scale_shape_manual(name="Sex of participant",
values=c(22,21)) + # Use points with a fill color
scale_linetype_discrete(name="Sex of participant") +
xlab("Group") + ylab("Score") + # Set axis labels
ggtitle("Data") + # Set title
theme_bw() +
theme(legend.position=c(.7, .4)) # Position legend inside
# This must go after theme_bw
Plot1
Plot1 ordered
Plot2
Data plot which looks wrong
the data2 that makes the plot you want looks like this:
> dat2
sex time total_bill
1 Female Lunch 13.53 # female has lunch and dinner
2 Female Dinner 16.81
3 Male Lunch 16.24 # male has lunch and dinner
4 Male Dinner 17.42
However your data1 before and after the arrange looks like this:
before
sex levels time total_bill
1 Male Female Lunch 17.42
2 Female Male Dinner 16.81 # female only has dinner
3 Male Female Lunch 16.24 # male only has lunch
4 Female Male Dinner 13.53
after
sex levels time total_bill
1 Female Male Dinner 16.81 # female only has dinner
2 Female Male Dinner 13.53
3 Male Female Lunch 17.42 # male only has lunch
4 Male Female Lunch 16.24
In both data sets females either only have lunch and males only have dinner. Also for the levels column there is the same problem.
So the lines in your plots are drawn between the two female points at dinner, rather than across females points at lunch and dinner.
For your updated question, I run your code with the column names like this:
# group NM sex score
# 1 KG S 2537.945
# 2 EG S 1766.390
# 3 KG S 3970.918
and get this? what is the issue?

ggplot create map with arrows

I have a data frame like this
id lon lat
1 A -69.5 -58.5
2 A -69.5 -58.5
3 A -69.5 -57.5
4 A -68.5 -57.5
5 A -68.5 -57.5
6 A -68.5 -57.5
7 A -66.5 -57.5
8 A -68.5 -56.5
9 A -68.5 -56.5
10 A -67.5 -56.5
11 A -65.5 -56.5
12 A -65.5 -56.5
13 A -65.5 -55.5
14 A -62.5 -54.5
15 B -177 -52.5
16 B -178 -50.5
17 B -179 -48.5
18 B 179 -47.5
19 B 178 -46.5
20 B 177 -46.5
and I want to produce a map of the position of A and B, linked by oriented lines. However when ids cross the Pacific (lon=-180 -> lon=+180) I get an arrow crossing the whole figure, like shown below.
This is the code I am using
worldmap = map_data("world")
ggplot(test, aes(x = lon, y=lat, colour = factor(id))) +
geom_polygon(data=worldmap,center=180,aes(x=long, y=lat, group=group), fill="black",colour="black") +
xlab("") +ylab("")+theme(axis.text=element_blank(),axis.ticks=element_blank())+ theme(panel.background = element_rect(fill = 'white', colour = 'black') ,panel.grid.major = element_blank(),panel.grid.minor = element_blank())+
geom_path(size =2,arrow = arrow(angle=30,length = unit(0.6, "inches")))
How can I fix it?
Thanks
I guess that depends on what you think the "right" think to do is. I decided to break up the pathes that cross the glob into two segments by adding in points at the edge of the map, and then creating a "sequence" indicator so ggplot knows which lines to connect. Here's the transformation for your sample data
test2 <- do.call(rbind, lapply(split(test, test$id), function(x) {
cp <- cumsum(c(FALSE, diff(x$lon)>250))
xx<-split(x, cp)
xx<-Map(cbind, xx, seq=seq_along(xx))
Reduce(function(a,b) {
lasta<-a[nrow(a),]
firstb<-b[1,]
lasta$lon <- 180*sign(lasta$lon)
firstb$lon <- 180*sign(firstb$lon)
lasta$lat <- mean(lasta$lat, firstb$lat)
firstb$lat <- lasta$lat
rbind(a,lasta, firstb,b)
}, xx)
}))
tail(test2)
# id lon lat seq
# B.17 B -179 -48.5 1
# B.171 B -180 -48.5 1
# B.18 B 180 -48.5 2
# B.181 B 179 -47.5 2
# B.19 B 178 -46.5 2
# B.20 B 177 -46.5 2
here you can see that we've broken the B line up into two sequences. Then if we use a group aesthetic
geom_path(aes(group=interaction(id, seq)), ...)
then R will only connect those points that are in the same id/seq group. This will prevent the line from going across the ocean. However, because we are drawing two lines for that group rather than one, there's no way to turn of the arrow head for just one of the segments. you might want to find another way to indicate start/end.

Resources