How to reverse coordinates on a line graph ggplot2 R - r

I'm working on a data visualization project and am making some line graphs. This is my data set:
groupA <- read.csv("afcongroupA.csv", header=T, row.names=NULL)
groupA
Date Team Position
1 1/12 South Africa 56
2 1/12 Angola 85
3 1/12 Morocco 61
4 1/12 Cape Verde Islands 58
5 4/12 South Africa 71
6 4/12 Angola 78
7 4/12 Morocco 62
8 4/12 Cape Verde Islands 76
9 8/12 South Africa 67
10 8/12 Angola 85
11 8/12 Morocco 68
12 8/12 Cape Verde Islands 78
13 12/12 South Africa 87
14 12/12 Angola 84
15 12/12 Morocco 72
16 12/12 Cape Verde Islands 69
I then plotted them on a line graph to show the rise of decline in position standings:
groupA$Date <- factor(groupA$Date, levels=groupA$Date[!duplicated(groupA$Date)])
ggplot(groupA, aes(x=Date, y=Position, colour=Team, group=Team)) + geom_line()
What I want to do is reverse the y-axis so that the largest number is at the bottom. I tried this bit of code:
groupA <- coord_flip() + scale_x_reverse()
But I get this error message:
Error in coord_flip() + scale_x_reverse() :
non-numeric argument to binary operator
I'm using R 2.15.2 on a Mac running OS X.

As your column Date is a factor then scale_x_reverse() won't work. One solution is to order your levels of factors in data frame
groupA$Date <- factor(groupA$Date, levels=rev(unique(groupA$Date)))
Then just use your code to make plot and flip axis.
ggplot(groupA, aes(x=Date, y=Position, colour=Team, group=Team)) +
geom_line()+coord_flip()

Related

How to plot multiple variables on the same plot in ggplot in geom_line after melting?

I have a tibble where the column names of original df were given by values in variable col that i melted into long format using id.vars=Country to get this using melt. This is to plot the different values of AGR_LogLabProd, MIN_LogLabProd, MAN_LogLabProd by year on the same x-axis .
CHN4
Country Year variable value
---------------------------
1 CHN 1958 AGR_LogLabProd 14.81782
2 CHN 1959 AGR_LogLabProd 14.61870
3 CHN 1960 AGR_LogLabProd 14.41969
4 CHN 1961 AGR_LogLabProd 14.28257
5 CHN 1958 MIN_LogLabProd 13.67850
6 CHN 1959 MIN_LogLabProd 14.24685
7 CHN 1960 MIN_LogLabProd 14.57734
8 CHN 1961 MIN_LogLabProd 14.59046
9 CHN 1958 MAN_LogLabProd 13.29359
10 CHN 1959 MAN_LogLabProd 13.86194
11 CHN 1960 MAN_LogLabProd 14.19243
12 CHN 1961 MAN_LogLabProd 14.20556
I use ggplot(CHN4, aes(x=Year, y=value))+geom_line()but its giving me a strange plot (given in the attached image) , not seperate lines for each variable in the variable column as expected . Any clue to whats going wrong?
This is a pretty common problem. You need to include a grouping variable. If you want to use color for every different level, you would use
library(ggplot2)
ggplot(CHN4, aes(x=Year, y=value, color = variable)) +
geom_line()
but if you don't care for colors, you can do
library(ggplot2)
ggplot(CHN4, aes(x=Year, y=value, group = variable)) +
geom_line()

How to plot correct colors in R maps library

I am trying to plot specific colors for specific countries using R maps library. I can fill in the colors but they are not correctly associated with their respective countries. I wonder if someone could have a clue why?
My data frame is «filld» and has 3 columns: the first is the countries names, the second is just some numeric data, and the 3rd is the color:
countries toplot color
1 Argentina -1 red
2 Armenia -1 red
3 Australia -1 red
4 Bahrain -1 red
5 Botswana -1 red
6 Belgium -1 red
7 Bulgaria -1 red
8 Canada -1 red
9 Chile -1 red
10 Taiwan -1 red
11 Croatia -1 red
12 Czech Republic -1 red
13 UK:Great Britain -1 red
14 Egypt -1 red
15 Denmark -1 red
16 Finland 0 yellow
17 France 0 yellow
18 Georgia 0 yellow
19 Germany 0 yellow
20 China:Hong Kong 0 yellow
21 Hungary 0 yellow
22 Indonesia 0 yellow
23 Iran 0 yellow
24 Ireland 0 yellow
25 Israel 0 yellow
26 Italy 0 yellow
27 Japan 0 yellow
28 Jordan 0 yellow
29 Kazakhstan 1 darkgreen
30 Korea 1 darkgreen
31 Kuwait 1 darkgreen
32 Lebanon 1 darkgreen
33 Lithuania 1 darkgreen
34 Malaysia 1 darkgreen
35 Malta 1 darkgreen
36 Morocco 1 darkgreen
37 Netherlands 1 darkgreen
38 New Zealand 1 darkgreen
39 UK:Northern Ireland 1 darkgreen
40 Norway 1 darkgreen
41 Oman 1 darkgreen
42 Palestine 1 darkgreen
43 Poland 1 darkgreen
44 Portugal 1 darkgreen
45 Qatar 1 darkgreen
46 Russia 1 darkgreen
47 Saudi Arabia 0 yellow
48 Serbia 0 yellow
49 Singapore 0 yellow
50 Slovak Republic 0 yellow
51 Slovenia -1 red
52 South Africa -1 red
53 Spain -1 red
54 Sweden -1 red
55 Thailand 1 darkgreen
56 Turkey 1 darkgreen
57 United Arab Emirates 0 yellow
58 USA 1 darkgreen
This is the code I am using:
library(maps) # Provides functions that let us plot the maps
library(mapdata) # Contains the hi-resolution points that mark out the countries.
map('world', filld$countries, fill=T, border="darkgray", col=filld$color)
map('world', col="darkgray", add=T)
But this is the colors I am getting:
Australia should be filled in red, but is green; Spain should be filled in red, but is yellow; France should be filled yellow but it darkgreen;etc...
Some countries are ok though, e.g. the USA should be and is darkgreen.
Any comments will be appreciated. Thanks!
I'm not entirely sure what creates the problem, but plotting the world first and then filling by color does the trick.
map('world', col='darkgray')
for (color in unique(filld$color)) {
map('world', regions=filld$countries[which(filld$color==color)], fill=T, border="darkgray", col=color,add=T)
}
The cause of the original problem is that
map('world', filld$countries, fill=T, border="darkgray", col=filld$color)
does not return a set of polygons with exactly the same length as the colour vector. If a country consists of several polygons (e.g. islands), these are all separate. Japan, to give just one example that appears in your data, consists of 34 polygons:
z <- map('world',region='japan')
z$names
So the colours are no longer correctly aligned.
You could simply add the option exact=TRUE, but then only the main polygon of each country would be coloured (the one which fits the name exactly), and that isn't even defined for all countries.
For choropleths with the 'maps' package, your best solution is to use match.map(), which gives consecutive numbers to the chosen regions (all polygons):
sel_c <- match.map("world",filld$countries)
map('world',col=filld$col[sel_c],border="darkgrey",fill=TRUE)
Following #Richard suggestion:
library(maps)
library(ggplot2)
map <- map_data("world")
map <- subset(map, region!="Antarctica")
map <- spTransform(map, CRS("+proj=robin")) #Not working, don't know why...
TimssCountries<-ggplot() +
geom_polygon(data = map, aes(x=long, y = lat, group = group), fill = NA, colour="darkgray", size=0.25)+
geom_map(data=filld,map=map,aes(map_id=country, x=lon, y=lat), fill = "filld$color", colour = "gray") +
coord_equal()
TimssCountries
But, I don't know how to add the color legend with ggplot, so to have a similar effect to this other map:
Thanks!...
Just for referece if someone is looking for a similar solution:scale_fill_identity plots the colors in the correct order. The full code is:
TimssDif<-TimssCountries +
geom_map(data = data, map = map, aes(map_id = country, fill = color), colour="darkgray") +
theme(legend.title = element_blank()) + # omit plot title saying 'color'
scale_fill_identity("Title legend", labels = c("Below mean", "At mean", "Above mean"), breaks = plotclr, guide = "legend")
TimssDif + theme(legend.position = "bottom")

R Merging Boxplots

I am trying to use R to show a merged boxplot, I am sure this is easy, I just am missing something:
boxplot(WHO$Male, WHO$Female, ylim=c(0,100))
boxplot(WHO$Female ~ WHO$Year, ylim=c(0,100))
boxplot(WHO$Male ~ WHO$Year, ylim=c(0,100))
All three work, but when I try:
boxplot(WHO$Male ~ WHO$Year, WHO$Female ~ WHO$Year, ylim=c(0,100))
It returns:
Error in as.data.frame.default(data) :
cannot coerce class ""formula"" to a data.frame
Note, Year, only contains three numbers, 1990, 2000, 2010
> head(WHO)
Year WHO.region Country Male Female
1 1990 Africa Algeria 66 68
2 1990 Africa Angola 39 43
3 1990 Africa Benin 45 50
4 1990 Africa Botswana 63 66
5 1990 Africa Burkina Faso 45 49
6 1990 Africa Burundi 47 50
reshape2 package does something similar. Actually there was quite similar question - Plot multiple boxplot in one graph, maybe it will be helpful.

How to drop unused factors in faceted R ggplot boxplot?

Below is some example code I use to make some boxplots:
stest <- read.table(text=" site year conc
south 2001 5.3
south 2001 4.67
south 2001 4.98
south 2002 5.76
south 2002 5.93
north 2001 4.64
north 2001 6.32
north 2003 11.5
north 2003 6.3
north 2004 9.6
north 2004 56.11
north 2004 63.55
north 2004 61.35
north 2005 67.11
north 2006 39.17
north 2006 43.51
north 2006 76.21
north 2006 158.89
north 2006 122.27
", header=TRUE)
require(ggplot2)
ggplot(stest, aes(x=year, y=conc)) +
geom_boxplot(horizontal=TRUE) +
facet_wrap(~site, ncol=1) +
coord_flip() +
scale_y_log10()
Which results in this:
I tried everything I could think of but cannot make a plot where the south facet only contains years where data is displayed (2001 and 2002). Is what I am trying to do possible?
Here is a link (DEAD) to the screenshot showing what I want to achieve:
Use the scales='free.x' argument to facet_wrap. But I suspect you'll need to do more than that to get the plot you're looking for.
Specifically aes(x=factor(year), y=conc) in your initial ggplot call.
A simple way to circumvent your problem (with a fairly good result):
generate separately the two boxplots and then join them together using the grid.arrange command of the gridExtra package.
library(gridExtra)
p1 <- ggplot(subset(stest,site=="north"), aes(x=factor(year), y=conc)) +
geom_boxplot(horizontal=TRUE) + coord_flip() + scale_y_log10(name="")
p2 <- ggplot(subset(stest,site=="south"), aes(x=factor(year), y=conc)) +
geom_boxplot(horizontal=TRUE) + coord_flip() +
scale_y_log10(name="X Title",breaks=seq(4,6,by=.5)) +
grid.arrange(p1, p2, ncol=1)

ggplot2 + Date structure using scale X

I really need help here because I am way beyond lost.
I am trying to create a line chart showing several teams' performance over a year. I divided the year into quarters: 1/1/2012, 4/1/12. 8/1/12. 12/1/12 and loaded the csv data frame into R.
Month Team Position
1 1/1/12 South Africa 56
2 1/1/12 Angola 85
3 1/1/12 Morocco 61
4 1/1/12 Cape Verde Islands 58
5 4/1/12 South Africa 71
6 4/1/12 Angola 78
7 4/1/12 Morocco 62
8 4/1/12 Cape Verde Islands 76
9 8/1/12 South Africa 67
10 8/1/12 Angola 85
11 8/1/12 Morocco 68
12 8/1/12 Cape Verde Islands 78
13 12/1/12 South Africa 87
14 12/1/12 Angola 84
15 12/1/12 Morocco 72
16 12/1/12 Cape Verde Islands 69
When I try using ggplot2 to generate the graph the fourth quarter 12/1/12 inexplicably moves to the second spot.
ggplot(groupA, aes(x=Month, y=Position, colour=Team, group=Team)) + geom_line()
I then put this plot into a variable GA in order to try to use scale_x to format the date:
GA + scale_x_date(labels = date_format("%m/%d"))
But I keep getting this Error:
Error in structure(list(call = match.call(), aesthetics = aesthetics, :
could not find function "date_format"
And if I run this code:
GA + scale_x_date()
I get this error:
Error: Invalid input: date_trans works with objects of class Date only
I am using a Mac OS X running R 2.15.2
Please help.
Its because, df$Month, (assuming your data.frame is df), which is a factor has its levels in this order.
> levels(df$Month)
# [1] "1/1/12" "12/1/12" "4/1/12" "8/1/12"
The solution is to re-order the levels of your factor.
df$Month <- factor(df$Month, levels=df$Month[!duplicated(df$Month)])
> levels(df$Month)
# [1] "1/1/12" "4/1/12" "8/1/12" "12/1/12"
Edit: Alternate solution using strptime
# You could convert Month first:
df$Month <- strptime(df$Month, '%m/%d/%y')
Then your code should work. Check the plot below:

Resources