Reorder Bar Chart Output in ggplot2 when using as.factor - r

I was hoping someone could help.
I have a DF as follows:
Year Winner
1930 Uruguay
1934 Italy
1938 Italy
1950 Uruguay
1954 Germany FR
1958 Brazil
1962 Brazil
1966 England
1970 Brazil
....
and so on
What I want to do is create a bar chart with ggplot2, but reorder it so the country with the highest number of winners comes first.
The code I've used to generate my current graph is:
ggplot(data, aes(x=as.factor(Winner), fill=as.factor(Winner) )) +
geom_bar() +
theme(legend.position = "none")
I know there's something about reorder but I can't get it to work with the as.factor argument.
Thanks

I got around this problem using forcats
require(forcats)
ggplot(data, aes(fct_infreq(Winner), fill=as.factor(Winner))) +
geom_bar()+
theme(legend.position = "none")

Related

Specify the colour of ggpairs plot using a variable but not plot that variable

I have a dataset from the world bank with some continuous and categorical variables.
> head(nationsCombImputed)
iso3c iso2c country year.x life_expect population birth_rate neonat_mortal_rate region
1 ABW AW Aruba 2014 75.45 103441 10.1 2.4 Latin America & Caribbean
2 AFG AF Afghanistan 2014 60.37 31627506 34.2 36.1 South Asia
3 AGO AO Angola 2014 52.27 24227524 45.5 49.6 Sub-Saharan Africa
4 ALB AL Albania 2014 77.83 2893654 13.4 6.5 Europe & Central Asia
5 AND AD Andorra 2014 70.07 72786 20.9 1.5 Europe & Central Asia
6 ARE AE United Arab Emirates 2014 77.37 9086139 10.8 3.6 Middle East & North Africa
income gdp_percap.x log_pop
1 High income 47008.83 5.014693
2 Low income 1942.48 7.500065
3 Lower middle income 7327.38 7.384309
4 Upper middle income 11307.55 6.461447
5 High income 30482.64 4.862048
6 High income 67239.00 6.958379
I wish to use ggpairs to plot some of the continuous variables (life_expect, birth_rate, neonat_mortal_rate, gdp_percap.x) in a scatter plot but I would like to colour them using the region categorical variable from the data. I have tried a number of different ways but I cannot colour the continuous variables without including the categorical variable.
ggpairs(nationsCombImputed[,c(2,5,7,8,9,11)],
title="Scatterplot of Variables",
mapping = ggplot2::aes(color = region),
labeller = "iso2c")
But I get this error
Error in stop_if_high_cardinality(data, columns,
cardinality_threshold) : Column 'iso2c' has more levels (211) than
the threshold (15) allowed. Please remove the column or increase the
'cardinality_threshold' parameter. Increasing the
cardinality_threshold may produce long processing times
Ultimately I would just like a 4x4 scatter plot of the continuous variables coloured by region with the data points labels using the iso2c code in column 2.
Is this possible in ggpairs?
Well yes it is possible! As per #Robin Gertenbach suggestions I added the columns argument to my code and this worked great, please see below.
ggpairs(nationsCombImputed,
title="Scatterplot of Variables",
columns = c(5,7,8,11),
mapping=ggplot2::aes(colour = region))
I still wish to add data point labels to the scatter plot using the iso2c column but I am struggling with this, any pointers would be greatly appreciated.
As mentioned in the comment you can get ggpairs to color but not plot a dimension by specifying the numeric indices of the columns you do want to plot with columns = c(5,7,8,11).
To have a text scatter plot you will need to define a function e.g. textscatter that you will supply via lower = list(continuous = textscatter) in the ggpairs function call and specify the labels in the aesthetics.
textscatter <- function(data, mapping, ...) {
ggplot(data, mapping, ...) + geom_text()
}
ggpairs(
nationsCombImputed,
title="Scatterplot of Variables",
columns = c(5,7,8,11),
mapping=ggplot2::aes(colour = region, label = iso2c))
lower = list(continuous = textscatter)
)
Of course you can also put the label aesthetic definition into textscatter

Fill geom_area (ggplot2) with a gradient

I am having some troubles applying a gradient fill to my area plot.
The data is as below:
> df
year annual
1 1960 0.0100
2 1961 -0.2700
3 1962 -0.3450
4 1963 -0.6508
5 1964 -0.9458
6 1965 -0.2458
7 1966 0.9492
8 1967 0.5383
9 1968 0.6275
10 1969 0.0000
I've set up a colorRampPalette for the gradient, and I know this works.
spi.cols <- colorRampPalette(c("darkred","red","yellow","white","green","blue","darkblue"),space="rgb")
With the plot, my aim is to have the fill colours follow the values in the annual column. So as to make it easy to tell that values are within certain boundaries. Right now, the plot seems to think every value it is "filling" is equal to zero, and is thus filling it all in one colour only.
ggplot(df, aes(x = year)) +
geom_polygon(aes(y = annual, fill = annual)) +
theme_classic() +
scale_fill_gradientn(colours = spi.cols(12), limits = c(-2.5, 2.5), guide = "legend")
I have also specified the breaks I'd like in my gradient, but I'm not sure how to utilise this. I attempted to use this in values of the scale_fill_gradientn but this was unsuccessful.
spi.breaks <- c(-2.5,-2,-1.6,-1.3,-0.8,-0.5,0.5,0.8,1.3,1.6,2,2.5)
Any help would be much appreciated

R ggplot2 the names of the biggest values on x axis

Hi there) can anybody help me. I have a big DF with two columns Country_dest and SumTotal (is value), trying to use qplot function
qplot(country_dest, SumTotal, data=Africa)
Brunei 58
Aruba 73
Cuba 95
Nicaragua 97
Turkmenistan 99
Saint Lucia 102
Honduras 153
Barbados 161
Haiti 165
Montenegro 175
And I would like to draw a plot, but on x axis put the name of the countries (for example 7 or 6 of them) with the highest value of SumTotal, is it possible to do?)
Thank you in advance!
using ggplot, just reorder by population:
ggplot(data = Africa, aes(x= reorder(country_dest, -SumTotal), y= SumTotal)) + geom_bar(stat = "identity")
if you just wanna take say the top 5 use arrange and then subset:
require(dplyr)
Africa.ordered <- arrange(Africa, -SumTotal)
Africa.top5 <- Africa.ordered[1:5,]
and then draw your plot

Group Data Plot- Nothing works from past questions

I have the following data
Geography Population.Estimate Energy.Consump Employed Year
1 Alameda County, California 1513228 3038.53227 676598 2010
2 Alpine County, California 1163 17.14083 387 2010
3 Amador County, California 37862 140.65325 15103 2011
4 Butte County, California 219973 722.73871 90130 2011
5 Calaveras County, California 45457 198.95724 17085 2012
6 Colusa County, California 21483 63.77387 9489 2012
This is just part of the data from 58 counties.
I want to make a box plot to show x axis -Population and y axis -energy consumption for the years 2010, 2011, 2012. I tried a lot of things but it just doesnt work. Please help me with the plots. I used qplot as well as ggplot. Nothing seems to work on this data :(
I tried this
qplot(factor(Year),data=Population,geom="bar",fill=Population.Estimate,weight=En‌​ergy_Consump,position="dodge", main = "Effect of Energy", xlab="Population",ylab="Energy")
I tried this too
ggplot(Population)+ geom_bar(aes(x=Housing.Units,y=Energy.Consump, fill=factor(Year)),stat="identity")
I am struggling to get it right. I tried the other examples in stack overflow, since I am fairly new to R but nothing seems to work
Is this what you want?
ggplot(data=Population, aes(x=Population.Estimate, y=Energy.Consump, fill=as.factor(Year))) +
geom_bar(colour="black", stat="identity",
position=position_dodge()) + # Thinner lines
xlab("Population") + ylab("Energy Consumption")
Not this?
ggplot(data=Population, aes(x=Year, y=Population.Estimate, fill=Geography)) +
geom_bar(colour="black", stat="identity",
position=position_dodge()) +
xlab("Year") + ylab("Energy Consumption")
Given the large gaps in scale, if you want both the population and the energy consumption on a same graph, IMO, energy consumption per capita is better suited.

R: Plot lines separately by one variable, colored by another

I'm sure this has been done many times, but clearly I'm not searching using the correct terms.
I have some time series data in R with columns like this:
country year deaths region global.region
1 Afghanistan 2006 0.095830775 Asia & Pacific Global South
2 Afghanistan 1994 0.127597064 Asia & Pacific Global South
3 Algeria 2000 0.003278038 Arab States Global South
4 Algeria 2001 0.003230578 Arab States Global South
5 Algeria 1998 0.006746176 Arab States Global South
6 Algeria 1999 0.019952364 Arab States Global South
...
Basically, I want to plot all the lines by country, but I want them colored (and labeled in the legend) by region. I'm hoping to look at some regional trends in the data without trying build an average model (partly because I want to see outliers, partly because a lot of the countries have missing data and I think a good regional model might be difficult for me to make at this point, at best just misleading).
So in the end I'll have, for example, separate lines for Burkina Faso, Algeria, and Cote d'Ivoire plotted, but they'll all be orange. And I'll have separate lines for Afghanistan, Pakistan, and Iran, but they'll all be blue.
It is preferable that it's done with ggplot2 since that's the plotting library I am learning at the moment. But maybe there's a standard way of doing this in R that works across all (most) plot libraries?
Edit: Final solution: Group aesthetic. (Thanks #baptiste)
qplot(data=df, x=year, y=deaths, color=region, group=country) +
geom_line() +
xlab('Year') + ylab('Deaths per 100,000') + ggtitle('Deaths per 100,000 by country (WHO)')
Which makes:
Slightly different than your desired result, but here it goes..
ggplot(df, aes(x = year, y = deaths)) +
geom_line(aes(color = country, linetype = region))
Final solution: Group aesthetic. (Thanks #baptiste)
qplot(data=df, x=year, y=deaths, color=region, group=country) +
geom_line() +
xlab('Year') + ylab('Deaths per 100,000') + ggtitle('Deaths per 100,000 by country (WHO)')

Resources