Group Data Plot- Nothing works from past questions - r

I have the following data
Geography Population.Estimate Energy.Consump Employed Year
1 Alameda County, California 1513228 3038.53227 676598 2010
2 Alpine County, California 1163 17.14083 387 2010
3 Amador County, California 37862 140.65325 15103 2011
4 Butte County, California 219973 722.73871 90130 2011
5 Calaveras County, California 45457 198.95724 17085 2012
6 Colusa County, California 21483 63.77387 9489 2012
This is just part of the data from 58 counties.
I want to make a box plot to show x axis -Population and y axis -energy consumption for the years 2010, 2011, 2012. I tried a lot of things but it just doesnt work. Please help me with the plots. I used qplot as well as ggplot. Nothing seems to work on this data :(
I tried this
qplot(factor(Year),data=Population,geom="bar",fill=Population.Estimate,weight=En‌​ergy_Consump,position="dodge", main = "Effect of Energy", xlab="Population",ylab="Energy")
I tried this too
ggplot(Population)+ geom_bar(aes(x=Housing.Units,y=Energy.Consump, fill=factor(Year)),stat="identity")
I am struggling to get it right. I tried the other examples in stack overflow, since I am fairly new to R but nothing seems to work

Is this what you want?
ggplot(data=Population, aes(x=Population.Estimate, y=Energy.Consump, fill=as.factor(Year))) +
geom_bar(colour="black", stat="identity",
position=position_dodge()) + # Thinner lines
xlab("Population") + ylab("Energy Consumption")
Not this?
ggplot(data=Population, aes(x=Year, y=Population.Estimate, fill=Geography)) +
geom_bar(colour="black", stat="identity",
position=position_dodge()) +
xlab("Year") + ylab("Energy Consumption")
Given the large gaps in scale, if you want both the population and the energy consumption on a same graph, IMO, energy consumption per capita is better suited.

Related

Making a spiral column chart with ggplot?

I am trying to make a spiral side-by-side bar chart like the following examples:
The previous one is from Spiral barplot using ggplot & coord_polar (Condegram), but I couldn't make it work for my data.
Using ggplot, this is as far as I have gotten:
data <- read.table(text="year group1 group2
1973 25939 27147
1978 21086 23108
1989 28401 24010
1995 34601 25457
2000 38672 28894
2007 40874 34926
2009 43892 38169
2013 48028 39270
2014 47289 39948
2015 48261 41913
2016 49814 42373
2017 50346 42818",header=T)
data$year <- as.character(data$year)
data <- data %>% gather(group, value, group1:group2)
ggplot(data)+
geom_bar(aes(x=year, y=value, fill=group), stat="identity", position = "dodge") +
coord_polar()
Which produces an ugly spiral bar chart
I'm not sure how to make the bottoms square and add the space the white spiral needs. Any help and explanation would be greatly appreciated!

Reorder Bar Chart Output in ggplot2 when using as.factor

I was hoping someone could help.
I have a DF as follows:
Year Winner
1930 Uruguay
1934 Italy
1938 Italy
1950 Uruguay
1954 Germany FR
1958 Brazil
1962 Brazil
1966 England
1970 Brazil
....
and so on
What I want to do is create a bar chart with ggplot2, but reorder it so the country with the highest number of winners comes first.
The code I've used to generate my current graph is:
ggplot(data, aes(x=as.factor(Winner), fill=as.factor(Winner) )) +
geom_bar() +
theme(legend.position = "none")
I know there's something about reorder but I can't get it to work with the as.factor argument.
Thanks
I got around this problem using forcats
require(forcats)
ggplot(data, aes(fct_infreq(Winner), fill=as.factor(Winner))) +
geom_bar()+
theme(legend.position = "none")

Create an Index Chart in R - relative starting point

I need to look at relative change in 2 groups of data which have very different scales.
I would therefore think that by setting my first value to 100% and then creating a proportion to that value per group is the way forward. I can then create a line chart to show the relative movement.
I would call this an index chart so may have missed existing questions.
However I don't know how to set my data up in R to do this.
My aggregated data below. I want each of 1999 to be 100% and the subsequent years to be % of that.
> Totals
year fips Emissions
1 1999 06037 6109.6900
2 2002 06037 7188.6802
3 2005 06037 7304.1149
4 2008 06037 6421.0170
5 1999 24510 403.7700
6 2002 24510 192.0078
7 2005 24510 185.4144
8 2008 24510 138.2402
I'm probably going to want to add a bar chart behind it to show weighting too as relative change is much more dramatic for smaller data. Tips on that are appreciated too but I've not searched for that yet as the above is the primary issue IMO.
Appreciate your help.
James
For example with dplyr:
library(dplyr)
dat <-
df1 %>%
group_by(fips) %>%
mutate(ind = Emissions / first(Emissions))
And using ggplot2 to plot a line chart:
library(ggplot2)
ggplot(dat, aes(x = year, y = ind, color = as.factor(fips))) +
geom_line()

R: Plot lines separately by one variable, colored by another

I'm sure this has been done many times, but clearly I'm not searching using the correct terms.
I have some time series data in R with columns like this:
country year deaths region global.region
1 Afghanistan 2006 0.095830775 Asia & Pacific Global South
2 Afghanistan 1994 0.127597064 Asia & Pacific Global South
3 Algeria 2000 0.003278038 Arab States Global South
4 Algeria 2001 0.003230578 Arab States Global South
5 Algeria 1998 0.006746176 Arab States Global South
6 Algeria 1999 0.019952364 Arab States Global South
...
Basically, I want to plot all the lines by country, but I want them colored (and labeled in the legend) by region. I'm hoping to look at some regional trends in the data without trying build an average model (partly because I want to see outliers, partly because a lot of the countries have missing data and I think a good regional model might be difficult for me to make at this point, at best just misleading).
So in the end I'll have, for example, separate lines for Burkina Faso, Algeria, and Cote d'Ivoire plotted, but they'll all be orange. And I'll have separate lines for Afghanistan, Pakistan, and Iran, but they'll all be blue.
It is preferable that it's done with ggplot2 since that's the plotting library I am learning at the moment. But maybe there's a standard way of doing this in R that works across all (most) plot libraries?
Edit: Final solution: Group aesthetic. (Thanks #baptiste)
qplot(data=df, x=year, y=deaths, color=region, group=country) +
geom_line() +
xlab('Year') + ylab('Deaths per 100,000') + ggtitle('Deaths per 100,000 by country (WHO)')
Which makes:
Slightly different than your desired result, but here it goes..
ggplot(df, aes(x = year, y = deaths)) +
geom_line(aes(color = country, linetype = region))
Final solution: Group aesthetic. (Thanks #baptiste)
qplot(data=df, x=year, y=deaths, color=region, group=country) +
geom_line() +
xlab('Year') + ylab('Deaths per 100,000') + ggtitle('Deaths per 100,000 by country (WHO)')

Trying to plot temperature and count data on same plot using xyplot?

I am using the xyplot in lattice trying to make a plot that shows temperature change over time in correlation with count data. I am not sure if ggplot2 would be better? My data is arrange like this:
Year (1998 1998 1999 2000 2001 2001 2002)
Low (2.777778 8.333330 10.555556 4.444444 26.388889 15.555556 12.500000)
Geese (2 14 10 16 7 10 15)
State (Arkansas California California California California Florida California)
I am stuck at this part of the code:
xyplot(c(geese,low)~year,subset=state=="California", par.settings=bwtheme, auto.key=TRUE)
The plot has the geese and low (temperature) as the same type of point and if I add a line there is no separation between the two. Please any help for this would be awesome.
To plot multiple series on the same plot, use + rather than c() to specify multiple y values. For example
xyplot(geese + low ~year, subset=state=="California", auto.key=TRUE, type="b")
That will produce

Resources