scaling x and y axis (geom_bar) - r

I am having a trouble plotting what is seemingly simple plot.
x <-
read_excel("Desktop/Book1.xlsx",
col_types = c("numeric", "numeric", "numeric"))
x1 <- gather(hospitals, key = "sector", value = "count", 2:3)
p <- ggplot(data = x1, aes( x = Years, y = count, fill = sector )) +
geom_col(position="stack", stat="identity", width = 5, colour="black") +
geom_text(aes(label=count), vjust=1, color="white", size=2) +
guides(fill=FALSE)+
scale_fill_grey() +
theme_bw(base_size = 12 )
p
data is
1 1946 Public hospitals 35
2 1984 Public hospitals 41
3 2000 Public hospitals 65
4 2001 Public hospitals 67
5 2002 Public hospitals 66
6 2003 Public hospitals 76
7 2004 Public hospitals 77
8 2005 Public hospitals 85
9 2006 Public hospitals 90
10 2007 Public hospitals 94
11 2008 Public hospitals 97
12 2009 Public hospitals 102
13 2010 Public hospitals 102
14 1946 Private hospitals NA
15 1984 Private hospitals 139
16 2000 Private hospitals 325
17 2001 Private hospitals 336
18 2002 Private hospitals 343
19 2003 Private hospitals 364
20 2004 Private hospitals 376
21 2005 Private hospitals 376
22 2006 Private hospitals 353
23 2007 Private hospitals 355
24 2008 Private hospitals 365
25 2009 Private hospitals 370
26 2010 Private hospitals 376
Showing 12 to 26 of 26 entries, 3 total columns
and i am ending with this result!
first, how can I modify x axis to show the bars separated and only for the years i have data for ? [ can the x axis around 1960 be omitted and bars squeezed to save space ?
second, how can the Y axis be fixed ? some bars are higher than their value is!

x1 %>%
ggplot(aes( x = as.character(Years), y = count, fill = sector )) +
geom_col(position="stack", colour="black") +
geom_text(aes(label=count), vjust=1, size=2,
color=ifelse(df$sector != "Public hospitals", "white", "black")) +
guides(fill=FALSE) +
scale_x_discrete(name = "Year") +
scale_fill_grey() +
theme_bw(base_size = 12)
Edit: Upon reconsideration I realized I did not properly stack the positioning of the text. It happens to look ok with this data, but that's just coincidence. To get the right positioning for the text, one approach is manual: we could sum up the cumulative height for each year:
x1 %>%
group_by(Years) %>%
mutate(cuml_count = cumsum(count)) %>%
ungroup() %>% ....
geom_text(aes(label = count, y = cuml_count), vjust = 1, size = 2,
color=ifelse(df$sector != "Public hospitals", "white", "black")) +

Related

CREATE A TIME SERIES PLOT in r with ggplot

I have problems with coding of BIG DATA.
view(data)
Year
Month
Deaths
1998
1
200
1998
2
40
1998
3
185
1998
4
402
1998
5
20
1998
6
48
1998
7
290
1998
8
15
1998
9
252
1998
10
409
1998
11
233
1998
12
122
My data goes until 2014. I would like to create a time series. In the x-Axis only some years are available in 5 year step. In the y axis the deaths of all month during the 2000 years are shown. I don't know how can I code that?
I am not sure if it is right because I didn't have any data. I have this from a programming book
data$date = as.Date(paste(data$Year, data$Month,1), format = "%Y %m %d")
ggplot(data,
aes(
x = date,
y = Deaths,
)) +
geom_line() +
ggtitle("Time series") +
xlab("Year") +
ylab("Deaths")
Update if you want a month break, you can use
scale_x_date(date_breaks = "year", date_labels = "%Y", date_minor_breaks = "month")

Graph to visualize mean group wise and pareto chart in R language

I have a dataset which has regions of a country, states and sales in that state. I want to visualize the mean of that dataset region wise and also a pareto chart to know which state contributes more to the overall regional sales. How to do this in R language. Please help as I'm new to R
#dput for dataset
Region <- c('South','South','South','South','South','Central','Central','Central','North','North','North','North','East','East','East','East','West','West','West','West')
State <- c('TAMIL NADU', 'TELANGANA,'ANDHRA PRADESH','KARNATAKA,'KERALA','MADHYA PRADESH','ORISSA','CHATTISGARH','DELHI','UTTARAKHAND','HARYANA','PUNJAB','ASSAM','MIZORAM','WB','BIHAR','GUJARAT','RAJASTHAN','MAHARASHTRA','GOA')
sales <- C(89,109,92,56,43,103,26,41,126,56,64,98,26,16,61,40,61,101,191,38)
The dataset somewhat looks like this
Region
State
Gdp
South
Tamil Nadu
89
South
Telangana
109
South
Karnataka
92
South
Andhra Pradesh
56
South
Kerala
43
Central
Madhya Pradesh
103
Central
Chattisgarh
26
Central
Orissa
41
North
Delhi
126
North
Punjab
56
North
Haryana
64
North
Uttarakhand
98
East
Assam
26
East
Mizoram
16
East
West Bengal
61
East
Bihar
40
West
Gujarat
61
West
Rajasthan
101
West
Maharashtra
191
West
Goa
38
You did not provide a desired output, so here is my guess at it..
library(data.table)
library(ggplot2)
# setDT(DT) #not needed if your data is already in data.table format
# Order decreasing Gdp
setorder(DT, -Gdp)
# Data wrangling
DT[, `:=`(meanGdp_region = mean(Gdp),
cumGdp = cumsum(Gdp)), by = Region]
DT[, State_f := factor(State, levels = State)]
# Plot
ggplot(data = DT, aes(x = State_f)) +
geom_col(aes(y = Gdp)) +
geom_line(aes(y = cumGdp, group = 1), color = "red") +
geom_hline(aes(yintercept = meanGdp_region), color = "blue") +
facet_wrap(~Region, nrow = 1, scales = "free_x") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
labs(x = "")
sample data used
# Sample data
DT <- fread("Region State Gdp
South Tamil Nadu 89
South Telangana 109
South Karnataka 92
South Andhra Pradesh 56
South Kerala 43
Central Madhya Pradesh 103
Central Chattisgarh 26
Central Orissa 41
North Delhi 126
North Punjab 56
North Haryana 64
North Uttarakhand 98
East Assam 26
East Mizoram 16
East West Bengal 61
East Bihar 40
West Gujarat 61
West Rajasthan 101
West Maharashtra 191
West Goa 38")
Another output guess:
Region <- c('South','South','South','South','South','Central','Central','Central','North','North','North','North','East','East','East','East','West','West','West','West')
State <- c('TAMIL NADU', 'TELANGANA','ANDHRA PRADESH','KARNATAKA','KERALA','MADHYA PRADESH','ORISSA','CHATTISGARH','DELHI','UTTARAKHAND','HARYANA','PUNJAB','ASSAM','MIZORAM','WB','BIHAR','GUJARAT','RAJASTHAN','MAHARASHTRA','GOA')
sales <- c(89,109,92,56,43,103,26,41,126,56,64,98,26,16,61,40,61,101,191,38)
df <- data.frame(Region, State, sales)
df2 <- df %>%
arrange(desc(sales)) %>%
mutate(State = factor(State)) %>%
mutate(cumulative = cumsum(sales)) %>%
mutate(State = fct_inorder(df$State))
ggplot(df2, aes(x=State)) +
geom_bar(aes(y=sales), fill='blue', stat="identity") +
geom_point(aes(y=cumulative), color = rgb(0, 1, 0), pch=16, size=1) +
geom_path(aes(y=cumulative, group=1), colour="slateblue1", lty=3, size=0.9) +
theme(axis.text.x = element_text(angle=90, vjust=0.6)) +
labs(title = "Pareto Plot", x = 'State', y = 'Count')
it's great that you want to explore R. I found few mistakes, these vectors will not work, you forgot to put ' in few places and you should use c instead of C (in the code I grouped by colour States in diff. way compared to previous answer - hope you can choose what works for you).
library(ggplot2)
Region <- c('South','South','South','South','South','Central','Central','Central','North','North','North','North','East','East','East','East','West','West','West','West')
State <- c('TAMIL NADU', 'TELANGANA','ANDHRA PRADESH','KARNATAKA','KERALA','MADHYA PRADESH','ORISSA','CHATTISGARH','DELHI','UTTARAKHAND','HARYANA','PUNJAB','ASSAM','MIZORAM','WB','BIHAR','GUJARAT','RAJASTHAN','MAHARASHTRA','GOA')
sales <- c(89,109,92,56,43,103,26,41,126,56,64,98,26,16,61,40,61,101,191,38)
myDf <- data.frame(Region, State, sales, stringsAsFactors = FALSE)
str(myDf)
myDf <- myDf\[order(myDf$sales, decreasing=TRUE), \]
myDf$State <- factor(myDf$State , levels=myDf$State)
myDf$cumulative <- cumsum(myDf$sales)
ggplot(myDf, aes(x = State)) +
geom_bar(aes(y = sales, fill = Region), stat = "identity") +
geom_point(aes(y = cumulative), color = rgb(0, 1, 0), pch = 16, size = 1) +
geom_path(aes(y = cumulative, group = 1), colour = "slateblue1", lty = 3, size = 0.9) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.6)) +
labs(title = "Pareto Plot", x = 'States', y = 'Sales')]

How to create cumulative precipitation vs. temperature graph in a single plot

I have historical data for precip vs. annual temperature. I want to plot them into cool & wet, warm and wet, cool and dry, warm and dry years. Can someone help me with this?
Year Precip annual temperature
1987 821 8.5
1988 441 8
1989 574 7.9
1990 721 12.4
1991 669 10.8
1992 830 10
1993 1105 7.8
1994 772 8
1995 678 6.7
1996 834 8
1997 700 11
1998 786 11.2
1999 612 12
2000 758 10.6
2001 833 11
2002 622 10.6
2003 656 10.7
2004 799 9.9
2005 647 10.8
2006 764 12
2007 952 12.5
2008 943 10.86
2009 610 12.8
2010 766 11
2011 717 11.3
2012 602 9.5
2013 834 10.6
2014 758 11
2015 841 11
2016 630 11.5
2017 737 11.2
Average 742.32 10.36
As Majid suggested, you need to give more detail so you can get better answers. At least, try to use dput() with your dataframe, so we can get a reproducible copy of it. Copying and pasting into Excel is not appropriate for these kind of questions.
In any case, that graph can be easily be done using the ggplot2 package. You graph each year based on its X and Y coordinates and then manually add the lines and the titles for each category. You do need to establish the boundaries between cool/warm and dry/wet, of course.
library(ggplot2)
rain <- read.csv('~/data/rain.csv')
limit_humid <- 800
limit_warm <- 9.5
ggplot(rain, aes(x = temp, y = precip)) +
geom_text(aes(label = year)) +
geom_vline(xintercept = limit_warm) +
geom_hline(yintercept = limit_humid) +
annotate('text', label = 'bold("Cool and wet")', size = 4, parse = T,
x = min(rain$temp), y = max(rain$precip)) +
annotate('text', label = 'bold("Warm and wet")', size = 4, parse = T,
x = max(rain$temp), y = max(rain$precip)) +
annotate('text', label = 'bold("Cool and dry")', size = 4, parse = T,
x = min(rain$temp), y = min(rain$precip)) +
annotate('text', label = 'bold("Warm and wet")', size = 4, parse = T,
x = max(rain$temp), y = min(rain$precip)) +
theme_classic() +
labs(x = 'Average Temperature (°C)',
y = 'Cumulative precipitation (mm)')

Line chart issues - plot looks "funny" (ggplot2)

I have a large dataframe (CO2_df) with many years for many countries, and tried to plot a graph with ggplot2. This graph will have 6 curves + an aggregate curve. However, my graph looks pretty "funny" and I have no idea why.
The data looks like this (excerpt):
x y x1 x2 x4 x6
1553 1993 0.00000 CO2 Austria 6 6 - Other Sector
1554 2006 0.00000 CO2 Austria 6 6 - Other Sector
1555 2015 0.00000 CO2 Austria 6 6 - Other Sector
2243 1998 12.07760 CO2 Austria 5 5 - Waste management
2400 1992 11.12720 CO2 Austria 5 5 - Waste management
2401 1995 11.11040 CO2 Austria 5 5 - Waste management
2402 2006 10.26000 CO2 Austria 5 5 - Waste management
2489 1998 0.00000 CO2 Austria 6 6 - Other Sector
I have used this code:
ggplot(data=CO2_df, aes(x=x, y=y, group=x6, colour=x6)) +
geom_line() +
geom_point() +
ggtitle("Austria") +
xlab("Year") +
ylab("C02 Emissions") +
labs(colour = "Sectors")
scale_color_brewer(palette="Dark2")
CO2_df %>%
group_by(x) %>%
mutate(sum.y = sum(y)) %>%
ggplot(aes(x=x, y=y, group=x6, colour=x6)) +
geom_line() +
geom_point() +
ggtitle("Austria") +
xlab("Year") +
ylab("C02 Emissions") +
labs(colour = "Sectors")+
scale_color_brewer(palette="Dark2")+
geom_line(aes(y = sum.y), color = "black")
My questions
1) Why does it look like this and how can I solve it?
2) I have no idea why the value on the y axis are close to zero. They are not...
3) How can I add an entry to the legend for the aggregate line?
Thank you for any sort of help!
Nordsee
What about something like this:
CO2_df %>% # data
group_by(x,x6) %>% # group by
summarise(y = sum(y)) %>% # add the sum per group
ggplot(aes(x=x, y=y)) + # plot
geom_line(aes(group=x6, fill=x6, color=x6))+
# here you can put a summary line, like sum, or mean, and so on
stat_summary(fun.y = sum, na.rm = TRUE, color = 'black', geom ='line') +
geom_point() +
ggtitle("Austria") +
xlab("Year") +
ylab("C02 Emissions") +
labs(colour = "Sectors")+
scale_color_brewer(palette="Dark2"))
With modified data, to see the right behaviour, I've put same years and very different values to understand:
CO2_df <- read.table(text ="
x y x1 x2 x4 x6
1553 1993 20 CO2 'Austria' 6 '6 - Other Sector'
1554 1994 23 CO2 'Austria' 6 '6 - Other Sector'
1555 1995 43 CO2 'Austria' 6 '6 - Other Sector'
2243 1993 12.07760 CO2 'Austria' 5 '5 - Waste management'
2400 1994 11.12720 CO2 'Austria' 5 '5 - Waste management'
2401 1995 11.11040 CO2 'Austria' 5 '5 - Waste management'
2402 1996 10.26000 CO2 'Austria' 5 '5 - Waste management'
2489 1996 50 CO2 'Austria' 6 '6 - Other Sector'", header = T)

Drawing colored US State map with cut_number() in R

I have a dataframe called "drawdata":
GeoName Ranking
1 Alabama 15
2 Alaska 2
3 Arizona 28
4 Arkansas 12
5 California 19
6 Colorado 7
7 Connecticut 42
8 Delaware 37
9 District of Columbia 9
10 Florida 38
11 Georgia 11
12 Hawaii 48
13 Idaho 10
14 Illinois 16
15 Indiana 26
16 Iowa 34
17 Kansas 27
18 Kentucky 20
19 Louisiana 4
20 Maine 51
21 Maryland 30
22 Massachusetts 39
23 Michigan 14
24 Minnesota 23
25 Mississippi 41
26 Missouri 32
27 Montana 25
28 Nebraska 21
29 Nevada 45
30 New Hampshire 47
31 New Jersey 33
32 New Mexico 5
33 New York 44
34 North Carolina 13
35 North Dakota 31
36 Ohio 35
37 Oklahoma 6
38 Oregon 18
39 Pennsylvania 40
40 Rhode Island 49
41 South Carolina 29
42 South Dakota 46
43 Tennessee 43
44 Texas 3
45 Utah 17
46 Vermont 50
47 Virginia 8
48 Washington 24
49 West Virginia 22
50 Wisconsin 36
51 Wyoming 1
And I want to draw a US State map with different colors for each ranking. The code I have is:
names(drawdata) = c('region','value')
drawdata[,1] = tolower(drawdata[,1])
states = data.frame(state.center, state.abb)
states_map = map_data("state")
df = merge(drawdata, states_map, by = "region")
df$num = 49
p1 = ggplot(data = df, aes(x = long, y = lat, group = group))
p1 = p1 + geom_polygon(aes(fill = cut_number(value, num[1])))
p1 = p1 + geom_path(colour = 'gray', linestyle = 2)
p1 = p1 + scale_fill_brewer('', palette = 'PuRd')
p1 = p1 + coord_map()
p1 = p1 + scale_x_continuous(breaks=NULL) + scale_y_continuous(breaks=NULL)
p1 = p1 + theme(legend.position="none")
p1 = p1 + geom_text(data = states, aes(x = x, y = y, label = state.abb, group = NULL), size = 2)
p1
This perfectly works if 'num', or the number of colors to fill, is small. However, when I set 'num=49', then it produces an error:
Error in cut.default(x, breaks(x, "n", n), include.lowest = TRUE, ...) :
'breaks' are not unique
When I alter the code from
p1 = p1 + geom_polygon(aes(fill = cut_number(value, num[1])))
to
p1 = p1 + geom_polygon(aes(fill = cut_number(unique(value), num[1])))
then it gives me a different error:
Error: Aesthetics must either be length one, or the same length as the dataProblems:cut_number(unique(value), num[1])
I want a map where every 49 States in the map have different colors, each reflecting their 'Ranking'. Any help is very appreciated!
Brewer palettes deliberately have small maximums (generally < 12) since it's pretty much impossible for humans to map the subtle differences to the discrete values you have. You can achieve what you're looking for by "faking" it with scale_fill_gradient2 (NOTE: I deliberately left the legend in as you should too):
library(ggplot2)
names(drawdata) <- c('region','value')
drawdata[,1] <- tolower(drawdata[,1])
states <- data.frame(state.center, state.abb)
states <- states[!(states$state.abb %in% c("AK", "HI")),] # they aren't part of states_map
states_map <- map_data("state")
p1 <- ggplot()
# borders
p1 <- p1 + geom_map(data=states_map, map=states_map,
aes(x=long, y=lat, map_id=region),
color="white", size=0.15)
# fills
p1 <- p1 + geom_map(data=drawdata, map=states_map,
aes(fill=value, map_id=region),
color="white", size=0.15)
# labels
p1 <- p1 + geom_text(data=states,
aes(x=x, y=y, label=state.abb, group=NULL), size=2)
# decent projection
p1 <- p1 + coord_map("albers", lat0=39, lat1=45)
p1 <- p1 + scale_fill_gradient2(low="#f7f4f9", mid="#df65b0", high="#67001f")
# better theme
p1 <- p1 + labs(x=NULL, y=NULL)
p1 <- p1 + theme_bw()
p1 <- p1 + theme(panel.grid=element_blank())
p1 <- p1 + theme(panel.border=element_blank())
p1 <- p1 + theme(axis.ticks=element_blank())
p1 <- p1 + theme(axis.text=element_blank())
p1
You can get an even better result with scale_fill_distiller which does alot under the scenes to let you use a Color Brewer palette with continuous data (I'd argue you do not have continuous data tho):
p1 <- p1 + scale_fill_distiller(palette="PuRd")
I'd strongly suggest continuing to use cut like you had originally and having a max of 9 breaks to fit into the Color Brewer palette you're trying to work with. In reality, folks are still going to need a table to really grok the rankings (never assume Americans know either state shapes, locations or even the two-letter abbreviations for them), so I'd also pretty much just suggest using an actual table with full names at least with this choropleth if not in place of it.
Note also that the way you're trying to build the map deliberately excluded Alaska, Hawaii and the District of Columbia. You'll need to use a real shapefile and something like I cover here to get them to show up nicely.
If you want different colors for each state, using a gradient, you can work with scale_fill_gradient. Here is one version, using green and red at the ends of the gradient, so that each state is on that scale.
ggplot(data = df, aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = value)) +
geom_path(colour = 'gray', linestyle = 2) +
scale_fill_gradient(low = "green", high = "red") +
coord_map() +
scale_x_continuous(breaks=NULL) + scale_y_continuous(breaks=NULL) +
theme(legend.position="none") +
geom_text(data = states, aes(x = x, y = y, label = state.abb, group = NULL), size = 2)

Resources