Modify x axis label for each facet - r

I have this mosaic plot
I'd like to have only label on x-axis for individuals in correct facet.
for example you can see that in the last facet there are only 7 bars. i'd like to show only x axis labels for those 7 bars.
Hope i have been clear enough
here's my code and data
p<-ggplot(data = newdata) +
geom_mosaic(aes(weight = frequency, x = product(region),fill=factor(categ)),na.rm=TRUE) +facet_grid(~cutt) +theme(axis.text.x=element_text(angle=90, hjust= .1))+
guides(fill=guide_legend(title = "Type of Crime", reverse = TRUE))`
head(newdata)
region categ frequency median_income cutt vec
1 alabama burglary 0.25773 42917 39k-51k 0
2 alabama larceny 0.67646 42917 39k-51k 0
3 alabama motor_veichle_theft 0.06581 42917 39k-51k 0
4 arizona burglary 0.20239 50036 39k-51k 0
5 arizona larceny 0.71590 50036 39k-51k 0
6 arizona motor_veichle_theft 0.08171 50036 39k-51k 0

Related

How do I fill certain counties on a US map in R?

I am trying to construct a map of the eastern US with the counties lying in Appalachia highlighted as a certain color, while non-Appalachian counties are left white. I have constructed a county map of the eastern US using the following code:
library(usmap)
library(maps)
library(ggplot2)
us.counties = map_data('county')
head(us.counties)
#> long lat group order region subregion
#> 1 -86.50517 32.34920 1 1 alabama autauga
#> 2 -86.53382 32.35493 1 2 alabama autauga
#> 3 -86.54527 32.36639 1 3 alabama autauga
#> 4 -86.55673 32.37785 1 4 alabama autauga
#> 5 -86.57966 32.38357 1 5 alabama autauga
#> 6 -86.59111 32.37785 1 6 alabama autauga
plot_usmap("counties",
include = c(.east_north_central, .east_south_central, .south_atlantic,
.south_region, .northeast_region),
exclude = c('TX', 'AR', 'LA', 'OK'))
Which returned this map showing the eastern US with counties outlined.
I also have the following data frame appalachian.counties containing a list of all US counties in Appalachia by name and state they are in.
> head(appalachian.counties)
region subregion
1 alabama bibb
2 alabama blount
3 alabama calhoun
4 alabama chambers
5 alabama cherokee
6 alabama chilton
I would like to construct a map that looks like the blank map included above, but with the Appalachian counties included in the data frame appalachian.counties filled in a blue and the Appalachian counties specifically in Kentucky filled in red. Is this possible?
You could try this:
library(usmap)
library(maps)
us.counties = map_data('county')
states <- us.counties[us.counties$region %in% appalachian.counties$region,]
app <- us.counties[paste(us.counties$region, us.counties$subregion) %in%
paste(appalachian.counties$region, appalachian.counties$subregion),]
ken <- app[app$region == "kentucky",]
ggplot(states, aes(long, lat, group = group)) +
geom_polygon(fill = "white", colour = "gray75") +
geom_polygon(fill = "red", data = app, colour = "white") +
geom_polygon(fill = "blue", data = ken, colour = "white") +
coord_equal() +
theme_void()

ggplot2 barplot two data frames comparison

I wanted to create a barplot in R and compare the same value in two different data.frame
My data looks like this
First DF:
2004 2005 2006 unit region
1 1500 1000 2000 X region1
2 1000 2500 2800 Y region1
3 2000 2050 1900 X region2
4 2200 2100 2000 Y region2
etc.
Second DF:
2004 2005 2006 unit region
1 5 10 12 PP region1
2 3 5 8 SS region1
3 8 12 11 PP region2
4 7 5 5 SS region2
etc.
what I wanted to do is a visual comparison of:
Barplot (clustered) - region1 unit X with the same region1 from second DF unit PP. Years (2004, 2005, 2006)
Line chart the same data as above
Barplot (clustered) - a set of 10 regions with unit Y with the same 10 regions from second table unit SS. Years (2004, 2005, 2006)
I would like to have a barplot (clustered barplot).
If anyone can help me I would much appreciate, trying to do it for the entire day, not being able to move ahead.
Thanks !!!
2004 2005 2006 unit region
1500 1000 2000 X region1
1000 2500 2800 Y region1
2000 2050 1900 X region2
2200 2100 2000 Y region2
df1 <- read.table(con <- file("clipboard"), header = T)
2004 2005 2006 unit region
5 10 12 PP region1
3 5 8 SS region1
8 12 11 PP region2
7 5 5 SS region2
df2 <- read.table(con <- file("clipboard"), header = T)
# Barplot (clustered) - region1 unit X with the same region1 from second DF unit PP. Years (2004, 2005, 2006)
df1$df <- 1
df2$df <- 2
require(reshape2)
require(ggplot2)
df <- rbind(df1, df2)
df <- melt(df, id.vars=c("region", "unit", "df"))
ggplot(df[(df$region=="region1" & df$df == 1) | (df$region == "region1" & df$unit == "PP"),],
aes(variable, value)) +
geom_bar(aes(fill = factor(df)), position = "dodge", stat="identity")
# Line chart the same data as above
ggplot(df[(df$region=="region1" & df$df == 1) | (df$region == "region1" & df$unit == "PP"),],
aes(variable, value)) +
geom_line(aes(fill = factor(df)), stat="identity")
# Barplot (clustered) - a set of 10 regions with unit Y with the same 10 regions from second table unit SS. Years (2004, 2005, 2006) I would like to have a barplot (clustered barplot).
cat("For this one you'd need to provide a suitable example, as the current example has only 2 regions")

Sorting Y Axis Values ggplot

I'm trying to create a dotplot where countries are listed on my Y axis from A-Z top to bottom. The medal count will be the X axis for each of the four plots, one each for gold, silver, bronze, and total. Of course, ggplot prefers to plot countries from Z-A and despite reading all about the problem, I haven't resolved the issue. I appreciate any straightforward help on both the coding and comprehension fronts.
mdat <- melt(raw, value.name = "Count", variable.name = "Place", id.var = "Country")
mdat[, "Place"] <- factor(mdat[, "Place"], levels=c("Gold", "Silver", "Bronze", "Total"))
##I know my problem is likely on or around the above line ##
plot1 <- ggplot(mdat, aes(x = Count, y = Country, colour = Place)) +
geom_point() +
facet_grid(.~Place) + theme_bw()+
scale_colour_manual(values=c("#FFCC33", "#999999", "#CC6600", "#000000"))
print(plot1)
Algeria Gold 4
Argentina Gold 5
Armenia Gold 1
Algeria Silver 2
Argentina Silver 5
Armenia Silver 2
Algeria Bronze 4
Argentina Bronze 2
Armenia Bronze 0
You have to sort the levels of Country before you plot. Also, there is no Total level the data you provided. The following appraoch should give you the desired result:
Reading the data (including a Total level for the Place variable):
mdat <- read.table(text="Country Place Count
Algeria Gold 4
Argentina Gold 5
Armenia Gold 1
Algeria Silver 2
Argentina Silver 5
Armenia Silver 2
Algeria Bronze 4
Argentina Bronze 2
Armenia Bronze 0
Algeria Total 10
Argentina Total 12
Armenia Total 3", header=TRUE)
Sorting the levels of the Country variable:
mdat$Country <- factor(mdat$Country,levels=sort(unique(mdat$Country),decreasing=TRUE))
Getting your Place variable in the correct order:
levels(mdat$Place) <- c("Bronze"=3,"Gold"=1,"Silver"=2,"Total"=4)
mdat$Place <- as.numeric(mdat$Place)
mdat$Place <- as.factor(mdat$Place)
levels(mdat$Place) <- c("Gold","Silver","Bronze","Total")
Creating the plot:
ggplot(mdat, aes(x = Count, y = Country, colour = Place)) +
geom_point(size=4) +
facet_grid(.~Place) + theme_bw()+
scale_colour_manual(values=c("#FFCC33","#999999","#CC6600","#000000"))
which gives the following plot:
As you melted your data already, I suspect that there is no Total variable in the raw dataframe. You can calculte that with:
raw$Total <- rowSums(..specify the Gold, Silver & Bronze columns here..)

Change colour scheme for ggplot geom_polygon in R

I'm creating a map using the maps library and ggplot's geom_polygon. I'd simply like to change the default blue, red, purple colour scheme to something else. I'm extremely new to ggplot so please forgive if I'm just not using the right data types. Here's what the data I'm using looks like:
> head(m)
region long lat group order subregion Group.1 debt.to.income.ratio.mean ratio total
17 alabama -87.46201 30.38968 1 1 <NA> alabama 12.4059 20.51282 39
18 alabama -87.48493 30.37249 1 2 <NA> alabama 12.4059 20.51282 39
19 alabama -87.52503 30.37249 1 3 <NA> alabama 12.4059 20.51282 39
20 alabama -87.53076 30.33239 1 4 <NA> alabama 12.4059 20.51282 39
21 alabama -87.57087 30.32665 1 5 <NA> alabama 12.4059 20.51282 39
22 alabama -87.58806 30.32665 1 6 <NA> alabama 12.4059 20.51282 39
> head(v)
Group.1 debt.to.income.ratio.mean ratio region total
alabama alabama 12.40590 20.51282 alabama 39
alaska alaska 11.05333 33.33333 alaska 6
arizona arizona 11.62867 25.55556 arizona 90
arkansas arkansas 11.90300 5.00000 arkansas 20
california california 11.00183 32.59587 california 678
colorado colorado 11.55424 30.43478 colorado 92
Here's the code:
library(ggplot2)
library(maps)
states <- map_data("state")
m <- merge(states, v, by="region")
m <- m[order(m$order),]
p<-qplot(long, lat, data=m, group=group, fill=ratio, geom="polygon")
I've tried the below and more:
cols <- c("8" = "red","4" = "blue","6" = "darkgreen", "10" = "orange")
p + scale_colour_manual(values = cols)
p + scale_colour_brewer(palette="Set1")
p + scale_color_manual(values=c("#CC6666", "#9999CC"))
The problem is that you are using a color scale but are using the fill aesthetic in the plot. You can use scale_fill_gradient() for two colors and scale_fill_gradient2() for three colors:
p + scale_fill_gradient(low = "pink", high = "green") #UGLY COLORS!!!
I was getting issues with scale_fill_brewer() complaining about a continuous variable supplied when a discrete variable was expected. One easy fix is to create discrete bins with cut() and then use that as the fill aesthetic:
m$breaks <- cut(m$ratio, 5) #Change to number of bins you want
p <- qplot(long, lat, data = m, group = group, fill = breaks, geom = "polygon")
p + scale_fill_brewer(palette = "Blues")

Time Series in R with ggplot2

I'm a ggplot2 newbie and have a rather simple question regarding time-series plots.
I have a data set in which the data is structured as follows.
Area 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
MIDWEST 10 6 13 14 12 8 10 10 6 9
How do I generate a time series when the data is structured in this format.
With the reshape package, I could just alter the data to look like:
totmidc <- melt(totmidb, id="Area")
totmidc
Area variable value
1 MIDWEST 1998 10
2 MIDWEST 1999 6
3 MIDWEST 2000 13
4 MIDWEST 2001 14
5 MIDWEST 2002 12
6 MIDWEST 2003 8
7 MIDWEST 2004 10
8 MIDWEST 2005 10
9 MIDWEST 2006 6
10 MIDWEST 2007 9
Then run the following code to get the desired plot.
ggplot(totmidc, aes(Variable, Value)) + geom_line() + xlab("") + ylab("")
However, is it possible to generate a time series plot from the first
object in which the columns represent the years.
What is the error that ggplot2 gives you? The following seems to work on my machine:
Area <- as.numeric(unlist(strsplit("1998 1999 2000 2001 2002 2003 2004 2005 2006 2007", "\\s+")))
MIDWEST <-as.numeric(unlist(strsplit("10 6 13 14 12 8 10 10 6 9", "\\s+")))
qplot(Area, MIDWEST, geom = "line") + xlab("") + ylab("")
#Or in a dataframe
df <- data.frame(Area, MIDWEST)
qplot(Area, MIDWEST, data = df, geom = "line") + xlab("") + ylab("")
You may also want to check out the ggplot2 website for details on scale_date et al.
I am guessing that with "time series plot" you mean you want to get a bar chart instead of a line chart?
In that case, you have to modify your code only slightly to pass the correct parameters to geom_bar(). The geom_bar default stat is stat_bin, which will calculate a frequency count of your categories on the x-scale. With your data you want to override this behaviour and use stat_identity.
library(ggplot2)
# Recreate data
totmidc <- data.frame(
Area = rep("MIDWEST", 10),
variable = 1998:2007,
value = round(runif(10)*10+1)
)
# Line plot
ggplot(totmidc, aes(variable, value)) + geom_line() + xlab("") + ylab("")
# Bar plot
# Note that the parameter stat="identity" passed to geom_bar()
ggplot(totmidc, aes(x=variable, y=value)) + geom_bar(stat="identity") + xlab("") + ylab("")
This produces the following bar plot:

Resources