Graph Creation in r - r

I am trying to calculate the city wise spend on each product on yearly basis.Also including graphical representation however I am not able to get the graphs on R?
Top_11 <- aggregate(Ca_spend["Amount"],
by = Ca_spend[c("City","Product","Month_Year")],
FUN="sum")
A <- ggplot(Top_11,aes(x=City,Month_Year,y=Amount))
A <-geom_bar(stat="identity",position='dodge',fill="firebrick1",colour="black")
A <- A+facet_grid(.~Type)
This is the code I am using.I am trying to plot City,Product,Year on same graph.
VARIABLES-(City product Month_Year Amount)
(OBSERVATIONS)- New York Gold 2004 $50,0000 (Sample DATA Type)

I'd try this:
ggplot(Top_11,aes(x=City, fill = Product, y=Amount)) +
geom_col() +
facet_wrap(~Month_Year)
For your 5 rows of sample data, that gives the graph below. You can play around with which variable goes to fill (fill color), x (x-axis), and facet_wrap (for small multiples). I see in your code you tried facet_grid(.~Type), but that won't work unless you have a column named Type.

Related

Time Series w/ multiple variables, groups

I'm trying to plot a time series in ggplot for certain export markets, for example-sake, Japan. I want to focus on a few different export items (e.g. pork, beef, wheat,etc.) by exporter (e.g. US, EU, Australia, etc.). I'd like to be able to set up the data so that I can use facet_wrap to show a graph for each of those goods in one image (representing the Japanese market), that has all relevant exporters. I've been trying to use geom_line but I have no idea how to arrange the data so that I can use facet_wrap, ggplot, etc.
You need two columns that specify the exporter and country, in long format (so each row is a unique combination of product, exporter, country and date). A reproducible example of this is shown below.
Then, the key plot element is using facet_grid(exporter~product).
export_data.df <- data.frame(
value = runif(36),
Date = rep(c(rep(as.Date("1999/1/1"),3),
rep(as.Date("1999/1/2"),3),
rep(as.Date("1999/1/3"),3),
rep(as.Date("1999/1/4"),3)),3),
exporter = rep(rep(c("Japan","USA","NZ"),3),4),
product = rep(c(rep("Pork",3),rep("Beef",3),rep("Chicken",3)),4)
)
ggplot(export_data.df) +
geom_line(mapping = aes(x = Date,y = value)) +
facet_grid(exporter~product)
Output of above code

R ggplot2 Visualize categorical variable that levels appear more than once

I am trying to visualize some tennis data with ggplot2 in R.
Here are my data:
Year<-c(1999:2020)
Player <- rep("Federer",22)
Rank <-
c("Q1","3R","3R","4R","4R","W","SF","W","W","SF","F","W","SF","SF","SF","SF","3R",
"SF","W","W","4R","SF")
data <- data.frame(Year, Player, Rank)
data$Rank <- factor(data$Rank, levels = unique(data$Rank))
What I want to do is a diagram that looks like a bar plot but actually is not a bar plot. I would like to have as x-axis Years from 1999 to 2020 and correspond them to Rank level.
My problem is that Rank, which is I converted to categorical variable, has some levels that appear more than once in time and this makes things difficult for me.
I am looking to do something like the following pic from Wikipedia with specific color for every level of Rank variable.
The Australian open result is what I want to visualize.
Maybe something like this, using geom_tile() to make like a heatmap..instead of a barplot:
library(ggthemes)
ggplot(data,aes(x=factor(Year),y=Player,fill=Rank)) +
geom_tile() + scale_fill_economist()

How to make a geom_line with 3 different parameters

So I have a data set that sorts DJs by Rank, the year they received that rank, and the name of the DJ that received the previously mention information on a horizontal access in Excel.
When I plot the data I'm currently working with it ends up displaying a line chart with the a vertical line from 1 to 5 for each year and I'm not sure what to do from here.
library(ggplot2)
library(plyr)
DJMAG <- DJMAG_MOdified
Top <-data.frame(DJMAG$Year, DJMAG$Rank , DJMAG$DJ)
names(Top) <- c("Year","Rank","DJ")
ggplot(Top, aes(Top$Year)) +
geom_line(aes(y = as.numeric(Top$Rank), color = "Hardwell")) + xlab("2004 to 2018") + ylab("Rank")
There are no error messages but What I'm trying to show with this data is how (X = Year) DJs with their own line plot increased or decreased in ranking from 2004 to 2017 and the rankings of the top 5, 1-5 on the Y-axis with an inverted y-axis.
So I took the liberty of coming up with some example data.
DJMAG_MOdified <- data.frame(Year=rep(2004:2018,3),
Rank=runif(45,0,1),
DJ=rep(c("A","B","C"),each=15),
Other=runif(45,0,1))
I purposefully added the Other column, so we still subset it as you have done.
Instead of your method which was:
Top <-data.frame(DJMAG$Year, DJMAG$Rank , DJMAG$DJ)
names(Top) <- c("Year","Rank","DJ")
It would be preferable to have it in one line where you dont need to change column names as follows:
Top <- DJMAG_MOdified[,c("Year","Rank","DJ")]
As for the plot, I am thinking maybe this is what you are looking for, where each DJ is represented by a different coloured line?
ggplot(Top, aes(x=Year,y=as.numeric(Rank))) +
geom_line(aes(col = DJ)) +
xlab("2004 to 2018") +
ylab("Rank")
I didnt understand where the color = "Hardwell" part of your code came from...

Remove NA values for bar plot in ggplot2.

The issue is pretty straightforward.
I am trying to generate a bar plot containing GDP per capita for several countries. The data is incomplete as some of the values are missing.
For instance, for year 1960, I only have data for 3 countries.
When I plot the data, the graph that is returned includes NAs, which I would like to exclude from the plot.
Please see an example below:
GDP <- ggplot(data, aes(country,1960)) + geom_bar(stat = 'identity', na.rm = T)
Please note that in this case, each column represents a year. The first column of the data frame consists of the name of the countries.
If it helps, please see the attached screenshot.
Thanks.

Why do geom_line() and geom_freqpoly() give back different graphs?

I am trying get my head around ggplot2 which creates beautiful graphs as you probably all know :)
I have a dataset with some transactions of sold houses in it (courtesy of: http://support.spatialkey.com/spatialkey-sample-csv-data/ )
I would like to have a line chart that plots the cities on the x axis and 4 lines showing the number of transactions in my datafile per city for each of the 4 home types. Doesn't sound too hard, so I found two ways to do this.
using an intermediate table doing the counts and geom_line() to plot the results
using geom_freqpoly() on my raw dataframe
the basic charts look the same, however chart nr. 2 seems to be missing plots for all the 0 values of the counts (eg. for the cities right of SACRAMENTO, there is no data for Condo, Multi-Family or Unknown (which seems to be missing completely in this graph)).
I personally like the syntax of method number 2 more than that of number 1 (it's a personal thing probably).
So my question is: Am I doing something wrong or is there a method to have the 0 counts also plotted in method 2?
# line chart example
# setup the libraries
library(RCurl) # so we can download a dataset
library(ggplot2) # so we can make nice plots
library(gridExtra) # so we can put plots on a grid
# get the data in from the web straight into a dataframe (all data is from: http://support.spatialkey.com/spatialkey-sample-csv-data/)
data <- read.csv(text=getURL('http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv'))
# create a data frame that counts the number of trx per city/type combination
df_city_type<-data.frame(table(data$city,data$type))
# correct the column names in the dataframe
names(df_city_type)<-c('city','type','qty')
# alternative 1: create a ggplot with a geom_line on the calculated values - to show the nr. trx per city (on the x axis) with a differenct colored line for each type
cline1<-ggplot(df_city_type,aes(x=city,y=qty,group=type,color=type)) + geom_line() + theme(axis.text.x=element_text(angle=90,hjust=0))
# alternative 2: create a ggplot with a geom_freqpoly on the source data - - to show the nr. trx per city (on the x axis) with a differenct colored line for each type
c_line <- ggplot(na.omit(data),aes(city,group=type,color=type))
cline2<- c_line + geom_freqpoly() + theme(axis.text.x=element_text(angle=90,hjust=0))
# plot the two graphs in rows to compare, see that right of SACRAMENTO we miss two lines in plot 2, while they are in plot 1 (and we want them)
myplot<-grid.arrange(cline1,cline2)
As #joran pointed out, this gives a "similar" plot, when using "continuous" values:
ggplot(data, aes(x=as.numeric(factor(city)), group=type, colour=type)) +
geom_freqpoly(binwidth=1)
However, this is not exactly the same (compare the start of the graph), as the breaks are screwed up. Instead of binning from 1 to 39 with binwidth of 1, it, for some reason starts at 0.5 and goes until 39.5.

Resources