Scaling GGplot based on other dataset

Scaling GGplot based on other dataset - r

I have a graph showing total sales per state. I have another date frame with population data among other measures per city that can be rolled up to state. Given that there is a large variance in the population per state, I wanted to scale the sales by the % of the states population relative to the total population. - i.e. See if particular states actually buy more or not relative to their size. Any ideas where to start pls?
Very basic code I am using to start with.
State_Sales_Summary_Plot <- ggplot(
data=Customers_DF_Clean,
aes(x=State.Code, y=Total.Spent)
) +
geom_bar(stat="identity")
Population Version of the code below:
Population_Per_State <- ggplot(
data=SSC_IndexData_Education_and_Occupation_DF_Clean,
aes(x=State.Name, y=Population)
) +
geom_bar(stat="identity")

Related

Add points to geom_density_ridges for groups with small number of observations

I am loving using geom_density_ridges(), with individual points also included for each group. However, some groups have small sample sizes (e.g. n=1 or 2) precluding the generation of the density ridges. For these groups, I'd like to be able to plot the locations of the existing observations - even though no probability density function will be shown.
In this example, I'd like to be able to plot the 2 data points for May on the appropriate line.
library(tidyverse)
library(ggridges)
data("lincoln_weather")
#pull weather from all months that are NOT May
lincoln_weather_nomay<-lincoln_weather[which(lincoln_weather$Month!="May"),]
#pull weather just from May
lincoln_weather_may<-lincoln_weather[which(lincoln_weather$Month=="May"),]
#recombine, keeping only the first two rows for the May dataset
new_weather<-rbind(lincoln_weather_nomay,lincoln_weather_may[c(1:2),])
ggplot( new_weather, aes(x=`Min Temperature [F]`,y=Month,fill=Month))+
geom_density_ridges(alpha = 0.5,jittered_points = TRUE, point_alpha=1,point_shape=21) +
labs(x="Average temperature (F)",y='')+
guides(fill=FALSE,color=FALSE)
How can I add the points for the May observations to the appropriate location (i.e. the May slot) and at the appropriate location along the x-axis?

Simply add a separate geom_point() call to the function, in which you subset the data to include only observations for the previously-unplotted categories. You can apply any of the usual customizations to either 'match' the points plotted for the other categories, or to make these points 'stand out'.
ggplot( new_weather, aes(x=`Min Temperature [F]`,y=Month,fill=Month))+
geom_density_ridges(alpha = 0.5,jittered_points = TRUE, point_alpha=1,point_shape=21) +
geom_point(data=subset(new_weather, Month %in% c("May")),
aes(),shape=13)+
labs(x="Average temperature (F)",y='')+
guides(fill=FALSE,color=FALSE)

Graph Creation in r

I am trying to calculate the city wise spend on each product on yearly basis.Also including graphical representation however I am not able to get the graphs on R?
Top_11 <- aggregate(Ca_spend["Amount"],
by = Ca_spend[c("City","Product","Month_Year")],
FUN="sum")
A <- ggplot(Top_11,aes(x=City,Month_Year,y=Amount))
A <-geom_bar(stat="identity",position='dodge',fill="firebrick1",colour="black")
A <- A+facet_grid(.~Type)
This is the code I am using.I am trying to plot City,Product,Year on same graph.
VARIABLES-(City product Month_Year Amount)
(OBSERVATIONS)- New York Gold 2004 $50,0000 (Sample DATA Type)

I'd try this:
ggplot(Top_11,aes(x=City, fill = Product, y=Amount)) +
geom_col() +
facet_wrap(~Month_Year)
For your 5 rows of sample data, that gives the graph below. You can play around with which variable goes to fill (fill color), x (x-axis), and facet_wrap (for small multiples). I see in your code you tried facet_grid(.~Type), but that won't work unless you have a column named Type.

R: relative frequency categorical data in ggplot2

I'm working in Rstudio.
With ggplot2, I'm trying to form a plot where I have frequencies of a categorical variable (number of shares purchased), per category (there are 5 categories). For example, members of category A might buy 1 share more frequently than members of category D.
I now have a count plot. However, because one category is much bigger than the others, you don't get a good idea about the n shares in the other categories.
The code of the count plot is as follows:
#ABS. DISTRIBUTION SHARES/CATEGORY
ggplot(dat, aes(x=Number_share, fill=category)) +
geom_histogram(binwidth=.5, alpha=.5, position="dodge")
This results in this graph: https://imgur.com/a/e4k94
Therefore, I am planning to make a plot where, instead of an absolute count, you have a distribution relative to their category.
I calculated the relative frequencies of each category:
library(MASS)
categories = dat$category
categories.freq = table(categories)
categories.relfreq = categories.freq / nrow(dat)
cbind(categories.relfreq)
categories.relfreq
Beauvent 1 0.002708692
Beauvent 2 0.015020931
E&B 0.037182960
Ecopower 1 0.042107855
Ecopower 2 0.029549372
Ecopower 3 0.873183945
I don't know how to make a plot where the frequency of a share number acquisition is relative to the category, instead of absolute. Can anybody help me with this?

I think what you are looking for is this
ggplot(dat, aes(x=Number_share, fill=category)) +
geom_bar(position="fill")
This will stack the categories on top of each other and the position="fill" argument will give the relative counts

I found that this problem is very similar: Histogram with weights in R
basically it's because the default of a histogram is to use counts on the y-axis, while I want to use a hist(freq=TRUE), or in the case of ggplot: ggplot_histogram(y= ..density..).

ggplot or plot random sample of data

Is there a way to run a random sample of that data I want to plot in one swoop?
ggplot()+
labs(y="Monthly Pay $", title="Monthly Pay per Pay Instance by Employee")+
geom_line(aes(y=MonthlyPay, x=PayInstance, group=EMPLID),
data = linetechanalysisTermed)
I have 7,759 in this population and myplot is completely unreadable (I thought about altering the axis but I don't think it will help with such a large population).

Label stacked bar chart with variable other than plotted Y

I'm working on some fish electroshocking data and looking at fish species abundance per transects in a river. Essentially, I have an abundance of different species per transect that I'm plotting in a stacked bar chart. But, what I would like to do is label the top of the bar, or underneath the x-axis tick mark with N = Total Preds for that particular transect. The abundance being plotted is the number of that particular species divided by the total number of fish (preds) that were caught at that transect. I am having trouble figuring out a way to do this since I don't want to label the plot with the actual y-value that is being plotted.
Excuse the crude code. I am newer to R and not super familiar with generating random datasets. The following is what I came up with. Obviously in my real data the abundance % per transect always adds up to 100 %, but the idea is to be able to label the graph with TotalPreds for a transect.
#random data
Transect<-c(1:20)
Habitat<-c("Sand","Gravel")
Species<-c("Smallmouth","Darter","Rock Bass","Chub")
Abund<-runif(20,0.0,100.0)
TotalPreds<-sample(1:139,20,replace=TRUE)
data<-data.frame(Transect,Habitat,Species,Abund,TotalPreds)
#Generate plot
AbundChart<-ggplot(data=data,aes(x=Transect,y=Abund,fill=Species))
AbundChart+labs(title="Shocking Fish Abundance")+theme_bw()+
scale_y_continuous("Relative Abundance (%)",expand=c(0.02,0),
breaks=seq(0,100,by=20),labels=seq(0,100,by=20))+
scale_x_discrete("Transect",expand=c(0.03,0))+
theme(plot.title=element_text(face='bold',vjust=2,size=25))+
theme(legend.title=element_text(vjust=5,size=15))+
geom_bar(stat="identity",colour="black")+
facet_grid(~Habitat,labeller=label_both,scales="free_x")
I get this plot that I would like to label with TotalPreds as described previously.
Again my plot would have bars that reached 100% for abundance, and in my real data transects 1-10 are gravel and 11-20 are sand. Excuse my poor sample dataset.
*Update
My actual data looks like this:
Variable in this case is the fish species and value is the abundance of that species at that particular electroshocking transect. Total_Preds is repeated when the data moves to a new species, because total preds is indicative of the total preds caught at that particular transect (i.e. each transect only has 1 total preds value). Maybe the melt function wasn't the right way to analyze this, but I have like 17 fish species that were caught at different rates across these 20 transects. I guess habitat type is singular to a transect as well, with 1-10 being gravel and 11-20 being sand, and that is repeated in my dataset across fish species as well.

Edited in response to the update, you should be able to create a new dataframe containing the TotalPred data (not repeated) and use that in geom_text. Can't test this without data but maybe:
# select non-repeated half of melted data for use in geom_text
textlabels <- data[c(1:19),]
#Generate plot
AbundChart<-ggplot(data=data,aes(x=Transect,y=Abund,fill=Species))
AbundChart+labs(title="Shocking Fish Abundance")+theme_bw()+
scale_y_continuous("Relative Abundance (%)",expand=c(0.02,0),breaks=seq(0,100,by=20),labels=seq(0,100,by=20))+
scale_x_discrete("Transect",expand=c(0.03,0))+
theme(plot.title=element_text(face='bold',vjust=2,size=25))+
theme(legend.title=element_text(vjust=5,size=15))+
geom_bar(stat="identity",colour="black")+
facet_grid(~Habitat,labeller=label_both,scales="free_x") +
geom_text(data = textlabels, aes(x = Transect_ID, y = value, vjust = -0.5,label = TotalPreds))
You might have to play around with different values for vjust to get the labels where you want them.
See the geom_text help page for more info.
Hope that edit works with your data.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Scaling GGplot based on other dataset - r

Related

Add points to geom_density_ridges for groups with small number of observations

Graph Creation in r

R: relative frequency categorical data in ggplot2

ggplot or plot random sample of data

Label stacked bar chart with variable other than plotted Y

Categories

Resources