So I have a data set that sorts DJs by Rank, the year they received that rank, and the name of the DJ that received the previously mention information on a horizontal access in Excel.
When I plot the data I'm currently working with it ends up displaying a line chart with the a vertical line from 1 to 5 for each year and I'm not sure what to do from here.
library(ggplot2)
library(plyr)
DJMAG <- DJMAG_MOdified
Top <-data.frame(DJMAG$Year, DJMAG$Rank , DJMAG$DJ)
names(Top) <- c("Year","Rank","DJ")
ggplot(Top, aes(Top$Year)) +
geom_line(aes(y = as.numeric(Top$Rank), color = "Hardwell")) + xlab("2004 to 2018") + ylab("Rank")
There are no error messages but What I'm trying to show with this data is how (X = Year) DJs with their own line plot increased or decreased in ranking from 2004 to 2017 and the rankings of the top 5, 1-5 on the Y-axis with an inverted y-axis.
So I took the liberty of coming up with some example data.
DJMAG_MOdified <- data.frame(Year=rep(2004:2018,3),
Rank=runif(45,0,1),
DJ=rep(c("A","B","C"),each=15),
Other=runif(45,0,1))
I purposefully added the Other column, so we still subset it as you have done.
Instead of your method which was:
Top <-data.frame(DJMAG$Year, DJMAG$Rank , DJMAG$DJ)
names(Top) <- c("Year","Rank","DJ")
It would be preferable to have it in one line where you dont need to change column names as follows:
Top <- DJMAG_MOdified[,c("Year","Rank","DJ")]
As for the plot, I am thinking maybe this is what you are looking for, where each DJ is represented by a different coloured line?
ggplot(Top, aes(x=Year,y=as.numeric(Rank))) +
geom_line(aes(col = DJ)) +
xlab("2004 to 2018") +
ylab("Rank")
I didnt understand where the color = "Hardwell" part of your code came from...
Related
I'm trying to put ActivityDate on the X Axis, and Calories on the Y Axis, relating to how 33 different users ranged in their calorie burnings daily. I'm new to ggplot and visualizations as you can tell, so I'd appreciate the most basic solution that I can understand. Thank you so much.
I really tried several iterations of this code, and each one of them weren't quite right in how the visualization turned out. Here are a couple of my thoughts:
##first and foremost:
install.packages("tidyverse") install.packages("here") library(tidyverse) library(here)
Attempt 1 Bar Graph
ggplot(data=trimmed_dactivity) + geom_bar(mapping=aes(x=Id, color=ActivityDate))
Attempt 1 Bar Graph
##Not probably the best for stakeholders, but if I could maybe have the bars a little closer together that might help, so I tried to identify the unique IDs. Perhaps the reason why they are so small is that they appear in long number format, and are not sequential, so it could be adding the extra space and making the bars so small because of the spaces of empty sequential numbers.
Attempt 2 Bar Graph
UId <- unique("Id") ggplot(data=trimmed_dactivity) + geom_bar(mapping=aes(x=UId, color=ActivityDate))
Attempt 2 Bar Graph
##Facepalm, definitely not what I was looking for at all, but that was my effort to solve the above problem.
Attempt 3 Bar Graph
ggplot(data=trimmed_dactivity) + geom_bar(mapping=aes(x=ActivityDate, fill=Id)) + theme(axis.text.x = element_text(angle=45))
Attempt 3 Bar Graph
##The fill function does not work, and on the y-axis if you will, I don't know what "count" is referring to in this case, so could be useful except for those two issues.
##Finally, I switch to a line graph
Attempt 4 Line Graph
ggplot(data=trimmed_dactivity) + geom_line(mapping=aes(x=ActivityDate, y=Calories)) + theme(axis.text.x = element_text(angle=45))
Attempt 4 Line Graph
##Now what I get is separate lines going up and down, and what I want is 33 separate lines representing unique Id numbers to travel along the x axis for time, and rise in the y axis for calories. Of course I'm not sure how to do that...
Any help with what I'm missing on this journey here?
what I want is 33 separate lines representing unique Id numbers…
It sounds like you want a spaghetti plot. To make one, map Id to color (or to group if you don’t want each id to be colored differently).
library(ggplot2)
ggplot(fakedata, aes(ActivityDate, Calories)) +
geom_line(aes(color = factor(Id)), show.legend = FALSE)
Example data:
set.seed(13)
fakedata <- expand.grid(
Id = 1:33,
ActivityDate = seq(as.Date("2016-04-13"), length.out = 10, by = "day")
)
fakedata$Calories <- round(rnorm(330, 2500, 500))
I'm getting crazy here, please help me!
I'm new to R and this is why. I have a graph here in which I'm trying to plot steps given against time needed to fall asleep (in minutes) and I decided to plot user ID on the x axis and the other two variables in a vertical axis of its own.
The result is as follows:
I'm not happy with many things. The scaling of the line plot and the scale of the secondary axis, the width of the columns in geom_col, and the y axis labels, I mean, the user IDs have 10 digits each and it shows up as a potency.
Can you please help me out with all I mentioned, specially with the scaling of the secondary axis?
I've searched and searched and can't do it.
The code is this one:
ggplot(data= sleep_steps) +
+ geom_col(mapping = aes(x=Id, y=AVGSteps), fill = 'cyan') +
+ geom_line(mapping = aes(x=Id,y=AVGMinToFallAsleep)) +
+ labs(title = "Relationship between Steps and Time to Fall Asleep") +
+ scale_y_continuous(sec.axis = sec_axis(~ . - 8*60*60, name = "Minutes to Fall Asleep"))
And the table is like this:
head(sleep_steps)
Id AVGSteps AVGKcal AVGMinToFallAsleep AVGTotalMinAsleep
1 1503960366 12116.742 1816.419 22.92000 360.2800
2 1644430081 7282.967 2811.300 52.00000 294.0000
3 1844505072 2580.065 1573.484 309.00000 652.0000
4 1927972279 916.129 2172.806 20.80000 417.0000
5 2026352035 5566.871 1540.645 31.46429 506.1786
6 2347167796 9519.667 2043.444 44.53333 446.8000
I'm clueless. Since it is not a percentage nor is a datetime variable, I'm not sure what to do. I've tried to change the trans argument in sec_axis function but no success. The structure of the data frame is all num.
Thank you!
You need Id as a factor to start because they are individuals, not actual numbers.
Insert before plot
sleep_steps$Id <- as.factor(sleep_steps$Id)
Without the code for your data to check, I would also say that you need another fill colour for your second scale, but you are using geom_line which is not normally how you would plot individuals because they are not connected. You may need to reconsider that. Normally you would plot all your data with boxplots which would show the averages and the quartiles etc.
If you are looking for an actual RELATIONSHIP, then you need to look into an lm plot LINK HERE
I am trying to calculate the city wise spend on each product on yearly basis.Also including graphical representation however I am not able to get the graphs on R?
Top_11 <- aggregate(Ca_spend["Amount"],
by = Ca_spend[c("City","Product","Month_Year")],
FUN="sum")
A <- ggplot(Top_11,aes(x=City,Month_Year,y=Amount))
A <-geom_bar(stat="identity",position='dodge',fill="firebrick1",colour="black")
A <- A+facet_grid(.~Type)
This is the code I am using.I am trying to plot City,Product,Year on same graph.
VARIABLES-(City product Month_Year Amount)
(OBSERVATIONS)- New York Gold 2004 $50,0000 (Sample DATA Type)
I'd try this:
ggplot(Top_11,aes(x=City, fill = Product, y=Amount)) +
geom_col() +
facet_wrap(~Month_Year)
For your 5 rows of sample data, that gives the graph below. You can play around with which variable goes to fill (fill color), x (x-axis), and facet_wrap (for small multiples). I see in your code you tried facet_grid(.~Type), but that won't work unless you have a column named Type.
I have a grouped barplot produced using ggplot in R with the following code
ggplot(mTogether, aes(x = USuniquNegR, y = value, fill = variable)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_discrete(name = "Area",
labels = c("Everywhere", "New York")) +
xlab("Reasons") +
ylab("Proportion of total complaints") +
coord_flip() +
ggtitle("Comparison between NY and all areas")
mTogether is created using the following code
mTogether <- melt(together, id.vars = 'USuniquNegR')
The Data Frame together is made up of
USperReasons USperReasonsNY USuniquNegR
1 0.198343304187759 0.191304347826087 Late Flight
2 0.35987114588127 0.321739130434783 Customer Service Issue
3 0.0667280257708237 0.11304347826087 Lost Luggage
4 0.0547630004601933 0.00869565217391304 Flight Booking Problems
5 0.109065807639208 0.121739130434783 Can't Tell
6 0.00460193281178095 0 Damaged Luggage
7 0.0846755637367694 0.0782608695652174 Cancelled Flight
8 0.0455591348366314 0.0521739130434783 Bad Flight
9 0.0225494707777266 0.0347826086956522 longlines
10 0.0538426138978371 0.0782608695652174 Flight Attendant Complaints
Together can be generated by the following
together<-data.frame(cbind(USperReasons,USperReasonsNY,USuniquNegR))
where
USperReasons <- c(0.19834,0.35987,.06672,0.05476,0.10906,.00460,.08467,0.04555,0.02254,0.05384)
USperReasonsNY <- c(0.191304348,0.321739130,0.113043478,0.008695652,0.121739130,0.000000000,0.078260870,0.05217391,0.034782609,0.078260870)
USuniquNegR <- c("Late Flight","Customer Service Issue","Lost Luggage","Flight Booking Problems","Can't Tell","Damaged Luggage","Cancelled Flight","Bad Flight","longlines","Flight Attendant Complaints")
The problem is when I try change xlim of the ggplot using
+ xlim(0, 1)
I just seem to get an error:
Discrete value supplied to continuous scale
I can't understand why this happens but I need to resolve it because currently the x axis starts below 0 and is very highly packed:
image of ggplot output
The problem is that you are cbind()ing your column vectors together, which converts the numbers to characters. Fix that and the rest should fix itself.
together<-data.frame(USperReasons,USperReasonsNY,USuniquNegR)
You need to remove the cbind from
together<-data.frame(cbind(USperReasons,USperReasonsNY,USuniquNegR))
because str(together) tells that all three columns are factors.
With
together <- data.frame(USperReasons, USperReasonsNY, USuniquNegR)
the plot looks reasonable to me (without having to use ylim or xlim).
So, the error was not within ggplot2 but in data preparation.
Therefore, please, provide a full working example which can be copied, pasted and run when asking a question next time. Thank you.
I have data in percentages. I would like to use ggplot to create a graph, but I cannot get it to work like I would like. Since the data is very skewed a simple stacked column doesn't work well because the really small values don't show up. Here is a sample set:
Actual Predicted
a 0.5 5
b 9.5 5
c 90 90
On the left is an excel plot and on the right is R-ggplot
The problem is that in R the columns do not stack up to be even.
Here is my R code:
a = c("a","b","c","a","b","c")
b = c("Actual","Actual","Actual","Predicted","Predicted","Predicted")
c = c(0.5,2.5,97,0.2,2.2,97.6)
c = c+1
dat = data.frame(Type=a, Case=b, Percentage=c)
ggplot(dat, aes(x=Case, y=Percentage, fill=Type)) + geom_bar(stat="identity") + scale_y_log10()
*In both Excel and R I do a +1 to deal with numbers 0-1, so the y-axis is off slightly
If I use:
ggplot(dat, aes(x=Case, y=Percentage, fill=Type)) + geom_bar(stat="identity",position = "fill") + scale_y_log10()
The total heights match, however the two blue portions do not match in size (they are both 90%)
Just because two sets of numbers add up to the same value (103 in this case) doesn't mean the sum of the logs will add up to the same value! When you stack the bars without "fill" you get them different heights because the sums of the logs of the values are different. When you then scale it all to the same height you have to squash the blue boxes down by different rates and so they look different.
The Excel bar chart is deliberately misleading. The left red bar is the same size as the blue bar above it but represents a value of about a tenth of the blue bar. You can't make a barchart on a log scale of proportions - its just wrong.
There is a brilliant way to show small numbers without losing them or misrepresenting them. Its an amazing visualisation technique called 'writing the numbers in a table'.
I managed to get it to work like excel. Like Spacedman said, the plot is visually misleading, but numerically correct. The reason is that we want to compare bar segment actual height, when numerically you need to look at the y-axis start and end values. Its similar to bar charts that don't have a y-axis minimum of zero. Here is an example.
I am not sure if I will use the method for visualizing my data, but I had to figure it out.
Here is the result:
Here is the code (I might clean it up as a function that can be called when you assign the y values in ggplot).
a = c("a","b","c","a","b","c")
b = c("Actual","Actual","Actual","Predicted","Predicted","Predicted")
c = c(0.5,9.5,90,5,5,90)
c = c+1
dat = data.frame(Type=a, Case=b, Percentage=c, Cumsum_L=c, Cumsum=c, Norm=c)
for(i in 1:length(dat$Percentage)){
cumsum=0
for(j in 1:i){
if(dat$Case[j]==dat$Case[i]){
cumsum=cumsum+(dat$Percentage[j])
}
}
dat$Cumsum_L[i]=cumsum-dat$Percentage[i]
dat$Cumsum[i]=cumsum
if(dat$Cumsum_L[i]==0){
dat$Cumsum_L[i]=1
}
dat$Norm[i] = log(dat$Cumsum[i])-log(dat$Cumsum_L[i])
}
intervals = seq(from = 0, to = 100, by = 10)
intervals_log = log(intervals)
intervals_log[1]=0
ggplot(dat, aes(x=Case, y=Norm, fill=Type)) + geom_bar(stat="identity") +
scale_y_continuous(name="Percent",breaks = intervals_log, labels=intervals )
*I also need to fix the end points +1 kinda thing.
**I also might be butchering maths.