I have the following dataset:-
The images provide details to the dataset. They are the sales of a company and the type column has two entries-Store and Online.
I am supposed to create a line graph and show the sales of the company as two different lines on the same line graph for both store and online sales. However, I am getting erroneous results and cannot understand how to bifurcate the data into two types and then create the graph.
The code I have used thus far is just what I wrote to try and understand what results in what as I am a beginner at R. The following is the code which gives an erroneous line graph:-
figure <-ggplot(Amazing_retail.df, aes(year_month,Sales))+
geom_line(color='blue')+
xlab("Year_Month")+
ylab("Total Sales")+
ggtitle("Monthwise Sales for the Years 2010-11")
figure
Thus, what can I do get the appropriate result i.e. the online and store sales for the company on the same line graph.
[1]: https://i.stack.imgur.com/zjUbj.jpg
Amazing_retail.df <- data.frame( year_month = rep(1:12, 2),
sales_type = c(rep("online", 12), rep("shop", 12)),
Sales = sample(100000:200000, 24))
ggplot(Amazing_retail.df, aes(x = year_month, y = Sales)) +
geom_line(aes(color = sales_type, linetype = sales_type)) +
scale_color_manual(values = c("red", "blue")) +
xlab("Year_Month")+
ylab("Total Sales")+
ggtitle("Monthwise Sales for the Years 2010-11")
Related
I would like to create an interactive histogram with dates on the x-axis.
I have used ggplot+ggplotly.
I've read I need to use to pass the proper information using the "text=as.character(mydates)" option and sometimes "tooltips=mytext".
This trick works for other kinds of plots but there is a problem with the histograms, instead of getting a single bar with a single value I get many sub-bars stacked.
I guess the reason is passing "text=as.character(fechas)" produces many values instead of just the class value defining that bar.
How can I solve this problem?
I have tried filtering myself the data but I don't know how to make this the parameters match the parameters used by the histogram, such as where the dates start for each bar.
library(lubridate)
library(ggplot2)
library(ggplotly)
Ejemplo <- data.frame(fechas = dmy("1-1-20")+sample(1:100,100, replace=T),
valores=runif(100))
dibujo <- ggplot(Ejemplo, aes(x=fechas, text=as.character(fechas))) +
theme_bw() + geom_histogram(binwidth=7, fill="darkblue",color="black") +
labs(x="Fecha", y="Nº casos") +
theme(axis.text.x=element_text(angle=60, hjust=1)) +
scale_x_date(date_breaks = "weeks", date_labels = "%d-%m-%Y",
limits=c(dmy("1-1-20"), dmy("1-4-20")))
ggplotly(dibujo)
ggplotly(dibujo, tooltip = "text")
As you can see, the bars are not regular histogram bars but something complex.
Using just ggplot instead of ggplotly shows the same problem, though then you woulnd't need to use the extra "text" parameter.
Presently, feeding as.character(fechas) to the text = ... argument inside of aes() will display the relative counts of distinct dates within each bin. Note the height of the first bar is simply a count of the total number of dates between 6th of January and the 13th of January.
After a thorough reading of your question, it appears you want the maximum date within each weekly interval. In other words, one date should hover over each bar. If you're partial to converting ggplot objects into plotly objects, then I would advise pre-processing the data frame before feeding it to the ggplot() function. First, group by week. Second, pull the desired date by each weekly interval to show as text (i.e., end date). Next, feed this new data frame to ggplot(), but now layer on geom_col(). This will achieve similar output since you're grouping by weekly intervals.
library(dplyr)
library(lubridate)
library(ggplot2)
library(plotly)
set.seed(13)
Ejemplo <- data.frame(fechas = dmy("1-1-20") + sample(1:100, 100, replace = T),
valores = runif(100))
Ejemplo_stat <- Ejemplo %>%
arrange(fechas) %>%
filter(fechas >= ymd("2020-01-01"), fechas <= ymd("2020-04-01")) %>% # specify the limits manually
mutate(week = week(fechas)) %>% # create a week variable
group_by(week) %>% # group by week
summarize(total_days = n(), # total number of distinct days
last_date = max(fechas)) # pull the maximum date within each weekly interval
dibujo <- ggplot(Ejemplo_stat, aes(x = factor(week), y = total_days, text = as.character(last_date))) +
geom_col(fill = "darkblue", color = "black") +
labs(x = "Fecha", y = "Nº casos") +
theme_bw() +
theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
scale_x_discrete(label = function(x) paste("Week", x))
ggplotly(dibujo) # add more text (e.g., week id, total unique dates, and end date)
ggplotly(dibujo, tooltip = "text") # only the end date is revealed
The "end date" is displayed once you hover over each bar, as requested. Note, the value "2020-01-12" is not the last day of the second week. It is the last date observed in the second weekly interval.
The benefit of the preprocessing approach is your ability to modify your grouped data frame, as needed. For example, feel free to limit the date range to a smaller (or larger) subset of weeks, or start your weeks on a different day of the week (e.g., Sunday). Furthermore, if you want more textual options to display, you could also display your total number of unique dates next to each bar, or even display the date ranges for each week.
I'm a massive Rstudio novice so I have scoured the related questions etc. but I am still having trouble with organising my graph properly. I am having trouble getting my graph to show dates in the correct, chronological order. Wondering if I could get someone to have a look at my code and data and see what i'm doing wrong (explained very simply please, I am a novice).
I am currently reading in a CSV file, which is set up like this 1:
AdD = date sample taken, AdT = time sample taken, AdV = Concentration value - these are water samples and only consist of these two samples across the two months (one per month)
and I get the graph:
The graph shows the 5th month first on the x axis, when I want it in chronological order (aka April - 4th month) to appear first.
My code is as follows (please disregard the geom_hline and axis elements blank - this is one of 6 graphs in a facet and those aren't relevant to the date problem I think/hope) :
F1ambH <- read_csv("data 1 Amb.csv")
f1ambH <- ggplot(data=F1ambH, aes(x=AhD, y=AhV))+ geom_point() +theme_bw()+labs(y= "Concentration (µg/L)", x = "Sample Date")
f1ambH <- f1ambH + geom_hline(yintercept=1.1, linetype="dashed", color="steelblue")+ theme(axis.title.x = element_blank())+ theme(axis.title.y = element_text(face = "bold", size = 11))
f1ambH
I have also tried mutating the data like this:
F1ambH <- read_csv("data 1 Amb.csv") %>% mutate(dates = dmy(AhD))
f1ambH <- ggplot(data=dates, aes(x=AhD, y=AhV))+ geom_point() +theme_bw()+labs(y= "Concentration (µg/L)", x = "Sample Date")
which produces this graph:
Which shows the dates correctly, but the two points on the graph don't have a corresponding x axis tick which I need (of which I feel like ive exhausted my options in trying to fix
so if I can fix either problem then that would be amazing.
EDT:
Using the +scale_x_date(breaks=unique(F1ambH$dates)) as suggested by the first comment seems to solve my problem, but the points are now at the opposite side of the graph and look horrendous, is there a way to clean it up?
Figure
Use your second solution, but use
+scale_x_date(breaks=unique(dates))
to specify where you want the breakpoints.
If you make x variable as factor and add it before plot, it keeps the order:
F1ambH$AhD <- factor(F1ambH$AhD,levels=unique(F1ambH$AhD),order=TRUE)
f1ambH <- ggplot(data=F1ambH, aes(x=AhD, y=AhV))+ geom_point() +theme_bw()+labs(y=
"Concentration (µg/L)", x = "Sample Date")
f1ambH <- f1ambH + geom_hline(yintercept=1.1, linetype="dashed", color="steelblue")+
theme(axis.title.x = element_blank())+ theme(axis.title.y = element_text(face =
"bold", size = 11))
f1ambH
Even you can have any order you prefer:
F1ambH$AhD <- factor(F1ambH$AhD,levels=c(your preference order),order=TRUE)
I have ggplot with mean of imdb movie rating per year plotted and I wanted to plot ribbon like layer to it, that shows the standard error for each point but is obviously continues ( if that's possible even)
ggplot(data = avg_imdb_movie_year, aes( x = startYear, y = avg_rating)) +
geom_point() +
geom_ribbon(aes(x = start_Year, y = standard_error, xmin = min(xx), xmax = max(xx)))
The xx is sequence corresponding to the years of the movies. The standard_error is simply calculated as sd(average_rating) [that is the difference to mean for each data point]
I think I do something completely wrong. If my data is discrete is there a way I can draw ribbon like standard error around the mean points?
Additional to that I have a question about adding layers that have different data frame. Here is example, I want to add to this ggplot another geom_point() layer where the data would be awarded movie ratings average per year. But I run into error:
ggplot(data = avg_imdb_movie_year, aes( x = startYear, y = avg_rating)) +
geom_point() +
geom_point(aes(x = avg_awarded_moves_year$year_film,
y = avg_awarded_moves_year$average_per_year))
Error message: Error: Aesthetics must be either length 1 or the same as the data (138): x and y
I realise that it's because there are less years (rows) in awarded_movies table, but I don't know how to add another plot from different dataset to existing ggplot. Do anyone has any ideas?
I have a data frame that contains 4 variables: an ID number (chr), a degree type (factor w/ 2 levels of Grad and Undergrad), a degree year (chr with year), and Employment Record Type (factor w/ 6 levels).
I would like to display this data as a count of the unique ID numbers by year as a stacked area plot of the 6 Employment Record Types. So, count of # of ID numbers on the y-axis, degree year on the x-axis, the value of x being number of IDs for that year, and the fill will handle the Record Type. I am using ggplot2 in RStudio.
I used the following code, but the y axis does not count distinct IDs:
ggplot(AlumJobStatusCopy, aes(x=Degree.Year, y=Entity.ID,
fill=Employment.Data.Type)) + geom_freqpoly() +
scale_fill_brewer(palette="Blues",
breaks=rev(levels(AlumJobStatusCopy$Employment.Data.Type)))
I also tried setting y = Entity.ID to y = ..count.. and that did not work either. I have searched for solutions as it seems to be a problem with how I am writing the aes code.
I also tried the following code based on examples of similar plots:
ggplot(AlumJobStatusCopy, aes(interval)) +
geom_area(aes(x=Degree.Year, y = Entity.ID,
fill = Employment.Data.Type)) +
scale_fill_brewer(palette="Blues",
breaks=rev(levels(AlumJobStatusCopy$Employment.Data.Type)))
This does not even seem to work. I've read the documentation and am at my wit's end.
EDIT:
After figuring out the answer to the problem, I realized that I was not actually using the correct values for my Year variable. A count tells me nothing as I am trying to display the rise in a lack of records and the decline in current records.
My Dataset:
Year, int, 1960-2015
Current Record, num: % of total records that are current
No Record, num: % of total records that are not current
Ergo each Year value has two corresponding percent values. I am now using 2 lines instead of an area plot since the Y axis has distinct values instead of a count function, but I would still like the area under the curves filled. I tried using Melt to convert the data from wide to long, but was still unable to fill both lines. Filling is just for aesthetic purposes as I would like to use a gradient for each with 1 fill being slightly lighter than the other.
Here is my current code:
ggplot(Alum, aes(Year)) +
geom_line(aes(y = Percent.Records, colour = "Percent.Records")) +
geom_line(aes(y = Percent.No.Records, colour = "Percent.No.Records")) +
scale_y_continuous(labels = percent) + ylab('Percent of Total Records') +
ggtitle("Active, Living Alumni Employment Record") +
scale_x_continuous(breaks=seq(1960, 2014, by=5))
I cannot post an image yet.
I think you're missing a step where you summarize the data to get the quantities to plot on the y-axis. Here's an example with some toy data similar to how you describe yours:
# Make toy data with three levels of employment type
set.seed(1)
df <- data.frame(Entity.ID = rep(LETTERS[1:10], 3), Degree.Year = rep(seq(1990, 1992), each=10),
Degree.Type = sample(c("grad", "undergrad"), 30, replace=TRUE),
Employment.Data.Type = sample(as.character(1:3), 30, replace=TRUE))
# Here's the part you're missing, where you summarize for plotting
library(dplyr)
dfsum <- df %>%
group_by(Degree.Year, Employment.Data.Type) %>%
tally()
# Now plot that, using the sums as your y values
library(ggplot2)
ggplot(dfsum, aes(x = Degree.Year, y = n, fill = Employment.Data.Type)) +
geom_bar(stat="identity") + labs(fill="Employment")
The result could use some fine-tuning, but I think it's what you mean. Here, the bars are equal height because each year in the toy data include an equal numbers of IDs; if the count of IDs varied, so would the total bar height.
If you don't want to add objects to your workspace, just do the summing in the call to ggplot():
ggplot(tally(group_by(df, Degree.Year, Employment.Data.Type)),
aes(x = Degree.Year, y = n, fill = Employment.Data.Type)) +
geom_bar(stat="identity") + labs(fill="Employment")
Using ggplot function, it is possible to group/color the column of interest and plot the data based on that as follows:
ggplot(inputDataFrame, aes(as.numeric(interestingColumn) , group = AnotherColumn)) +
coord_cartesian(xlim = c(0,400)) + geom_line(stat='ecdf')
How can I also add the curve/plot regarding the whole data in "interestingColumn" regardless of the "group" criteria. So that I can compare the whole data and its subdivision groups in one plot.
For instance, running the above code, I will get the figure as follows and I will get the cumulative values for each product separately. How can I add a plot to the following plot which shows the whole products consumption regardless of the product group.
Thanks.
You can add a geom_line without the color aesthetics and a geom_line with the color aesthetics. Also see below how to create a reproducible example.
# create your reproducible example...
set.seed(1)
inputDataFrame <- data.frame(interestingColumn = rnorm(100, 200, 80),
AnotherColumn = factor(rbinom(100, 4, .3)))
# plotting
ggplot(inputDataFrame, aes(as.numeric(interestingColumn))) +
coord_cartesian(xlim = c(0,400)) +
geom_line(stat='ecdf') +
geom_line(aes(color=AnotherColumn), stat='ecdf')