I am trying to emulate a ggplot of multiple lines which works as follows:
set.seed(45)
df <- data.frame(x=c(1,2,3,4,5,1,2,3,4,5,3,4,5), val=sample(1:100, 13),
variable=rep(paste0("category", 1:3), times=c(5,5,3)))
ggplot(data = df, aes(x=x, y=val)) + geom_line(aes(colour=variable))
I can get this simple example to work, however on a much larger data set I am following the same steps but it is not working.
ncurrencies = 6
dates = c(BTC$Date, BCH$Date, LTC$Date, ETH$Date, XRP$Date, XVG$Date)
opens = c(BTC$Open, BCH$Open, LTC$Open, ETH$Open, XRP$Open, XVG$Open)
categories = rep(paste0("categories", 1:ncurrencies),
times=c(nrow(BTC), nrow(BCH), nrow(LTC), nrow(ETH), nrowXRP), nrow(XVG)))
df = data.frame(dates, opens, categories)
# Plot - Not correct.
ggplot(data=df, aes(x=dates, y=opens)) +
geom_line(aes(colour=categories))
As you can see, the different points are discretised and the y-axis is strange. I am guessing this is a rookie error but I have been going round in circles for a while. Can anyone see it?
P.S. I don't think I can upload the data here as it would be too much code. However, the dataframe is in the same format as the practice example and the categories match up correctly to the x and y data. Therefore I believe it is the way I am defining ggplot - I am relatively new to R.
Thank you Markus and Jan, yes you are correct. df$opens was a factor and changing it to a numeric solved the problem.
opens = as.numeric(c(BTC$Open, BCH$Open, LTC$Open, ETH$Open, XRP$Open, XVG$Open))
Related
I'm trying to make a stack of histograms (or a ridgeplot) so I can compare distributions at certain timepoints in my observations.
I used this source for the histogram, and this for the ridge plots.
However, I cannot figure out how to set up my code to make either a stacked histogram of each length (L) by week, so that I can see L distributions at different weeks. I have tried the fill option in ggplot (which in the example seems to produce automatic color differences for the weeks because it is in the aes()?) and other "stacks" using the y= argument, but haven't had much success, I think due to the way my data is set up. If anyone can help me figure out how to make multiple histograms by week, that would be useful!
Thanks!
#fake data
L = rnorm(100, mean=10, sd=2)
t = c((rep.int(7,10)), (rep.int(14,20)), rep.int(21,30), rep.int(28,20), (rep.int(31, 20)), (rep.int(36,10)))
fake = data.frame(cbind(L,t))
#subset data into weeks for convenience
dayofweek = seq(7,120,7)
fake2 = as.data.frame(subset(fake, t %in% dayofweek))
fake2$week <- floor(fake2$t/7)
#Plots, basic code
ggplot(fake2, aes(x=L, fill=week)) +
geom_histogram()
I tried facet_grid before, but for some reason facet_wrap actually at least separated the graphs correctly, AND magically made the color fill work:
ggplot(fake2, aes(x=L, fill = week)) +
geom_histogram()+
facet_wrap(.~week)
I want to create a clustered Bar chart in R using 2 numeric variables, e.g:
Movie Genre (X-axis) and Gross$ + Budget$ should be Y-axis
It's a very straightforward chart to create in Excel. However, in R, I have put Genre in my X-axis and Gross$ in Y-axis.
My question is: Where do I need to put another Numeric variable ie Budget$ in my code so that the new Budget$ will be visible beside Gross$ in the chart?
Here is my Code:
ggplot(data=HW, aes(reorder(x=HW$Genre,-HW$Gross...US, sum),
y=HW$Gross...US))+
geom_col()
P.S. In aes I have just put reorder to sort the categories.
Appreciate help!
Could you give us some data so we can recreate it?
I think you are looking for geom_bar() and one of its options, position="dodge", which tells ggplot to put the bars side by side. But without knowing your data and its structure I can't further help you.
Melting the dataset should help in this case. A dummy-data based example below:
Data
HW <- data.frame(Genre = letters[sample(1:6, 100, replace = T)],
Gross...US = rnorm(100, 1e6, sd=1e5),
Budget...US = rnorm(100, 1e5, sd=1e4))
Code
library(tidyverse)
library(reshape2)
HW %>%
melt %>%
ggplot(aes(Genre, value, fill=variable)) + geom_col(position = 'dodge')
I feel like I am asking a totally silly question, but I can't force ggplot to show the legend for lines colours.
The thing is that I have two data frames with the same data, just the first data.frame represents new data (plus additional numbers) and the second represents the old data. I am trying to compare new and old data, thus to understand which is which I have to see the legend. I have tried to use scale_colour_manual, but it still doesn't appear.
I have read a number of various answers on similar questions and non of them worked or led to a better. You can see a simple example of my problem below:
rm(list = ls())
library(ggplot2)
xnew<-3:10
y<-5:12
xold<-4:11
years<-2000:2007
xfact<-rep("x", times=8)
yfact<-rep("y", times=8)
Newdata<-data.frame(indicator=c(xfact,yfact),Years=c(years,years), data=c(xnew,y))
Olddata<-data.frame(indicator=xfact,Years=c(years), data=xold)
graph<-ggplot(mapping=aes(Years, data, group=1)) +
geom_line(,Newdata[Newdata=="x",], size=1.5, colour="lightblue")+
geom_line(,Olddata[Olddata=="x",], size=1.5, colour="orange")+
ggtitle("OLD vs NEW")+
scale_colour_manual(name="Legend", values=c("New"="lightblue", "Old"="orange"))
the result is without the legend.
Thanks for all the help I have already found on this website and thank you in advance for helping to solve this problem.
Legends are created in ggplot by mapping aesthetics to a single variable. Your mistake is that you're trying to set colors manually in each layer.
Newdata$type <- "New"
Olddata$type <- "Old"
all_data <- rbind(Newdata,Olddata)
ggplot(data = all_data[all_data$indicator == 'x',],aes(x = Years,y = data,colour = type)) +
geom_line() +
ggtitle("OLD vs NEW") +
scale_colour_manual(name="Legend", values=c("New"="lightblue", "Old"="orange"))
There are countless examples illustrating this basic technique in ggplot here.
I want to put labels of the percentages on my stacked bar plot. However, I only want to label the largest 3 percentages for each bar. I went through a lot of helpful posts on SO (for example: 1, 2, 3), and here is what I've accomplished so far:
library(ggplot2)
groups<-factor(rep(c("1","2","3","4","5","6","Missing"),4))
site<-c(rep("Site1",7),rep("Site2",7),rep("Site3",7),rep("Site4",7))
counts<-c(7554,6982, 6296,16152,6416,2301,0,
20704,10385,22041,27596,4648, 1325,0,
17200, 11950,11836,12303, 2817,911,1,
2580,2620,2828,2839,507,152,2)
tapply(counts,site,sum)
tot<-c(rep(45701,7),rep(86699,7), rep(57018,7), rep(11528,7))
prop<-sprintf("%.1f%%", counts/tot*100)
data<-data.frame(groups,site,counts,prop)
ggplot(data, aes(x=site, y=counts,fill=groups)) + geom_bar()+
stat_bin(geom = "text",aes(y=counts,label = prop),vjust = 1) +
scale_y_continuous(labels = percent)
I wanted to insert my output image here but don't seem to have enough reputation...But the code above should be able to produce the plot.
So how can I only label the largest 3 percentages on each bar? Also, for the legend, is it possible for me to change the order of the categories? For example put "Missing" at the first. This is not a big issue here but for my real data set, the order of the categories in the legend really bothers me.
I'm new on this site, so if there's anything that's not clear about my question, please let me know and I will fix it. I appreciate any answer/comments! Thank you!
I did this in a sort of hacky manner. It isn't that elegant.
Anyways, I used the plyr package, since the split-apply-combine strategy seemed to be the way to go here.
I recreated your data frame with a variable perc that represents the percentage for each site. Then, for each site, I just kept the 3 largest values for prop and replaced the rest with "".
# I added some variables, and added stringsAsFactors=FALSE
data <- data.frame(groups, site, counts, tot, perc=counts/tot,
prop, stringsAsFactors=FALSE)
# Load plyr
library(plyr)
# Split on the site variable, and keep all the other variables (is there an
# option to keep all variables in the final result?)
data2 <- ddply(data, ~site, summarize,
groups=groups,
counts=counts,
perc=perc,
prop=ifelse(perc %in% sort(perc, decreasing=TRUE)[1:3], prop, ""))
# I changed some of the plotting parameters
ggplot(data2, aes(x=site, y=perc, fill=groups)) + geom_bar()+
stat_bin(geom = "text", aes(y=perc, label = prop),vjust = 1) +
scale_y_continuous(labels = percent)
EDIT: Looks like your scales are wrong in your original plotting code. It gave me results with 7500000% on the y axis, which seemed a little off to me...
EDIT: I fixed up the code.
I'm still pretty new to R, and have come up against a plotting problem I can't find an answer to.
I've got a data frame that looks like this (though a lot bigger):
df <- data.frame(Treatment= rep(c("A", "B", "C"), each = 6),
LocA=sample(1:100, 18),
LocB=sample(1:100, 18),
LocC=sample(1:100, 18))
And I want dot plots that look like this one produced in Excel. It's exactly the formatting I want: a dotplot for each of the treatments side-by-side for each location, with data for multiple locations together on one graph. (Profuse apologies for not being able to post the image here; posting images requires a 10 reputation.)
It's no problem to make a plot for each location, with the dots color-coded, and so on:
ggplot(data = df, aes(x=Treatment, y=LocA, color = Treatment)) + geom_point()
but I can't figure out how to add locations B and C to the same graph.
Any advice would be much appreciated!
As a couple of people have mentioned, you need to "melt" the data, getting it into a "long" form.
library(reshape2)
df_melted <- melt(df, id.vars=c("Treatment"))
colnames(df_melted)[2] <- "Location"
In ggplot jargon, having different groups like treatment side-by-side is achieved through "dodging". Usually for things like barplots you can just say position="dodge" but geom_point seems to require a bit more manual specification:
ggplot(data=df_melted, aes(x=Location, y=value, color=Treatment)) +
geom_point(position=position_dodge(width=0.3))
You need to reshape the data. Here an example using reshape2
library(reshape2)
dat.m <- melt(dat, id.vars='Treatment')
library(ggplot2)
ggplot(data = dat.m,
aes(x=Treatment, y=value,shape = Treatment,color=Treatment)) +
geom_point()+facet_grid(~variable)
Since you want a dotplot, I propose also a lattice solution. I think it is more suitable in this case.
dotplot(value~Treatment|variable,
groups = Treatment, data=dat.m,
pch=c(25,19),
par.strip.text=list(cex=3),
cex=2)