I have a dataframe which has 12 columns (one for each month of the year) and an id. Each record in this dataframe corresponds to the transaction amount(in dollars) a customer has made over the course of last twelve months. I want to plot these columns as series. And I also want to plot all the customers in the dataframe. The x-axis will be the month index and y-axis will be dollar value. So basically for each customer I need a line or series chart on the same graph.
Code for generating random data
a <- data.frame(id = seq(1,1000,1))
b <- data.frame(replicate(12,sample(1000:100000,1000,rep=TRUE)))
df <- cbind(a,b)
This is what I tried but its not what I want
library(reshape2)
library(ggplot2)
df_lg <- melt(df, id = 'id') # convert from wide to tall
ggplot(data=df_lg,
aes(x=variable, y=value, colour=variable)) +
geom_line()
Any ideas how to do this?
Just add group to your aesthetics, so the colour and group should be the id variable you want in the legend.
ggplot(data=df_lg,
aes(x=variable, y=value, colour=id, group = id)) +
geom_line()
Related
I am trying to plot values on the y axis against years on the x axis with ggplot2.
This is the dataset: https://drive.google.com/file/d/1nJYtXPrxD0xvq6rBz2NXlm4Epi52rceM/view?usp=sharing
I want to plot the values of specific countries.
It won't work by just specifying year as the x axis and a country's values on the y axis. I'm reading I need to melt the data frame, so I did that, but it's now in a format that doesn't seem convenient to get the job done.
I'm assuming I haven't correctly melted, but I have a hard time finding what I need to specifically do.
What I did beforehand is manually transpose the data and make the years a column, as well as all the countries.
This is the dataset transposed:
https://drive.google.com/file/d/131wNlubMqVEG9tID7qp-Wr8TLli9KO2Q/view?usp=sharing
Here's how I melted:
inv_melt.data <- melt(investments_t.data, id.vars="Year")
ggplot() +
geom_line(aes(x=Year, y=value), data = inv_melt.data)
The plot shows the aggregated values of all countries per year, but I want them per country in such a manner that I can also select to plot certain countries only.
How do I utilize melt in such a manner? Could someone walk me through this?
There are no columns named "Year" in the linked to data set, there are columns per year. So it need to be melted by "country" and then the "variable" edited with sub.
inv_melt.data <- reshape2::melt(investments_t.data, id.vars="country")
inv_melt.data$variable <- as.integer(sub("^X", "", inv_melt.data$variable))
ggplot(inv_melt.data, aes(variable, value, color = country)) +
geom_line(show.legend = FALSE)
Edit.
The following code keeps only some countries, filtering out the ones with more missing values.
i <- sapply(investments_t.data[-1], function(x) sum(is.na(x)) == 0)
i <- c(1, which(i))
inv_melt.data <- reshape2::melt(investments_t.data[i], id.vars = "Year")
ggplot(inv_melt.data, aes(Year, value, color = variable)) +
geom_line(show.legend = FALSE)
I have a ggplot graph and I want to draw two lines on it (from different columns, but for the same date). What I get are two lines that are stacked on each other, but I want to have the same y-axis, ordered correctly, with the lines overlapping each other.
This is the data I'm trying to plot:
final_table:
Month a b
1 2018-04 758519.397875 2404429.258675
2 2018-05 964792.603725 1995902.14473
3 2018-06 703170.240575 1294997.84319
This is my code:
bla3 <- melt(final_table, id='Month')
ggplot(data=bla3, aes(x=Month, y=value, colour= variable, group=variable)) +
geom_line()
And the output I get (notice the y-axis is totally wrong and unordered).
I guess that your data variable is not in the right format. E.g. if you run
class(final_table$month)
This should yield date. So you need to get it into the right format. Here's an example with your numbers.
Month <- as.character(c("2018-04", "2018-05", "2018-06")) #or convert it to character after
a <- c(758519.397875, 964792.603725, 703170.240575)
b <- c(2404429.258675, 1995902.14473, 1294997.84319)
final_table <- data.frame(Month, a, b)
#your Month variable is messed up, you actually need the day!
final_table$Month <- as.Date(paste(final_table$Month,"-01",sep=""))
library(reshape) #need to load that for melt
bla3 <- melt(final_table, id='Month')
ggplot(data=bla3, aes(x=Month, y=value, colour= variable, group=variable)) +
geom_line()
I would like to create a stacked bar graph however my output shows overlaid bars instead of stacked. How can I rectify this?
#Create data
date <- as.Date(rep(c("1/1/2016", "2/1/2016", "3/1/2016", "4/1/2016", "5/1/2016"),2))
sales <- c(23,52,73,82,12,67,34,23,45,43)*1000
geo <- c(rep("Western Territory",5), rep("Eastern Territory",5))
data <- data.frame(date, sales, geo)
#Plot
library(ggplot2)
ggplot(data=data, aes(x=date, y=sales, fill=geo))+
stat_summary(fun.y=sum, geom="bar") +
ggtitle("TITLE")
Plot output:
As you can see from the summarized table below, it confirms the bars are not stacked:
>#Verify plot is correct
>ddply(data, c("date"), summarize, total=sum(sales))
date total
1 0001-01-20 90000
2 0002-01-20 86000
3 0003-01-20 96000
4 0004-01-20 127000
5 0005-01-20 55000
Thanks!
You have to include position="stack" in your statSummary:
stat_summary(position="stack",fun.y=sum, geom="bar")
Alternatively, since your data are already summarized, you could use geom_col (the short hand for geom_bar(stat = "identity")):
ggplot(data=data, aes(x=date, y=sales, fill=geo))+
geom_col() +
scale_x_date(date_labels = "%b-%d")
Produces:
Note that I changed the date formatting (by adding format = "%m/%d/%Y" to the as.Date call) and explictly set the axis lable formatting.
If your actual data have more than one entry per period, you can always summarise first, then pass that into ggplot instead of the raw data.
I have a task and i need to plot graph using ggplot2.
I have a vector of rating (Samsung S4 ratings from its users)
I generate this data using this:
TestRate<- data.frame (rating=sample (x =1:5, size=100, replace=T ), month= sample(x=1:12,size=100,rep=T) )
And now I need to plot a graph, where on X axis will be dates (monthes in our example data) and 5 different lines grouped by 5 different ratings (1,2,3,4,5). Each line shows count of its ratings for corresponding month
How can I plot this in ggplot2?
You need first to count the number of elements per couple of (rating, month):
library(data.table)
setDT(TestRate)[,count:=.N,by=list(month, rating)]
And then you can plot the result:
ggplot(TestRate, aes(month, count, color=as.factor(rating))) + geom_line()
If your data.table is not set (so to speak), you can use dplyr (and rename the legend while you are at it).
df <- TestRate %>% group_by(rating, month) %>% summarise(count = n())
ggplot(df, aes(x=month, y=count, color=as.factor(rating))) + geom_line() + labs(color = "Rating")
I have a time series with multiple days of data. In between each day there's one period with no data points. How can I omit these periods when plotting the time series using ggplot2?
An artificial example shown as below, how can I get rid of the two periods where there's no data?
code:
Time = Sys.time()+(seq(1,100)*60+c(rep(1,100)*3600*24, rep(2, 100)*3600*24, rep(3, 100)*3600*24))
Value = rnorm(length(Time))
g <- ggplot()
g <- g + geom_line (aes(x=Time, y=Value))
g
First, create a grouping variable. Here, two groups are different if the time difference is larger than 1 minute:
Group <- c(0, cumsum(diff(Time) > 1))
Now three distinct panels could be created using facet_grid and the argument scales = "free_x":
library(ggplot2)
g <- ggplot(data.frame(Time, Value, Group)) +
geom_line (aes(x=Time, y=Value)) +
facet_grid(~ Group, scales = "free_x")
The problem is that how does ggplot2 know you have missing values? I see two options:
Pad out your time series with NA values
Add an additional variable representing a "group". For example,
dd = data.frame(Time, Value)
##type contains three distinct values
dd$type = factor(cumsum(c(0, as.numeric(diff(dd$Time) - 1))))
##Plot, but use the group aesthetic
ggplot(dd, aes(x=Time, y=Value)) +
geom_line (aes(group=type))
gives
csgillespie mentioned padding by NA, but a simpler method is to add one NA after each block:
Value[seq(1,length(Value)-1,by=100)]=NA
where the -1 avoids a warning.