I have a dataset like the following one, and I have about 1 million rows like this:
orderid prodid priceperitem date category_eng
3010419 2 62420 18.90 2014-10-09 roll toliet paper
I am currently plotting a plot of these products scatterplots using priceperitem as y-axis and date as x-axis. I have also ordered these rows based on these products' coefficient of variation of their prices throughout time. I have summarized these results in another dataset like the following one:
prodid mean count sd cv
424657 12.7124 5541.0000 10.239 80.54999886
158726193 23.7751 1231.0000 17.7567 74.68621596
And I have used the following code to get the scatterplots of many products at the same time:
ggplot(Roll50.last, aes(x=date, y=priceperitem)) + geom_point() + facet_wrap(~prodid)
But I want to order the plots based on these products' CV that I have summarized in another data.frame. I am wondering if there is a way that can allow me to specify that I want to order the panel plots by the order of a value in another dataframe.
Below is a small sample data. Basically, the idea is to get the different products' price cv which = s.d./mean. And I want to plot these scatterplot of these products in order of cv from highest to lowest.
#generate reproducible example
id = c(1: 3)
date = seq(as.Date("2011-3-1"), as.Date("2011-6-8"), by="days")
id = rep(c(1,2,3,4),each=25)
set.seed(01)
id = sample(id)
price = rep(c(1:20), each = 5)
price = sample(price)
data = data.frame(date, id, price)
You can turn prodid into a factor and set the order of the factor categories to be based on the size of the coefficient of variation. For example, let's assume your data frame with the cv values is called cv. Then, to order prodid by the values of cv:
Roll50.last$prodid = factor(Roll50.last$prodid,
levels = cv$prodid[order(cv$cv, decreasing=TRUE)])
Now, when you plot the data, the prodid facets will be ordered by decreasing size of cv.
Related
I have time series data where measurements of 7 variables (Var1:Var7) were taken on 15 individuals (denoted by a unique ID). These individuals were sampled from 3 different Locations. Note that the number of observations is different for each individual. I believe the individuals within each Location will be more similar to each other than individuals in other Locations, both in value and trend. For each Variable within each Location, I want to plot the average time series (to get an idea of what the group looks like as a whole) up to the point where Time is the same for each individual (so the length of the x-axis will only be as long as the shortest individual).
How can I do this and add error bars for each Time point to see how much variation exists between individuals?
Here is some sample data:
set.seed(123)
ID = factor(letters[seq(15)])
Time = c(1000,1200,1234,980,1300,1020,1180,1908,1303,
1045,1373,1111,1097,1167,1423)
df <- data.frame(ID = rep(ID, Time), Time = sequence(Time))
df$Location = rep(c("NY","WA","MA"), c(5714,7829,4798))
df[paste0('Var', c(1:7))] <- rnorm(sum(Time))
The values of all your variables are the same, so I did the following to make it more random:
for(i in 1:7) df[paste0('Var', i)] <- rnorm(sum(Time))
Then the following code gives a time-series plot for each of the 7 variables averaged over the three locations.
df %>%
pivot_longer(cols = Var1:Var7, names_to="Variable") %>%
group_by(Location, Variable, Time) %>%
summarise(mval=mean(value)) %>%
ggplot(aes(y=mval, x=Time, color=Variable)) +
geom_line() +
facet_grid(~Location) # , scales="free" # ?
I'm not sure if this is what you had in mind though.
In the time series data created below data, individuals (denoted by a unique ID) were sampled from 2 populations (NC and SC). All individuals have the same number of observations. I want to average the data for each respective "time point" for all individuals that belong to the same "State" (the average line) and I want to plot the average lines from each state against each other. I want it to look something like this:
library(tidyverse)
set.seed(123)
ID <- rep(1:10, each = 500)
Time = rep(c(1:500),10)
Location = rep(c("NC","SC"), each = 2500)
Var <- rnorm(5000)
data <- data.frame(
ID = factor(ID),
Time = Time,
State = Location,
Variable = Var
)
I would recommend getting familiar with the various dplyr functions. Specifically, group_by and summarise. You may want to read through: Introduction to dplyr or going through this series of blog posts.
In short, we are grouping the data by the Time and State variable and then summarizing that data with an average (i.e., mean(Variable)). To plot the data, we put Time on our x-axis, the newly created avg_var on our y-axis, and use State to represent color. These are assigned as our chart's aesthetics (i.e., aes(...). Finally, we add the line geom with geom_line() to render the lines on our visualization.
data %>%
group_by(Time, State) %>%
summarise(avg_var = mean(Variable)) %>%
ggplot(aes(x = Time, y = avg_var, color = State)) +
geom_line()
I am trying to calculate the city wise spend on each product on yearly basis.Also including graphical representation however I am not able to get the graphs on R?
Top_11 <- aggregate(Ca_spend["Amount"],
by = Ca_spend[c("City","Product","Month_Year")],
FUN="sum")
A <- ggplot(Top_11,aes(x=City,Month_Year,y=Amount))
A <-geom_bar(stat="identity",position='dodge',fill="firebrick1",colour="black")
A <- A+facet_grid(.~Type)
This is the code I am using.I am trying to plot City,Product,Year on same graph.
VARIABLES-(City product Month_Year Amount)
(OBSERVATIONS)- New York Gold 2004 $50,0000 (Sample DATA Type)
I'd try this:
ggplot(Top_11,aes(x=City, fill = Product, y=Amount)) +
geom_col() +
facet_wrap(~Month_Year)
For your 5 rows of sample data, that gives the graph below. You can play around with which variable goes to fill (fill color), x (x-axis), and facet_wrap (for small multiples). I see in your code you tried facet_grid(.~Type), but that won't work unless you have a column named Type.
I would like to plot a geom_point with ggplot2 to represent y = sum of amount ($) per nationality, x = number of times per nationality and colors to differentiate nationalities.
I grouped top 20 nationalities using this:
DF$groupnat <- fct_lump(DF$customer_country, 20)
With that I could group top 20 countries and the rest will be group in the value others
I know how to do individually but when I try to summarize all in one graph something fail. I'm not able to sum amount per nationality and same with numbers of times.
Here is an example of the Data set I'm using:
dataframe -> DF
The idea is to plot a geom_point graph involving 3 variables: x = number of observations per nationality (make a count for each nationality of the variable groupnat), y= total amount (sum of amount per nationality (groupnat)) and differentiate each nationality with a different color.
Thanks in advance!
Hi I am trying to plot multiple plots factor by ID and DAY. Each ID will have multiple plots based on the day, all ID's have multiple day data so multiple plots. I tried with the lattice plot as shown below. But factor with both day and ID is an issue.
library("lattice")
# require("lattice") - you do not need this line
xyplot(IPRE+PRED+DV) ~ TIME| ID, data= df ,type=c("l","l","p"),col= c("blue","black","red"),
distribute.type=TRUE, xlab="Time (h)",ylab="conc",layout=c(0,4))
Columns ID DAY TIME DV IPRED PRED
Not too sure what your ultimate goal is but this may be of some assistance. facet_wrap from the ggplot package allows you to split the plots by multiple variables.
library(ggplot2)
data(iris)
iris$Day<-rep(weekdays(Sys.Date()+0:4),each=10)
ggplot(data=iris,aes(x=Sepal.Width,y=Sepal.Length))+
geom_point(aes(colour=Species))+
facet_wrap(~Day+Species,nrow=5)