I have the following data and trying to plot using hPlot
hPlot(Numbers~Date, data = df, group='Type', type = "line", radius=6)
Date Type Numbers
2014-01-05 Type-1 16
2014-01-12 Type-1 82
2014-01-12 Type-2 2
2014-01-19 Type-1 177
2014-01-26 Type-1 270
2014-01-26 Type-2 3
2014-02-02 Type-1 381
2014-02-09 Type-1 461
2014-02-09 Type-2 4
I am getting multiple dates as x-axis as shown in figure below. I also tried unique and as.character but the x-data is not corresponding to y-data.
Here is the solution to your problem. You can view the chart along with code here. I am posting it here as well
# don't switch to scientific notation, since we want date to be
# represented in milliseconds
options(scipen = 13)
dat = transform(df, Date2 = as.numeric(as.POSIXct(Date))*1000)
library(rCharts)
h1 <- hPlot(Numbers ~ Date2, data = dat,
group = 'Type',
type = "line",
radius=6
)
h1$xAxis(type = 'datetime', labels = list(
format = '{value:%Y-%m-%d}'
))
h1
Related
I have a simple dataset that contains three columns of hourly observations over the course of a few days.
The data looks something like...
Time Fast Standard Slow
Aug 02 2020 18:00:00 100 200 300
Aug 02 2020 19:00:00 50 100 150
Aug 02 2020 18:00:00 100 200 300
Aug 03 2020 12:00:00 50 100 150
Aug 03 2020 11:00:00 40 50 70
I start by loading up the CSV:
library(tidyverse)
# Link source
if (!exists("gasprices")) { # Check if the object is not already loaded
if (file.exists("./datafiles/gasprices.rdata")) {
load("./datafiles/gasprices.rdata")
} else {
gasprices <- read.csv("./datafiles/gasprices.csv")
}
But when I go to plot one of the lines, I get a blank plot. I think R is showing every row, when what I really need is three overall change-over-time lines for the three variables (fast, standard, slow). My ideal outcome would show three lines of different colors changing over time in the x axis.
# Plot
g <- ggplot(gasprices, aes(x=Time, y=Fast)) +
geom_line(color = "#00AFBB", size = 2)
xlab("") +
theme_light()
g
Any help would be greatly appreciated. Thank you,
It's likely to do with the column data-types. Try running the below for your dataframe, what do you get?
lapply(gasprices, class)
Try setting the datatype to a datetime before plotting:
gasprices$Time<- as.POSIXct(gasprices$Time, format = "%b %e %Y %H:%M:%S")
Have a look at this page for details about the providing the format to be used to parse the datetime.
Let me know how it goes!
What you should do is use the tidyverse package to unpivot your data.
require(ggplot2)
require(tidyr)
require(dplyr)
Lets create a dataframe with the same structure:
Data <- data.frame ( time = c(1,2,3), fast = c(100, 105, 110), slow = c(50, 70, 90), standart = c(94, 95, 96))
time fast slow standart
1 1 100 50 94
2 2 105 70 95
3 3 110 90 96
Now we unpivot the data.
Data %>%
tidyr::gather(key = 'Speed Type', value = 'Speed Value', -time)
time Speed Type Speed Value
1 fast 100
2 fast 105
3 fast 110
1 slow 50
2 slow 70
3 slow 90
1 standart 94
2 standart 95
3 standart 96
ggplot2::ggplot(data = UnpivotData, mapping = ggplot2::aes(x = time, y = `Speed Value`, color = `Speed Type`)) +
ggplot2::geom_line()
You must convert your dates to numeric values before you can plot lines with ggplot(). Perhaps this thread will help you. After doing this, you must provide new axis tick labels to your plot, for example:
plot + scale_x_discrete(labels= df$Time)
Here is a full example with date-to-numeric along with assigning axis tick labels:
library(reshape2)
# Make data frame
Lines <-"Time Fast Standard Slow
Aug 02 2020 18:00:00 100 200 300
Aug 02 2020 19:00:00 50 100 150
Aug 02 2020 20:00:00 100 200 300
Aug 03 2020 12:00:00 50 100 150
Aug 03 2020 11:00:00 40 50 70"
df <- read.csv(text = gsub(" +", ",", readLines(textConnection(Lines))),
check.names = FALSE)
# Convert date string to proper format
df$Time <- as.POSIXct(df$Time, format = "%b %d %Y %H:%M:%S")
# Reshape data for easier plotting. This function is from Reshape2.
df <- melt(df, id = "Time")
# Plot
ggplot(data = df, aes(x = as.numeric(Time), y = value, color = variable)) +
geom_line() +
scale_x_continuous(breaks = as.numeric(df$Time), labels = as.character(df$Time)) +
theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust=1))
exampleplot
I'm trying to plot a data frame that has "Date" as the x-axis, and stock price as the y-axis, and I have four different stocks to be plotted. I'm very confused by the ggplot documentation, and haven't found an easy solution to this. Here is the data frame:
appleData <- read.csv("AAPL.csv", header = TRUE)
microsoftData <- read.csv("MSFT.csv", header = TRUE)
googleData <- read.csv("GOOG.csv", header = TRUE)
amazonData <- read.csv("AMZN.csv", header = TRUE)
names(appleData) <- c("Date", "AAPL")
names(microsoftData) <- c("Date", "MSFT")
names(googleData) <- c("Date", "GOOG")
names(amazonData) <- c("Date", "AMZN")
mergedData1 <- merge(appleData, microsoftData, by = "Date")
mergedData2 <- merge(googleData, amazonData, by = "Date")
totalData <- merge(mergedData1, mergedData2, by = "Date")
totalData
The dataframe is called "totalData", and when I use ggplot(totalData) I get a blank plot. What I need help with specifically is plotting all four stocks onto the same plot, and also rescaling the prices so that they all begin at $100 (so they are on the same scale). Thank you in advance.
I found your question a little difficult to help with because you didn't provide the data you are using. Check out this amazing reference on how to ask really good questions that get answered quickly! How to make a great R reproducible example?
I hope this below code helps you get started on answering your question.
One of the main things I did was I converted your data from an untidy "wide" dataframe to a "tidy" long dataframe using the gather function from tidyr. I highly recommend that you check out this excellent tutorial http://garrettgman.github.io/tidying/ that goes into the basics of tidying. Once your data is "tidy" you will find many tools will work much easier!
Good Luck!
library(dplyr)
library(tidyr)
library(ggplot2)
# create sample data frame with random numbers
set.seed(123)
total_data <- data.frame(date = seq.Date(from = as.Date("2018-01-01"),
to = as.Date("2018-01-31"), by = "day"),
AAPL = sample(100:1000, 31),
MSFT = sample(100:1000, 31),
GOOG = sample(100:1000, 31),
AMZN = sample(100:1000, 31))
head(total_data)
#> date AAPL MSFT GOOG AMZN
#> 1 2018-01-01 359 912 445 691
#> 2 2018-01-02 809 721 346 388
#> 3 2018-01-03 467 815 832 268
#> 4 2018-01-04 892 122 502 802
#> 5 2018-01-05 943 528 826 183
#> 6 2018-01-06 140 779 827 518
# convert your wide data frame to a tidy long data frame
total_data <- gather(total_data, company, value, -date)
# plot using ggplot2
total_data %>%
ggplot(aes(x = date, y = value, color = company)) +
geom_line()
I have a web visits over time chart which plots daily traffic from 2014 until now, and looks like this:
ggplot(subset(APRA, Post_Day > "2013-12-31"), aes(x = Post_Day, y = Page_Views))+
geom_line()+
scale_y_continuous(labels = comma)+
ylim(0,50000)
As you can see it's not a great graph, what would make a bit more sense is to break it down by month as opposed to day. However when I try this code:
ggplot(subset(APRA, Post_Day > "2013-12-31"), aes(x = Post_Day, y = Page_Views))+
geom_line()+
scale_y_continuous(labels = comma)+
ylim(0,50000)+
scale_x_date(date_breaks = "1 month", minor_breaks = "1 week", labels = date_format("%B"))
I get this error:
Error: Invalid input: date_trans works with objects of class Date only
The date field Post_Day is POSIXct. Page_Views is numeric. Data looks like:
Post_Title Post_Day Page_Views
Title 1 2016-05-15 139
Title 2 2016-05-15 61
Title 3 2016-05-15 79
Title 4 2016-05-16 125
Title 5 2016-05-17 374
Title 6 2016-05-17 39
Title 7 2016-05-17 464
Title 8 2016-05-17 319
Title 9 2016-05-18 84
Title 10 2016-05-18 64
Title 11 2016-05-19 433
Title 12 2016-05-19 418
Title 13 2016-05-19 124
Title 14 2016-05-19 422
I'm looking to change the X axis from a daily granularity into monthly.
The sample data set shown in the question has multiple data points per day. So, it needs to be aggregated day-wise anyway. For the aggregation by day or month, data.table and lubridate are used.
Create sample data
As no reproducible example is supplied, a sample data set is created:
library(data.table)
n_rows <- 5000L
n_days <- 365L*3L
set.seed(123L)
DT <- data.table(Post_Title = paste("Title", 1:n_rows),
Post_Day = as.Date("2014-01-01") + sample(0:n_days, n_rows, replace = TRUE),
Page_Views = round(abs(rnorm(n_rows, 500, 200))))[order(Post_Day)]
DT
Post_Title Post_Day Page_Views
1: Title 74 2014-01-01 536
2: Title 478 2014-01-01 465
3: Title 3934 2014-01-01 289
4: Title 4136 2014-01-01 555
5: Title 740 2014-01-02 442
---
4996: Title 1478 2016-12-31 586
4997: Title 2251 2016-12-31 467
4998: Title 2647 2016-12-31 468
4999: Title 3243 2016-12-31 498
5000: Title 4302 2016-12-31 309
Plot raw data
Without aggregation the data can be plotted by
library(ggplot2)
ggplot(DT) + aes(Post_Day, Page_Views) + geom_line()
Aggregated by day
ggplot(DT[, .(Page_Views = sum(Page_Views)), by = Post_Day]) +
aes(Post_Day, Page_Views) + geom_line()
To aggregate day-wise the grouping parameter by of data.table is used and sum() as aggregation function. The aggregation is reducing the number of data points from 5000 to 1087. Hence, the plot looks less convoluted.
Aggregated by month
ggplot(DT[, .(Page_Views = sum(Page_Views)),
by = .(Post_Month = lubridate::floor_date(Post_Day, "month"))]) +
aes(Post_Month, Page_Views) + geom_line()
In order to aggregate by month, the grouping parameter by is used but this time Post_Day is mapped to the first day of the respective months. So, 2014-03-26 becomes a Post_Month of 2014-03-01 which is still of class POSIXct. By this, the x-axis remains continuous with a date scale. This avoids the trouble when converting Post_Day to factor, e.g, "2014-03" using format(Post_Day, ""%Y-%m"), where the x-axis would become discrete.
APRA$month <- as.factor(stftime(APRA$Post_Day, "%m")
APRA <- APRA[order(as.numeric(APRA$month)),]
This would create a month column to your data
z <- apply(split(APRA, APRA$month), function(x) {sum(as.numeric(APRA$Page_Views))})
z <- do.call(rbind, z)
z$month <- unique(APRA$month)
colnames(Z) <- c("Page_Views", "month")
This would create a z dataframe which has months and page views each month
Now plot it
ggplot(z, aes(x = month, y = Page_Views)) + geom_line()
Please let me know if this is what you were looking for. Also I haven't compiled it, please tell if it throws some error.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am trying to recreate a bar graph that I created in Excel using data that lists inventory and sales throughout the year. Here is my graph in Excel:
Note: Average sales rate is total sales / total inventory for the 13 months in the bar graph.
I am doing this through R and the ggplot package. I am quite new at this but this was what I managed so far:
library(lubridate)
library(ggplot2)
library(scales)
library(reshape2)
COdata <- read.csv("C:/.../CenterOne.csv")
# Grab related data
# VIN refers to a unique inventory identifier for the item
# First Launch Date is what I use to count my inventory for the month
# Sale Date is what I use to count my sales for the month
DFtest <- COdata[, c("VIN", "First.Launch.Date", "Sale.Date")]
Here is a snapshot of what the data looks like:
> head(DFtest)
VIN First.Launch.Date Sale.Date
1 4T1BF1FK4CU048373 22/04/2015 0:00
2 2T3KF4DVXCW108677 16/03/2015 0:00
3 4T1BF1FKXCU035935 19/03/2015 0:00 20/03/2015 0:00
4 JTDKN3DU3B1465796 16/04/2015 0:00
5 2T3YK4DV8CW015050
6 4T1BF1FK5CU599556 30/04/2015 0:00
I convert the dates to a proper format removing the hours/seconds and breaking them up into monthly intervals:
DFtest$First.Launch.Date <- as.Date(DFtest$First.Launch.Date, format = "%d/%m/%Y")
DFtest$Sale.Date <- as.Date(DFtest$Sale.Date, format = "%d/%m/%Y")
DFtest$month.listings <- as.Date(cut(DFtest$First.Launch.Date, breaks = "month"))
DFtest$month.sales <- as.Date(cut(DFtest$Sale.Date, breaks = "month"))
> head(DFtest)
VIN First.Launch.Date Sale.Date month.listings month.sales
1 4T1BF1FK4CU048373 2015-04-22 <NA> 2015-04-01 <NA>
2 2T3KF4DVXCW108677 2015-03-16 <NA> 2015-03-01 <NA>
3 4T1BF1FKXCU035935 2015-03-19 2015-03-20 2015-03-01 2015-03-01
4 JTDKN3DU3B1465796 2015-04-16 <NA> 2015-04-01 <NA>
5 2T3YK4DV8CW015050 <NA> <NA> <NA> <NA>
6 4T1BF1FK5CU599556 2015-04-30 <NA> 2015-04-01 <NA>
Avg line graph - my attempt at creating one
DF_Listings = data.frame(table(format(DFtest$month.listings)))
DF_Sales = data.frame(table(format(DFtest$month.sales)))
DF_Merge <- merge(DF_Listings, DF_Sales, by = "Var1", all = TRUE)
> head(DF_Listings)
Var1 Freq
1 2014-12-01 77
2 2015-01-01 886
3 2015-02-01 930
4 2015-03-01 1167
5 2015-04-01 1105
6 2015-05-01 1279
DF_Merge$Avg <- DF_Merge$Freq.y / DF_Merge$Freq.x
> head(DF_Merge)
Var1 Freq.x Freq.y Avg
1 2014-12-01 77 NA NA
2 2015-01-01 886 277 0.3126411
3 2015-02-01 930 383 0.4118280
4 2015-03-01 1167 510 0.4370180
5 2015-04-01 1105 309 0.2796380
6 2015-05-01 1279 319 0.2494136
ggplot(DF_Merge, aes(x=Var1, y=Avg, group = 1)) +
stat_smooth(aes(x = seq(length(unique(Var1)))),
se = F, method = "lm", formula = y ~ poly(x, 11))
Bar Graph
dfm <- melt(DFtest[ , c("VIN", "First.Launch.Date", "Sale.Date")], id.vars = 1)
dfm$value <- as.Date(cut(dfm$value, breaks = "month"))
ggplot(dfm, aes(x= value, width = 0.4)) +
geom_bar(aes(fill = variable), position = "dodge") +
scale_x_date(date_breaks = "months", labels = date_format("%m-%Y")) +
theme(axis.text.x=element_text(hjust = 0.5)) +
xlab("Date") + ylab("")
So I managed to make some of the plots which brings me to several questions:
How would I combine them into all a single graph using ggplot?
Notice how my bar graph has blanks for the first and last month? How do I remove that (precisely, how do I remove 11-2014 and 01-2016 from the x-axis)?
In my bar graph, January 2014 had no sales and as a result, the inventory bar takes up a larger space. How do I reduce its size to fit with the rest of the graph?
What could I do to change the x-axis from using dates as numbers (i.e. 12-2014) to using month-year in words (i.e. December-2014). I've tried using as.yearmon but that doesn't work with the scale_x_date portion of my ggplot function.
There's also the issue with the average sales rate line which I can safely assume I would be using geom_hline() but I am not sure how to approach this.
Using mtoto's suggestion of utilizing googleVis, I took a crack at recreating the graph:
# Testing Google Vis
mytest <- DF_Merge
library(zoo)
library(plyr) # to rename columns
library(googleVis)
mytest$Var1 <- as.yearmon(mytest$Var1)
mytest$Var1 <- as.factor(mytest$Var1) # googleVis cannot understand yearmon "class" so change it to factor
# Rename columns to ensure comprehension
mytest <- rename(mytest, c("Var1"="Date", "Freq.x"="Listings", "Freq.y"="Sales", "Avg"="Sales Rate"))
# Prepare for values to be displayed right on the plot
mytest$Listings.annotation <- mytest$Listings
mytest$Sales.annotation <- mytest$Sales
mytest$`Sales Rate.annotation` <- percent(mytest$`Sales Rate`) #Googlevis automatically understands that .annotation is used to display values in the graph
# Create average rate line
mytest$`Sales Rate` <- as.numeric(mytest$`Sales Rate`)
mytest$AvgRate <- (sum(mytest$Sales) / sum(mytest$Listings))
mytest <- rename(mytest, c("AvgRate"="Average Sales Rate"))
# Create the annotation for the average line
mytest$`Average Sales Rate.annotation` <- mytest$`Average Sales Rate`
x = nrow(mytest) - 1
mytest$`Average Sales Rate.annotation`[1:x] = "" # Ensures only the last row in this column has a value
mytest$`Average Sales Rate.annotation` <- as.numeric(mytest$`Average Sales Rate.annotation`, na.rm = TRUE)
mytest$`Average Sales Rate.annotation`[nrow(mytest)] <- percent(mytest$`Average Sales Rate.annotation`[nrow(mytest)]) # Transforms only the last row to a proper percentage!
# Plot the graph
column <- gvisComboChart(mytest, xvar= "Date",
yvar=c("Listings", "Listings.annotation", "Sales", "Sales.annotation", "Sales Rate", "Sales Rate.annotation", "Average Sales Rate",
"Average Sales Rate.annotation"),
options=list(seriesType="bars",
series="[{type: 'bars', targetAxisIndex:0, color:'orange'},
{type: 'bars', targetAxisIndex:0, color:'green'},
{type: 'line', targetAxisIndex:1, color:'red'},
{type: 'line', targetAxisIndex:1, color:'purple', lineDashStyle:[2,2,20,2,20,2]}]",
vAxes="[{format:'decimal', textPosition: 'out', viewWindow:{min:0, max:200}},
{format:'percent', textPosition: 'out', viewWindow:{min:0, max:1}}]",
hAxes="[{textPosition: 'out'}]",
legend = "bottom",
curveType="function",
width=1500,
height=800))
plot(column)
The variables could have been named better but I was able to get what I was looking for with my final result:
I have a data in R like the following:
bag_id location_type event_ts
1 155 transfer 2012-01-02 15:57:54
2 155 sorter 2012-01-02 17:06:05
3 305 arrival 2012-01-01 07:20:16
4 692 arrival 2012-03-29 09:47:52
10 748 transfer 2012-01-08 17:26:02
11 748 sorter 2012-01-08 17:30:02
12 993 arrival 2012-01-23 08:58:54
13 1019 arrival 2012-01-09 07:17:02
14 1019 sorter 2012-01-09 07:33:15
15 1154 transfer 2012-01-12 21:07:50
where class(event_ts) is "POSIXct".
I wanted to find the density of bags at each location in different times. So, I used the function density like the following:
adj<-.00001
dSorter<-density(as.numeric(Data$event_ts[which(Data$location_type=="sorter")]),n=length(Data$event_ts[which(Data$location_type=="sorter")]),adjust = adj)
StartTime<-as.POSIXct(strptime("2012-06-01", "%Y-%m-%d"), tz="UTC") # want to zoom & see part of data
EndTime<-as.POSIXct(strptime("2012-06-3", "%Y-%m-%d"), tz="UTC")
Range<-range(as.numeric(c(StartTime,EndTime)))
lablist.x<-substr(seq(StartTime,EndTime,by="hour"),start=6, stop=13) # want to have time labels for my plot
plot(dSorter, main="Sorter",xlim=Range, xaxt = "n")
axis(1, at=as.numeric(seq(StartTime,EndTime,by="hour")), labels =F)
text(1:49,par("usr")[3] - 0.25, labels=lablist.x,srt = 45,adj=1, , xpd = TRUE) #want to rotate the labels
The last comment does not work and I do not know how should I recognize the "x" values at function "text".
Thank you in advance for any kind of comments and guidance.
Best,
Shima.
The problem was solved,
I found the coordinates that I have to use in function "text" with the command: par("usr")
and changed the last line of my code as the following:
text(c(as.numeric(seq(StartTime,EndTime,by="hour"))),par("usr")[3]-(par("usr")[4]-par("usr")[3])/20, labels=lablist.x,srt = 45,adj=1, xpd = TRUE)