Multi-line Time Series Chart in ggplot2 - r

I have a dataframe comprising two columns, 'host', and 'date'; which describes a series of cyber attacks against a number of different servers on specific dates over a seven month period.
Here's what the data looks like,
> china_atks %>% head(100)
host date
1 groucho-oregon 2013-03-03
2 groucho-oregon 2013-03-03
...
46 groucho-singapore 2013-03-03
48 groucho-singapore 2013-03-04
...
Where 'groucho-oregon', 'groucho-signapore', etc., is the hostname of the server targeted by an attack.
There are around 190,000 records, spanning 03/03/2013 to 08/09/2013, e.g.
> unique(china_atks$date)
[1] "2013-03-03" "2013-03-04" "2013-03-05" "2013-03-06" "2013-03-07"
"2013-03-08" "2013-03-09"
[8] "2013-03-10" "2013-03-11" "2013-03-12" "2013-03-13" "2013-03-14"
"2013-03-15" "2013-03-16"
[15] "2013-03-17" "2013-03-18" "2013-03-19" "2013-03-20" "2013-03-21"
"2013-03-22" "2013-03-23"
...
I'd like to create a multi-line time series chart that visualises how many attacks each individual server received each day over the range of dates, but I can't figure out how to pass the data to ggplot to achieve this. There are nine unique hostnames, and so the chart would show nine lines.
Thanks!

Here's one way to do this.
First Summarize the count frequency by date.
library(plyr)
df <- plyr::count(da,c("host", "date"))
Then Do the plotting.
ggplot(data=df, aes(x=date, y=freq, group=1)) +
geom_line(aes(color = host))
Data
da <- structure(list(host = structure(1:4, .Label = c("groucho-eu",
"groucho-oregon", "groucho-singapore", "groucho-tokyo"), class = "factor"),
date = structure(c(1L, 1L, 1L, 1L), .Label = "2013-03-03", class = "factor"),
freq = c(1L, 4L, 2L, 1L)), .Names = c("host", "date", "freq"
), row.names = c(NA, -4L), class = "data.frame")

ggplot2 library is capable of performing statistics. Hence, an option could be to let ggplot handle count/frequency. This should draw multiple lines (one for each group)
ggplot(df, aes(x=Date, colour = host, group = host)) + geom_line(stat = "count")
Note: Make sure host is converted to factor to have discrete color for lines.

Related

how to make multiple highchart graph in R?

I'm trying to graph multiple dataframe columns in R.
(like this-> Graphing multiple variables in R)
bid ask date
1 20.12 20.14 2014-10-31
2 20.09 20.12 2014-11-03
3 20.03 20.06 2014-11-04
4 19.86 19.89 2014-11-05
This is my data.
And I can make one line graph like this.
`data%>% select(bid,ask,date) %>% hchart(type='line', hcaes(x='date', y='bid'))`
I want to add ask line graph in this graph.
One way is to reshape (gather) the values to plot and then add a group aesthetic to the hchart function:
library(tidyr)
data %>% select(bid,ask,date) %>%
gather("key", "value", bid, ask) %>%
hchart(type='line', hcaes(x='date', y='value', group='key'))
ps. Don't forget to load all the necessary libraries
You can use the following code
library(reshape2)
library(highcharter)
df_m <- melt(df, id="date")
hchart(df_m, "line", hcaes(x = date, y = value, group = variable))
Here is the data
df = structure(list(bid = c(20.12, 20.09, 20.03, 19.86), ask = c(20.14,
20.12, 20.06, 19.89), date = structure(c(4L, 1L, 2L, 3L), .Label = c("03/11/2014",
"04/11/2014", "05/11/2014", "31/10/2014"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))

ggplot r: How to Highlight the Data from a Year [duplicate]

This question already has answers here:
geom_smooth on a subset of data
(3 answers)
Closed 3 years ago.
Data: Height was recorded daily
I want to plot the Height of my Plants (Plant A1 - Z50)
in single Plots, and i want to Highlight the current Year.
So i made a Subset of each Plant and a subset for the current year (2018)
Now i need a Plot with the total record an the highlighted Data from 2018
dput(Plant)
structure(list(Name = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L), .Label = c("Plant A1", "Plant B1", "Plant C1"), class = "factor"),
Date = structure(c(1L, 4L, 5L, 7L, 1L, 4L, 6L, 1L, 2L, 3L
), .Label = c(" 2001-01-01", " 2001-01-02", " 2001-01-03",
" 2002-01-01", " 2002-02-01", " 2019-01-01", " 2019-12-31"
), class = "factor"), Height_cm = c(91, 106.1, 107.4, 145.9,
169.1, 192.1, 217.4, 139.8, 140.3, 140.3)), .Names = c("Name",
"Date", "Height_cm"), class = "data.frame", row.names = c(NA,
-10L))
Plant_A1 <- filter(Plant, Name == "Plant A1")
Current_Year <- as.numeric("2018")
Plant_A1_Subset <- filter(Plant_A1, format(Plant_A1$Date, '%Y') == Current_Year)
ggplot(data=Plant_A1,aes(x=Plant_A1$Date, y=Plant_A1$Heigth)) +
geom_point() +
geom_smooth(method="loes", level=0.95, span=1/2, color="red") +
labs(x="Data", y="Height cm")
Now i don't know how to put my new Subset for 2018(Plant_A1_Subset) into this graph.
As noted, this question has a duplicate with an answer in this question.
That said here's likely the most common way of handling your problem.
In ggplot2 future calls inherits any arguments passed into aes of the ggplot(aes(...)) function. Thus the plot will always use these arguments in future ggplot functions, unless one manually overwrites the arguments. However we can solve your problem, by simply adding an extra argument in the aes of geom_point. Below I've illustrated a simple way to achieve what you might be looking for.
Specify the aes argument in individual calls
The first method is likely the most intuitive. aes controls the the plotted parameters. As such if you want to add colour to certain points, one way is to let the aes be individual to the geom_point and geom_smooth argument.
library(ggplot2)
library(lubridate) #for month(), year(), day() functions
current_year <- 2018
ggplot(data = Plant_A1, aes(x = Date, y = Heigth)) +
#Note here, colour set in geom_point
geom_point(aes(col = ifelse(year(Date) == current_year, "Yes", "No"))) +
geom_smooth(method="loess", level=0.95,
span=1/2, color="red") +
labs(x="Data", y="Height cm",
col = "Current year?") #Specify legend title for colour
Note here that i have used the inheritance of the aes argument. Simply put, the aes will check the names within data, and if it can find it, it will use these as variables. So there is no need to specify data$....

Lattice xyplot() Adding a different mean trend line to each panel?

I have a simple trellis scatterplot. Two panels - male/female. ID is a unique number for each participant. The var1 is a total test time. Mean.values is a vector of two numbers (the means for gender).
No point including a best fit line so what I want is to plot a trend line of the mean in each panel. The two panels have different means, say male = 1 minute, female = 2 minutes.
xyplot(var1 ~ ID|Gender, data=DF,
group = Gender,
panel=function(...) {
panel.xyplot(...)
panel.abline(h=mean.values)
})
At the minute the graph is coming out so that both trendlines appear in each panel. I want only one trendline in each.
Does anyone have the way to do this?
I have tried a number of different ways including the long code for function Addline which just doesn't work for me. I just want to define which panel im looking at and i've looked at ?panel.number but not sure how that works as its coming up that I don't have a current row. (current.row(prefix)).
There must be a simple way of doing this?
[EDIT - Here's the actual data i'm using]
I've tried to simplify the DF
library(lattice)
dput(head(DF))
structure(list(ID = 1:6, Var1 = c(2333858, 4220644,
2941774, 2368496, 3165740, 3630300), mean = c(2412976, 2412976,
2412976, 2412976, 2412976, 2412976), Gender = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("1", "2"), class = "factor")), .Names = c("ID",
"Var1", "mean", "Gender"), row.names = c(NA, 6L), class = "data.frame")
dput(tail(DF))
structure(list(ID = 161:166, Var1= c(2825246, 3552170,
3688882, 2487760, 3849108, 3085342), mean = c(3689805, 3689805,
3689805, 3689805, 3689805, 3689805), Gender = structure(c(2L,
2L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor")), .Names = c("ID",
"Var1", "mean", "Gender"), row.names = 109:114, class = "data.frame")
plot i'm using:
xyplot((Var1/1000) ~ ID|Gender, data=DF,
group = Gender,scales=list(x=list(at=NULL)),
panel=function(...) {
panel.xyplot(...)
panel.abline(h=mean.values) })
causes 2 lines.
[EDIT - This is the code which includes the function Addline & is everywhere on all the posts and doesn't seem to work for me]
addLine<- function(a=NULL, b=NULL, v = NULL, h = NULL, ..., once=F) { tcL <- trellis.currentLayout() k<-0 for(i in 1:nrow(tcL)) for(j in 1:ncol(tcL)) if (tcL[i,j] > 0) { k<-k+1 trellis.focus("panel", j, i, highlight = FALSE) if (once) panel.abline(a=a[k], b=b[k], v=v[k], h=h[k], ...) else panel.abline(a=a,b=b, v=v, h=h, ...) trellis.unfocus() } }
then writing after the trellis plot (mean.values being a vector of two numbers, mean for female, mean for male)
addLine(v=(mean.values), once=TRUE)
Update - I managed to do it in ggplot2.
Make the ggplot using facet_wrap then -
hline.data <- data.frame(z = c(2413, 3690), Gender = c("Female","Male"))
This creates a DF of the two means and the Gender, 2x2 DF
myplot <- myplot + geom_hline(aes(yintercept = z), hline.data)
This adds the lines to the ggplot.
If you just wanted plot the mean of values you are drawing on the plot aready, you can skip the mean.values variable and just do
xyplot(Var1 ~ ID|Gender, data=DF,
group = Gender,
panel=function(x,y,...) {
panel.xyplot(x,y,...)
panel.abline(h=mean(y))
}
)
With the sample data
DF<-data.frame(
ID=1:10,
Gender=rep(c("M","F"), each=5),
Var1=c(5,6,7,6,5,8,9,10,8,9)
)
this produces
I believe lattice has a specific panel function for this, panel.average().
Try replacing panel.abline(h=mean.values) with panel.average(...).
If that doesn't solve the problem, we might need more information; try using dput() on your data (e.g., dput(DF), or some representative subset).

changing strip's color in lattice multipanel plot with 2 (or possibly more) factors

I've checked quite extensively through the forum and on the web but I couldn't find anyone that already presented my case, so here you are the question:
my goal: how can I extend the example presented here in case I have more than one conditioning factor?
I've tried several ways to modify the which.panel variable of strip.default function, but I couldn't come out of my problem.
This is the code I'm using at the moment (with comments):
if (!require("plyr","lattice")) install.packages("plyr","lattice")
require("plyr")
require("lattice")
# dataframe structure (8 obs. of 6 variables)
data2 <- structure(list(
COD = structure(c(1L, 1L, 1L, 1L, 2L, 2L,2L, 2L),
.Label = c("A", "B"), class = "factor"),
SPEC = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L),
.Label = c("15/25-(15/06)", "15/26-(22/06)"), class = "factor"),
DATE = structure(c(16589, 16590, 16589, 16590, 16589, 16590, 16589, 16590), class = "Date"),
PM.BDG = c(1111.25, 1111.25, 1141.29, 1141.29, 671.26, 671.26, 707.99, 707.99),
PM = c(1033.14, 1038.4, 1181.48, 1181.48, 616.39, 616.39, 641.55, 641.55),
DELTA.PM = c(-78.12, -72.85, 40.19, 40.19, -54.87, -54.87, -66.44, -66.44)),
.Names = c("COD", "SPEC", "DATE", "PM.BDG", "PM", "DELTA.PM"),
row.names = c(NA, 8L), class = "data.frame")
# create a dataframe with a vector of colors
# based on the value of DELTA.PM for the last
# date available for each combination of COD and SPEC.
# Each color will be used for a specific panel, and it will
# forestgreen if DELTA.PM is higher than zero, red otherwise.
listaPM <- ddply(data2, .(COD,SPEC), summarize, ifelse(DELTA.PM[DATE=="2015-06-04"]<0, "red", "forestgreen"))
names(listaPM) <- c("COD","SPEC","COLOR")
# set a personalized strip, with bg color based on listaPM$COLOR
# and text based on listaPM$COD and listaPM$SPEC
myStripStylePM <- function(which.panel, factor.levels, ...) {
panel.rect(0, 0, 1, 1,
col = listaPM[which.panel,3],
border = 1)
panel.text(x = 0.5, y = 0.5,
font=2,
lab = paste(listaPM[which.panel,1],listaPM[which.panel,2], sep=" - "),
col = "white")}
# prepare a xyplot function to plot that will be used later with dlply.
# Here I want to plot the values of PM.BDG and PM over time (DATE),
# conditioning them on the SPEC (week) and COD (code) factors.
graficoPM <- function(df) {
xyplot (PM.BDG + PM ~ DATE | SPEC + COD,
data=df,
type=c("l","g"),
col=c("black", "red"),
abline=c(h=0,v=0),
strip = myStripStylePM
)}
# create a trellis object that has a list of plots,
# based on different COD (codes)
grafico.PM <- dlply(data2, .(data2$COD), graficoPM)
# graphic output, 1st row should be COD "A",
# 2nd row should be COD "B", each panel is a different SPEC (week)
par(mfrow=c(2,1))
print(grafico.PM[[1]], position=c(0,0.5,1,1), more=TRUE)
print(grafico.PM[[2]], position=c(0,0,1,0.5))
As you can see, the first row of plots is correct: text of the first strip is "A" (1st COD), the weeks (SPEC) are shown and the color represents if PM is above or below PM.BDG on the last date of the plot
On the contrary, the 2nd row of plots just repeats the same scheme of the first row (as it can be seen by the fact that COD is Always "A" and 2nd strip's bg color in the 2nd row is green, when the line of PM in red is clearly well below the PM.BDG line in black).
Although I'd like to keep my code, I'm pretty sure my goal could be achieved with a different strategy. If you can find a better way to use my dataframe, I'll be happy to study the code and see if it works with my data.
The problem is match up the current panel data to the listaPM data. Because you are doing different sub-setting in each of the calls, it's difficult to use which.panel() to match up the data sets.
There is an undocumented feature which allows you to get the conditioning variable names to make the matching more robust. Here's how you would use it in your case.
myStripStylePM <- function(which.panel, factor.levels, ...) {
cp <- dimnames(trellis.last.object())
ci <- arrayInd(packet.number(), .dim=sapply(cp, length))
cv <- mapply(function(a,b) a[b], cp, as.vector(ci))
idx<-which(apply(mapply(function(n, v) listaPM[, n] == v, names(cv), cv),1,all))
stopifnot(length(idx)==1)
panel.rect(0, 0, 1, 1,
col = listaPM[idx,3],
border = 1)
panel.text(x = 0.5, y = 0.5,
font=2,
lab = paste(listaPM[idx,1],listaPM[idx,2], sep=" - "),
col = "white")
}
When run with the rest of your code, it produces this plot

Temporal density plot in R

I have irregularly measured observations of a phenomenon with a timestamp each:
2013-01-03 00:04:23
2013-01-03 00:02:04
2013-01-02 23:45:16
2013-01-02 23:35:16
2013-01-02 23:31:56
2013-01-02 23:31:30
2013-01-02 23:29:18
2013-01-02 23:28:43
...
Now I would like to plot these points on the x axis and apply a kernel density function to them, so I can visually explore temporal density using various bandwidths. Something like this should turn out, although the example below does not use x axis labeling; I would like to have labels with, for example, particular days (January 1st, January 5th, etc.):
It is important, however, that the measurement points themselves are visible in the plot, like above.
#dput
df <- structure(list(V1 = structure(c(2L, 2L, 1L, 3L, 1L, 4L, 5L, 4L), .Label = c("2013-01-02", "2013-01-03", "2013-01-04", "2013-01-05", "2013-01-11"), class = "factor"), V2 = structure(c(1L, 3L, 8L, 4L, 7L, 6L, 5L, 2L), .Label = c(" 04:04:23", " 06:28:43", " 10:02:04", " 11:35:16", " 14:29:18", " 17:31:30", " 23:31:56", " 23:45:16"), class = "factor")), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, -8L))
Using ggplot since it gives fine-grained control over your plot. Use different layers for the measurements and the density itself.
df$tcol<- as.POSIXct(paste(df$dte, df$timestmp), format= "%Y-%m-%d %H:%M:%S")
library(ggplot2)
measurements <- geom_point(aes(x=tcol, y=0), shape=15, color='blue', size=5)
kde <- geom_density(aes(x=tcol), bw="nrd0")
ggplot(df) + measurements + kde
Leads to
Now, if you want to further adjust the x-axis labels (since you want each separate day marked, you can use the scales package.
We are going to use scale_x_date but that only takes in 'Date'
library(scales)
df$tcol <- as.Date(df$tcol, format= "%Y-%m-%d %H:%M:%S")
xlabel <- scale_x_date(labels=date_format("%m-%d"), breaks="1 day")
ggplot(df) + xlabel + measurements + kde
This gives:
Please note that the hours seem to have gotten rounded.
Hopefully this helps you move forward.
Convert your values to POSIXct, convert that numeric (i.e., seconds in UNIX time) and then apply your kernel density function. If z is your vector of timestamps:
z2 <- as.POSIXct(z, "%Y-%m-%d %H:%M:%S", tz="GMT")
plot(density(as.numeric(z2)))
It would then be relatively easy to add a labeled x-axis with axis.

Resources