Related
I have two variables (V1,V2) measured on same subject (id) at two time points (timepoint). I want to have a scatterplot with arrow paths to show how values moved from T1 to T2 for the same subject.
In my example, some subjects do not have change in V1 nor V2, it would be ideal to show just as one dot for those sub (sub 1 for example), but I am OK with two dots for two visits, since they will be overlap. There are also sub with a decrease in either V1 or V2 (sub 2 for example), those sub were shown in red arrow above. The third group of subjects show an increase in either V1 or V2 (sub 6 and 7): these sub were in green.
However, what I really need is all arrows point from T1 to T2. That is I hope the green arrow change direction.
The dataset can be generated by:
datatest <- data.frame(timepoint =rep(seq(2,1),8),
id = c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8),
V1= c( 30.29, 30.29, 21.60, 31.43, 20.75,20.75, 21.60, 30.03, 21.60, 31.30, 31.60, 21.72, 31.6, 20.02, 11.60, 20.16),
V2=c(40, 40, 30.78, 41.63, 40.41, 40.41,30.78, 40.97, 20.78, 40.84, 41.85, 41.85, 40.78, 31.79,20.78, 30.23))
which looks like this:
timepoint id V1 V2
1 2 1 30.29 40.00
2 1 1 30.29 40.00
3 2 2 21.60 30.78
4 1 2 31.43 41.63
5 2 3 20.75 40.41
6 1 3 20.75 40.41
7 2 4 21.60 30.78
8 1 4 30.03 40.97
9 2 5 21.60 20.78
10 1 5 31.30 40.84
11 2 6 31.60 41.85
12 1 6 21.72 41.85
13 2 7 31.60 40.78
14 1 7 20.02 31.79
15 2 8 11.60 20.78
16 1 8 20.16 30.23
To generate the (wrong) plot I currently have, please run the codes below:
library(ggplot2)
library(lemon)
ggplot(datatest, aes(V1,V2,color=as.factor(timepoint),group=id)) +ggtitle("V2 vs V1 from T1 to T2")+
geom_pointline(linesize=1, size=2, distance=4, arrow = arrow(angle = 30, length = unit(0.1, "inches"), ends = "first", type = "open") )+
scale_x_continuous(limits = c(0,33), breaks=seq(0,30,10), expand = c(0, 0)) +
scale_y_continuous(limits = c(0,43), breaks=seq(0,44,10),expand = c(0, 0))+
scale_color_manual(values=c("green","red"))+labs(color = "Timepoint")
The plot currently looks like this:
Thank you!
Would this get you closer?
library(dplyr)
library(tidyr)
library(ggplot2)
data <- data.frame(timepoint =rep(seq(2,1),8),
id = c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8),
V1= c( 30.29, 30.29, 21.60, 31.43, 20.75,20.75, 21.60, 30.03, 21.60, 31.30, 31.60, 21.72, 31.6, 20.02, 11.60, 20.16),
V2=c(40, 40, 30.78, 41.63, 40.41, 40.41,30.78, 40.97, 20.78, 40.84, 41.85, 41.85, 40.78, 31.79,20.78, 30.23))
data <- data %>%
mutate(row_id = paste0("T", timepoint)) %>%
pivot_wider(id_cols = id,
names_from = row_id,
values_from = c(V1, V2)) %>%
mutate(colour = ifelse((V1_T1 > V1_T2) | (V2_T1 > V2_T2), "red", "green"))
ggplot(data = data) +
geom_point(aes(x = V1_T1, y = V2_T1)) +
geom_point(aes(x = V1_T2, y = V2_T2)) +
geom_segment(aes(x = V1_T1, xend = V1_T2, y = V2_T1 , yend = V2_T2, colour = colour),
arrow = arrow(length = unit(0.3,"cm"))) +
scale_x_continuous(
limits = c(0, 33),
breaks = seq(0, 30, 10),
expand = c(0, 0)
) +
scale_y_continuous(
limits = c(0, 43),
breaks = seq(0, 44, 10),
expand = c(0, 0)
)
You can filter the object data to remove those lines where V1 and V2 do not change and not draw the lines with length zero.
This input data is from dput:
structure(list(Player = c("deGrom", "deGrom", "deGrom", "deGrom",
"deGrom", "deGrom", "deGrom", "Wheeler", "Wheeler", "Wheeler",
"Wheeler", "Wheeler", "Wheeler", "Syndergaard", "Syndergaard",
"Syndergaard", "Syndergaard", "Matz", "Matz", "Matz", "Matz",
"Matz", "Stroman", "Stroman"), GSc = c(66, 70, 77, 77, 79, 78,
79, 76, 70, 64, 70, 62, 70, 69, 73, 81, 62, 68, 62, 69, 68, 70,
63, 75)), row.names = c(NA, -24L), class = c("tbl_df", "tbl",
"data.frame"))
I have a data frame MetsGS3 with the data above.
I want to use ggplot to create a line chart with a different color line for each of the five players. The x-axis will contain the numbers 2, 4, 6, 8, 10, 12. The y-axis will contain the game scores (GS2). I want the x-axis label to be Player and the y-axis label to be Game Score.
This code does not work, and I need help getting it to work. I know it is missing elements.
ggplot(MetsGS, aes(x=MetsGS$Player, y=GSc, colour = MetsGS$Player) + geom_line(size=1.2) + ggtitle("Mets Game Score Game Scores")
The last time I ran the above ggplot code in RStudio I got this error:
"Error: Incomplete expression: ggplot(MetsGS, aes(x=MetsGS$Player, y=GSc, colour = MetsGS$Player) + geom_line(size=1.2) + ggtitle("Mets Game Score Game Scores")"
Thanks in advance,
Howard
I think there is some data missing in your dataset. I can't find how you are defining x as a number comprised between 2 and 12.
So, I assumed that for each player, each line containing the name of the player correspond to a different game. So, I create a new column using dplyr as this (I called your dataframe d):
library(dplyr)
d %>% group_by(Player) %>% mutate(Number = seq_along(Player)*2)
# A tibble: 24 x 3
# Groups: Player [5]
Player GSc Number
<chr> <dbl> <dbl>
1 deGrom 66 2
2 deGrom 70 4
3 deGrom 77 6
4 deGrom 77 8
5 deGrom 79 10
6 deGrom 78 12
7 deGrom 79 14
8 Wheeler 76 2
9 Wheeler 70 4
10 Wheeler 64 6
# … with 14 more rows
and plot it like this:
library(ggplot2)
library(dplyr)
d %>% group_by(Player) %>% mutate(Number = seq_along(Player)*2) %>%
ggplot(., aes(x=Number, y=GSc, colour = Player)) +
geom_line(size=1.2) +
ggtitle("Mets Game Score Game Scores")+
scale_x_continuous(breaks = seq(2,14, by = 2))
Does it look what you are looking for ? If not, can you clarify your question
I have a problem most likely with a simple solution. I have two data frames. The first is a simple edge list with weights that looks like this:
head(merge_allwinsloss_df)
winner loser weight
1 CAL HAW 20
2 TENN APP 7
3 LOU CHAR 56
4 CMU PRE 46
5 WAKE TULN 4
6 CIN UTM 21
and the second is a file that provides groupings (in the form of college football conferences) that looks like this:
short conference
1 TEM AAC
2 USF AAC
3 UCF AAC
4 CIN AAC
5 ECU AAC
6 CONN AAC
What I'd like to do is to create a plot (preferably using ggplot) that uses a directed graph (from winner to loser), weight the edges (via weight) and color those by teams in the same conference, and color code nodes by conference. The code below is a "start" but I'm not really getting anywhere.
ggplot(data = merge_allwinsloss_df, aes(from_id = winner, to_id = loser)) +
geom_net(aes(color = all_teams_by_conference_df), layout.alg = "fruchtermanreingold",
size = 2, labelon = TRUE, vjust = -0.6, ecolour = "grey80",
directed = TRUE, fontsize = 3, ealpha = 0.5) +
scale_color_brewer("Conference",
palette = "Paired") +
xlim(c(-0.05, 1.05)) +
theme_net() +
theme(legend.position = "bottom")
I melted the data but that also caused a lot of other problems mostly associated with either losing the mapping or my inability to figure out how to tag the teams in merge_allwinsloss_df by conference properly. I'm sorry if this isn't overtly clear. I've been searching for help and racking my brain for days so any help would get greatly appreciated. Thanks in advance.
UPDATE: Here is a minimal example.
#Create a list of CFB winners and losers with weight given by point differential
merge_allwinsloss_ALT_df <- data.frame(matrix(c("CAL","HAW", 12, "TENN", "APP", 7, "LOU", "CHAR", 56,
"CMU", "HAW", 0, "WVU", "APP", 20 , "ARK", "TENN", 6, "CMU", "WVU", 7,
"WVU", "JMU", 15, "IND", "MIN", 3, "IND", "HAW", 14, "FSU", "TCU", 2,
"TCU", "ARK", 14),
nrow=12,ncol=3,byrow=TRUE))
colnames(merge_allwinsloss_ALT_df) <- c("winner", "loser", "weight")
merge_allwinsloss_ALT_df
#Create a list of CFB teams with conference associations
all_teams_by_conference_ALT_df<- data.frame(matrix(c("CAL","PAC", "HAW", "MAC", "TENN", "SEC",
"APP", "SUN BELT", "LOU", "ACC", "CHAR", "FCS",
"CMU", "MAC", "WVU", "BIG 12", "ARK", "SEC", "JMU", "FCS",
"IND", "BIG 10", "MIN", "BIG 10", "FSU", "ACC", "TCU",
"BIG 12"),
nrow=14,ncol=2,byrow=TRUE))
colnames(all_teams_by_conference_ALT_df) <- c("team", "conference")
all_teams_by_conference_ALT_df
# (attempt to) Plot the two data files using the first as the nodes and the # second as a reference file for coloring by conference.
ggplot(data = merge_allwinsloss_ALT_df, aes(from_id = winner, to_id = loser)) +
geom_net(aes(color = all_teams_by_conference_ALT_df), layout.alg = "fruchtermanreingold",
size = 2, labelon = TRUE, vjust = -0.6, ecolour = "grey80",
directed = TRUE, fontsize = 3, ealpha = 0.5) +
scale_color_brewer("Conference",
palette = "Paired") +
xlim(c(-0.05, 1.05)) +
theme_net() +
theme(legend.position = "bottom")
I realize that something is off here but I just can't figure it out. Moreover, I'd like to set it up so that (a) all of the teams in the same conference that have played against each other share a common color for their edges and (b) weight the edges using the weight column in merge_allwinsloss_df_ALT.
Thank you for the help!
You need to join the two tables together so that it's all in one data frame.
To add the conference of the winner, you'd do it as follows:
df1 <- merge(merge_allwinsloss_ALT_df,all_teams_by_conference_ALT_df,
by.x="winner",by.y="team",all.x=T)
To capture both the winning and losing teams' conferences, then I'd rename df1$conference to "conference_winner", and then perform the same merge again this time using df1, and by.x="loser"
Also, I'd suggest trying to use shorter names for your data frames. It doesn't make sense to type merge_allwinsloss_ALT_df over and over. Also merge is a function, so that compounds the problem by creating confusion when you use it in a name (see above where my code is merge(merge...) due your naming convention).
After that you can just map color and/or fill to conference_winner or conference_loser.
I have a table with header expanded on two columns. How to draw a 3D graph on this table OR what would be a way to draw a graph on tables having elaborated headers. Kindly suggest me alternate ways to achieve this (if any)
Crime Table:
year
2014 2015 2016
Reported Detected Reported Detected Reported Detected
Murder 221 208 178 172 26 20
Murder(Gain) 20 16 11 9 1 1
Dacoity 51 45 44 36 5 1
Robbery 538 316 351 201 23 10
Chain Snatching 528 394 342 229 23 0
Code:
library(tables)
#CLASS 1 CRIMES 2014
c14 <- structure(list(`Reported` = c(221, 20, 51,
538, 528), `Detected` = c(208, 16, 45, 316, 394)), .Names = c("Reported",
"Detected"), row.names = c("Murder", "Murder(Gain)", "Dacoity", "Robbery", "Chain Snatching"), class = "data.frame")
c14
#CLASS 1 CRIMES 2015
c15 <- structure(list(`Reported` = c(178, 11, 44,
351, 342), `Detected` = c(172, 9,
36, 201, 229)), .Names = c("Reported",
"Detected"), row.names = c("Murder", "Murder(Gain)", "Dacoity",
"Robbery", "Chain Snatching"), class = "data.frame")
c15
#CLASS 1 CRIMES 31-01-2016
c16 <- structure(list(`Reported` = c(26, 1, 5,
23, 23), `Detected` = c(20, 1,
1, 10, 0)), .Names = c("Reported",
"Detected"), row.names = c("Murder", "Murder(Gain)", "Dacoity",
"Robbery", "Chain Snatching"), class = "data.frame")
c16
# rbind with rownames as a column
st <- rbind(
data.frame(c14, year = '2014', what = factor(rownames(c14), levels = rownames(c14)),
row.names= NULL, check.names = FALSE),
data.frame(c15,year = '2015',what = factor(rownames(c15), levels = rownames(c15)),
row.names = NULL,check.names = FALSE),
data.frame(c16,year = '2016',what = factor(rownames(c16), levels = rownames(c16)),
row.names = NULL,check.names = FALSE)
)
crimetable <- tabular(Heading()*what ~ year*(`Reported` +`Detected`)*Heading()*(identity),data=st)
crimetable
As I hate 3D plots for 3-way tables and I like ggplot2, I suggest this:
Gather your data into "long" format:
library(tidyr)
st_long = gather(st, type, count, -c(year, what))
head(st_long, 3)
# year what type count
# 1 2014 Murder Reported 221
# 2 2014 Murder(Gain) Reported 20
# 3 2014 Dacoity Reported 51
As you can see, both Detected and Reported columns are now included in the same column called type. This is useful for ggplot2, as it can easily create facets. Facets are separate elements within the plot that share the same aesthetic components but work with on different groups of data:
library(ggplot2)
ggplot(st_long, aes(year, count, group = what, color = what)) +
geom_line() +
facet_wrap(~ type)
(I am not saying that line plot is the only/best plot here, but it is often used when comparing frequencies across different time-points.)
I'm using Paul Bleicher's Calendar Heatmap to visualize some events over time and I'm interested to add black-and-white fill patterns instead of (or on top of) the color coding to increase the readability of the Calendar Heatmap when printed in black and white.
Here is an example of the Calendar Heatmap look in color,
and here is how it look in black and white,
it gets very difficult to distinguish between the individual levels in black and white.
Is there an easy way to get R to add some kind of patten to the 6 levels instead of color?
Code to reproduce the Calendar Heatmap in color.
source("http://blog.revolution-computing.com/downloads/calendarHeat.R")
stock <- "MSFT"
start.date <- "2012-01-12"
end.date <- Sys.Date()
quote <- paste("http://ichart.finance.yahoo.com/table.csv?s=", stock, "&a=", substr(start.date,6,7), "&b=", substr(start.date, 9, 10), "&c=", substr(start.date, 1,4), "&d=", substr(end.date,6,7), "&e=", substr(end.date, 9, 10), "&f=", substr(end.date, 1,4), "&g=d&ignore=.csv", sep="")
stock.data <- read.csv(quote, as.is=TRUE)
# convert the continuous var to a categorical var
stock.data$by <- cut(stock.data$Adj.Close, b = 6, labels = F)
calendarHeat(stock.data$Date, stock.data$by, varname="MSFT Adjusted Close")
update 02-13-2013 03:52:11Z, what do I mean by adding a pattern,
I envision adding a pattern to the individual day-boxes in the Calendar Heatmap as pattern is added to the individual slices in the pie chart to the right (B) in this plot,
found here something like the states in this plot.
I answered this question before he becomes a bounty. It looks like the OP find my previous answer a little bit complicated. I organized the code in a single gist here. you need just to download the file and source it.
I create new function extra.calendarHeat which is an extension of the first one to draw hetmap of double time series.(dat,value1,value2). I addedthis new parameters:
pch.symbol : vector of symbols , defualt 15:20
cex.symbol : cex of the symbols , default = 2
col.symbol : color of symbols , default #00000044
pvalues : value of symbols
Here some examples:
## I am using same data
stock <- "MSFT"
start.date <- "2012-01-12"
end.date <- Sys.Date()
quote <- paste("http://ichart.finance.yahoo.com/table.csv?s=",
stock,
"&a=", substr(start.date,6,7),
"&b=", substr(start.date, 9, 10),
"&c=", substr(start.date, 1,4),
"&d=", substr(end.date,6,7),
"&e=", substr(end.date, 9, 10),
"&f=", substr(end.date, 1,4),
"&g=d&ignore=.csv", sep="")
stock.data <- read.csv(quote, as.is=TRUE)
p1 <- extra.calendarHeat(dates= stock.data$Date, values = stock.data$Adj.Close,
pvalues = stock.data$Volume,
varname="W&B MSFT Adjusted Close
\n Volume as no border symbol ")
## multiply symbols
p2 <- extra.calendarHeat(dates= stock.data$Date, values = stock.data$Adj.Close,
pvalues = stock.data$Volume,
varname="W&B MSFT Adjusted Close \n
black Volume as multiply symbol ",
pch.symbol = c(3,4,8,9),
col.symbol='black')
## circles symbols
p3 <- extra.calendarHeat(dates= stock.data$Date, values = stock.data$Adj.Close,
pvalues = stock.data$Volume,
varname="W&B MSFT Adjusted Close \n blue Volume as circles",
pch.symbol = c(1,10,13,16,18),
col.symbol='blue')
## triangles symbols
p4 <- extra.calendarHeat(dates= stock.data$Date, values = stock.data$Adj.Close,
pvalues = stock.data$Volume,
varname="W&B MSFT Adjusted Close \n red Volume as triangles",
pch.symbol = c(2,6,17,24,25),
col.symbol='red')
p5 <- extra.calendarHeat(dates= stock.data$Date, values = stock.data$Adj.Close,
varname="MSFT Adjusted Close",
pch.symbol = LETTERS,
col.symbol='black')
# symbols are LETTERS
p6 <- extra.calendarHeat(dates= stock.data$Date, values = stock.data$Adj.Close,
pvalues = stock.data$Volume,
varname="MSFT Adjusted Close \n Volume as LETTERS symbols",
pch.symbol = letters,
color='r2b')
You can panel.level.plot from latticeExtra to add pattern. I think the question as it is asked is a little bit specific. So I try to generalize it. The idea is to give the steps to transform a time series to a calendar heatmap: with 2 patterns (fill color and a shape). We can imagine multiple time series (Close/Open). For example, you can get something like this
or like this, using a ggplot2 theme:
The function calendarHeat , giving a single time series (dat,value) , transforms data like this :
date.seq value dotw woty yr month seq
1 2012-01-01 NA 0 2 2012 1 1
2 2012-01-02 NA 1 2 2012 1 2
3 2012-01-03 NA 2 2 2012 1 3
4 2012-01-04 NA 3 2 2012 1 4
5 2012-01-05 NA 4 2 2012 1 5
6 2012-01-06 NA 5 2 2012 1 6
So I assume that I have data formated like this, otherwise, I extracted from calendarHeat the part of data transformation in a function(see this gist)
dat <- transformdata(stock.data$Date, stock.data$by)
Then the calendar is essentially a levelplot with custom sacles , custom theme and custom panel' function.
library(latticeExtra)
levelplot(value~woty*dotw | yr, data=dat, border = "black",
layout = c(1, nyr%%7),
col.regions = (calendar.pal(ncolors)),
aspect='iso',
between = list(x=0, y=c(1,1)),
strip=TRUE,
panel = function(...) {
panel.levelplot(...)
calendar.division(...)
panel.levelplot.points(...,na.rm=T,
col='blue',alpha=0.5,
## you can play with cex and pch here to get the pattern you
## like
cex =dat$value/max(dat$value,na.rm=T)*3
pch=ifelse(is.na(dat$value),NA,20),
type = c("p"))
},
scales= scales,
xlim =extendrange(dat$woty,f=0.01),
ylim=extendrange(dat$dotw,f=0.1),
cuts= ncolors - 1,
colorkey= list(col = calendar.pal(ncolors), width = 0.6, height = 0.5),
subscripts=TRUE,
par.settings = calendar.theme)
Where the scales are:
scales = list(
x = list( at= c(seq(2.9, 52, by=4.42)),
labels = month.abb,
alternating = c(1, rep(0, (nyr-1))),
tck=0,
cex =1),
y=list(
at = c(0, 1, 2, 3, 4, 5, 6),
labels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday"),
alternating = 1,
cex =1,
tck=0))
And the theme is setting as :
calendar.theme <- list(
xlab=NULL,ylab=NULL,
strip.background = list(col = "transparent"),
strip.border = list(col = "transparent"),
axis.line = list(col="transparent"),
par.strip.text=list(cex=2))
The panel function uses a function caelendar.division. In fact, the division of the grid(month black countour) is very long and is done using grid package in the hard way (panel focus...). I change it a little bit, and now I call it in the lattice panel function: caelendar.division.
We can use ggplot2's scale_shape_manual to get us shapes that appear close to shading, and we can plot these over the grey heatmap.
Note: This was adapted from #Jay's comments in the original blog posting for the calendar heatmap
# PACKAGES
library(ggplot2)
library(data.table)
# Transofrm data
stock.data <- transform(stock.data,
week = as.POSIXlt(Date)$yday %/% 7 + 1,
month = as.POSIXlt(Date)$mon + 1,
wday = factor(as.POSIXlt(Date)$wday, levels=0:6, labels=levels(weekdays(1, abb=FALSE)), ordered=TRUE),
year = as.POSIXlt(Date)$year + 1900)
# find when the months change
# Not used, but could be
stock.data$mchng <- as.logical(c(0, diff(stock.data$month)))
# we need dummy data for Sunday / Saturday to be included.
# These added rows will not be plotted due to their NA values
dummy <- as.data.frame(stock.data[1:2, ])
dummy[, -which(names(dummy) %in% c("wday", "year"))] <- NA
dummy[, "wday"] <- weekdays(2:3, FALSE)
dummy[, "mchng"] <- TRUE
rbind(dummy, stock.data) -> stock.data
# convert the continuous var to a categorical var
stock.data$Adj.Disc <- cut(stock.data$Adj.Close, b = 6, labels = F)
# vals is the greyscale tones used for the outer monthly borders
vals <- gray(c(.2, .5))
# PLOT
# Expected warning due to dummy variable with NA's:
# Warning message:
# Removed 2 rows containing missing values (geom_point).
ggplot(stock.data) +
aes(week, wday, fill=as.factor(Adj.Disc),
shape=as.factor(Adj.Disc), color=as.factor(month %% 2)) +
geom_tile(linetype=1, size=1.8) +
geom_tile(linetype=6, size=0.4, color="white") +
scale_color_manual(values=vals) +
geom_point(aes(alpha=0.2), color="black") +
scale_fill_grey(start=0, end=0.9) + scale_shape_manual(values=c(2, 3, 4, 12, 14, 8)) +
theme(legend.position="none") + labs(y="Day of the Week") + facet_wrap(~ year, ncol = 1)