dumbbell plot in R - r

Hi I have series of time intervals stored in data frame df.
replicate ID timeA timeB mean
1 60 80 70
2 10 70 40
3 25 35 30
I am trying to plot a dumbbell:
library(ggplot2)
devtools::install_github("hrbrmstr/ggalt")
library(ggalt)
library(dplyr)
df <- arrange(df, timeA)
#calculate mean middle point between two values
df$mean <- rowMeans(df[2:3])
#add factor levels
df <- mutate(df, rep=factor(replicateID, levels=rev(replicateID)))
gg <- ggplot(df, aes(x=timeA, xend=timeB, y=rep))
gg <- gg + geom_dumbbell(colour="#a3c4dc",
point.colour.l="#0e668b",
point.colour.r="#0000ff",
point.size.l=2.5,
point.size.r=2.5)
gg <- gg + geom_point(aes(y = df$mean), color = "red", linetype = "dotted")
The dumbbell plot gets plotted correctly till a certain point, however, I would like to have the middle point of each pair of values displayed on the graph too and connect all the middle values with a line.
I tried to do that by adding geom_point but this doesn't work.
Any suggestion?

First of all, the parameter/aesthetic names have changed, so don't get confused. If you don't update you'll have to use your parameter names. But for geom_segment and geom_point it will be like below:
#data
df=structure(list(replicateID = c(2, 3, 1), timeA = c(10, 25, 60
), timeB = c(70, 35, 80), mean = c(40, 30, 70), rep = structure(3:1, .Label = c("1",
"3", "2"), class = "factor"), time_mean = c(40, 30, 70)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L), groups = structure(list(
rep = structure(1:3, .Label = c("1", "3", "2"), class = "factor"),
.rows = list(3L, 2L, 1L)), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE))
gg <- gg + geom_dumbbell(colour="#a3c4dc",
colour_x = "#0e668b",
colour_xend="#0000ff",
size_x=2.5,
size_xend=2.5,
linetype = "dotted")
gg <- gg + geom_point(aes(x = mean), color = "red") #draws the points
gg+geom_path(aes(x=mean,group=1)) #draws the line
You could also try to use geom_line with the same parameters. Here you would get a connection from ID 1-3 which might not be what you're looking for. I'm not sure from your question.
PS: in the future please consider posting the output from dput(df) as this is easier for others to read into an r-session

Using the dumbbell R package
##Load some libraries
library(tidyverse)
library(ggplot2)
library(rlang)
library(utils)
library(data.table)
library(dumbbell)
## reformat the data
df2<-df %>% mutate("key"="Time") %>% mutate("diff"=timeA-timeB)
df3<-df2%>% arrange(desc(timeA))
df2$replicateID<-factor(df2$replicateID, df3$replicateID)
##Plot
dumbbell::dumbbell(df2,key="key",id="replicateID", column1 = "timeA", column2="timeB", lab1 = "timeA", lab2="timeB", pt_val = 1, delt=1, textsize = 3) + geom_point(aes(x = df2$mean, y=df2$replicateID), color = "red") +
geom_path(aes(x=df2$mean,y=df2$replicateID,group=1))
Dont have enough points so here is the link to the plot
dumbbell R package

Related

How to generate sub-plots with grouped categorical variable sorted by a numeric variable in ggplot?

I have a dataframe text with count n of word appearing in each file file_num = 1 or 2 or 3. I would like to use ggplot to generate three subplots, one for each value of file_num, with word on the y-axis and the frequency n on x-axis. I want each sub-plot to be sorted according to increasing or decreasing value of n observed withing each file_num. I have tried many different ways to solve this seemingly trivial issue but have not been successful.
Here is dput of my test data:
structure(list(file_num = c("1", "1", "1", "1", "2", "2", "2",
"2", "2", "3", "3", "3", "3", "3"), word = c("test", "quality",
"page", "limit", "information", "limit", "test", "instruments",
"quality", "limit", "test", "effective", "page", "system"), n = c(5,
35, 55, 75, 20, 30, 40, 60, 70, 101, 201, 301, 401, 501)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L), spec = structure(list(
cols = list(file_num = structure(list(), class = c("collector_character",
"collector")), word = structure(list(), class = c("collector_character",
"collector")), n = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
Here is what I have tried:
library(tidytext)
library(stringr)
library(pdftools)
library(dplyr)
library(purrr)
library(ggplot2)
library(forcats)
text %>% group_by(file_num) %>% arrange(file_num, desc(n)) %>%
ggplot(.,aes(factor(word,levels = unique(word)), n, fill = file_num)) +
geom_bar(stat = "identity", position = "dodge") +
scale_x_discrete("Word") +
scale_y_continuous("n") + coord_flip() +
facet_grid(rows = vars(file_num), scales = "free")
Here is the plot that is generated using the above code on dataframe text created using the dput data. It shows the desired result (word sorted with increasing value of n) for file_num = 1, but not for file_num = 2 or 3:
Thanks to #Tjebo for pointing me in the right direction. Here is a working solution that is based on ggplot. It does require one to save the modified dataframe text before using it in ggplot.
Let me know if there is a way to directly pipe the modified dataframe into ggplot
text1 <- text %>% ungroup %>% arrange(file_num, n) %>%
mutate(order = row_number()) # create variable order
ggplot(text1,aes(order, n, fill = file_num)) +
geom_bar(stat = "identity", show.legend = FALSE) +
scale_x_continuous(
breaks = text1$order,
labels = text1$word,
expand = c(0,0),
xlab("Word")) +
facet_grid(file_num ~ ., scales = "free") +
coord_flip()
Output plot:
You could achieve this "ordered per facet" quite simply with the ggcharts package, using the following code on your data:
library(ggcharts)
bar_chart(data = text, x = word, y = n,
fill = file_num,
facet = file_num,
horizontal = TRUE
)
This yields the following graph:
Please, let me know whether this is what you want.
Update:
The object created by bar_chart is of class ggplot, as can be seen below:
class(chart)
[1] "gg" "ggplot"
This means that one can use the ggplot2 functions to alter the graph, e.g.:
chart +
guides(fill=FALSE) + ## remove legend
ggtitle("My new title") + ## add title
theme_linedraw() +
theme(strip.background = element_rect(colour = "red", size = 2))
yielding the following pic (for illustration only) :

r does not allow the x axis to display the title (now with added data)

The question was how to get R to display titles on the x- and y-axes when the plot is rotated. mtext was not allowing this to happen. The question then became how to do this with the data at hand.
Here is my edited code and data.
Small segment of my Data:
library(ggplot)
x <- structure(list(
CS1 = c(51.176802507837, 11.289327763008, 10.8584547767754, 5.37665764546685, 6.47159365761892),
CS2 = c(34.9956506731101, 45.7147446193383, 23.788413903316, 42.4969135802469, 18.8998879103283),
CS3 = c(3.59556251631428, 5.59228312932411, 11.7117536894149, 15.7240944017563, 9.72486977228754),
CS4 = c(0.830633241559198, 2.57358541893362, 3.05352639873916, 7.01238591916558, 2.98276253547777),
CS5 = c(6.6094547746612, 7.67873290538655, 9.93544994944388, 8.49609094535301, 6.71423210935406)),
class = c("tbl_df", "tbl", "data.frame"))
Now some code to make a ggplot.
xplot<-ggplot(x, aes(y = test, y = CS2, group = test))+
geom_boxplot()+
labs(y = "Intensity",
x = "Variable")+
scale_x_discrete()
xplot
Try using ggplot from the tidyverse.
<del>It is useful to have a basic dataset to run from:<\del> Now that you have some data
library(tidyverse)
x <-structure(list(
CS1 = c(51.176802507837, 11.289327763008, 10.8584547767754, 5.37665764546685, 6.47159365761892),
CS2 = c(34.9956506731101, 45.7147446193383, 23.788413903316, 42.4969135802469, 18.8998879103283),
CS3 = c(3.59556251631428, 5.59228312932411, 11.7117536894149, 15.7240944017563, 9.72486977228754),
CS4 = c(0.830633241559198, 2.57358541893362, 3.05352639873916, 7.01238591916558, 2.98276253547777),
CS5 = c(6.6094547746612, 7.67873290538655, 9.93544994944388, 8.49609094535301, 6.71423210935406)),
row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame")
)
Now gather that data into two columns
x1 <- gather(x, test, values, CS1:CS5)
Now plot
xplot<-ggplot(x1, aes(x = test, y = values, group = test))+
geom_boxplot()+
labs(y = "Intensity",
x = "Variable")
xplot + coord_flip()

Heatmaps for a matrix with ones and zeros using R

Below is my sample data, basically its a matrix with row names as person names
and some columns for each of these rows. All I have in the data is just zeros and ones. I would like to visualize it using heatmaps. (reds for 0s and green for 1s or any other color coding). How do I accomplish this using R? you can show me using any example dataset with just ones and zeros (binary values).
Just another approach using ggplot
library(ggplot2)
library(reshape2)
library(plyr)
library(scales)
df <- structure(list(people = structure(c(2L, 1L), .Label = c("Dwayne", "LeBron"), class = "factor"),
G = c(1L, 0L),
MIN = c(1L, 0L),
PTS = c(0L, 1L),
FGM = c(0L,0L),
FGA = c(0L,0L),
FGP = c(1L,1L)),
.Names = c("people", "G", "MIN", "PTS", "FGM", "FGA", "FGP"),
class = "data.frame",
row.names = c(NA, -2L))
df.m <- melt(df)
df1.m <- ddply(df.m, .(variable), transform, rescale = value)
p <- ggplot(df1.m, aes(variable, people)) +
geom_tile(aes(fill = rescale), colour = "black")
p + scale_fill_gradient(low = "green", high = "red")
show(p)
Adopted from this tutorial
With highcharter:
library(highcharter)
library(tidyr)
library(dplyr)
df<-data.frame(row=c("Dwayne","James"),G=c(1,0),MIN=c(1,0),PTS=c(0,1),FGM=c(0,0),FGA=c(0,0),FGP=c(1,1))
rownames(df)<-c("Dwayne","James")
df$row<-rownames(df)
data<-df%>%
tidyr::gather(row,value)%>%
setNames(c("name","variable","value"))
hchart(data, "heatmap", hcaes(x = variable, y = name, value = value)) %>%
hc_colorAxis(stops = color_stops(2, c("red","green")))
UPDATE:
You can add hc_size(height = 800) for height=800 or make something like that
x<-50
hg<-length(unique(data$name))*x+100
hchart(data, "heatmap", hcaes(x = variable, y = name, value = value)) %>%
hc_colorAxis(stops = color_stops(2, c("red","green")))%>%
hc_size(height = hg)
Where each row in dataset makes chart bigger by 50 points. You can change it in x
This answer uses plotly and hence adding it as another answer. Using the same data as the following one.
library(plotly)
df1 <- as.matrix(df)
p <- plot_ly(x = colnames(df), y = df[,1], z = as.matrix(df[-1]), colors = colorRamp(c("green", "red")), type = "heatmap")
This is much simpler than the ggplot2 in terms of getting the output.
Hope this helps!

Unexpected theme change in ggplot2

I'm getting unexpected behavior in the look of ggplot2. When I plot large amounts of data, it appears the default theme changes from theme_grey to something like theme_bw. I can reproduce this on the particular dataset I'm working on, but cannot reproduce it on simulated data.
At any rate, here's the code:
ggplot(df2, aes(x = Sequence, y = y, color = as.factor(group))) +
geom_point(shape=19, alpha = 0.8)
nrow(df2)
[1] 4330
results in:
Now, if I take a subset of the data:
df3 <- slice(df2, 1:10)
ggplot(df3, aes(x = Sequence, y = y, color = as.factor(group))) +
geom_point(shape=19, alpha = 0.8)
results in:
I have tried:
uninstalling/reinstalling ggplot2
manually specifying a theme
unload all packages except ggplot2
working outside of a project
Sample of 5 obs:
> dput(df2[1:5, ])
structure(list(Sequence = c("1", "2", "3", "4", "5"), group = c(0,
0, 0, 0, 0), y = c(7711.945, 7695.075, 3432.585, 8081.19, 7344.455
)), .Names = c("Sequence", "group", "y"), row.names = c(NA, 5L
), class = "data.frame")
Your input for 'x' is currently stored as a factor (I'm guessing). The following code will reproduce the issue you're having and the final line of converting the x to numeric fixes the issue.
# make some test input
n <- 5000
df <- data.frame(x = factor(1:n), y = rnorm(n), group = sample(0:1, n, replace = T))
library(ggplot2)
# Using the x "as is" which is currently a factor
ggplot(df, aes(x = x, y =y, color = as.factor(group))) + geom_point(shape = 19, alpha = 0.8)
# Converting to numeric we see the desired result
ggplot(df, aes(x = as.numeric(x), y =y, color = as.factor(group))) + geom_point(shape = 19, alpha = 0.8)

Overlaying plots with a horizontal date in R

I was attempting to overlay two plots using ggplot2, I can graph them individually, but I want to overlay them to show a comparison. They have the same y axis. The y axis is a score from 0 to 100, the x axis is a specific date in the month (from a range of 3 weeks)
Here is what I have tried:
data <- read.table(text = Level5avg, header = TRUE)
data2 <- read.table(text = Level6avg, header = TRUE)
colnames(data) = c("x","y")
colnames(data2) = c("x","y")
ggplot(rbind(data.frame(data2, group="a"), data.frame(data, group="b")), aes(x=x,y=y)) +
stat_density2d(geom="tile", aes(fill = group, alpha=..density..), contour=FALSE) + scale_fill_manual(values=c("b"="#FF0000", "a"="#00FF00")) + geom_point() + theme_minimal()
When I do this, I get a strange graph that has several dots, but I'm not sure if my code is right, since I can't distinguish the data. I want to add 3 more (small) datasets to the plot, if it is possible. If it is possible, how do I make it into a line graph in order to distinguish the datasets?
Note: I was under the impression ggplot would work for my purposes because of this post (and several other posts on this site advised using ggplot as opposed to Lattice). I'm not sure if what I want is possible, so I came here.
Data sets:
dput(data) structure(list(x = structure(1:6, .Label = c("10/27/2015",
"10/28/2015",
"10/29/2015", "10/30/2015", "10/31/2015", "11/1/2015"), class = "factor"),
y = c(0, 12.5, 0, 0, 11, 43)), .Names = c("x", "y"), class = "data.frame",
row.names = c(NA, -6L))
dput(data2) structure(list(x = structure(1:3, .Label
=c("10/28/2015","10/31/2015",
"11/1/2015"), class = "factor"), y = c(0, 0, 41.5)), .Names = c("x",
"y"), class = "data.frame", row.names = c(NA, -3L))
I've now managed to get my overlay, but is there a way to organize the horizontal axis? The dates have no order.
It seems to me that the answer that you are basing your plots on uses density plots that are not useful for your data. If you are just looking for some line plots with points, you could do the following (note I created a dataframe outside of the ggplot() call to make it look a little cleaner):
data$group <- "b"
data2$group <- "a"
df <- rbind(data2,data)
df$x <- as.Date(df$x,"%m/%d/%Y")
ggplot(df,aes(x=x,y=y,group=group,color=group)) + geom_line() +
geom_point() + theme_minimal()
Note that by converting the date, the dates end up in the right order all on their own.

Resources