R multiple line plot using a dataframe [duplicate] - r

This question already has answers here:
Plot multiple lines (data series) each with unique color in R
(10 answers)
Closed 6 years ago.
I can't seem to figure out how to create a graph for multiple line plots.
This is my dataframe:
topics before_event after_event current
1 1 0.057 0.044 0.064
2 2 0.059 0.055 0.052
3 3 0.058 0.037 0.044
4 4 0.036 0.055 0.044
5 5 0.075 0.064 0.066
6 6 0.047 0.045 0.045
7 7 0.043 0.043 0.041
8 8 0.042 0.041 0.046
9 9 0.049 0.046 0.039
10 10 0.043 0.060 0.045
11 11 0.054 0.054 0.062
12 12 0.065 0.056 0.068
13 13 0.042 0.045 0.048
14 14 0.067 0.054 0.055
15 15 0.049 0.052 0.053
The variables in the dataframe are all numeric vectors, example:
topics <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
I know I should be using 'ggplot2' and 'reshape' but I can't seem to work out the correct code to represent topics on the x-axis, the scale 0-1 on the y-axis, and each var (before_event, after_event, current) as three individual lines.
Any help would be really appreciated!

We can use matplot from base R
matplot(df1[,1], df1[-1], type = 'l', xlab = "topics", ylab = "event", col = 2:4, pch = 1)
legend("topright", legend = names(df1)[-1], pch = 1, col=2:4)

we can use ggplot and geom_line
library(ggplot2)
topics <- seq(1,15,1)
before_event <- runif(15, min=0.042, max=0.070)
after_event <- runif(15, min=0.040, max=0.065)
current <- runif(15, min=0.041, max=0.066)
df <- data.frame(topics,before_event,after_event,current) #create data frame from the above vectors
df.m <- melt(df, id.vars="topics") # melt the dataframe using topics as id
# plot the lines using ggplot and geom_line
ggplot(data = df.m,
aes(x = topics, y = value, group = variable, color = variable)) +
geom_line(size = 2)

Related

How to make a stacked Sankey diagram using ggplot in R? [duplicate]

This question already has an answer here:
reshape wide to long using data.table with multiple columns
(1 answer)
Closed last month.
I have this data and I want to create a stacked Sankey Diagram using ggplot. I want to try and recreate it and look like the following picture. What's the best way to go about it?
Risk Factors for Stroke 1990 1995 2000 2005 2010
Obesity 0.001 0.013 0.043 0.077 0.115
Diabetes 0.359 0.316 0.26 0.187 0.092
Smoking 0.171 0.156 0.142 0.128 0.116
Hypercholesterolemia 0.161 0.104 0.045 0.001 0.001
Hypertension 0.654 0.633 0.602 0.561 0.509
I want to recreate this diagram with the data
I tried this so far but I don't think that will make my data the way I want it to.
D2 <- Datatable1 %>% make_long(`Risk Factors for Stroke in Blacks`, `1990`, `1995`, `2000`, `2005`, `2010`)
D2
this looks close enough to get you started...
library(data.table)
library(ggplot2)
library(ggalluvial)
# read sample data
DT <- fread('"Risk Factors for Stroke" 1990 1995 2000 2005 2010
Obesity 0.001 0.013 0.043 0.077 0.115
Diabetes 0.359 0.316 0.26 0.187 0.092
Smoking 0.171 0.156 0.142 0.128 0.116
Hypercholesterolemia 0.161 0.104 0.045 0.001 0.001
Hypertension 0.654 0.633 0.602 0.561 0.509', header = TRUE)
# create workable column-names
setnames(DT, janitor::make_clean_names(names(DT)))
# melt to long format
DT.melt <- melt(DT, id.vars = "risk_factors_for_stroke")
# create variable for sorting the riks by value
DT.melt[order(-value, variable), id := factor(rowid(variable))]
# create plot
ggplot(data = DT.melt,
aes(x = variable, y = value,
stratum = id,
alluvium = risk_factors_for_stroke,
fill = risk_factors_for_stroke,
colour = id,
label = value)) +
geom_flow(stat = "alluvium", lode.guidance = "frontback",
color = "white") +
geom_stratum(color = "white", width = 0.7) +
geom_text(position = position_stack(vjust = 0.5), colour = "white")

Graphing using r and points (pch)

What happens to my R-code? I don't want to see the bar graph and I want to see only the plotting characters (pch):
plot(Graphdata$Sites,ylim=c(-1,2.5), xlab="sites",
ylab="density frequency",lwd=2)
points(Graphdata$A,pch=1,col="blue",lwd=2)
points(Graphdata$B,pch=2,col="green",lwd=2)
points(Graphdata$C,pch=3,col="red",lwd=2)
points(Graphdata$D,pch=4,col="orange",lwd=2)
legend("topright",
legend=c("A","B","C","D"),
col=c("red","blue","green","orange"),lwd=2)
Part of my data looks like:
Sites A B C D
1 A 2.052 2.268 1.828 1.474
2 B 0.549 0.664 0.621 1.921
3 C 0.391 0.482 0.400 0.382
4 D 0.510 0.636 0.497 0.476
5 A 0.214 0.239 0.215 0.211
6 B 1.016 1.362 0.978 0.876
......................................
.....................................
and I want the legend according to pch and not in the form of line.
The barplot is happening because you're plotting the factor of your Sites column. To only plot the points:
library(reshape2)
library(ggplot2)
#plots 6 categorical variables
Graphadata.m = melt(t(Graphdata[-1]))
ggplot(Graphadata.m,aes(Var2,value,group=Var2)) + geom_point() + ylim(-1,2.5)
If you want it to plot 4 categorical variables instead:
Graphdata.1 = t(Graphdata[-1])
colnames(Graphdata.1) = c("A","B","C","D","A","B")
Graphdata.m = melt(Graphdata.1)
ggplot(Graphdata.m,aes(Var2,value,group=Var2)) + geom_point() + ylim(-1,2.5)
EDIT:
In base R:
plot(1,xlim=c(1,4),ylim=c(-1,2.5), xlab="sites",ylab="density frequency",lwd=2,type="n",xaxt = 'n')
axis(1,at=c(1,2,3,4),tick=T,labels=c("A","B","C","D"))
points(Graphdata$A,pch=1,col="blue",lwd=2)
points(Graphdata$B,pch=1,col="green",lwd=2)
points(Graphdata$C,pch=1,col="red",lwd=2)
points(Graphdata$D,pch=1,col="orange",lwd=2)

Marginal densities (or bar plots) on facets in ggplot2

my problem is the following: I have this table below
0 1-5 6-10 11-15 16-20 21-26 27-29
a 0.019 0.300 0.296 0.211 0.117 0.042 0.014
b 0.058 0.448 0.308 0.120 0.042 0.019 0.005
c 0.026 0.277 0.316 0.187 0.105 0.068 0.020
d 0.054 0.297 0.378 0.108 0.108 0.041 0.014
e 0.004 0.252 0.358 0.216 0.102 0.053 0.015
f 0.032 0.097 0.312 0.280 0.161 0.065 0.054
g 0.113 0.500 0.233 0.094 0.043 0.014 0.003
h 0.328 0.460 0.129 0.050 0.020 0.010 0.003
representing some marginal frequencies (by row) for each subgroups of my data (a to h).
My dataset is actually in the long format (very long, counting more than 100 thousand entries), with the first 6 rows as you see below:
RX_SUMM_SURG_PRIM_SITE Nodes.Examined.Class
1 Wedge Resection 1-5
2 Segmental Resection 1-5
3 Lobectomy w/mediastinal LNdissection 6-10
4 Lobectomy w/mediastinal LNdissection 6-10
5 Lobectomy w/mediastinal LNdissection 1-5
6 Lobectomy w/mediastinal LNdissection 11-15
When I plot a barplot by group (the table above is simply the cross tabulation of of these two covariates with the row marginal probabilities taken) here's what happens:
The code I have for this is
ggplot(data.ln.red, aes(x=Nodes.Examined.Class))+geom_bar(aes(x=Nodes.Examined.Class, group=RX_SUMM_SURG_PRIM_SITE))+
facet_grid(RX_SUMM_SURG_PRIM_SITE~.)
Actually I would be very happy only with the marginal frequencies (i.e. the ones in the table) on each y-axis of the facets of the plot (instead of the counts).
Anybody can help me with this?
Thanks for all your help!
EM
geom_bar calculates both counts and proportions of observations. You can access these calculated proportions with either ..prop.. (the old way) or calc(prop) (introduced in newer versions of ggplot2). Use this as your y aesthetic.
You can also get rid of the aes you have in geom_bar, as this is just a repeat of what you've already covered by ggplot and facet_grid.
It looks like your counts/proportions are going to vary widely between groups, so I'm adding free y-scaling to the faceting.
Here's an example of a similar plot with the iris data, which you can model your code off of:
library(tidyverse)
ggplot(iris, aes(x = Sepal.Length, y = calc(prop))) +
geom_bar() +
facet_grid(Species ~ ., scales = "free_y")
Created on 2018-04-06 by the reprex package (v0.2.0).
Edit: the calculated prop variable is proportions within each group, not proportions across all groups, so it works differently when x is a factor. For categorical x, prop treats x as the group; to override this, include group = 0 or some other dummy value in your aes. Sorry I missed that the first time!

Reorder not working in ggplot with multiple facets

I have the data below:
LETTER ID NUMBER
1 A 805qhau1hbnm1 0.001
2 A 47s11wwxy8x7c 0.521
3 A 92g6022uvxtmf 0.036
4 A 92pkgg5y0gvkk 0.002
5 B gxx44abszy02j 0.066
6 B agupupsu0gq26 0.001
7 B 92g6022uvxtmf 0.003
8 B 92g6022uvxtmf 0.003
9 B agupupsu0gq26 0.004
10 B dwvprfgxafqct 0.058
11 B 92pkgg5y0gvkk 0.161
12 B 2264vrpp4b02v 0.444
13 B 92g6022uvxtmf 0.084
14 B 1ypga6ay26dyk 0.018
15 B 9tkrv34jdmvtk 0.414
16 B agupupsu0gq26 0.001
17 B agupupsu0gq26 0.002
18 B gxx44abszy02j 0.065
19 B 0mtz8hnvvm63r 0.012
20 B 9ta79k8xtyzdy 0.006
21 B 92g6022uvxtmf 0.014
22 A 47s11wwxy8x7c 0.539
23 A 92g6022uvxtmf 0.028
24 A 92pkgg5y0gvkk 0.003
25 A 92pkgg5y0gvkk 0.002
26 A 805qhau1hbnm1 0.001
27 A fmubqnkxnj16f 0.451
28 B 448pxv1p0ffjp 0.040
29 B 3cj2kj0rx311k 0.012
30 B 9ta79k8xtyzdy 0.006
31 B gxx44abszy02j 0.064
32 B agupupsu0gq26 0.002
33 B agupupsu0gq26 0.001
34 A 92pkgg5y0gvkk 0.002
35 A 65a353h1x9yfd 0.055
36 B dbrx980zu7bmk 0.009
And I have the ggplot code below:
l_myPlot <- ggplot(data=l_data, aes(x=reorder(x=ID, X=NUMBER, sum, order=T), y = NUMBER))+
geom_bar(stat='identity' )+
facet_wrap(~ LETTER, scales="free_x")+
theme(axis.text.x=element_text(angle=90, hjust=1))+
scale_y_continuous()
As you can see, I am reordering the x axis based on the addition of the number column. The issue is that the sorting is not being done properly. The A facet ID 92...vkk should be the second bar in the order, not the fourth.
My approach is to use factors to order a new identifier created out of LETTER and ID and then use scale_x_discrete(labels =) to alter the x axis labels.
library(ggplot2)
library(dplyr)
# summarise the data
ld <- l_data %>% group_by(LETTER, ID) %>% transmute(sum = sum(NUMBER))
ld <- ld[!duplicated(ld) ,]
# Sort in correct order
ld <- ld[with(ld, order(LETTER, sum)) ,]
# Factor in the sorted order
ld$new_ID <- factor(paste(ld$LETTER, ld$ID),
levels = paste(ld$LETTER, ld$ID))
# Plot
l_myPlot <- ggplot() +
geom_bar( data = ld,
aes(x = new_ID,
y = sum ),
stat = 'identity' ) +
facet_wrap( ~ LETTER
, scales = "free_x"
) +
scale_x_discrete(labels=ld$ID) +
theme ( axis.text.x = element_text( angle = 90, hjust = 1 ) ) +
scale_y_continuous()
l_myPlot

Faceted bar charts from multiple columns in ggplot2

I have some data containing resistance data against three different antibiotics, each stored as a column.
Isolate MIC1 MIC2 MIC3
1 0.008 0.064 0.064
2 0.016 0.250 0.500
3 0.064 0.125 32
...
I've plotted barcharts for each individual antibiotic thus:
ggplot(data, aes(factor(MIC1))) + geom_bar()
(the values are discrete as powers of 2 - 0.008, 0.016, 0.032 etc - plotting as factor evenly spaces the bars, if there more elegant way of doing this, please let me know!)
What I'd really like to do is to have a faceted stack of the three graphs with a shared x-axis.
Is there an easier way to do this than to recode the variables like this:
isolate antibiotic MIC
1 1 0.008
1 2 0.064
1 3 0.064
2 1 0.016
2 2 0.250
2 3 0.500
3 1 0.064
3 2 0.125
3 3 32
...
and then doing it this way?
ggplot(data, aes(factor(MIC))) + geom_bar() + facet_grid(antibiotic ~ .)
Thanks in advance.
Reshaping your data frame to look like your second example is pretty easy using reshape2, to the point where I'm not sure it could really get much easier. Are there any specific problems you would have with this solution?
df = read.table(text="Isolate MIC1 MIC2 MIC3
1 0.008 0.064 0.064
2 0.016 0.250 0.500
3 0.064 0.125 32", header=TRUE)
library(reshape2)
df_melted = melt(df, id.vars="Isolate", variable.name="antibiotic", value.name="MIC")
ggplot(df_melted, aes(factor(MIC))) + geom_bar() + facet_grid(antibiotic ~ .)

Resources