Graphing using r and points (pch) - r

What happens to my R-code? I don't want to see the bar graph and I want to see only the plotting characters (pch):
plot(Graphdata$Sites,ylim=c(-1,2.5), xlab="sites",
ylab="density frequency",lwd=2)
points(Graphdata$A,pch=1,col="blue",lwd=2)
points(Graphdata$B,pch=2,col="green",lwd=2)
points(Graphdata$C,pch=3,col="red",lwd=2)
points(Graphdata$D,pch=4,col="orange",lwd=2)
legend("topright",
legend=c("A","B","C","D"),
col=c("red","blue","green","orange"),lwd=2)
Part of my data looks like:
Sites A B C D
1 A 2.052 2.268 1.828 1.474
2 B 0.549 0.664 0.621 1.921
3 C 0.391 0.482 0.400 0.382
4 D 0.510 0.636 0.497 0.476
5 A 0.214 0.239 0.215 0.211
6 B 1.016 1.362 0.978 0.876
......................................
.....................................
and I want the legend according to pch and not in the form of line.

The barplot is happening because you're plotting the factor of your Sites column. To only plot the points:
library(reshape2)
library(ggplot2)
#plots 6 categorical variables
Graphadata.m = melt(t(Graphdata[-1]))
ggplot(Graphadata.m,aes(Var2,value,group=Var2)) + geom_point() + ylim(-1,2.5)
If you want it to plot 4 categorical variables instead:
Graphdata.1 = t(Graphdata[-1])
colnames(Graphdata.1) = c("A","B","C","D","A","B")
Graphdata.m = melt(Graphdata.1)
ggplot(Graphdata.m,aes(Var2,value,group=Var2)) + geom_point() + ylim(-1,2.5)
EDIT:
In base R:
plot(1,xlim=c(1,4),ylim=c(-1,2.5), xlab="sites",ylab="density frequency",lwd=2,type="n",xaxt = 'n')
axis(1,at=c(1,2,3,4),tick=T,labels=c("A","B","C","D"))
points(Graphdata$A,pch=1,col="blue",lwd=2)
points(Graphdata$B,pch=1,col="green",lwd=2)
points(Graphdata$C,pch=1,col="red",lwd=2)
points(Graphdata$D,pch=1,col="orange",lwd=2)

Related

How to make a stacked Sankey diagram using ggplot in R? [duplicate]

This question already has an answer here:
reshape wide to long using data.table with multiple columns
(1 answer)
Closed last month.
I have this data and I want to create a stacked Sankey Diagram using ggplot. I want to try and recreate it and look like the following picture. What's the best way to go about it?
Risk Factors for Stroke 1990 1995 2000 2005 2010
Obesity 0.001 0.013 0.043 0.077 0.115
Diabetes 0.359 0.316 0.26 0.187 0.092
Smoking 0.171 0.156 0.142 0.128 0.116
Hypercholesterolemia 0.161 0.104 0.045 0.001 0.001
Hypertension 0.654 0.633 0.602 0.561 0.509
I want to recreate this diagram with the data
I tried this so far but I don't think that will make my data the way I want it to.
D2 <- Datatable1 %>% make_long(`Risk Factors for Stroke in Blacks`, `1990`, `1995`, `2000`, `2005`, `2010`)
D2
this looks close enough to get you started...
library(data.table)
library(ggplot2)
library(ggalluvial)
# read sample data
DT <- fread('"Risk Factors for Stroke" 1990 1995 2000 2005 2010
Obesity 0.001 0.013 0.043 0.077 0.115
Diabetes 0.359 0.316 0.26 0.187 0.092
Smoking 0.171 0.156 0.142 0.128 0.116
Hypercholesterolemia 0.161 0.104 0.045 0.001 0.001
Hypertension 0.654 0.633 0.602 0.561 0.509', header = TRUE)
# create workable column-names
setnames(DT, janitor::make_clean_names(names(DT)))
# melt to long format
DT.melt <- melt(DT, id.vars = "risk_factors_for_stroke")
# create variable for sorting the riks by value
DT.melt[order(-value, variable), id := factor(rowid(variable))]
# create plot
ggplot(data = DT.melt,
aes(x = variable, y = value,
stratum = id,
alluvium = risk_factors_for_stroke,
fill = risk_factors_for_stroke,
colour = id,
label = value)) +
geom_flow(stat = "alluvium", lode.guidance = "frontback",
color = "white") +
geom_stratum(color = "white", width = 0.7) +
geom_text(position = position_stack(vjust = 0.5), colour = "white")

Marginal densities (or bar plots) on facets in ggplot2

my problem is the following: I have this table below
0 1-5 6-10 11-15 16-20 21-26 27-29
a 0.019 0.300 0.296 0.211 0.117 0.042 0.014
b 0.058 0.448 0.308 0.120 0.042 0.019 0.005
c 0.026 0.277 0.316 0.187 0.105 0.068 0.020
d 0.054 0.297 0.378 0.108 0.108 0.041 0.014
e 0.004 0.252 0.358 0.216 0.102 0.053 0.015
f 0.032 0.097 0.312 0.280 0.161 0.065 0.054
g 0.113 0.500 0.233 0.094 0.043 0.014 0.003
h 0.328 0.460 0.129 0.050 0.020 0.010 0.003
representing some marginal frequencies (by row) for each subgroups of my data (a to h).
My dataset is actually in the long format (very long, counting more than 100 thousand entries), with the first 6 rows as you see below:
RX_SUMM_SURG_PRIM_SITE Nodes.Examined.Class
1 Wedge Resection 1-5
2 Segmental Resection 1-5
3 Lobectomy w/mediastinal LNdissection 6-10
4 Lobectomy w/mediastinal LNdissection 6-10
5 Lobectomy w/mediastinal LNdissection 1-5
6 Lobectomy w/mediastinal LNdissection 11-15
When I plot a barplot by group (the table above is simply the cross tabulation of of these two covariates with the row marginal probabilities taken) here's what happens:
The code I have for this is
ggplot(data.ln.red, aes(x=Nodes.Examined.Class))+geom_bar(aes(x=Nodes.Examined.Class, group=RX_SUMM_SURG_PRIM_SITE))+
facet_grid(RX_SUMM_SURG_PRIM_SITE~.)
Actually I would be very happy only with the marginal frequencies (i.e. the ones in the table) on each y-axis of the facets of the plot (instead of the counts).
Anybody can help me with this?
Thanks for all your help!
EM
geom_bar calculates both counts and proportions of observations. You can access these calculated proportions with either ..prop.. (the old way) or calc(prop) (introduced in newer versions of ggplot2). Use this as your y aesthetic.
You can also get rid of the aes you have in geom_bar, as this is just a repeat of what you've already covered by ggplot and facet_grid.
It looks like your counts/proportions are going to vary widely between groups, so I'm adding free y-scaling to the faceting.
Here's an example of a similar plot with the iris data, which you can model your code off of:
library(tidyverse)
ggplot(iris, aes(x = Sepal.Length, y = calc(prop))) +
geom_bar() +
facet_grid(Species ~ ., scales = "free_y")
Created on 2018-04-06 by the reprex package (v0.2.0).
Edit: the calculated prop variable is proportions within each group, not proportions across all groups, so it works differently when x is a factor. For categorical x, prop treats x as the group; to override this, include group = 0 or some other dummy value in your aes. Sorry I missed that the first time!

Removing outliers from facet_wrap boxplots in ggplot

How can I change the y axis to exclude outliers (not just hide them but scale the y axis so as not to include them) for geom_boxplot with multiple individual boxplots using facet_wrap? An example of my dataset is:
Pop. grp1 grp2 grp3 grp4 grp5 grp6 grp7 grp8
a 0.00652 1.27 0.169 0.859 0.388 0.521 3.58 0.0912
a 0.0133 0.136 0.154 0.167 0.845 0.159 0.561 0.108
a 0.0270 1.60 0.119 0.515 0.0386 0.0145 0.884 0.0155
b 0.00846 0.331 0.100 0.897 0.330 2.52 0.663 0.0338
b 0.0154 0.0997 0.122 0.0873 0.905 0.136 0.413 0.139
b 0.0353 0.536 0.171 0.471 0.0280 0.00608 0.414 0.00973
where I'd like to make a boxplot for each column showing populations a and b.
I've melted the data by population and then used geom_boxplot + facet_wrap but some outliers are so far above the whiskers that the boxes themselves barely show. The code I've used is:
wc.m <- melt(w_c_diff_ab, id.var="Pop.")
p.wc <- ggplot(data = wc.m, aes(x=variable, y=value)) + geom_boxplot(aes(fill=Population))
p.wc + facet_wrap( ~ variable, scales="free") + scale_fill_manual(values=c("skyblue", violetred1"))
but I'm struggling to remove outliers as I'm not sure how to calculate limits for the y axes on a per-boxplot basis.

multiple columns plot with correlation value in ggplot2

Hi I have a dataframe df as below.
I would like to make a facet plot that shows relation between columns A & B, A & C, A & D , B & C and C & D and overlay a regression line and person's correlation coefficient value.
I am trying to make a facet plot to show relation between each of these column could not figure out exactly how.
Any help would be appreciated. This question is unique in SO as there are not any ans for plot among columns.
df<- read.table(text =c("A B C D
0.451 0.333 0.034 0.173
0.491 0.27 0.033 0.207
0.389 0.249 0.084 0.271
0.425 0.819 0.077 0.281
0.457 0.429 0.053 0.386
0.436 0.524 0.049 0.249
0.423 0.27 0.093 0.279
0.463 0.315 0.019 0.204
"), header = T)
df
pl<-ggplot(data=df) + geom_point(aes(x=A,y=B,size=10)) +
geom_point(aes(x=A,y=C,size=10)) +
geom_point(aes(x=A,y=C,size=10)) +
geom_point(aes(x=A,y=D,size=10)) +
geom_smooth(method = "lm", se=FALSE, color="black")
pl

R multiple line plot using a dataframe [duplicate]

This question already has answers here:
Plot multiple lines (data series) each with unique color in R
(10 answers)
Closed 6 years ago.
I can't seem to figure out how to create a graph for multiple line plots.
This is my dataframe:
topics before_event after_event current
1 1 0.057 0.044 0.064
2 2 0.059 0.055 0.052
3 3 0.058 0.037 0.044
4 4 0.036 0.055 0.044
5 5 0.075 0.064 0.066
6 6 0.047 0.045 0.045
7 7 0.043 0.043 0.041
8 8 0.042 0.041 0.046
9 9 0.049 0.046 0.039
10 10 0.043 0.060 0.045
11 11 0.054 0.054 0.062
12 12 0.065 0.056 0.068
13 13 0.042 0.045 0.048
14 14 0.067 0.054 0.055
15 15 0.049 0.052 0.053
The variables in the dataframe are all numeric vectors, example:
topics <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
I know I should be using 'ggplot2' and 'reshape' but I can't seem to work out the correct code to represent topics on the x-axis, the scale 0-1 on the y-axis, and each var (before_event, after_event, current) as three individual lines.
Any help would be really appreciated!
We can use matplot from base R
matplot(df1[,1], df1[-1], type = 'l', xlab = "topics", ylab = "event", col = 2:4, pch = 1)
legend("topright", legend = names(df1)[-1], pch = 1, col=2:4)
we can use ggplot and geom_line
library(ggplot2)
topics <- seq(1,15,1)
before_event <- runif(15, min=0.042, max=0.070)
after_event <- runif(15, min=0.040, max=0.065)
current <- runif(15, min=0.041, max=0.066)
df <- data.frame(topics,before_event,after_event,current) #create data frame from the above vectors
df.m <- melt(df, id.vars="topics") # melt the dataframe using topics as id
# plot the lines using ggplot and geom_line
ggplot(data = df.m,
aes(x = topics, y = value, group = variable, color = variable)) +
geom_line(size = 2)

Resources