This question already has an answer here:
reshape wide to long using data.table with multiple columns
(1 answer)
Closed last month.
I have this data and I want to create a stacked Sankey Diagram using ggplot. I want to try and recreate it and look like the following picture. What's the best way to go about it?
Risk Factors for Stroke 1990 1995 2000 2005 2010
Obesity 0.001 0.013 0.043 0.077 0.115
Diabetes 0.359 0.316 0.26 0.187 0.092
Smoking 0.171 0.156 0.142 0.128 0.116
Hypercholesterolemia 0.161 0.104 0.045 0.001 0.001
Hypertension 0.654 0.633 0.602 0.561 0.509
I want to recreate this diagram with the data
I tried this so far but I don't think that will make my data the way I want it to.
D2 <- Datatable1 %>% make_long(`Risk Factors for Stroke in Blacks`, `1990`, `1995`, `2000`, `2005`, `2010`)
D2
this looks close enough to get you started...
library(data.table)
library(ggplot2)
library(ggalluvial)
# read sample data
DT <- fread('"Risk Factors for Stroke" 1990 1995 2000 2005 2010
Obesity 0.001 0.013 0.043 0.077 0.115
Diabetes 0.359 0.316 0.26 0.187 0.092
Smoking 0.171 0.156 0.142 0.128 0.116
Hypercholesterolemia 0.161 0.104 0.045 0.001 0.001
Hypertension 0.654 0.633 0.602 0.561 0.509', header = TRUE)
# create workable column-names
setnames(DT, janitor::make_clean_names(names(DT)))
# melt to long format
DT.melt <- melt(DT, id.vars = "risk_factors_for_stroke")
# create variable for sorting the riks by value
DT.melt[order(-value, variable), id := factor(rowid(variable))]
# create plot
ggplot(data = DT.melt,
aes(x = variable, y = value,
stratum = id,
alluvium = risk_factors_for_stroke,
fill = risk_factors_for_stroke,
colour = id,
label = value)) +
geom_flow(stat = "alluvium", lode.guidance = "frontback",
color = "white") +
geom_stratum(color = "white", width = 0.7) +
geom_text(position = position_stack(vjust = 0.5), colour = "white")
Related
What happens to my R-code? I don't want to see the bar graph and I want to see only the plotting characters (pch):
plot(Graphdata$Sites,ylim=c(-1,2.5), xlab="sites",
ylab="density frequency",lwd=2)
points(Graphdata$A,pch=1,col="blue",lwd=2)
points(Graphdata$B,pch=2,col="green",lwd=2)
points(Graphdata$C,pch=3,col="red",lwd=2)
points(Graphdata$D,pch=4,col="orange",lwd=2)
legend("topright",
legend=c("A","B","C","D"),
col=c("red","blue","green","orange"),lwd=2)
Part of my data looks like:
Sites A B C D
1 A 2.052 2.268 1.828 1.474
2 B 0.549 0.664 0.621 1.921
3 C 0.391 0.482 0.400 0.382
4 D 0.510 0.636 0.497 0.476
5 A 0.214 0.239 0.215 0.211
6 B 1.016 1.362 0.978 0.876
......................................
.....................................
and I want the legend according to pch and not in the form of line.
The barplot is happening because you're plotting the factor of your Sites column. To only plot the points:
library(reshape2)
library(ggplot2)
#plots 6 categorical variables
Graphadata.m = melt(t(Graphdata[-1]))
ggplot(Graphadata.m,aes(Var2,value,group=Var2)) + geom_point() + ylim(-1,2.5)
If you want it to plot 4 categorical variables instead:
Graphdata.1 = t(Graphdata[-1])
colnames(Graphdata.1) = c("A","B","C","D","A","B")
Graphdata.m = melt(Graphdata.1)
ggplot(Graphdata.m,aes(Var2,value,group=Var2)) + geom_point() + ylim(-1,2.5)
EDIT:
In base R:
plot(1,xlim=c(1,4),ylim=c(-1,2.5), xlab="sites",ylab="density frequency",lwd=2,type="n",xaxt = 'n')
axis(1,at=c(1,2,3,4),tick=T,labels=c("A","B","C","D"))
points(Graphdata$A,pch=1,col="blue",lwd=2)
points(Graphdata$B,pch=1,col="green",lwd=2)
points(Graphdata$C,pch=1,col="red",lwd=2)
points(Graphdata$D,pch=1,col="orange",lwd=2)
Hi I have a dataframe df as below.
I would like to make a facet plot that shows relation between columns A & B, A & C, A & D , B & C and C & D and overlay a regression line and person's correlation coefficient value.
I am trying to make a facet plot to show relation between each of these column could not figure out exactly how.
Any help would be appreciated. This question is unique in SO as there are not any ans for plot among columns.
df<- read.table(text =c("A B C D
0.451 0.333 0.034 0.173
0.491 0.27 0.033 0.207
0.389 0.249 0.084 0.271
0.425 0.819 0.077 0.281
0.457 0.429 0.053 0.386
0.436 0.524 0.049 0.249
0.423 0.27 0.093 0.279
0.463 0.315 0.019 0.204
"), header = T)
df
pl<-ggplot(data=df) + geom_point(aes(x=A,y=B,size=10)) +
geom_point(aes(x=A,y=C,size=10)) +
geom_point(aes(x=A,y=C,size=10)) +
geom_point(aes(x=A,y=D,size=10)) +
geom_smooth(method = "lm", se=FALSE, color="black")
pl
The project: I am currently trying to build a Shiny App that allows users to generate charts where they can modify time period examined, select variables to look at by time period and country of interest.
I currently built the app with no issues, however I am struggling to add confidence intervals. I believe this is because the data has been melted from a long to a wide format. See random sample of data below:
country_name year variable value variable_low value_low variable_high value_high
Uruguay 2002 v2x_delibdem 0.851 v2x_partipdem_codelow 0.724 v2x_partipdem_codehigh 0.796
Pakistan 2014 v2x_delibdem 0.248 v2x_egaldem_codelow 0.119 v2x_libdem_codehigh 0.312
Costa Rica 1992 v2x_delibdem 0.864 v2x_polyarchy_codelow 0.882 v2x_partipdem_codehigh 0.691
Botswana 2005 v2x_delibdem 0.527 v2x_libdem_codelow 0.518 v2x_libdem_codehigh 0.626
Brazil 1979 v2x_egaldem 0.116 v2x_partipdem_codelow 0.147 v2x_delibdem_codehigh 0.105
Uruguay 1975 v2x_egaldem 0.207 v2x_egaldem_codelow 0.178 v2x_polyarchy_codehigh 0.117
Niger 1970 v2x_libdem 0.154 v2x_egaldem_codelow 0.149 v2x_libdem_codehigh 0.176
Romania 2009 v2x_partipdem 0.478 v2x_partipdem_codelow 0.422 v2x_polyarchy_codehigh 0.717
Thailand 1997 v2x_partipdem 0.338 v2x_polyarchy_codelow 0.481 v2x_polyarchy_codehigh 0.617
Peru 1975 v2x_polyarchy 0.104 v2x_egaldem_codelow 0.076 v2x_partipdem_codehigh 0.078
The value_low and value_high are the melted values for variable_low and variable_high. These are the upper and lower bounds for the confidence intervals.
When I run this ggplot2 code:
myColors <- brewer.pal(5,"Set1")
names(myColors) <- levels(dat$variable)
colScale <- scale_colour_manual(name = "Variable",values = myColors)
ggplot(filter(dat, country_name == "Argentina", variable %in%c("v2x_delibdem","v2x_egaldem"))) +
geom_line(aes(x=year, y=value, group = variable, color = variable), size = 1)+
geom_point(aes(x=year, y=value, group = variable, color = variable), size = 3)+
geom_ribbon(aes(x=year, ymin=value_low, ymax=value_high, group = variable, color = variable), linetype=2, alpha=0.1)+
colScale +
scale_x_continuous(breaks=pretty_breaks(n=10),limits=c(1970,2015))+
scale_y_continuous("Index Score", breaks = scales::pretty_breaks(n = 10),limits=c(0, 1))+
theme(text = element_text(size=15))
This is the result:
Clearly my approach isn't working. I was wondering if anyone had experience adding a confidence interval ribbons around a geom_line chart which plots multiple variables.
This question already has answers here:
Plot multiple lines (data series) each with unique color in R
(10 answers)
Closed 6 years ago.
I can't seem to figure out how to create a graph for multiple line plots.
This is my dataframe:
topics before_event after_event current
1 1 0.057 0.044 0.064
2 2 0.059 0.055 0.052
3 3 0.058 0.037 0.044
4 4 0.036 0.055 0.044
5 5 0.075 0.064 0.066
6 6 0.047 0.045 0.045
7 7 0.043 0.043 0.041
8 8 0.042 0.041 0.046
9 9 0.049 0.046 0.039
10 10 0.043 0.060 0.045
11 11 0.054 0.054 0.062
12 12 0.065 0.056 0.068
13 13 0.042 0.045 0.048
14 14 0.067 0.054 0.055
15 15 0.049 0.052 0.053
The variables in the dataframe are all numeric vectors, example:
topics <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
I know I should be using 'ggplot2' and 'reshape' but I can't seem to work out the correct code to represent topics on the x-axis, the scale 0-1 on the y-axis, and each var (before_event, after_event, current) as three individual lines.
Any help would be really appreciated!
We can use matplot from base R
matplot(df1[,1], df1[-1], type = 'l', xlab = "topics", ylab = "event", col = 2:4, pch = 1)
legend("topright", legend = names(df1)[-1], pch = 1, col=2:4)
we can use ggplot and geom_line
library(ggplot2)
topics <- seq(1,15,1)
before_event <- runif(15, min=0.042, max=0.070)
after_event <- runif(15, min=0.040, max=0.065)
current <- runif(15, min=0.041, max=0.066)
df <- data.frame(topics,before_event,after_event,current) #create data frame from the above vectors
df.m <- melt(df, id.vars="topics") # melt the dataframe using topics as id
# plot the lines using ggplot and geom_line
ggplot(data = df.m,
aes(x = topics, y = value, group = variable, color = variable)) +
geom_line(size = 2)
I want to use a simpler way to get the overlap of two missing variables and construct a heatmap similar to correlation matrix. The data I have is as below:
set.seed(123)
data = data.frame(id = 1:1000, age_missing = sample(c(0,1),1000, replace = TRUE), salary_missing = sample(c(0,1),1000, replace = TRUE),
address_missing = sample(c(0,1),1000, replace = TRUE),
gender_missing =sample(c(0,1),1000, replace = TRUE) )
The ideal output is
|var1 | var2| Missing Percent|
------------------------------
age age 0.5
age gender 0.05
age address 0.08
gender gender 0.15
gender age 0.05
Maybe something along the lines of
dd <- as.matrix(data[,2:5])
crossprod(dd) / nrow(dd)
which yields
age_missing salary_missing address_missing
age_missing 0.493 0.231 0.251
salary_missing 0.231 0.497 0.248
address_missing 0.251 0.248 0.494
gender_missing 0.244 0.271 0.247
gender_missing
age_missing 0.244
salary_missing 0.271
address_missing 0.247
gender_missing 0.506