I am trying to plot a stacked bar chart with multiple facets using the code below:
dat <- read.csv(file="./fig1.csv", header=TRUE)
dat2 <- melt(dat, id.var = c("id", "col1", "label"))
ggplot(dat2, aes(x=id, y=value, fill = variable)) +
geom_bar(stat="identity") +
scale_x_discrete(limits=dat2$label) +
facet_grid(. ~ col1) +
geom_col(position = position_stack(reverse = TRUE))
and here is a minimized example of how my data looks like:
id label col1 col2 col3 col4 col5
1 3 1 0.2 0.1 0.1 0.1
2 3 1 0.2 0.1 0.2 0.1
3 4 1 0.2 0.2 0.2 0.1
4 4 1 0.1 0.1 0.2 0.1
5 7 2 0.1 0.1 0.1 0.2
6 8 2 0.2 0.1 0.1 0.1
7 9 2 0.2 0.1 0.2 0.1
8 9 2 0.2 0.2 0.2 0.1
9 9 2 0.1 0.1 0.2 0.1
The problem I have is that the labels do not show up as I expect them. The labels for the facet where col1 is 1 gets repeated for the facet where col1 is 2, which means the labels (7,8,9,9,9) are ignored. Also, when consecutive labels are the same, they only appear once. For instance, when the first label which is 3 appears, the second label which is again 3 is ignored. Does anyone know how I can have the labels as I list them in the label column?
Related
I have a code where I create factors and then want to summarise, have a proportional table and unlikeability calculation:
myvars <- names(Diab[c(17:33)])
Diab[myvars] <- lapply(Diab[myvars], ordered, levels = c("No","Down","Steady","Up"), labels = c("No","Down","Steady","Up"))
summary(Diab$metformin)
round(prop.table(summary(Diab$metformin)),3)
unalike(Diab$metformin)
summary(Diab$repaglinide)
round(prop.table(summary(Diab$repaglinide)),3)
unalike(Diab$repaglinide)
.....
where
myvars
[1] "metformin" "repaglinide" "nateglinide"
[4] "chlorpropamide" "glimepiride" "glipizide"
[7] "glyburide" "tolbutamide" "pioglitazone"
[10] "rosiglitazone" "acarbose" "miglitol"
[13] "tolazamide" "glyburide_metformin" "glipizide_metformin"
[16] "glimepiride_pioglitazone" "insulin"
Instead of coding summary(), round(prop.table()) and unalike() for each of myvars, how can I do this in a loop?
I know I can summary(Diab[myvars]), put the output is in columns and I want to retain the output in rows as follows:
summary(Diab$metformin)
No Down Steady Up
22057 162 5310 275
round(prop.table(summary(Diab$metformin)),3)
No Down Steady Up
0.793 0.006 0.191 0.010
unalike(Diab$metformin)
0.3340651
Thank you in advance for your solutions.
Consider reshaping your wide data to long format and then run table (equivalent to summary.factor) and prop.table. Doing so, you avoid any need for looping. Unfamiliar of definition of unalike, possibly from ragree package, it appears you can pass a data frame with named arguments.
Diab_long <- reshape(Diab[c(17:33)], varying = names(Diab), times = names(Diab),
v.names = "value", timevar = "metric", ids = NULL,
new.row.names = 1:1E4, direction = "long")
tbl <- table(Diab_long)
prop.table(tbl, margin = 1)
ragree::unalike(Diab_long, ...)
To demonstrate with seeded, random data:
Data
set.seed(22620)
lvls <- c("No","Down","Steady","Up")
# DATA FRAME OF ALL FACTORS
Diab <- setNames(data.frame(replicate(17, factor(sample(lvls, 10, replace=TRUE),
levels = c("No","Down","Steady","Up")))),
c("metformin", "repaglinide", "nateglinide",
"chlorpropamide", "glimepiride", "glipizide",
"glyburide", "tolbutamide", "pioglitazone",
"rosiglitazone", "acarbose", "miglitol",
"tolazamide", "glyburide_metformin",
"glipizide_metformin",
"glimepiride_pioglitazone", "insulin"))
# RESHAPE TO LONG
Diab_long <- reshape(Diab, varying = names(Diab), times = names(Diab),
v.names = "value", timevar = "metric", ids = NULL,
new.row.names = 1:1E4, direction = "long")
Output (does not include unalike)
tbl <- table(Diab_long)
tbl
# value
# metric Down No Steady Up
# acarbose 1 2 2 5
# chlorpropamide 4 4 1 1
# glimepiride 6 3 0 1
# glimepiride_pioglitazone 4 0 2 4
# glipizide 4 4 2 0
# glipizide_metformin 2 3 3 2
# glyburide 3 2 3 2
# glyburide_metformin 1 3 6 0
# insulin 1 1 5 3
# metformin 2 2 4 2
# miglitol 1 3 5 1
# nateglinide 6 3 1 0
# pioglitazone 1 4 3 2
# repaglinide 1 4 2 3
# rosiglitazone 1 7 1 1
# tolazamide 2 4 1 3
# tolbutamide 3 3 2 2
ptbl <- prop.table(tbl, margin = 1)
ptbl
# value
# metric Down No Steady Up
# acarbose 0.1 0.2 0.2 0.5
# chlorpropamide 0.4 0.4 0.1 0.1
# glimepiride 0.6 0.3 0.0 0.1
# glimepiride_pioglitazone 0.4 0.0 0.2 0.4
# glipizide 0.4 0.4 0.2 0.0
# glipizide_metformin 0.2 0.3 0.3 0.2
# glyburide 0.3 0.2 0.3 0.2
# glyburide_metformin 0.1 0.3 0.6 0.0
# insulin 0.1 0.1 0.5 0.3
# metformin 0.2 0.2 0.4 0.2
# miglitol 0.1 0.3 0.5 0.1
# nateglinide 0.6 0.3 0.1 0.0
# pioglitazone 0.1 0.4 0.3 0.2
# repaglinide 0.1 0.4 0.2 0.3
# rosiglitazone 0.1 0.7 0.1 0.1
# tolazamide 0.2 0.4 0.1 0.3
# tolbutamide 0.3 0.3 0.2 0.2
Online Demo
I got a plot where on x-axis there are negative values from -0.15 to -1, but I need them from -1 to 0.
I plotted values (both positive and negative) by geom_bar in ggplot function. I got a plot where on x-axis there are negative values from -0.15 to -1, but I need them from -1 to 0.
Could you help how to fix it?
data frame looks like:
id value33333
<dbl> <chr>
1 -0.6
2 -0.8
3 -1
4 -0.2
5 -1
6 0.4
7 -1
8 -1
9 -0.6
10 0.1
11 -0.6
12 -1
13 0.1
14 0.15
15 0.5
16 0.4
17 -0.95
18 0.5
19 -0.6
20 0.05
I need to plot value33333 on x-axis and percent on y axis.
Thanks a lot!
ggplot(data = value33333) + geom_bar(mapping = aes(x = value33333, y = ..prop.., group = 1), stat = "count") +
scale_y_continuous(labels = scales::percent_format()) + theme_bw()
Using xlim(-1.1,0) (-1.1 to include the last bar) works without errors.
head(value33333)
interviewer internalID value
1 Nuriya 3 -0.6
2 Nuriya 5 -0.8
3 Nuriya 7 -1.0
4 Nuriya 9 -0.2
5 Nuriya 11 -1.0
6 Nuriya 13 0.4
ggplot(data = value33333) +
geom_bar(aes(x = value, y = ..prop.., group = 1), stat = "count") +
scale_y_continuous(labels = scales::percent_format()) + theme_bw() +
xlim(-1.1,0)
I have looked at all barplot questions in the sites but still couldn't figure out what to do with my dataset. I don't know if it's a duplicate but any help would be so much appreciated
My dataset
Region Scenario HC NPV1 NPV2
C 1 0.1 10 5
C 2 0.2 8 4
C 3 0.3 7 3
C 4 0.4 6 2
N 1 0.1 10 5
N 2 0.2 8 4
N 3 0.3 7 3
N 4 0.4 6 2
W 1 0.1 10 5
W 2 0.2 8 4
W 3 0.3 7 3
W 4 0.4 6 2
I want to create a barplot where HC, Scenario is at x-axis, NPV1 and NPV2 is the height and be distinguished by different patterns. A region should be a common name in the middle of each 4 scenarios.
Thanks a lot.
Expected output is something like this.
Further to my above comments, I'm quite unclear about how you'd like to visualise your data. What exactly would you like to show on the x axis?
As a start, perhaps you are after something like this?
library(tidyverse)
df %>%
gather(key, val, -Region, -Scenario, -HC) %>%
unite(x, Region, Scenario, HC) %>%
ggplot(aes(x, val, fill = key)) +
geom_col()
Here categories on the x-axis are of the form <Region>_<Scenario>_<HC>.
Update
To achieve a plot similar to the one you're showing you can do the following
library(tidyverse)
df %>%
gather(key, val, -Region, -Scenario, -HC) %>%
ggplot(aes(HC, val, fill = key)) +
geom_col(position = "dodge2") +
facet_wrap(~Region, nrow = 1, strip.position = "bottom") +
theme_minimal() +
theme(strip.placement = "outside")
Explanation: strip.position = "bottom" ensures that strip labels are at the bottom, and strip.placement = "outside" ensures that strip labels are below the axis labels (to be precise, between the axis labels and axis title).
Sample data
df <- read.table(text =
"Region Scenario HC NPV1 NPV2
C 1 0.1 10 5
C 2 0.2 8 4
C 3 0.3 7 3
C 4 0.4 6 2
N 1 0.1 10 5
N 2 0.2 8 4
N 3 0.3 7 3
N 4 0.4 6 2
W 1 0.1 10 5
W 2 0.2 8 4
W 3 0.3 7 3
W 4 0.4 6 2
", header = T)
I am relatively new to R and am struggling to remove the column names for this graph. Here is a small sample of the 4417 row data which contains 3 trials and 8 tests. I have used row.names=FALSE, which doesn't remove their names from the graph.
Test TestNumber Display Trial TrueValue Subject Response
Vertical Distance, Aligned 1 1 B 0.6 1 0.6
Vertical Distance, Aligned 1 1 B 0.6 2 0.55
Vertical Distance, Aligned 1 1 B 0.6 3 0.7
Vertical Distance, Aligned 1 1 B 0.6 4 0.6
Vertical Distance, Aligned 1 1 B 0.6 5 0.65
Vertical Distance, Aligned 1 1 B 0.6 6 0.6
Vertical Distance, Aligned 1 1 B 0.6 7 0.5
Vertical Distance, Aligned 1 1 B 0.6 8 0.65
Vertical Distance, Aligned 1 1 B 0.6 9 0.5
ggplot(ds, aes(x=factor(Response),
y=TrueValue,
row.names=FALSE,
color=Trial,sd(x)))
+ geom_boxplot(notch=FALSE)
+ scale_y_continuous("Response")
+ scale_x_discrete('Trial')
+ theme_bw()
+ theme(axis.text.x=element_text(angle = -90, hjust = 0))
+ theme(text=element_text(size=10, family="Arial"))
+ ggtitle('Trial Median Comparison \n to Look for Over Estimation')
I divided the value of X into 5 boxes and calculated its joint probabilities.
In the example below, since there are lots of 2s in X, in the end I only have 4 boxes.
Example:
X <-c(1,2,2,2,2,3,4,5,6,7)
Y <-c(0,1,1,1,0,1,0,1,0,1)
qX=quantile(X, 1:4/5) # find quantiles 20%,40%,60%,80%
qY=c(0,1)
dX=findInterval(X,qX,rightmost.closed=TRUE)
dY=findInterval(Y,qY+0.001,rightmost.closed=TRUE)
pXY=xtabs(~dX+dY)/10 # joint distribution
rownames(pXY) <- paste("box",1:dim(pXY)[1],sep="")
> pXY
dY
dX 0 1
box1 0.1 0.0
box2 0.1 0.4
box3 0.1 0.1
box4 0.1 0.1
Now I want to add one more column for the range of X in each box.
The desired table will be:
box1 [1,1] 0.1 0.0
box2 [2,3] 0.1 0.4
box3 [4,5] 0.1 0.1
box4 [6,7] 0.1 0.1
The output of xtabs or table is somewhat messy to add to. I would convert to matrix:
pXY2 <- pXY; class(pXY2) <- "matrix"
data.frame(r=t(sapply(split(X,dX),range)),pXY2)
# r.1 r.2 X0 X1
# 0 1 1 0.1 0.0
# 2 2 3 0.1 0.4
# 3 4 5 0.1 0.1
# 4 6 7 0.1 0.1
Given the cutpoints used to make dX, the values of the boxes really are 0,2,3,4, not 1,2,3,4.
If you want to print the range with special formatting, consider writing a custom function:
brackem <- function(x) paste0("[",x[1],",",x[2],"]")
data.frame(r=tapply(X,dX,function(z)brackem(range(z))),pXY2)
# r X0 X1
# 0 [1,1] 0.1 0.0
# 2 [2,3] 0.1 0.4
# 3 [4,5] 0.1 0.1
# 4 [6,7] 0.1 0.1