Barplots for all levels (even those with no values) in R - r

I'm graphing values associated with a range of factors I got from cut:
a b
1 (25,30] 10
2 (30,35] 313
3 (35,40] 904
4 (40,45] 809
5 (45,50] 608
6 (50,55] 514
7 (55,60] 227
8 (60,65] 323
9 (65,70] 23
10 (70,75] 5
11 (75,80] 1
I graph it with:
plt_tmp = barplot(agg$b)
axis(1, agg$a, at=plt_tmp,las=2)
This would be fine, but the levels are actually (ie levels(agg$a)):
[1] "(0,5]" "(5,10]" "(10,15]" "(15,20]" "(20,25]" "(25,30]" "(30,35]" "(35,40]" "(40,45]"
[10] "(45,50]" "(50,55]" "(55,60]" "(60,65]" "(65,70]" "(70,75]" "(75,80]" "(80,85]" "(85,90]"
[19] "(90,95]" "(95,100]"
And I was hoping I could graph all of them, including the missing ones, as 0 values. How would I go about doing this? Any help would be very much appreciated!

You need to merge all your levels back against your base data, and then pass that data to barplot. With a simplified example:
agg <- data.frame(a=factor(c(1,2,4), levels=1:5), b=c(10,1,20))
with(merge(agg, list(a=levels(agg$a)), all=TRUE), barplot(b, names.arg=a) )

Use package ggplot2 for your barchart. With scale option you can define wether unused levels are plotted or not:
library(ggplot2)
ggplot(agg, aes(x=a, y=b)) +
geom_bar(stat="identity") +
scale_x_discrete(drop=FALSE)

Related

how to add regression lines for each factor on a plot

I've created a model and I'm trying to add curves that fit the two parts of the data, insulation and no insulation. I was thinking about using the insulation coefficient as a true/false term, but I'm not sure how to translate that into code. Entries 1:56 are "w/o" and 57:101 are "w/". I'm not sure how to include the data I'm using but here's the head and tail:
month year kwh days est cost avgT dT.yr kWhd.1 id insulation
1 8 2003 476 21 a 33.32 69 -8 22.66667 1 w/o
2 9 2003 1052 30 e 112.33 73 -1 35.05172 2 w/o
3 10 2003 981 28 a 24.98 60 -6 35.05172 3 w/o
4 11 2003 1094 32 a 73.51 53 2 34.18750 4 w/o
5 12 2003 1409 32 a 93.23 44 6 44.03125 5 w/o
6 1 2004 1083 32 a 72.84 34 3 33.84375 6 w/o
month year kwh days est cost avgT dT.yr kWhd.1 id insulation
96 7 2011 551 29 e 55.56 72 0 19.00000 96 w/
97 8 2011 552 27 a 61.17 78 1 20.44444 97 w/
98 9 2011 666 34 e 73.87 71 -2 19.58824 98 w/
99 10 2011 416 27 a 48.03 64 0 15.40741 99 w/
100 11 2011 653 31 e 72.80 53 1 21.06452 100 w/
101 12 2011 751 33 a 83.94 45 2 22.75758 101 w/
bill$id <- seq(1:101)
bill$insulation <- as.factor(ifelse(bill$id > 56, c("w/"), c("w/o")))
m1 <- lm(kWhd.1 ~ avgT + insulation + I(avgT^2), data=bill)
with(bill, plot(kWhd.1 ~ avgT, xlab="Average Temperature (F)",
ylab="Daily Energy Use (kWh/d)", col=insulation))
no_ins <- data.frame(bill$avgT[1:56], bill$insulation[1:56])
curve(predict(m1, no_ins=x), add=TRUE, col="red")
ins <- data.frame(bill$avgT[57:101], bill$insulation[57:101])
curve(predict(m1, ins=x), add=TRUE, lty=2)
legend("topright", inset=0.01, pch=21, col=c("red", "black"),
legend=c("No Insulation", "Insulation"))
ggplot2 makes this a lot easier than base plotting. Something like this should work:
ggplot(bill, aes(x = avgT, y = kWhd.1, color = insulation)) +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = FALSE) +
geom_point()
In base, I'd create a data frame with point you want to predict on, something like
pred_data = expand.grid(
kWhd.1 = seq(min(bill$kWhd.1), max(bill$kWhd.1), length.out = 100),
insulation = c("w/", "w/o")
)
pred_data$prediction = predict(m1, newdata = pred_data)
And then use lines to add the predictions to your plot. My base graphics is pretty rusty, so I'll leave that to you (or another answerer) if you want it.
In base R it's important to order the x-values. Since this is to be done on multiple factors, we can do this with by, resulting in a list L.
Since your example data is not complete, here's an example with iris where we consider Species as the "factor".
L <- by(iris, iris$Species, function(x) x[order(x$Petal.Length), ])
Now we can do the plot and add loess predictions as lines with a sapply.
with(iris, plot(Sepal.Width ~ Petal.Length, col=Species))
sapply(seq(L), function(x)
lines(L[[x]]$Petal.Length,
predict(loess(Sepal.Width ~ Petal.Length, L[[x]], span=1.1)), # span=1.1 for smoothing
col=x))
Yields

R ggplot ordering bars within groups

I'm attempting to format a grouped bar plot in R with ggplot such that bars are in decreasing order per group. This is my current plot:
based on this data frame:
> top_categories
Category Count Community
1 Singer-Songwriters 151 1
2 Adult Alternative 147 1
3 Dance Pop 95 1
4 Folk 89 1
5 Adult Contemporary 88 1
6 Pop Rap 473 2
7 Gangsta & Hardcore 413 2
8 Soul 175 2
9 East Coast 170 2
10 West Coast 135 2
11 Album-Oriented Rock (AOR) 253 3
12 Singer-Songwriters 217 3
13 Soft Rock 196 3
14 Folk 145 3
15 Adult Contemporary 106 3
16 Soul 278 4
17 Blues 137 4
18 Funk 119 4
19 Quiet Storm 76 4
20 Dance Pop 74 4
21 Indie & Lo-Fi 235 5
22 Indie Rock 234 5
23 Adult Alternative 114 5
24 Alternative Rock 49 5
25 Singer-Songwriters 47 5
created with this code:
ggplot(
top_categories,
aes(
x=Community,
y=Count,
group=Category,
label=Category
)
) +
geom_bar(
stat="identity",
color="black",
fill="#9C27B0",
position="dodge"
) +
geom_text(
angle=90,
position=position_dodge(width=0.9),
hjust=-0.05
) +
ggtitle("Number of Products in each Category in Each Community") +
guides(fill=FALSE)
Based on suggestions from related posts, I've attempted to use the reorder function and turn the Count into a factor, both with results that seem to break the ordering of the bars vs. the text or rescale the plot in a nonsensical way such as this (with factors):
Any tips on how I might accomplish this in-group ordering? Thanks!
When you group by Category, the bars are ordered according to the order of appearance of Categories in the dataframe. This works fine for Community 1 and 2 as your rows are already ordered by decreasing Count. But in Community 3, as Category "Singer-Songwriters" is the first occcurring Category in the dataframe, it is put first.
Grouping instead by an Id variable resolves the problem:
top_categories$Id=rep(c(1:5),5)
ggplot(
top_categories,
aes(
x=Community,
y=Count,
group=Id,
label=Category
)
) +
geom_bar(
stat="identity",
color="black",
fill="#9C27B0",
position="dodge"
) +
geom_text(
angle=90,
position=position_dodge(width=0.9),
hjust=-0.05
) +
ggtitle("Number of Products in each Category in Each Community") +
guides(fill=FALSE)

what is the best way to visualize a table on graph in r

I have the following table(data frame):
week24 week25 week26
under 0.5m 1824 1878 1955
0.5 to 1m 170 205 211
1to3 117 109 124
3to6 19 19 25
6to10 9 8 8
10to15 4 3 5
15to30 9 13 9
above 30m 19 32 28
i am looking for the best way to visualize it on a graph then i can i have row names under 0.5m:above 30m in X axis .
i have already tried barplot() but the results are not that good
how can i make it more informative?
It's not clear to me what exactly are you trying to obtain. Maybe just by adding the legend your graph will be way more descriptive. I've a simple data frame to show what I mean:
df <- data.frame(Z=c(1,2,3),Y=c(2,3,1))
row.names(df) <- c("Cat1","Cat2","Cat3")
barplot(as.matrix(df),
legend.text = row.names(df),
args.legend = list(x = "right"),
col = c("blue","green","red"))
If you want check better colours, check this website: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf

ggplot each group consists of only one observation

I'm trying to make a plot similar to this answer: https://stackoverflow.com/a/4877936/651779
My data frame looks like this:
df2 <- read.table(text='measurements samples value
1 4hours sham1 6
2 1day sham1 175
3 3days sham1 417
4 7days sham1 163
5 14days sham1 37
6 90days sham1 134
7 4hours sham2 8
8 1day sham2 402
9 3days sham2 482
10 7days sham2 67
11 14days sham2 16
12 90days sham2 31
13 4hours sham3 185
14 1day sham3 402
15 3days sham3 482
16 7days sham3 85
17 14days sham3 29
18 90days sham3 10',header=T)
And plot it with
ggplot(df2, aes(measurements, value)) + geom_line(aes(colour = samples))
No lines show in the plot, and I get the message
geom_path: Each group consist of only one observation.
Do you need to adjust the group aesthetic?
I don't see where what I'm doing is different from the answer I linked above. What should I change to make this work?
Add group = samples to the aes of geom_line. This is necessary since you want one line per samples rather than for each data point.
ggplot(df2, aes(measurements, value)) +
geom_line(aes(colour = samples, group = samples))

Split data based on column values and create scatter plot.

I need to make a scatter plot for days vs age for the f group (sex=1) and make another scatter plot for days vs age for the m group (sex=2) using R.
days age sex
306 74 1
455 67 2
1000 55 1
505 65 1
399 54 2
495 66 2
...
How do I extract the data by sex? I know after that to use plot() function to create a scatter plot.
Thank you!
You can do this with the traditional R graphics functions like:
plot(age ~ days, Data[Data$sex == 1, ])
plot(age ~ days, Data[Data$sex == 2, ])
If you prefer to color the points rather than separate the plots (which might be easier to understand) you can do:
plot(age ~ days, Data, col=Data$sex)
However, this kind of plot would be especially easy (and better looking) using ggplot2:
library(ggplot2)
ggplot(Data, aes(x=days, y=age)) + geom_point() + facet_wrap(~sex)
spread splits data by column values. This is also called converting data from "long" to "wide".
I haven't tested this, but something like
spread(data, sex, age)
should get you
days 1 2
306 74 NA
455 NA 67
1000 55 NA
505 65 NA
399 NA 54
495 NA 66

Resources