I want have the following data frame
Value Phase
22 1
23 1
40 1
19 2
17 2
16 2
12 3
13 3
14 3
9 4
7 4
6 4
I want to see how the sum of value of a particular phase has changed over different phases. The phase column can range from 1 to 5. I want to see from phase 1 to phase 2 to 3 and so on, is there a decrease or increase in the sum of value of that phase. I want to use the base plotting system. How can I plot the graph so that the changes in each phase are made clear.
Here is how to do a line + scatter plot of the sums of Value for each value in Phase. First you need to aggregate the data by Phase. I'm providing both a base R solution (as you requested) and a ggplot solution.
df <- read.table(text = "Value Phase
22 1
23 1
40 1
19 2
17 2
16 2
12 3
13 3
14 3
9 4
7 4
6 4", header = TRUE)
sums <- aggregate(Value ~ Phase, df, sum, na.rm = TRUE)
png("sums.png", height = 540, width = 540)
plot(sums$Phase, sums$Value, xlab = "Phase", ylab = "Sum of Value")
lines(sums$Phase, sums$Value, type = "l")
dev.off()
# ggplot method
require(ggplot2)
ggplot(sums, aes(x = Phase, y = Value)) + geom_point() + geom_line()
ggsave("sums-ggplot.png")
Related
I have a x by y by z table (see image below). I want to create 15 bar plots in one output arranged as 5 high and 3 across. For each bar plot, I want the x axis to be the labels "Caught", "Deterred" and "Successful", and the y axis to be the frequencies of "Caught", "Deterred" and "Successful". I would like the 5 bar plots high (not sure how to say this more clearly sorry) to represent the label of "Threshold" i.e. either 20, 40, 60, 80 or 90. I would like the 3 column bar plots across (again not sure how to say this properly so forgive me) to represent the label "Model", i.e. either Model_1, Model_2 or Model_3. Thus, for example, the bar plot on row 1 column 1 (of the 5x3 bar plots), should be the "Caught", "Deterred", "Successful" frequencies for when I have Model_1 and Threshold of 20.
Apologies for my poor explanation. Please let me know if anything needs clarifying. Thank you!
Here's a ggplot2 option that makes your "5 tall, 3 wide" simple with faceting:
First, fake data:
set.seed(42)
n <- 500
Models <- table(
Threshold = sample(c(20, 40, 60, 80, 90), size = n, replace = TRUE),
Outcome = sample(c("Caught", "Deferred", "Successful"), size = n, replace = TRUE),
Model = sample(c("Model_1", "Model_2", "Model_3"), size = n, replace = TRUE)
)
Models
# , , Model = Model_1
# Outcome
# Threshold Caught Deferred Successful
# 20 14 15 14
# 40 7 10 15
# 60 16 13 12
# 80 7 11 4
# 90 16 10 10
# , , Model = Model_2
# Outcome
# Threshold Caught Deferred Successful
# 20 14 11 15
# 40 5 12 10
# 60 11 8 13
# 80 7 15 6
# 90 12 13 11
# , , Model = Model_3
# Outcome
# Threshold Caught Deferred Successful
# 20 14 7 10
# 40 14 13 9
# 60 6 12 13
# 80 20 4 12
# 90 10 8 11
One good thing about something created with table is that as.data.frame gives us something easy to work with for ggplot2's "long data" preference:
head(as.data.frame(Models))
# Threshold Outcome Model Freq
# 1 20 Caught Model_1 14
# 2 40 Caught Model_1 7
# 3 60 Caught Model_1 16
# 4 80 Caught Model_1 7
# 5 90 Caught Model_1 16
# 6 20 Deferred Model_1 15
The plot:
library(ggplot2)
ggplot(as.data.frame(Models), aes(Outcome, Freq)) +
geom_bar(stat = "identity") +
facet_grid(Threshold ~ Model)
Note: if your data is naturally an array and not a table, we can easily accommodate that: as.table(ary) converts from an array to a table (which is really just an array underneath):
set.seed(42)
ary <- array(sample(20, size=2*3*3, replace=TRUE), dim = c(2,3,3))
dimnames(ary) <- list(Threshold=c(20,40), Outcome=c("C","D","S"), Model=1:3)
ary
# , , Model = 1
# Outcome
# Threshold C D S
# 20 19 6 13
# 40 19 17 11
# , , Model = 2
# Outcome
# Threshold C D S
# 20 15 14 10
# 40 3 15 15
# , , Model = 3
# Outcome
# Threshold C D S
# 20 19 10 20
# 40 6 19 3
as.table(ary)
### same output as `ary`
head(as.data.frame(as.table(ary)))
# Threshold Outcome Model Freq
# 1 20 C 1 19
# 2 40 C 1 19
# 3 20 D 1 6
# 4 40 D 1 17
# 5 20 S 1 13
# 6 40 S 1 11
I would like to plot the regression line from a glm model (written below). Ideally I'd like to plot it over the observed data, but I haven't been able to adapt the code I've found elsewhere (e.g. predict.glm, Plot predicted probabilities and confidence intervals in r).
Here is a subset of the data :
Pos Tot Age
<int> <int> <int>
1 1 11 1
2 0 1 1
3 3 3 1
4 1 2 1
5 5 7 1
47 13 16 4
48 9 9 4
49 9 10 4
50 14 14 4
158 1 3 2
159 3 5 2
160 0 7 2
161 9 12 2
162 0 2 2
209 0 1 3
210 1 2 3
211 1 1 3
212 2 2 3
Each row represents a unique location. I removed location column to de-identify.
Here is my model:
model1 <- glm(cbind(Tot - Pos, Pos) ~ -1+Age,
family = binomial(link = "log"), data = data.frame)
My goal is to plot the predicted probabilities of different glm models for visual comparison...but for now I can't even figure out how to plot my simplest model.
Edit
Because the response is a two-column matrix, I don't think there is a way to graph in ggplot. Can someone confirm?
I had tried to plot in ggplot, but due to the model response being a two-column matrix, the aesthetics of the plot and of the model did not match:
ggplot(data.frame, aes(x = Age, y = Pos/Tot)) +
geom_jitter(width = 0.05, height = 0.05) +
geom_smooth(method = glm, formula = cbind(Tot-Pos, Pos) ~ -1+Age, se = FALSE)
which returns a scatter plot of the observed values but also gives me the error message:
Warning message:
Computation failed in `stat_smooth()`:
object 'Tot' not found
So I'm now trying to figure out how to plot using the predict function, which I've never done before.
This is what I have so far, adapting from here:
newdata<-data.frame(Age = 1:4)
plot(1:4, predict(model1, newdata, type="link"))
How do I add 95% confidence intervals and transform the data back to a probability scale of 0-1 on the y-axis?
Thanks very much
Here's how to generate the predictions:
pd = data.frame(Age = 1:4)
# use type = "response" for probability-scale predictions
preds = predict(model1, newdata = pd, type = "response", se.fit = TRUE)
pd$fit = preds$fit
pd$se = preds$se.fit
And then plot:
ggplot(dd, aes(x = Age, y = Pos / Tot)) +
geom_point(position = position_jitter(width = 0.05, height = 0.05)) +
geom_ribbon(data = pd, aes(y = fit, ymin = fit - 1.96 * se, ymax = fit + 1.96 * se),
fill = "blue", alpha = 0.3) +
geom_line(data = pd, aes(y = fit))
From the plot, we can see that the model and plot are somewhat contradictory - this is because your model is specified as predicting the probability (Tot - Pos) / Pos, but your plot is showing the complement Pos / Tot, I'd recommend changing one to match the other.
Using this data:
dd = read.table(header = TRUE, text = "Pos Tot Age
1 1 11 1
2 0 1 1
3 3 3 1
4 1 2 1
5 5 7 1
47 13 16 4
48 9 9 4
49 9 10 4
50 14 14 4
158 1 3 2
159 3 5 2
160 0 7 2
161 9 12 2
162 0 2 2
209 0 1 3
210 1 2 3
211 1 1 3
212 2 2 3")
And the model from your question:
model1 <- glm(cbind(Tot - Pos, Pos) ~ -1+Age,
family = binomial(link = "log"), data = dd)
I have the following dataframe t :
name type total
a 1 20
a 1 20
a 3 20
a 2 20
a 3 20
b 1 25
b 2 25
c 5 35
c 5 35
c 6 35
c 1 35
The total is the identical for all the entries with the same name.
I want to plot a stacked barplot with type on the x axis and count of name normalized by the total on the y axis.
I plotted the non normalized plot by the following :
ggplot(t, aes(type,fill= name))+geom_bar() + geom_bar(position="fill")
How can I plot the normalized barplot ? i.e for type = 1 the y axis value would be 2/20 for a and 1/25 for b and 1/35 for c...
My try which did not work:
ggplot(t, aes(type, ..count../t$total[1],fill= name))+geom_bar() + geom_bar(position="fill")
Read in the data
d <- read.table(header = TRUE, text =
'name type total
a 1 20
a 1 20
a 3 20
a 2 20
a 3 20
b 1 25
b 2 25
c 5 35
c 5 35
c 6 35
c 1 35')
It's a bad idea to call it t, since that is the name of the transpose function.
Calculate the fractions
library(dplyr)
d2 <- d %>%
group_by(name, type) %>%
summarize(frac = n() / first(total))
This is much easier to do using the dplyr package.
Make the plot
ggplot(d2, aes(type, frac, fill = name)) +
geom_bar(stat = 'identity')
Result
I have a data frame that looks like the below. I have variables three variables per observation and I would like to create a bar graph per observation for each of these three variables. However, ggplot2 doesn't appear to have a way to specify multiple columns from the same data frame. What is the correct way to graph this data?
Aiming for something similar to the image below from Wikimedia (with a graph for each observation). Source: https://commons.wikimedia.org/wiki/File:Article_count_(en-de-fr).png
x English German French
Sample 1 5 10 14
Sample 2 4 4 14
Sample 3 5 10 53
Don't know why there are 2 row's per x-value.
This makes no sense. What do you want to plot? The sum per A,B,C? The mean?
Assuming you want to take the mean: Just do
dat <- read.table(textConnection(
"x A B C
1 5 10 14
1 4 4 14
2 5 10 14
2 4 4 14
3 5 10 14
3 4 4 14
"), header=TRUE)
dat <- aggregate(. ~ x, data=dat, mean) # instead of mean you can take your function
require(reshape2)
dat_molten <- melt(dat,"x")
require(ggplot2)
ggplot(dat_molten, aes(x=variable, y=value)) +
geom_bar(stat="identity") +
facet_grid(.~x)
It's my first day learning R and ggplot. I've followed some tutorials and would like plots like are generated by the following command:
qplot(age, circumference, data = Orange, geom = c("point", "line"), colour = Tree)
It looks like the figure on this page:
http://www.r-bloggers.com/quick-introduction-to-ggplot2/
I had a handmade test data file I created, which looks like this:
site temp humidity
1 1 1 3
2 1 2 4.5
3 1 12 8
4 1 14 10
5 2 1 5
6 2 3 9
7 2 4 6
8 2 8 7
but when I try to read and plot it with:
test <- read.table('test.data')
qplot(temp, humidity, data = test, color=site, geom = c("point", "line"))
the lines on the plot aren't separate series, but link together:
http://imgur.com/weRaX
What am I doing wrong?
Thanks.
You need to tell ggplot2 how to group the data into separate lines. It's not a mind reader! ;)
dat <- read.table(text = " site temp humidity
1 1 1 3
2 1 2 4.5
3 1 12 8
4 1 14 10
5 2 1 5
6 2 3 9
7 2 4 6
8 2 8 7",sep = "",header = TRUE)
qplot(temp, humidity, data = dat, group = site,color=site, geom = c("point", "line"))
Note that you probably also wanted to do color = factor(site) in order to force a discrete color scale, rather than a continuous one.