Remove a specific x value from ggplot [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I want to remove the points 4 and 5 from the x-axis of the plot I generated using ggplot. Currently my x-values only include 0, 1, 2, 3, and 6.
Here is the my.data data frame:
x y
1 2 0.1250000
2 0 0.3750000
3 0 0.3500000
4 0 0.6060606
5 1 0.7000000
6 0 0.6000000
7 0 0.4500000
8 6 0.9500000
9 0 0.7000000
10 3 0.5000000
11 0 0.6000000
12 3 0.1250000
13 0 0.3750000
14 0 0.3333333
15 1 0.6818182
16 0 0.0000000
17 2 0.5000000
Code:
ggplot(my.data, aes(x,y)) + geom_point()+geom_smooth()
Here is the plot that is generated:
Thanks!

For example (using mtcars). This is zooming in, i.e. the stats are nor influenced by the reduction of the data:
ggplot(mtcars, aes(mpg,qsec)) + geom_point()+geom_smooth() + coord_cartesian(xlim = c(10, 25))

Related

Calculating sample proportions in R using the table function [duplicate]

This question already has answers here:
How does the `prop.table()` function work in r?
(3 answers)
Closed 2 years ago.
I've got a model with 9 covariates and below is an example of one of the tables that it used to calculate the "yes"(1) and no(0) responses of a dataset,
table(wbca1$y,wbca1$Adhes)
And the output appears as follows
How can I code this so that I get the sample proportions for each covariate so I have a new table with 10 columns each representing "yes"(1)?
Thank you in advance
Something like this:
set.seed(111)
x = sample(1:9,100,replace=TRUE)
y = sample(0:1,100,replace=TRUE)
prop.table(table(y,x),margin=2)
x
y 1 2 3 4 5 6 7
0 0.4444444 0.2857143 0.6923077 0.4666667 0.5000000 0.4615385 0.6666667
1 0.5555556 0.7142857 0.3076923 0.5333333 0.5000000 0.5384615 0.3333333
x
y 8 9
0 0.3636364 0.4615385
1 0.6363636 0.5384615
Or you can simply do:
tab = table(y,x)
tab[2,]/colSums(tab)
1 2 3 4 5 6 7 8
0.5555556 0.7142857 0.3076923 0.5333333 0.5000000 0.5384615 0.3333333 0.6363636
9
0.5384615
Using tidyverse
library(dplyr)
tibble(x, y) %>%
count(x, y) %>%
mutate(prop = n/sum(n))

Standard deviation for a subset [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 7 years ago.
Improve this question
I am trying to calculate the mean and standard deviation for a variable within a subset. The coding works fine for mean but not sd. I have included sample where data= orf1 came from the subset. Any help?
mean(Stocking.Density2012,na.rm=TRUE,data=orf1)
[1] 13.72386
> sd(Stocking.Density2012,na.rm=TRUE,data=orf1)
Error in sd(Stocking.Density2012, na.rm = TRUE, data = orf1) :
unused argument (data = orf1)
Region Stocking.Density2012
1 12
8 7
2 12
8 17
1 34
3 24
1 16
2 5
1 5
4 11
1 5
3 3
7 3
5 13
1 18
4 15
2 18
1 10
6 5
1 10
5 46
1 19
3 12
1 15
6 4
1 4
7 8
1 8
8 12
data is neither an argument to mean nor to sd, so Stocking.Density2012 must be in the enclosing environment. Perhaps you attached it.
mean doesn't give an error because it has a ... argument, which sd does not.

How to find LC50 using r? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have run cadmium exposure (46h) test and now I want to find LC50 value(Lethal Concentration)and 95% confidence limits (upper and lower limits) using R ?
Here are my data:
Conc. mg/L Dead Live
C1 0 10
C2 0 10
C3 0 10
2 0 10
2 0 10
2 0 10
4 0 10
4 0 10
4 0 10
8 0 10
8 0 10
8 0 10
16 1 9
16 1 9
16 8 8
32 1 9
32 2 8
32 4 6
64 8 2
64 2 8
64 5 5
128 10 0
128 8 2
128 10 0
256 10 0
256 10 0
256 10 0
From here, it seems that LC50 is the minimum concentration at which 50% or more of organisms die. You could aggregate your data to compute the proportion of organisms that died at each concentration level:
# Numeric concentration
dat$Conc.mg.L <- as.character(dat$Conc.mg.L)
dat$Conc.mg.L[dat$Conc.mg.L %in% c("C1", "C2", "C3")] <- 0
dat$Conc.mg.L <- as.numeric(dat$Conc.mg.L)
# Determine LC50
(agg <- tapply(dat$Dead / (dat$Dead+dat$Live), dat$Conc.mg.L, mean))
# 0 2 4 8 16 32 64 128 256
# 0.0000000 0.0000000 0.0000000 0.0000000 0.2333333 0.2333333 0.5000000 0.9333333 1.0000000
as.numeric(names(agg)[min(which(agg >= 0.5))])
# [1] 64

Plotting points with size adapting to number of data points (cex)

I have the following data from a tree analysis:
train = sample(1:nrow(dd),1010)
yhat1 <- predict(tree.model1,newdata=dd[-train,])
v10.test <- dd$v10[-train]
dd is my data.frame, v10 is the (discrete) response variable that varies between 1 and 10, and train is a sample drawn from my dataframe.
I want to plot the predictions yhat1 with the actual test values v10.test, with the point size taking into account the number of actual test.values that are assigned to that yhat1 as prediction.
Thus:
plot(yhat1, v10.test, cex = ???)
The values for cex that I need can be drawn from the table object, but I don't know how. Any ideas?
table(yhat1, dd.test)
v10.test
yhat1 0 1 2 3 4 5 6 7 8 9 10
2.99479166666667 17 26 7 21 10 8 7 7 8 3 6
4.36725663716814 8 15 21 14 14 14 13 12 4 5 4
4.75 1 1 3 1 0 2 2 2 1 1 0
4.82710280373832 6 10 5 11 7 11 11 18 22 3 2
5.73684210526316 1 5 1 9 7 13 10 7 12 7 12
6.68 0 1 0 1 0 3 1 1 0 0 1
6.92045454545455 0 2 3 2 5 5 4 7 6 9 6
The symbols function may be preferable to using plot and cex when you want the size of points to depend on an additional variable. Note that you will generally get the best representation when using the square root of the variable to determine size (so that the area is proportional).
I played around a bit more and it turns out my main problem was not with the table but with the standard settings for pch and the standard size of the points, which made the resulting graph impossible to interpret.
So a way of doing it simply is
plot(yhat1, dd.test, pch = 20, cex = table(yhat1,v10.test)/10)
That does the trick (and shows how poor the data fit is)

How do I remove an extra line in a chart with ggplot

I'm a new R user and I am trying to chart an interaction between 2 continuous variables and a categorical variable.
Using interaction.plot:
interaction.plot(nonconform, trans, employdisc, type="b", col=(1:3) ,
leg.bty="o", leg.bg="beige", lwd=2, pch=c(18,24,22),
xlab="Nonconformity",
ylab="Discrimination",
main="Interaction Plot")
I get this result:
interaction plot
When I attempt to do the same thing with ggplot
ggplot(data=NTDS.zip, aes(x=nonconform, y=employdisc, colour = factor(trans), group=trans, )) +
stat_summary(fun.y=mean, geom="point") +
stat_summary(fun.y=mean, geom="line")
I get this result:
ggplot chart
There is an extra line (in grey that I can't get rid off). Its likely representing missing data, but haven't found a way to remove that line from the chart. Any discussion I found talked about suppressing warning due to missing data, but nothing regarding extra lines in a chart.
Any thoughts?
Update
After reading the R Graphics Cookbook I tried another method.
THe book's method involved summarizing the data first.
tg <- ddply(ntds.new, c("trans", "nonconform"), summarize, empdisc=mean(employdisc))
and then plotting the chart.
I tried 2 types (colour and linetype)
ggplot(tg, aes(x=nonconform, y=empdisc, colour=trans))+geom_line()
ggplot(tg, aes(x=nonconform, y=empdisc, linetype=trans))+geom_line()
The plot with the colour statement has the extra line, while the plot with linetype does not.
the data for this was:
trans nonconform empdisc
1 1 0 1.104046
2 1 1 1.472050
3 1 2 1.930070
4 1 3 2.247706
5 1 4 3.407407
6 1 NA 7.250000
7 2 0 3.427230
8 2 1 3.929707
9 2 2 4.062275
10 2 3 4.373853
11 2 4 4.470149
12 2 NA 5.294118
13 3 0 1.309524
14 3 1 1.968310
15 3 2 2.366589
16 3 3 3.815000
17 3 4 3.560606
18 3 NA 6.000000
19 4 0 2.661290
20 4 1 3.208861
21 4 2 3.033195
22 4 3 3.322176
23 4 4 3.755906
24 4 NA 6.625000
25 NA 0 4.000000
26 NA 1 4.166667
27 NA 2 2.500000
28 NA 3 6.666667
29 NA 4 5.400000
30 NA NA 2.000000
I went back and deleted the (10) lines with missing cases for either trans or nonconform columns.
trans nonconform empdisc
1 1 0 1.104046
2 1 1 1.472050
3 1 2 1.930070
4 1 3 2.247706
5 1 4 3.407407
6 2 0 3.427230
7 2 1 3.929707
8 2 2 4.062275
9 2 3 4.373853
10 2 4 4.470149
11 3 0 1.309524
12 3 1 1.968310
13 3 2 2.366589
14 3 3 3.815000
15 3 4 3.560606
16 4 0 2.661290
17 4 1 3.208861
18 4 2 3.033195
19 4 3 3.322176
20 4 4 3.755906
This solved my initial problem but this solution seems more complicated than it should be, and I'm curious as to why the plot with "colour" was affected and the one with "linetype" wasn't.
If we look in your data in table tg then there are NA values for the variable trans.
When you use trans (as factor) for the colors of the lines those NA values are also plotted because for color scales default action for NA levels is to plot them in grey50 color (na.value="grey50"). But for the linetype scales default action for NA levels is to plot blank line (na.value="blank") so you don't see the line.
To solve the problem there are couple of solutions. First, you can add the scale_color_discrete() and set the na.value= to NA.
ggplot(tg, aes(x=nonconform, y=empdisc, colour=as.factor(trans)))+
geom_line()+
scale_color_discrete(na.value=NA)
Another solution is to subset your data to remove NA values from your data and then plot your data. This can be done also inside the ggplot() call.
ggplot(tg[complete.cases(tg),], aes(x=nonconform, y=empdisc, colour=as.factor(trans)))+
geom_line()

Resources