I have a simple data frame called msq:
sex wing index
1 h 54 67.4
2 m 60.5 67.9
3 m 60 64.5
4 m 59 66.6
5 m 63.5 63.3
6 m 63 66.7
7 m 61.5 71.8
8 m 62 67.9
9 m 63 67.8
10 m 62.5 72.7
11 m 61.5 70.3
12 h 54.5 70.7
13 m 60 61.1
14 m 63.5 50.9
15 m 63 72.1
My intention is to make a boxplot with ggplot for which I use this code that works fine:
gplot(msq, aes("index",index))+ geom_boxplot (aes(group="sex"))
and then to plot an outlier that should stand alone up in the graph (a value 73.9). The problem is that if I include it in the data set, the boxplot "absorbs" it making the error line longer... I have been looking in Hmisc and to stat_summary but I can't get any clear idea.
thank you.
You could use geom_point to add points to a plot generated with ggplot2.
library(ggplot2)
ggplot(msq, aes(sex, index)) + # Note. I modified the aes call
geom_boxplot() +
geom_point(aes(y = 73.9)) # add points
Related
I have a data frame with the two columns bloodlevel and sex (F & M only), with 14 male and 11 female.
bloodlevel sex
1 14.9 M
2 12.9 M
3 14.7 M
4 14.7 M
5 14.8 M
6 14.7 M
7 13.9 M
8 14.1 M
9 16.1 M
10 16.1 M
11 15.3 M
12 12.8 M
13 14.0 M
14 14.9 M
15 11.2 F
16 14.5 F
17 12.1 F
18 14.8 F
19 15.2 F
20 11.2 F
21 15.0 F
22 13.2 F
23 14.4 F
24 14.7 F
25 13.2 F
I am trying to create two histograms that differentiate females' and males' blood levels with facet_wrap.
I have tried
ggplot(Physiology, aes(x=sex, y=bloodlevel))+
geom_histogram(binwidth=5, fill="white", color="black")+
facet_wrap(~Physiology)+
xlab("sex")
but I’m getting the error
Error in `combine_vars()`:
! At least one layer must contain all faceting variables: `Physiology`.
* Plot is missing `Physiology`
* Layer 1 is missing `Physiology`
I am trying trying to facet the variable with plot like this:
Is this what you're trying?
df <- data.frame(bloodlevel = sample(12:16,25,T),
sex=sample(c("M","F"),25,T))
df %>% ggplot(aes(x=bloodlevel))+geom_histogram()+
facet_wrap(~sex)
Next time please provide a working code sample for us to use (Copying the table you printed doesnt do the trick..)
This question already has answers here:
Visualizing two or more data points where they overlap (ggplot R)
(5 answers)
Closed 20 days ago.
I am trying to visualize my PCA analysis using ggplot but the output plot only shows 16 out of my 24 samples.
The data frame I created with my PCA data has 24 observations of 24 variables (24 samples, 24 PCAs), but ggplot is only plotting 16 out of the 24. Here is my code and mock data frame.
ggplot(data) +
aes(x=PC1, y=PC2) +
geom_point(size=3) +
coord_fixed() +
theme_bw()
Data frame
PC1 PC2
<dbl> <dbl>
1 -40.8 -20.6
2 -40.6 -19.0
3 -40.8 -20.6
4 8.01 -38.1
5 8.52 -36.3
6 8.01 -38.1
7 -39.7 -6.11
8 -38.1 -5.76
9 -39.7 -6.11
10 18.3 -33.9
11 17.9 -33.3
12 18.3 -33.9
13 -32.9 11.2
14 -31.7 9.49
15 -32.9 11.2
16 50.9 -4.98
17 49.4 -5.64
18 50.9 -4.98
19 -38.7 56.9
20 -38.0 54.9
21 -38.7 56.9
22 74.8 36.3
23 72.8 34.1
24 74.8 36.3
You could use geom_count to count overlapping points and use scale_size_area to scale the size of the points like this:
library(ggplot2)
ggplot(data) +
aes(x=PC1, y=PC2) +
geom_count() +
coord_fixed() +
theme_bw() +
scale_size_area(breaks = c(1,2))
Created on 2023-01-31 with reprex v2.0.2
When i was trying to plot a line, the x-axis came out different from the database. This is my data:
Month num temp
1 2016-1-1 61 4.5
2 2016-2-1 50 3.8
3 2016-3-1 51 5.3
4 2016-4-1 48 6.5
5 2016-5-1 49 11.3
6 2016-6-1 48 13.9
7 2016-7-1 50 15.3
8 2016-8-1 48 15.5
9 2016-9-1 52 14.6
10 2016-10-1 54 9.8
11 2016-11-1 69 4.9
12 2016-12-1 80 5.9
13 2017-1-1 59 3.8
14 2017-2-1 52 5.2
15 2017-3-1 51 7.3
16 2017-4-1 47 8.0
17 2017-5-1 50 12.1
18 2017-6-1 47 14.4
and my code was:
ggplot(data=trendsData,aes(x=Month, y=temp,group=1))+geom_line()+theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5))
but it came out:
enter image description here
Could anyone help with the disorder, thanks!
R can only sort those dates correctly when it knows what they are infact dates.
ymd() from the package lubridate is nice for that.
trendsData$Month <- ymd( trendsData$Month )
Then your plot should be fine.
EDIT:
If you want more date points to show on the x axis, you can use scale_x_date() like so:
+ scale_x_date( breaks=trendsData$Month )
Hi I am trying to find a non-parametric regression smoother to the difference between the control and treatment groups so as to determine the effectiveness of the appetite suppressant over time. then I need to use my model to estimate the difference between the treatment and control group at t=0 and t=50.
I want to use P-spline smoother ,but I do not have enough background about it
This is my data :
t
0 1 3 7 8 10 14 15 17 21 22 24 28 29 31 35 36 38 42 43 45 49 50 52 56 57 59 63 64 70 73 77 80 84 87 91 94 98 105
con
20.5 19.399 22.25 17.949 19.899 21.449 16.899 21.5 22.8 24.699 26.2 28.5 24.35 24.399 26.6 26.2 26.649 29.25 27.55 29.6 24.899 27.6 28.1 27.85 26.899 27.8 30.25 27.6 27.449 27.199 27.8 28.199 28 27.3 27.899 28.699 27.6 28.6 27.5
trt
21.3 16.35 19.25 16.6 14.75 18.149 14.649 16.7 15.05 15.5 13.949 16.949 15.6 14.699 14.15 14.899 12.449 14.85 16.75 14.3 16 16.85 15.65 17.149 18.05 15.699 18.25 18.149 16.149 16.899 18.95 22 23.6 23.75 27.149 28.449 25.85 29.7 29.449
where:
t - the time in days since the experiment started.
con - the median food intake of the control group.
trt - the median food intake of the treatment group.
Can anybody help please?
Only to give you a start. mgcv package implements various regression spline basis, including P-splines (penalized B-splines with difference penalty).
First, you need to set up your data:
dat <- data.frame(time = rep(t, 2), y = c(con, trt),
grp = gl(2, 39, labels = c("con", "trt")))
Then call gam for non-parametric regression:
library(mgcv) # no need to install; it comes with R
fit <- gam(y ~ s(time, bs = 'ps', by = grp) + grp, data = dat)
Read mgcv: how to specify interaction between smooth and factor? for specification of interaction. bs = 'ps' sets P-spline basis. By default, 10 (evenly spaced interior) knots are chosen. You can change k if you want.
More about P-splines in mgcv, read mgcv: how to extract knots, basis, coefficients and predictions for P-splines in adaptive smooth?.
I am trying to write a script to get some specific values for the equation 25a+20b=1600 with a in the range between 24:60 and b in 20:50
I need to get the pairs of a and b satisfying the equation.
My first problem was how to define a and b with a single digit decimal place (a=24.0,24.1,24.2...etc.) but I overcame that defining a<-c(240:600)/10, so my first question is: Is there any direct method to do that?
Now, I did a couple of nested loops and I am able to get each time the equation is satisfied in a vector, I want to use rbind() to attach this vector to a matrix or a dataframe but it is not working without any error or warning. it just takes the value of the first vector and that's it !
Here is my code, can someone help me define where the problem is?
solve_ms <- function() {
index<-1
sol<-data.frame()
temp<-vector("numeric")
a<-c(240:600)/10
b<-c(200:500)/10
for (i in 1:length(a)){
for (j in 1:length(b)) {
c <- 25*a[i]+20*b[j]
if(c == 1600) {
temp<-c(a[i], b[j])
if(index == 1) {
sol<-temp
index<-0
}
else rbind(sol,temp)
}
}
}
return(sol)
}
I found our where my code problem is, it is using rbind without assigning its return to a dataframe. I had to do {sol<-rbind(sol,temp)} and it will work.
I will check other suggestions as well.. thanks.
Try this instead:
#define a function
fun <- function(a,b) (25*a+20*b) == 1600
Since floating point precision could be an issue:
#alternative function
fun <- function(a,b,tol=.Machine$double.eps ^ 0.5) abs(25*a+20*b-1600) < tol
#create all possible combinations
paras <- expand.grid(a=c(240:600)/10, b=20:50)
paras[fun(paras$a,paras$b),]
a b
241 48.0 20
594 47.2 21
947 46.4 22
1300 45.6 23
1653 44.8 24
2006 44.0 25
2359 43.2 26
2712 42.4 27
3065 41.6 28
3418 40.8 29
3771 40.0 30
4124 39.2 31
4477 38.4 32
4830 37.6 33
5183 36.8 34
5536 36.0 35
5889 35.2 36
6242 34.4 37
6595 33.6 38
6948 32.8 39
7301 32.0 40
7654 31.2 41
8007 30.4 42
8360 29.6 43
8713 28.8 44
9066 28.0 45
9419 27.2 46
9772 26.4 47
10125 25.6 48
10478 24.8 49
10831 24.0 50
If the problem is really this simple i.e. solving for roots of 2 variable linear equation, you can always rearrange the equation to write b in terms of a i.e. b = (1600-25*a)/20 and get all the values of b for corresponding values of a and filter the combinations by b
e.g.
a = c(240:600)/10
b = 20:50
RESULTS <- data.frame(a, b = (1600 - 25 * a)/20)[((1600 - 25 * a)/20) %in% b, ]
RESULTS
## a b
## 1 24.0 50
## 9 24.8 49
## 17 25.6 48
## 25 26.4 47
## 33 27.2 46
## 41 28.0 45
## 49 28.8 44
## 57 29.6 43
## 65 30.4 42
## 73 31.2 41
## 81 32.0 40
## 97 33.6 38
## 105 34.4 37
## 121 36.0 35
## 137 37.6 33
## 145 38.4 32
## 161 40.0 30
## 177 41.6 28
## 185 42.4 27
## 193 43.2 26
## 201 44.0 25
## 209 44.8 24
## 217 45.6 23
## 225 46.4 22
## 233 47.2 21
## 241 48.0 20