ggplot facet with two variables - r

I have a data frame with the two columns bloodlevel and sex (F & M only), with 14 male and 11 female.
bloodlevel sex
1 14.9 M
2 12.9 M
3 14.7 M
4 14.7 M
5 14.8 M
6 14.7 M
7 13.9 M
8 14.1 M
9 16.1 M
10 16.1 M
11 15.3 M
12 12.8 M
13 14.0 M
14 14.9 M
15 11.2 F
16 14.5 F
17 12.1 F
18 14.8 F
19 15.2 F
20 11.2 F
21 15.0 F
22 13.2 F
23 14.4 F
24 14.7 F
25 13.2 F
I am trying to create two histograms that differentiate females' and males' blood levels with facet_wrap.
I have tried
ggplot(Physiology, aes(x=sex, y=bloodlevel))+
geom_histogram(binwidth=5, fill="white", color="black")+
facet_wrap(~Physiology)+
xlab("sex")
but I’m getting the error
Error in `combine_vars()`:
! At least one layer must contain all faceting variables: `Physiology`.
* Plot is missing `Physiology`
* Layer 1 is missing `Physiology`
I am trying trying to facet the variable with plot like this:

Is this what you're trying?
df <- data.frame(bloodlevel = sample(12:16,25,T),
sex=sample(c("M","F"),25,T))
df %>% ggplot(aes(x=bloodlevel))+geom_histogram()+
facet_wrap(~sex)
Next time please provide a working code sample for us to use (Copying the table you printed doesnt do the trick..)

Related

How to calculate House distance with euclidean distance between two set of points (coordinates) with R

I am trying to calculate euclidean distance between House a and x, b and x, ... from a table. This is my data look like:
df <- data.frame(house=c(letters[1:10],"x"),long=c(11,15,19,18,16,23,25,21,23,29,19),
lat=c(26,29,28,30,26,25,22,24,25,24,25),
location=(c(rep("city", 5),rep("district", 5), "null")))
I have tried to calculate with euclid formula:
euclid<- function(x1,x2, y1,y2) {
euclid= sqrt((x1-x2)^2+(y1-y2)^2)
return(euclid)
}
I am looking for this output:
House long lat **Distance to X**
h 21 24 2.24
c 19 28 3
e 16 26 3.16
f 23 25 4
i 23 25 4
d 18 30 5.1
b 15 29 5.66
g 25 22 6.71
a 11 26 8.06
j 29 24 10.05
How would I loop the formula to the long and lat values?
There's also the dist() function. Note the rownames step is there to make the output more readable:
rownames(df) <- df[['house']]
dist(df[, c('long', 'lat')])
# added round(..., 1) to make this output
a b c d e f g h i j
b 5.0
c 8.2 4.1
d 8.1 3.2 2.2
e 5.0 3.2 3.6 4.5
f 12.0 8.9 5.0 7.1 7.1
g 14.6 12.2 8.5 10.6 9.8 3.6
h 10.2 7.8 4.5 6.7 5.4 2.2 4.5
i 12.0 8.9 5.0 7.1 7.1 0.0 3.6 2.2
j 18.1 14.9 10.8 12.5 13.2 6.1 4.5 8.0 6.1
x 8.1 5.7 3.0 5.1 3.2 4.0 6.7 2.2 4.0 10.0
To get your intended output, you can convert the dist class to a matrix and subset:
as.matrix(dist(df[, c('long', 'lat')]))[11, -11]
a b c d e f g h i j
8.1 5.7 3.0 5.1 3.2 4.0 6.7 2.2 4.0 10.0
df$distance_to_x <- as.matrix(dist(df[, c('long', 'lat')]))[11, ]
df
house long lat location distance_to_x
a a 11 26 city 8.062258
b b 15 29 city 5.656854
c c 19 28 city 3.000000
d d 18 30 city 5.099020
e e 16 26 city 3.162278
f f 23 25 district 4.000000
g g 25 22 district 6.708204
h h 21 24 district 2.236068
i i 23 25 district 4.000000
j j 29 24 district 10.049876
x x 19 25 null 0.000000
And if you wanted to use your function as #nicola suggested. Using with() can be helpful as well:
with(df, euclid(long, long[house =='x'], lat, lat[house == 'x']))
Besides the approach with dist() by #Cole, you can use outer() to make it as well, i.e.,
# form complex-valued coordinates
z <- with(df,long + 1i*lat)
# calculate distance between complex numbers
df$distance2x <- as.numeric(abs(outer(z,z,"-"))[which(df$house == "x"),])
such that
> df
house long lat location distance2x
1 a 11 26 city 8.062258
2 b 15 29 city 5.656854
3 c 19 28 city 3.000000
4 d 18 30 city 5.099020
5 e 16 26 city 3.162278
6 f 23 25 district 4.000000
7 g 25 22 district 6.708204
8 h 21 24 district 2.236068
9 i 23 25 district 4.000000
10 j 29 24 district 10.049876
11 x 19 25 null 0.000000
Note: the idea is to form complex-valued coordinates and use abs() over the difference between two houses

R ordering splitting up ordering of entries with different number of digits to left of decimal

When I create a simple data frame,
dd <- data.frame(x = c('a','b','c','d','e','f','g','h','i','j','k','l','m'),z = c(11.2, 1.1, 911, 2,34453,11.2,106.45,44,22,12,1,19,19.1))
> dd
x z
1 a 11.20
2 b 1.10
3 c 911.00
4 d 2.00
5 e 34453.00
6 f 11.20
7 g 106.45
8 h 44.00
9 i 22.00
10 j 12.00
11 k 1.00
12 l 19.00
13 m 19.10
I am able to order the rows by the z column,
> dd[order(dd$z),]
x z
11 k 1.00
2 b 1.10
4 d 2.00
1 a 11.20
6 f 11.20
10 j 12.00
12 l 19.00
13 m 19.10
9 i 22.00
8 h 44.00
7 g 106.45
3 c 911.00
5 e 34453.00
, but when reading from a data frame which is from a 46 X ~5000 .csv file I get a result which seemingly orders the values with two digits to the left of the decimal, then the ones with a single digit to the left of the decimal. How do I order in strictly ascending order?
1940 11.8
1976 11.9
1921 12.1
1916 12.4
1967 12.5
1917 12.6
1918 12.6
1975 13.0
1919 13.8
1952 14.3
1930 7.9
1920 8.3
1963 8.4
1950 8.5
1927 8.6
1926 8.7
1960 8.7
1915 8.8
It looks like your column was read in as strings instead of as numbers.
Simply convert to numeric:
dd[["z"]] <- as.numeric(dd[["z"]])
If you get a message about NAs being coerced in, then you have some sloppy data.
Check which are NA, then check the raw data:
index.to.NAs <- which(is.na(dd[["z"]]))
rawData <- readLines("path/to.file.csv")
rawData[index.to.NAs]

R program - getting particular values depending on another column

So I have data regarding Id number and time
Id number Time(hr)
1 5
2 6.1
3 7.2
4 8.3
5 9.6
6 10.9
7 13
8 15.1
9 17.2
10 19.3
11 21.4
12 23.5
13 25.6
14 27.1
15 28.6
16 30.1
17 31.8
18 33.5
19 35.2
20 36.9
21 38.6
22 40.3
23 42
24 43.7
25 45.4
I want this output
Time Id number
10 5
20 10
30 16
40 22
So I want the time to be in 10 hour intervals and get the ID that corresponds to that particular hour...I decided to use this code data <- data2[seq(0, nrow(data2), by=5), ] but instead of the Time being in 10 hr intervals...the ID number is at 10 intervals....but I dont want that output..so far I'm getting this output
Id.number Time..s.
10 19.3
20 36.9
You can use %% (mod) operator.
data[data$Time %% 10 == 0, ]
I use cut() and cumsum(table()) but I don't quite get the answer you are expecting. How exactly are you calculating this?
# first load the data
v.txt <- '1 5
2 6.1
3 7.2
4 8.3
5 9.6
6 10.9
7 13
8 15.1
9 17.2
10 19.3
11 21.4
12 23.5
13 25.6
14 27.1
15 28.6
16 30.1
17 31.8
18 33.5
19 35.2
20 36.9
21 38.6
22 40.3
23 42
24 43.7
25 45.4'
# load in the data... awkwardly...
v <- as.data.frame(matrix(as.numeric(unlist(strsplit(strsplit(v.txt, '\n')[[1]], ' +'))), byrow=T, ncol=2))
tens <- seq(from=0, by=10, to=100)
v$cut <- cut(v$Time, tens, labels=tens[-1])
v2 <- as.data.frame(cumsum(table(v$cut)))
names(v2) <- 'Time'
v2$Id <- rownames(v2)
rownames(v2) <- 1:nrow(v2)
v2 <- v2[,c(2,1)]
rm(v, v.txt, tens) # not needed anymore
v2 # the answer... but doesn't quite match your expected answer...
Id Time
1 10 5
2 20 10
3 30 15
4 40 21
5 50 25

Plotting a new point value in a boxplot. R and ggplot2

I have a simple data frame called msq:
sex wing index
1 h 54 67.4
2 m 60.5 67.9
3 m 60 64.5
4 m 59 66.6
5 m 63.5 63.3
6 m 63 66.7
7 m 61.5 71.8
8 m 62 67.9
9 m 63 67.8
10 m 62.5 72.7
11 m 61.5 70.3
12 h 54.5 70.7
13 m 60 61.1
14 m 63.5 50.9
15 m 63 72.1
My intention is to make a boxplot with ggplot for which I use this code that works fine:
gplot(msq, aes("index",index))+ geom_boxplot (aes(group="sex"))
and then to plot an outlier that should stand alone up in the graph (a value 73.9). The problem is that if I include it in the data set, the boxplot "absorbs" it making the error line longer... I have been looking in Hmisc and to stat_summary but I can't get any clear idea.
thank you.
You could use geom_point to add points to a plot generated with ggplot2.
library(ggplot2)
ggplot(msq, aes(sex, index)) + # Note. I modified the aes call
geom_boxplot() +
geom_point(aes(y = 73.9)) # add points

Customizing x-axis of graph

I am using scale_x_discrete() to customize ticks and labels of x-axis.
However, as figure shows, the lines cut the right-side y-axis, which doesn't look good to me. Could you please help me to fix this. The data (temp) is also shown below.
> a = ggplot(data = temp, aes(b, c, group=a,shape=a,colour=a), ordered=TRUE) + geom_line() + geom_point()
> a
> b = a + scale_x_discrete(breaks = c("2","4","8","16","32","64","128"), labels=c("2","4","8","16","32","64","128"))
> temp
a b c
1 One 2 5.1
2 One 4 6.6
3 One 8 7.7
4 One 16 8.4
5 One 32 16.1
6 One 64 38.0
7 One 128 49.2
8 Two 2 5.9
9 Two 4 7.7
10 Two 8 9.2
11 Two 16 10.3
12 Two 32 16.8
13 Two 64 32.4
14 Two 128 45.7
15 Three 2 4.7
16 Three 4 7.0
17 Three 8 8.5
18 Three 16 9.6
19 Three 32 14.8
20 Three 64 31.0
21 Three 128 34.5
22 Four 2 4.3
23 Four 4 6.9
24 Four 8 8.3
25 Four 16 9.1
26 Four 32 14.0
27 Four 64 23.8
Why are you using a discrete scale for something at appears to be continuous.
If you replace scale_x_discrete with scale_x_continuous then this should work as you wish.
b <- a + scale_x_continuous(breaks = 2^(1:7))
b
You might be interested in a transformation to base 2, given the way your data for b appear only to be integer powers of 2.
a + scale_x_continuous(breaks = 2^(1:7), trans = 'log2')
There is also the "expand" argument from the ggplot website. Adjust the numbers to whatever look you are trying to achieve
a + scale_x_discrete(breaks = c("2","4","8","16","32","64","128"),
labels=c("2","4","8","16","32","64","128"),
expand = c(.1,.1))

Resources