ggplot not showing all my data points in plot [duplicate] - r

This question already has answers here:
Visualizing two or more data points where they overlap (ggplot R)
(5 answers)
Closed 20 days ago.
I am trying to visualize my PCA analysis using ggplot but the output plot only shows 16 out of my 24 samples.
The data frame I created with my PCA data has 24 observations of 24 variables (24 samples, 24 PCAs), but ggplot is only plotting 16 out of the 24. Here is my code and mock data frame.
ggplot(data) +
aes(x=PC1, y=PC2) +
geom_point(size=3) +
coord_fixed() +
theme_bw()
Data frame
PC1 PC2
<dbl> <dbl>
1 -40.8 -20.6
2 -40.6 -19.0
3 -40.8 -20.6
4 8.01 -38.1
5 8.52 -36.3
6 8.01 -38.1
7 -39.7 -6.11
8 -38.1 -5.76
9 -39.7 -6.11
10 18.3 -33.9
11 17.9 -33.3
12 18.3 -33.9
13 -32.9 11.2
14 -31.7 9.49
15 -32.9 11.2
16 50.9 -4.98
17 49.4 -5.64
18 50.9 -4.98
19 -38.7 56.9
20 -38.0 54.9
21 -38.7 56.9
22 74.8 36.3
23 72.8 34.1
24 74.8 36.3

You could use geom_count to count overlapping points and use scale_size_area to scale the size of the points like this:
library(ggplot2)
ggplot(data) +
aes(x=PC1, y=PC2) +
geom_count() +
coord_fixed() +
theme_bw() +
scale_size_area(breaks = c(1,2))
Created on 2023-01-31 with reprex v2.0.2

Related

ggplot facet with two variables

I have a data frame with the two columns bloodlevel and sex (F & M only), with 14 male and 11 female.
bloodlevel sex
1 14.9 M
2 12.9 M
3 14.7 M
4 14.7 M
5 14.8 M
6 14.7 M
7 13.9 M
8 14.1 M
9 16.1 M
10 16.1 M
11 15.3 M
12 12.8 M
13 14.0 M
14 14.9 M
15 11.2 F
16 14.5 F
17 12.1 F
18 14.8 F
19 15.2 F
20 11.2 F
21 15.0 F
22 13.2 F
23 14.4 F
24 14.7 F
25 13.2 F
I am trying to create two histograms that differentiate females' and males' blood levels with facet_wrap.
I have tried
ggplot(Physiology, aes(x=sex, y=bloodlevel))+
geom_histogram(binwidth=5, fill="white", color="black")+
facet_wrap(~Physiology)+
xlab("sex")
but I’m getting the error
Error in `combine_vars()`:
! At least one layer must contain all faceting variables: `Physiology`.
* Plot is missing `Physiology`
* Layer 1 is missing `Physiology`
I am trying trying to facet the variable with plot like this:
Is this what you're trying?
df <- data.frame(bloodlevel = sample(12:16,25,T),
sex=sample(c("M","F"),25,T))
df %>% ggplot(aes(x=bloodlevel))+geom_histogram()+
facet_wrap(~sex)
Next time please provide a working code sample for us to use (Copying the table you printed doesnt do the trick..)

ggplot x axis different from database

When i was trying to plot a line, the x-axis came out different from the database. This is my data:
Month num temp
1 2016-1-1 61 4.5
2 2016-2-1 50 3.8
3 2016-3-1 51 5.3
4 2016-4-1 48 6.5
5 2016-5-1 49 11.3
6 2016-6-1 48 13.9
7 2016-7-1 50 15.3
8 2016-8-1 48 15.5
9 2016-9-1 52 14.6
10 2016-10-1 54 9.8
11 2016-11-1 69 4.9
12 2016-12-1 80 5.9
13 2017-1-1 59 3.8
14 2017-2-1 52 5.2
15 2017-3-1 51 7.3
16 2017-4-1 47 8.0
17 2017-5-1 50 12.1
18 2017-6-1 47 14.4
and my code was:
ggplot(data=trendsData,aes(x=Month, y=temp,group=1))+geom_line()+theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5))
but it came out:
enter image description here
Could anyone help with the disorder, thanks!
R can only sort those dates correctly when it knows what they are infact dates.
ymd() from the package lubridate is nice for that.
trendsData$Month <- ymd( trendsData$Month )
Then your plot should be fine.
EDIT:
If you want more date points to show on the x axis, you can use scale_x_date() like so:
+ scale_x_date( breaks=trendsData$Month )

a barplot with two different variables on R

would like to plot the following data on the same barplot. it is a length frequency barplot showing the males and the females of a population with respect to their length classes:
I am new to this and i dont know how to put my data here, but here is an example:
Lengthclass Both Males Females
60 7 5 2
70 10 5 5
80 11 6 5
90 4 2 2
100 3 3 0
110 3 0 3
120 1 1 0
130 0 0 0
140 1 0 1
150 2 0 2
If i use this code:
{barplot()} it does not give me all three variables on the same plot.
i need a graph the looks like this but on R.
Thank you:)
classes <- levels(cut(60:100, breaks = c(60,70,80,90,100),
right =FALSE))
my.df <- data.frame(lengthclass = classes,
both = c(7,10,11,4),
male = c(5,5,6,2),
female = c(2,5,5,2))
barplot(t(as.matrix(my.df[, 2:4])),
beside = TRUE,
names.arg = my.df$lengthclass,
legend.text = TRUE,
ylim = c(0,12),
ylab = "number of individuals",
xlab = "Length class (cm)")
Your barplot is known as a "grouped barplot" (in contrast to a "stacked barplot").
Arrange your data in a matrix and use beside=TRUE in your call to barplot(). Here is an example using a built-in dataset:
> VADeaths
Rural Male Rural Female Urban Male Urban Female
50-54 11.7 8.7 15.4 8.4
55-59 18.1 11.7 24.3 13.6
60-64 26.9 20.3 37.0 19.3
65-69 41.0 30.9 54.6 35.1
70-74 66.0 54.3 71.1 50.0
> barplot(VADeaths,beside=TRUE)

making a fully labelled scatter plot using R

can someone help me show me how I could make a fully labelled scatter plot for 2 variables, showing the axis labels with units(such as "cm"), and also including the chart title. Forexample, how would i make a fully labelled scatter plot including all the above listed features for age and height, using the following data using R?
Distance Age Height Coning
1 21.4 18 3.3 Yes
2 13.9 17 3.4 Yes
3 23.9 16 2.9 Yes
4 8.7 18 3.6 No
5 241.8 6 0.7 No
6 44.5 17 1.3 Yes
7 30.0 15 2.5 Yes
8 32.3 16 1.8 Yes
9 31.4 17 5.0 No
10 32.8 13 1.6 No
11 53.3 12 2.0 No
12 54.3 6 0.9 No
13 96.3 11 2.6 No
14 133.6 4 0.6 No
15 32.1 15 2.3 No
16 57.9 12 2.4 Yes
17 30.8 17 1.8 No
18 59.9 7 0.8 No
19 42.7 15 2.0 Yes
20 20.6 18 1.7 Yes
21 62.0 8 1.3 No
22 53.1 7 1.6 No
23 28.9 16 2.2 Yes
24 177.4 5 1.1 No
25 24.8 14 1.5 Yes
26 75.3 14 2.3 Yes
27 51.6 7 1.4 No
28 36.1 9 1.1 No
29 116.1 6 1.1 No
30 28.1 16 2.5 Yes
31 8.7 19 2.2 Yes
32 105.1 6 0.8 No
33 46.0 15 3.0 Yes
34 102.6 7 1.2 No
35 15.8 15 2.2 No
36 60.0 7 1.3 No
37 96.4 13 2.6 No
38 24.2 14 1.7 No
39 14.5 15 2.4 No
40 36.6 14 1.5 No
41 65.7 5 0.6 No
42 116.3 7 1.6 No
43 113.6 8 1.0 No
44 16.7 15 4.3 Yes
45 66.0 7 1.0 No
46 60.7 7 1.0 No
47 90.6 7 0.7 No
48 91.3 7 1.3 No
49 14.4 18 3.1 Yes
50 72.8 14 3.0 Yes
With base graphics:
df <- read.table(header=T, sep=" ", text="
Yes Distance Age Height Coning
1 21.4 18 3.3 Yes
2 13.9 17 3.4 Yes
3 23.9 16 2.9 Yes
4 8.7 18 3.6 No
5 241.8 6 0.7 No
6 44.5 17 1.3 Yes
7 30.0 15 2.5 Yes
8 32.3 16 1.8 Yes
9 31.4 17 5.0 No
10 32.8 13 1.6 No
11 53.3 12 2.0 No
12 54.3 6 0.9 No
13 96.3 11 2.6 No
14 133.6 4 0.6 No
15 32.1 15 2.3 No
16 57.9 12 2.4 Yes
17 30.8 17 1.8 No
18 59.9 7 0.8 No
19 42.7 15 2.0 Yes
20 20.6 18 1.7 Yes
21 62.0 8 1.3 No
22 53.1 7 1.6 No
23 28.9 16 2.2 Yes
24 177.4 5 1.1 No
25 24.8 14 1.5 Yes
26 75.3 14 2.3 Yes
27 51.6 7 1.4 No
28 36.1 9 1.1 No
29 116.1 6 1.1 No
30 28.1 16 2.5 Yes
31 8.7 19 2.2 Yes
32 105.1 6 0.8 No
33 46.0 15 3.0 Yes
34 102.6 7 1.2 No
35 15.8 15 2.2 No
36 60.0 7 1.3 No
37 96.4 13 2.6 No
38 24.2 14 1.7 No
39 14.5 15 2.4 No
40 36.6 14 1.5 No
41 65.7 5 0.6 No
42 116.3 7 1.6 No
43 113.6 8 1.0 No
44 16.7 15 4.3 Yes
45 66.0 7 1.0 No
46 60.7 7 1.0 No
47 90.6 7 0.7 No
48 91.3 7 1.3 No
49 14.4 18 3.1 Yes
50 72.8 14 3.0 Yes")
attach(df)
lab <- sprintf("%.1fcm, %dyr", Height, Age)
plot(Age ~ Height, main="The Title", pch=20, xlab="Height in cm", ylab="Age in years")
text(y=Age, x=Height, labels=lab, cex=.7, col=rgb(0,0,0,.5), pos=4)
detach(df)
And with the help of wordcloud::textplot():
if (!require(wordcloud)) {
install.packages("wordcloud")
library(wordcloud)
}
plot(Age ~ Height, main="The Title", pch=20, xlab="Height in cm", ylab="Age in years", type="n")
textplot(y=Age, x=Height, words=lab, cex=.5, new=F, show.lines=T)
You can use the ggplot2 library. Example -
library(ggplot2)
ggplot(mtcars, aes(x=wt, y=mpg, label=rownames(mtcars)))+
geom_point() +
geom_text()
What that code snippet is doing is taking the 'mtcars' dataset, assigning the x variable as the wt column, the y variable as the mpg column, and the labels as the rownames. geom_point adds a scatterplot based on the above x,y, and geom_text places the labels at the x,y coordinates.
Check out the help entry on geom_text to see the formatting options.
Examples taken from ggplot2 documentation, page 98
p <- ggplot(mtcars, aes(x=wt, y=mpg, label=rownames(mtcars)))
p + geom_text()
# Change size of the label
p + geom_text(size=10)
p <- p + geom_point()
# Set aesthetics to fixed value
p + geom_text()
p + geom_point() + geom_text(hjust=0, vjust=0)
p + geom_point() + geom_text(angle = 45)
# Add aesthetic mappings
p + geom_text(aes(colour=factor(cyl)))
p + geom_text(aes(colour=factor(cyl))) + scale_colour_discrete(l=40)
p + geom_text(aes(size=wt))
p + geom_text(aes(size=wt)) + scale_size(range=c(3,6))
# You can display expressions by setting parse = TRUE. The
# details of the display are described in ?plotmath, but note that
# geom_text uses strings, not expressions.
p + geom_text(aes(label = paste(wt, "^(", cyl, ")", sep = "")),
parse = TRUE)
# Add an annotation not from a variable source
c <- ggplot(mtcars, aes(wt, mpg)) + geom_point()
c + geom_text(data = NULL, x = 5, y = 30, label = "plot mpg vs. wt")
# Or, you can use annotate
c + annotate("text", label = "plot mpg vs. wt", x = 2, y = 15, size = 8, colour = "red")
# Use qplot instead
qplot(wt, mpg, data = mtcars, label = rownames(mtcars),
geom=c("point", "text"))
qplot(wt, mpg, data = mtcars, label = rownames(mtcars), size = wt) +
geom_text(colour = "red")
# You can specify family, fontface and lineheight
p <- ggplot(mtcars, aes(x=wt, y=mpg, label=rownames(mtcars)))
p + geom_text(fontface=3)
p + geom_text(aes(fontface=am+1))
p + geom_text(aes(family=c("serif", "mono")[am+1]))

Plotting a new point value in a boxplot. R and ggplot2

I have a simple data frame called msq:
sex wing index
1 h 54 67.4
2 m 60.5 67.9
3 m 60 64.5
4 m 59 66.6
5 m 63.5 63.3
6 m 63 66.7
7 m 61.5 71.8
8 m 62 67.9
9 m 63 67.8
10 m 62.5 72.7
11 m 61.5 70.3
12 h 54.5 70.7
13 m 60 61.1
14 m 63.5 50.9
15 m 63 72.1
My intention is to make a boxplot with ggplot for which I use this code that works fine:
gplot(msq, aes("index",index))+ geom_boxplot (aes(group="sex"))
and then to plot an outlier that should stand alone up in the graph (a value 73.9). The problem is that if I include it in the data set, the boxplot "absorbs" it making the error line longer... I have been looking in Hmisc and to stat_summary but I can't get any clear idea.
thank you.
You could use geom_point to add points to a plot generated with ggplot2.
library(ggplot2)
ggplot(msq, aes(sex, index)) + # Note. I modified the aes call
geom_boxplot() +
geom_point(aes(y = 73.9)) # add points

Resources