Removing Labels from Legend in ggplot2 - r

I have the data frame below that I have graphed as shown. How can I limit the values shown in the legend to only the first three? In other words, I want it to only show "A", "B", and "C".
graph_table <- read.table(header=TRUE, text="
names freq rank percs sums sums_str
1 A 1208 'Top 3' 46.1 61.1 61.1%
2 B 289 'Top 3' 11.0 61.1 61.1%
3 C 105 'Top 3' 4.0 61.1 61.1%
4 D 388 D 14.8 14.8 14.8%
5 E 173 E 6.6 6.6 6.6%
6 F 102 F 3.9 3.9 3.9%
7 G 70 G 2.7 2.7 2.7%
8 H 54 H 2.1 2.1 2.1%
9 I 44 I 1.7 1.7 1.7%
10 J 32 J 1.2 1.2 1.2%
11 K 24 K 0.9 0.9 0.9%
12 L 20 L 0.8 0.8 0.8%
13 M 20 M 0.8 0.8 0.8%
14 N 18 N 0.7 0.7 0.7%
15 O 13 O 0.5 0.5 0.5%
16 P 10 P 0.4 0.4 0.4%
17 Q 10 Q 0.4 0.4 0.4%
18 R 10 R 0.4 0.4 0.4%
19 S 7 S 0.3 0.3 0.3%
20 T 5 T 0.2 0.2 0.2%
21 U 5 U 0.2 0.2 0.2%
22 V 5 V 0.2 0.2 0.2%
23 W 3 W 0.1 0.1 0.1%")
library(ggplot2)
p <- ggplot(graph_table[1:10,], aes(x=rank, y=percs,
fill=names))+geom_bar(stat="identity")
p <- p+geom_text(aes(label=sums_str, y=(sums+4)), size=4)
p

Was confused at first, but you want to show a top-3 so the other names don't need a legend. Here you go:
p <- ggplot(graph_table[1:10,], aes(x=rank, y=percs,
fill=names))+geom_bar(stat="identity")
p <- p+geom_text(aes(label=sums_str, y=(sums+4)), size=4)
p + scale_fill_discrete(breaks=c("A","B","C"))

Related

How to calculate House distance with euclidean distance between two set of points (coordinates) with R

I am trying to calculate euclidean distance between House a and x, b and x, ... from a table. This is my data look like:
df <- data.frame(house=c(letters[1:10],"x"),long=c(11,15,19,18,16,23,25,21,23,29,19),
lat=c(26,29,28,30,26,25,22,24,25,24,25),
location=(c(rep("city", 5),rep("district", 5), "null")))
I have tried to calculate with euclid formula:
euclid<- function(x1,x2, y1,y2) {
euclid= sqrt((x1-x2)^2+(y1-y2)^2)
return(euclid)
}
I am looking for this output:
House long lat **Distance to X**
h 21 24 2.24
c 19 28 3
e 16 26 3.16
f 23 25 4
i 23 25 4
d 18 30 5.1
b 15 29 5.66
g 25 22 6.71
a 11 26 8.06
j 29 24 10.05
How would I loop the formula to the long and lat values?
There's also the dist() function. Note the rownames step is there to make the output more readable:
rownames(df) <- df[['house']]
dist(df[, c('long', 'lat')])
# added round(..., 1) to make this output
a b c d e f g h i j
b 5.0
c 8.2 4.1
d 8.1 3.2 2.2
e 5.0 3.2 3.6 4.5
f 12.0 8.9 5.0 7.1 7.1
g 14.6 12.2 8.5 10.6 9.8 3.6
h 10.2 7.8 4.5 6.7 5.4 2.2 4.5
i 12.0 8.9 5.0 7.1 7.1 0.0 3.6 2.2
j 18.1 14.9 10.8 12.5 13.2 6.1 4.5 8.0 6.1
x 8.1 5.7 3.0 5.1 3.2 4.0 6.7 2.2 4.0 10.0
To get your intended output, you can convert the dist class to a matrix and subset:
as.matrix(dist(df[, c('long', 'lat')]))[11, -11]
a b c d e f g h i j
8.1 5.7 3.0 5.1 3.2 4.0 6.7 2.2 4.0 10.0
df$distance_to_x <- as.matrix(dist(df[, c('long', 'lat')]))[11, ]
df
house long lat location distance_to_x
a a 11 26 city 8.062258
b b 15 29 city 5.656854
c c 19 28 city 3.000000
d d 18 30 city 5.099020
e e 16 26 city 3.162278
f f 23 25 district 4.000000
g g 25 22 district 6.708204
h h 21 24 district 2.236068
i i 23 25 district 4.000000
j j 29 24 district 10.049876
x x 19 25 null 0.000000
And if you wanted to use your function as #nicola suggested. Using with() can be helpful as well:
with(df, euclid(long, long[house =='x'], lat, lat[house == 'x']))
Besides the approach with dist() by #Cole, you can use outer() to make it as well, i.e.,
# form complex-valued coordinates
z <- with(df,long + 1i*lat)
# calculate distance between complex numbers
df$distance2x <- as.numeric(abs(outer(z,z,"-"))[which(df$house == "x"),])
such that
> df
house long lat location distance2x
1 a 11 26 city 8.062258
2 b 15 29 city 5.656854
3 c 19 28 city 3.000000
4 d 18 30 city 5.099020
5 e 16 26 city 3.162278
6 f 23 25 district 4.000000
7 g 25 22 district 6.708204
8 h 21 24 district 2.236068
9 i 23 25 district 4.000000
10 j 29 24 district 10.049876
11 x 19 25 null 0.000000
Note: the idea is to form complex-valued coordinates and use abs() over the difference between two houses

Does ggplot2 exclude some data?

I want to create some basic grouped barplots with ggplot2 but it seems to exclude some data. If I review my input data everything is there, but some bars are missing and it is also messing with the error bars. I tried to convert into multiple variable types, regrouped, loaded again, saved everything in .csv and loaded all new... I just don't know what is wrong.
Here is my code:
library(ggplot2)
limits <- aes(ymax = DataCm$mean + DataCm$sd,
ymin = DataCm$mean - DataCm$sd)
p <- ggplot(data = DataCm, aes(x = factor(DataCm$Zeit), y = factor(DataCm$mean)
) )
p + geom_bar(stat = "identity",
position = position_dodge(0.9),fill =DataCm$group) +
geom_errorbar(limits, position = position_dodge(0.9),
width = 0.25) +
labs(x = "Time [min]", y = "Individuals per foodsource")
This is DataCm:
Zeit mean sd group
1 30 0.1 0.3162278 1
2 60 0.0 0.0000000 2
3 90 0.1 0.3162278 3
4 120 0.0 0.0000000 4
5 150 0.1 0.3162278 5
6 180 0.1 0.3162278 6
7 240 0.3 0.6749486 1
8 300 0.3 0.6749486 2
9 360 0.3 0.6749486 3
10 30 0.1 0.3162278 4
11 60 0.1 0.3162278 5
12 90 0.2 0.4216370 6
13 120 0.3 0.4830459 1
14 150 0.3 0.4830459 2
15 180 0.4 0.5163978 3
16 240 0.3 0.4830459 4
17 300 0.4 0.5163978 5
18 360 0.4 0.5163978 6
19 30 1.2 1.1352924 1
20 60 1.8 1.6865481 2
21 90 2.2 2.0976177 3
22 120 2.2 2.0976177 4
23 150 2.0 1.8856181 5
24 180 2.3 1.9465068 6
25 240 2.4 2.0655911 1
26 300 2.1 1.8529256 2
27 360 2.0 2.1602469 3
28 30 0.2 0.4216370 4
29 60 0.1 0.3162278 5
30 90 0.1 0.3162278 6
31 120 0.1 0.3162278 1
32 150 0.0 0.0000000 2
33 180 0.1 0.3162278 3
34 240 0.1 0.3162278 4
35 300 0.1 0.3162278 5
36 360 0.1 0.3162278 6
37 30 1.3 1.5670212 1
38 60 1.5 1.5811388 2
39 90 1.5 1.7159384 3
40 120 1.5 1.9002924 4
41 150 1.9 2.1317703 5
42 180 1.9 2.1317703 6
43 240 2.2 2.3475756 1
44 300 2.4 2.3190036 2
45 360 2.2 2.1499354 3
46 30 2.1 2.1317703 4
47 60 3.0 2.2110832 5
48 90 3.3 2.1628171 6
49 120 3.2 2.1499354 1
50 150 3.4 2.6331224 2
51 180 3.5 2.4152295 3
52 240 3.7 2.6267851 4
53 300 3.7 2.4060110 5
54 360 3.8 2.6583203 6
The output is:
Maybe you can help me. Thanks in advance!
Best wishes,
Benjamin
Solved it:
I reshaped everything in Excel and exported it another way. The group variable was also not the way I wanted it. Now it is fixed, but I can't really tell you why.
Your data looks malformed. I guess you wanted to have 6 different group values for each time point, but now the group variable just loops over, and you have:
1 30 0.1 0.3162278 1
...
10 30 0.1 0.3162278 4
...
19 30 1.2 1.1352924 1
...
28 30 0.2 0.4216370 4
geom_bar then probably omits rows that have identical mean and time. Although I am not sure why it chooses to do so, you should solve the group problem first anyway.

R grouping values in data table for pie chart

I have some statistic data about process quality presented in table form (result >> % of all cases)
# (df <- read.csv(...)
detection_quality_algo1_pupil <- table(df$pupeuclid1)
detection_quality_algo1_pupil_percent = round(
detection_quality_algo1_pupil[names(detection_quality_algo1_pupil)]
/ nrow(df)
* 100
, digits = 1)
0 - 16.4%
1 - 50.6%
2 - 12.0%
3 - 2.4%
etc.
> detection_quality_algo1_pupil_percent
0 1 2 3 4 5 10 11 12 13 16 17 20 21 22 23 24 25 27 29 30 31 32 33
16.4 50.6 12.0 2.4 0.5 0.6 0.9 0.6 0.3 0.1 0.3 0.1 0.1 0.1 0.1 0.3 0.3 0.1 0.1 0.3 0.1 0.3 0.1 0.1
37 40 43 45 50 53 54 55 56 59 102 104 106 107 112 114 131 132 134 136 138 139 141 142
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.4 0.1 0.3 0.1 0.1 0.3 0.1 0.1
145 149 150 151 152 153 154 155 156 157 158 160 161 164 166 167 168 169 170 171 173 175 187 191
0.3 0.6 0.1 0.3 0.1 0.5 0.3 0.1 0.1 0.4 0.1 0.1 0.4 0.1 0.1 0.3 0.3 0.3 0.1 0.3 0.1 0.1 0.1 0.1
194 208
0.1 0.1
> pie(detection_quality_algo1_pupil_percent)
my goal is grouping results with value > 3 into one big group named "> 3" and show results on pie chart.
I think it's about applying some filters on source table...
How can i do this?
Try:
x <- rep(0:5,c(20,50,20,4,4,2))
pie(table(x)) # 3 small groups
pie(table(cut(x, c(-Inf,0:2,Inf),labels=0:3))) # 1 group representing the 3 small groups
And, as #sebpardo notes, pie charts are terrible. Use a barplot instead:
barplot(table(cut(x, c(-Inf,0:2,Inf),labels=0:3)))
You could try adding a new 'collapsed' column to your dataframe using mutate, e.g.
library(dplyr)
df <- mutate(df, new_group = ifelse(group > 3, ">3", group)
I agree with #sebpardo's suggestion in the comment above that there's a better way to visualize data than pie charts. Even the help page advises against them (see ?pie):
"Pie charts are a very bad way of displaying information. [...]"

making a fully labelled scatter plot using R

can someone help me show me how I could make a fully labelled scatter plot for 2 variables, showing the axis labels with units(such as "cm"), and also including the chart title. Forexample, how would i make a fully labelled scatter plot including all the above listed features for age and height, using the following data using R?
Distance Age Height Coning
1 21.4 18 3.3 Yes
2 13.9 17 3.4 Yes
3 23.9 16 2.9 Yes
4 8.7 18 3.6 No
5 241.8 6 0.7 No
6 44.5 17 1.3 Yes
7 30.0 15 2.5 Yes
8 32.3 16 1.8 Yes
9 31.4 17 5.0 No
10 32.8 13 1.6 No
11 53.3 12 2.0 No
12 54.3 6 0.9 No
13 96.3 11 2.6 No
14 133.6 4 0.6 No
15 32.1 15 2.3 No
16 57.9 12 2.4 Yes
17 30.8 17 1.8 No
18 59.9 7 0.8 No
19 42.7 15 2.0 Yes
20 20.6 18 1.7 Yes
21 62.0 8 1.3 No
22 53.1 7 1.6 No
23 28.9 16 2.2 Yes
24 177.4 5 1.1 No
25 24.8 14 1.5 Yes
26 75.3 14 2.3 Yes
27 51.6 7 1.4 No
28 36.1 9 1.1 No
29 116.1 6 1.1 No
30 28.1 16 2.5 Yes
31 8.7 19 2.2 Yes
32 105.1 6 0.8 No
33 46.0 15 3.0 Yes
34 102.6 7 1.2 No
35 15.8 15 2.2 No
36 60.0 7 1.3 No
37 96.4 13 2.6 No
38 24.2 14 1.7 No
39 14.5 15 2.4 No
40 36.6 14 1.5 No
41 65.7 5 0.6 No
42 116.3 7 1.6 No
43 113.6 8 1.0 No
44 16.7 15 4.3 Yes
45 66.0 7 1.0 No
46 60.7 7 1.0 No
47 90.6 7 0.7 No
48 91.3 7 1.3 No
49 14.4 18 3.1 Yes
50 72.8 14 3.0 Yes
With base graphics:
df <- read.table(header=T, sep=" ", text="
Yes Distance Age Height Coning
1 21.4 18 3.3 Yes
2 13.9 17 3.4 Yes
3 23.9 16 2.9 Yes
4 8.7 18 3.6 No
5 241.8 6 0.7 No
6 44.5 17 1.3 Yes
7 30.0 15 2.5 Yes
8 32.3 16 1.8 Yes
9 31.4 17 5.0 No
10 32.8 13 1.6 No
11 53.3 12 2.0 No
12 54.3 6 0.9 No
13 96.3 11 2.6 No
14 133.6 4 0.6 No
15 32.1 15 2.3 No
16 57.9 12 2.4 Yes
17 30.8 17 1.8 No
18 59.9 7 0.8 No
19 42.7 15 2.0 Yes
20 20.6 18 1.7 Yes
21 62.0 8 1.3 No
22 53.1 7 1.6 No
23 28.9 16 2.2 Yes
24 177.4 5 1.1 No
25 24.8 14 1.5 Yes
26 75.3 14 2.3 Yes
27 51.6 7 1.4 No
28 36.1 9 1.1 No
29 116.1 6 1.1 No
30 28.1 16 2.5 Yes
31 8.7 19 2.2 Yes
32 105.1 6 0.8 No
33 46.0 15 3.0 Yes
34 102.6 7 1.2 No
35 15.8 15 2.2 No
36 60.0 7 1.3 No
37 96.4 13 2.6 No
38 24.2 14 1.7 No
39 14.5 15 2.4 No
40 36.6 14 1.5 No
41 65.7 5 0.6 No
42 116.3 7 1.6 No
43 113.6 8 1.0 No
44 16.7 15 4.3 Yes
45 66.0 7 1.0 No
46 60.7 7 1.0 No
47 90.6 7 0.7 No
48 91.3 7 1.3 No
49 14.4 18 3.1 Yes
50 72.8 14 3.0 Yes")
attach(df)
lab <- sprintf("%.1fcm, %dyr", Height, Age)
plot(Age ~ Height, main="The Title", pch=20, xlab="Height in cm", ylab="Age in years")
text(y=Age, x=Height, labels=lab, cex=.7, col=rgb(0,0,0,.5), pos=4)
detach(df)
And with the help of wordcloud::textplot():
if (!require(wordcloud)) {
install.packages("wordcloud")
library(wordcloud)
}
plot(Age ~ Height, main="The Title", pch=20, xlab="Height in cm", ylab="Age in years", type="n")
textplot(y=Age, x=Height, words=lab, cex=.5, new=F, show.lines=T)
You can use the ggplot2 library. Example -
library(ggplot2)
ggplot(mtcars, aes(x=wt, y=mpg, label=rownames(mtcars)))+
geom_point() +
geom_text()
What that code snippet is doing is taking the 'mtcars' dataset, assigning the x variable as the wt column, the y variable as the mpg column, and the labels as the rownames. geom_point adds a scatterplot based on the above x,y, and geom_text places the labels at the x,y coordinates.
Check out the help entry on geom_text to see the formatting options.
Examples taken from ggplot2 documentation, page 98
p <- ggplot(mtcars, aes(x=wt, y=mpg, label=rownames(mtcars)))
p + geom_text()
# Change size of the label
p + geom_text(size=10)
p <- p + geom_point()
# Set aesthetics to fixed value
p + geom_text()
p + geom_point() + geom_text(hjust=0, vjust=0)
p + geom_point() + geom_text(angle = 45)
# Add aesthetic mappings
p + geom_text(aes(colour=factor(cyl)))
p + geom_text(aes(colour=factor(cyl))) + scale_colour_discrete(l=40)
p + geom_text(aes(size=wt))
p + geom_text(aes(size=wt)) + scale_size(range=c(3,6))
# You can display expressions by setting parse = TRUE. The
# details of the display are described in ?plotmath, but note that
# geom_text uses strings, not expressions.
p + geom_text(aes(label = paste(wt, "^(", cyl, ")", sep = "")),
parse = TRUE)
# Add an annotation not from a variable source
c <- ggplot(mtcars, aes(wt, mpg)) + geom_point()
c + geom_text(data = NULL, x = 5, y = 30, label = "plot mpg vs. wt")
# Or, you can use annotate
c + annotate("text", label = "plot mpg vs. wt", x = 2, y = 15, size = 8, colour = "red")
# Use qplot instead
qplot(wt, mpg, data = mtcars, label = rownames(mtcars),
geom=c("point", "text"))
qplot(wt, mpg, data = mtcars, label = rownames(mtcars), size = wt) +
geom_text(colour = "red")
# You can specify family, fontface and lineheight
p <- ggplot(mtcars, aes(x=wt, y=mpg, label=rownames(mtcars)))
p + geom_text(fontface=3)
p + geom_text(aes(fontface=am+1))
p + geom_text(aes(family=c("serif", "mono")[am+1]))

Why length function does not work correct in R?

Following R code gives the cars which are in Type Small. But length function returns 6 instead of 13. Why is that?
> fuel.frame[fuel.frame$Type=="Small",]
row.names Weight Disp. Mileage Fuel Type
1 Eagle.Summit.4 30 0.97 33 3.030303 Small
2 Ford.Escort.4 28 114.00 33 3.030303 Small
3 Ford.Festiva.4 23 0.81 37 2.702703 Small
4 Honda.Civic.4 27 0.91 32 3.125000 Small
5 Mazda.Protege.4 29 113.00 32 3.125000 Small
6 Mercury.Tracer.4 27 0.97 26 3.846154 Small
7 Nissan.Sentra.4 27 0.97 33 3.030303 Small
8 Pontiac.LeMans.4 28 0.98 28 3.571429 Small
9 Subaru.Loyale.4 27 109.00 25 4.000000 Small
10 Subaru.Justy.3 24 0.73 34 2.941176 Small
11 Toyota.Corolla.4 28 0.97 29 3.448276 Small
12 Toyota.Tercel.4 25 0.89 35 2.857143 Small
13 Volkswagen.Jetta.4 28 109.00 26 3.846154 Small
> length(fuel.frame[fuel.frame$Type=="Small",])
[1] 6
length gives in this case the number of columns in the data frame. You can instead use nrow or ncol to get the number of rows or number of columns respectively:
nrow(fuel.frame[fuel.frame$Type=="Small",])
Another example using iris dataset:
> d = head(iris)
> d
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> nrow(d)
[1] 6
> ncol(d)
[1] 5
> dim(d)
[1] 6 5
I thought it might help to give a bit of an explanation as to thy your getting your result. Your asking the length of the data.frame not the vector. Since the data.frame has 6 columns that explains your result.
this asks for the vector specifically:
length(fuel.frame$Type[fuel.frame$Type=="Small"])
and so does this:
length(fuel.frame[fuel.frame$Type=="Small",][,1])
or use nrow instead of length as already suggested.

Resources