Plot multiple lines from dataframe in R - r

I have some data in a single dataframe. It represents several days' worth of data broken down by age within each day. What I'm looking to do is plot the Value (data points) for each age (y axis) by day (x axis). The frame is set up like this:
Age day Value
1 13 15 139
2 14 15 198
3 15 15 287
4 16 15 404
5 17 15 439
6 18 15 323
7 19 15 255
8 13 16 135
9 14 16 202
10 15 16 309
11 16 16 380
12 17 16 451
13 18 16 366
14 19 16 256
15 13 17 117
16 14 17 208
17 15 17 303
18 16 17 392
19 17 17 410
20 18 17 359
21 19 17 246
Thus, 13 would plot from 139 to 135 to 117 over the three day period. I'm trying to use ggplot2, and am having trouble with the syntax. The end result should plot lines with different color by age.
So far I've tried this:
ggplot(d, aes(x=day, y=Age, color=Value, group=Age)) + geom_line()
But this yields an empty plot and this error message: geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?
What am I missing?

Not quite sure by your wording what you're after...
I think it's this...
ggplot(df, aes(day, Value, group=factor(Age), color=factor(Age))) + geom_line()
plots days vs Value with separate lines being each Age?

Related

Stop Geom-Point from Re-ordering Y axis in R

I am trying to plot points on top of existing lines in ggplot. If I plot just the lines, the y axis is in the order that I'd like it to be in. However, when I try to add the points, the y axis order changes and I can't figure out why.
Here's the code I've been using to produce the plot:
ggplot(sincevax_reshape, aes(x=value,y=factor(record_id), group=factor(record_id))) +
xlab("Days Since Vaccine") + ylab("Participant ID") +
geom_line(data=sincevax_reshape[sincevax_reshape$variable=="Sample",], aes(x=value, y=factor(record_id), group=factor(record_id)), color="darkgrey", size=2) +
geom_point(aes(x=value, y=factor(record_id)))
And here's some reproducible data to play around with:
record_id variable value
6 10 Sample -182
7 11 Sample -233
14 21 Sample -189
16 23 Sample -232
17 24 Sample -214
18 24 Sample 20
19 24 Sample 102
1110 10 Today 177
1111 11 Today 118
1112 13 Today 115
1113 14 Today 62
1114 15 Today 111
1115 16 Today 211
1116 18 Today 120
1117 20 Today 97
1118 21 Today 134
1119 22 Today 15
1120 23 Today 90
1121 24 Today 107

How to user NSE inside fct_reorder() in ggplot2

I would like to know how to use NSE (Non-Standard Evaluation) expression in fct_reorder() in ggplot2 to replicate charts for different data frames.
This is an example of data frame that I use to draw a chart:
travel_time_br30 travel_time_br30_int time_reduction shift not_shift total
1 0-30 0 10 2780 3268 6048
2 0-30 0 20 2779 3269 6048
3 0-30 0 30 2984 3064 6048
4 0-30 0 40 3211 2837 6048
5 30-60 30 10 2139 2007 4146
6 30-60 30 20 2159 1987 4146
7 30-60 30 30 2363 1783 4146
8 30-60 30 40 2478 1668 4146
9 60-90 60 10 764 658 1422
10 60-90 60 20 721 701 1422
11 60-90 60 30 782 640 1422
12 60-90 60 40 801 621 1422
13 90-120 90 10 296 224 520
14 90-120 90 20 302 218 520
15 90-120 90 30 317 203 520
16 90-120 90 40 314 206 520
17 120-150 120 10 12 10 22
18 120-150 120 20 10 12 22
19 120-150 120 30 10 12 22
20 120-150 120 40 13 9 22
21 150-180 150 10 35 21 56
22 150-180 150 20 40 16 56
23 150-180 150 30 40 16 56
24 150-180 150 40 35 21 56
share
1 45.96561
2 45.94907
3 49.33862
4 53.09193
5 51.59190
6 52.07429
7 56.99469
8 59.76845
9 53.72714
10 50.70323
11 54.99297
12 56.32911
13 56.92308
14 58.07692
15 60.96154
16 60.38462
17 54.54545
18 45.45455
19 45.45455
20 59.09091
21 62.50000
22 71.42857
23 71.42857
24 62.50000
These are the scripts to draw a chart from above data frame:
g.var <- "travel_time_br30"
go.var <- "travel_time_br30_int"
test %>% ggplot(.,aes_(x=as.name(x.var),y=as.name("share"),group=as.name(g.var))) +
geom_line(size=1.4, aes(
color=fct_reorder(travel_time_br30,order(travel_time_br30_int))))
As I have several data frames which has different fields such as access_time_br30, access_time_br30_int instead of travel_time_br30 and travel_time_br30_int in the data frame, I set two variables (g.var and go.var) to easily replicate multiple chars in the same scripts.
As I need to reorder the factor group numerically, in particular, changing order of travel_time_br30 by travel_time_br30_int, I am using fct_reorder function in ggplot2(., aes_(...)). However, if I use aes_ with fct_reorder() in geom_line() as shown as an example in the following script, it returns an error saying Error:fmust be a factor (or character vector).
geom_line(size=1.4, aes_(color=fct_reorder(as.name(g.var),order(as.name(go.var)))))
Fct_reorder() does not seem to have an NSE version like fct_reorder_().
Is it impossible to use both aes_ and fct_reorder() in a sequence of scripts or are there any other solutions?
Based on my novice working knowledge of tidy-eval, you could transform your factor order in mutate() before passing the data into ggplot() and acheive your result.
Sorry I couldn't easily read in your table above, because of the line return so I made a new example off of mtcars that I think captures your intent. (let me know if it doesn't)
mtcars2 <- mutate(mtcars,
gear_int = 6 - gear,
gear_intrev = rev(gear_int)) %>%
mutate_at(vars(cyl, gear), as.factor)
library(rlang)
gg_reorder <- function(data, col_var, col_order) {
eq_var <- sym(col_var) # sym is flexible and my novice preference
eq_ord <- sym(col_order)
data %>% mutate(!!quo_name(eq_var) := fct_reorder(!!eq_var, !!eq_ord) ) %>%
ggplot(aes_(~mpg, ~hp, color = eq_var)) +
geom_line()
}
And now put it to use plotting...
gg_reorder(mtcars2, "gear", "gear_int")
gg_reorder(mtcars2, "gear", "gear_intrev")
I didn't specify all of the aes_() variables as strings but you could pass those as text and use the as.name() pattern. If you want more tidy-eval patterns Edwin Thoen wrote up a bunch of common cases.

Plot histogram by first sorting data and then dividing x values into bins in R

I have a dataset in a given format:
USER.ID avgfrequency
1 3 3.7821782
2 7 14.7500000
3 9 13.4761905
4 13 5.1967213
5 16 6.7812500
6 26 41.7500000
7 49 13.6666667
8 50 7.0000000
9 51 1.0000000
10 52 17.7500000
11 69 4.5000000
12 75 9.9500000
13 91 84.2000000
14 98 8.0185185
15 138 14.2000000
16 139 34.7500000
17 149 7.6666667
18 155 35.3333333
19 167 24.0000000
20 170 7.3529412
21 171 4.4210526
22 175 6.5781250
23 176 19.2857143
24 177 10.4864865
25 178 28.0000000
26 180 4.8461538
27 183 25.5000000
28 184 13.0000000
29 210 32.0000000
30 215 13.4615385
31 220 11.3611111
32 223 26.2500000
I want to first sort the dataset by avgfrequency and then I want to plot count of USER.ID's that fall under different bin categories.
I want to divide avgfrequency into different bin categories of width 10.
I am trying to sort data using:
user_avgfrequency <- user_avgfrequency[order(user_avgfrequency[,1]), ]
but getting an error.
df <- data.frame(USER.ID=c(3,7,9,13,16,26,49,50,51,52,69,75,91,98,138,139,149,155,167,170,171,175,176,177,178,180,183,184,210,215,220,223), avgfrequency=c(3.7821782,14.7500000,13.4761905,5.1967213,6.7812500,41.7500000,13.6666667,7.0000000,1.0000000,17.7500000,4.5000000,9.9500000,84.2000000,8.0185185,14.2000000,34.7500000,7.6666667,35.3333333,24.0000000,7.3529412,4.4210526,6.5781250,19.2857143,10.4864865,28.0000000,4.8461538,25.5000000,13.0000000,32.0000000,13.4615385,11.3611111,26.2500000) );
breaks <- seq(0,ceiling(max(df$avgfrequency)/10)*10,10);
cols <- colorRampPalette(c('blue','green','red'))(length(breaks)-1);
hist(df$avgfrequency,breaks,col=cols,axes=F,xlab='Average Frequency',ylab='Count');
axis(1,breaks);
axis(2,0:max(tabulate(cut(df$avgfrequency,breaks))));

ggplot with data frame columns

I am totally lost with using ggplot. I've tried with various solutions, but none were successful. Using numbers below, I want to create a line graph where the three lines, each representing df$c, df$d, and df$e, the x-axis representing df$a, and the y-axis representing the cumulative probability where 95=100%.
a b c d e
1 0 18 0.047368421 0.036842105 0.005263158
2 1 20 0.047368421 0.036842105 0.010526316
13 2 26 0.052631579 0.031578947 0.026315789
20 3 35 0.084210526 0.036842105 0.031578947
22 4 41 0.068421053 0.052631579 0.047368421
24 5 88 0.131578947 0.068421053 0.131578947
26 7 90 0.131578947 0.068421053 0.136842105
27 8 93 0.126315789 0.068421053 0.147368421
28 9 96 0.126315789 0.073684211 0.152631579
3 10 115 0.105263158 0.078947368 0.210526316
4 11 116 0.105263158 0.084210526 0.210526316
5 12 120 0.094736842 0.084210526 0.226315789
6 13 128 0.105263158 0.073684211 0.247368421
7 14 129 0.100000000 0.073684211 0.252631579
8 15 154 0.031578947 0.042105263 0.368421053
9 16 155 0.031578947 0.036842105 0.373684211
10 17 158 0.036842105 0.036842105 0.378947368
11 18 161 0.036842105 0.031578947 0.389473684
12 19 163 0.026315789 0.031578947 0.400000000
14 20 169 0.026315789 0.021052632 0.421052632
15 21 171 0.015789474 0.021052632 0.431578947
16 22 174 0.010526316 0.021052632 0.442105263
17 24 176 0.010526316 0.021052632 0.447368421
18 25 186 0.005263158 0.005263158 0.484210526
19 26 187 0.005263158 0.000000000 0.489473684
21 35 188 0.005263158 0.005263158 0.489473684
23 40 189 0.005263158 0.000000000 0.494736842
25 60 190 0.000000000 0.000000000 0.500000000
I was somewhat successful with using R base coding
plot(df$a, df$c, type="l",col="red")
lines(df$a, df$d, col="green")
lines(df$a, df$e, col="blue")
You first need to melt your data so that you have one column that designates from which variables the data comes from (call it variable) and another column that lists actual value (call it value). Study the example below to fully understand what happens to the variables from the original data.frame you want to keep constant.
library(reshape2)
xymelt <- melt(xy, id.vars = "a")
library(ggplot2)
ggplot(xymelt, aes(x = a, y = value, color = variable)) +
theme_bw() +
geom_line()
ggplot(xymelt, aes(x = a, y = value)) +
theme_bw() +
geom_line() +
facet_wrap(~ variable)
This code is also drawing column from your data called "d". You can remove it prior to melting, after melting, prior to plotting... or plot it.

how to make a pie graph only name top n performance

I haven't been using pie graph a lot in r, is there a way to make a pie graph and only show the top 10 names with percentage?
For example, here's a simple version of my data:
> data
count METRIC_ID
1 8 71
2 2 1035
3 5 1219
4 4 1277
5 1 1322
6 3 1444
7 5 1462
8 17 1720
9 6 2019
10 2 2040
11 1 2413
12 11 2489
13 24 2610
14 29 2737
15 1 2907
16 1 2930
17 2 2992
18 1 2994
19 2 3020
20 4 3045
21 35 3222
22 2 3245
23 5 3306
24 2 3348
25 2 3355
26 2 3381
27 3 3383
28 4 3389
29 6 3404
30 1 3443
31 22 3465
32 3 3558
33 15 3600
34 3 3730
35 6 3750
36 1 3863
37 1 3908
38 5 3913
39 3 3968
40 9 3972
41 2 3978
42 5 4077
43 4 4086
44 3 4124
45 2 4165
46 3 4205
47 8 4206
48 4 4210
49 12 4222
50 4 4228
and I want to see the count of each METRIC_ID's distribution:
pie(data$count, data$METRIC_ID)
But this Chart marks every single METRIC_ID on the graph, when I have over 100 METRIC_ID, it looks like a mess. How can I only mark the top n (for example, n=5) METRIC_ID on the graph, and show the count of that n METRIC_ID only?
Thank you for your help!!!
To suppress plotting of some labels, set them to NA. Try this:
labls <- data$METRIC_ID
labls[data$count < 3] <- NA
pie(data$count, paste(labls))
Simply subset your data before creating the piechart. I'd do somehting like:
Sort your datasets using order.
Select the first ten rows.
Create the pie chart from the resulting data.
Pie charts are not the best way to visualize your data, just google pie chart problems, e.g. this link. I'd go for something like:
library(ggplot2)
dat = dat[order(-dat$count),]
dat = within(dat, {METRIC_ID = factor(METRIC_ID, levels = METRIC_ID)})
ggplot(dat, aes(x = METRIC_ID, y = count)) + geom_point()
Here I just plot all the data, which I think still leads to a readable graph. This graph is more formally known as a dotplot, and is heavily used in the graphics book of Cleveland. Here the height is linked to count, which is much easier to interpret that linking count to the fraction of the area of a circle, as in the case of the piechart.
Find a better type of chart for your data.
Here is a possibility to create the chart you want:
data2 <- data[data$count %in% tail(sort(data$count),5),]
pie(data2$count, data2$METRIC_ID)
Slightly better:
data3 <- data2
data3$METRIC_ID <- as.character(data3$METRIC_ID)
data3 <- rbind(data3,data.frame(count=sum(data[! data$count %in% tail(sort(data$count),5),"count"]),METRIC_ID="others"))
pie(data3$count, data3$METRIC_ID)

Resources