I would like to know how to use NSE (Non-Standard Evaluation) expression in fct_reorder() in ggplot2 to replicate charts for different data frames.
This is an example of data frame that I use to draw a chart:
travel_time_br30 travel_time_br30_int time_reduction shift not_shift total
1 0-30 0 10 2780 3268 6048
2 0-30 0 20 2779 3269 6048
3 0-30 0 30 2984 3064 6048
4 0-30 0 40 3211 2837 6048
5 30-60 30 10 2139 2007 4146
6 30-60 30 20 2159 1987 4146
7 30-60 30 30 2363 1783 4146
8 30-60 30 40 2478 1668 4146
9 60-90 60 10 764 658 1422
10 60-90 60 20 721 701 1422
11 60-90 60 30 782 640 1422
12 60-90 60 40 801 621 1422
13 90-120 90 10 296 224 520
14 90-120 90 20 302 218 520
15 90-120 90 30 317 203 520
16 90-120 90 40 314 206 520
17 120-150 120 10 12 10 22
18 120-150 120 20 10 12 22
19 120-150 120 30 10 12 22
20 120-150 120 40 13 9 22
21 150-180 150 10 35 21 56
22 150-180 150 20 40 16 56
23 150-180 150 30 40 16 56
24 150-180 150 40 35 21 56
share
1 45.96561
2 45.94907
3 49.33862
4 53.09193
5 51.59190
6 52.07429
7 56.99469
8 59.76845
9 53.72714
10 50.70323
11 54.99297
12 56.32911
13 56.92308
14 58.07692
15 60.96154
16 60.38462
17 54.54545
18 45.45455
19 45.45455
20 59.09091
21 62.50000
22 71.42857
23 71.42857
24 62.50000
These are the scripts to draw a chart from above data frame:
g.var <- "travel_time_br30"
go.var <- "travel_time_br30_int"
test %>% ggplot(.,aes_(x=as.name(x.var),y=as.name("share"),group=as.name(g.var))) +
geom_line(size=1.4, aes(
color=fct_reorder(travel_time_br30,order(travel_time_br30_int))))
As I have several data frames which has different fields such as access_time_br30, access_time_br30_int instead of travel_time_br30 and travel_time_br30_int in the data frame, I set two variables (g.var and go.var) to easily replicate multiple chars in the same scripts.
As I need to reorder the factor group numerically, in particular, changing order of travel_time_br30 by travel_time_br30_int, I am using fct_reorder function in ggplot2(., aes_(...)). However, if I use aes_ with fct_reorder() in geom_line() as shown as an example in the following script, it returns an error saying Error:fmust be a factor (or character vector).
geom_line(size=1.4, aes_(color=fct_reorder(as.name(g.var),order(as.name(go.var)))))
Fct_reorder() does not seem to have an NSE version like fct_reorder_().
Is it impossible to use both aes_ and fct_reorder() in a sequence of scripts or are there any other solutions?
Based on my novice working knowledge of tidy-eval, you could transform your factor order in mutate() before passing the data into ggplot() and acheive your result.
Sorry I couldn't easily read in your table above, because of the line return so I made a new example off of mtcars that I think captures your intent. (let me know if it doesn't)
mtcars2 <- mutate(mtcars,
gear_int = 6 - gear,
gear_intrev = rev(gear_int)) %>%
mutate_at(vars(cyl, gear), as.factor)
library(rlang)
gg_reorder <- function(data, col_var, col_order) {
eq_var <- sym(col_var) # sym is flexible and my novice preference
eq_ord <- sym(col_order)
data %>% mutate(!!quo_name(eq_var) := fct_reorder(!!eq_var, !!eq_ord) ) %>%
ggplot(aes_(~mpg, ~hp, color = eq_var)) +
geom_line()
}
And now put it to use plotting...
gg_reorder(mtcars2, "gear", "gear_int")
gg_reorder(mtcars2, "gear", "gear_intrev")
I didn't specify all of the aes_() variables as strings but you could pass those as text and use the as.name() pattern. If you want more tidy-eval patterns Edwin Thoen wrote up a bunch of common cases.
Related
I am using a code based on Deseq2. One of my goals is to plot a heatmap of data.
heatmap.data <- counts(dds)[topGenes,]
The error I am getting is
Error in counts(dds)[topGenes, ]: subscript out of bounds
the first few line sof my counts(dds) function looks like this.
99h1 99h2 99h3 99h4 wth1 wth2
ENSDARG00000000002 243 196 187 117 91 96
ENSDARG00000000018 42 55 53 32 48 48
ENSDARG00000000019 91 91 108 64 95 94
ENSDARG00000000068 3 10 10 10 30 21
ENSDARG00000000069 55 47 43 53 51 30
ENSDARG00000000086 46 26 36 18 37 29
ENSDARG00000000103 301 289 289 199 347 386
ENSDARG00000000151 18 19 17 14 22 19
ENSDARG00000000161 16 17 9 19 10 20
ENSDARG00000000175 10 9 10 6 16 12
ENSDARG00000000183 12 8 15 11 8 9
ENSDARG00000000189 16 17 13 10 13 21
ENSDARG00000000212 227 208 259 234 78 69
ENSDARG00000000229 68 72 95 44 71 64
ENSDARG00000000241 71 92 67 76 88 74
ENSDARG00000000324 11 9 6 2 8 9
ENSDARG00000000370 12 5 7 8 0 5
ENSDARG00000000394 390 356 339 283 313 286
ENSDARG00000000423 0 0 2 2 7 1
ENSDARG00000000442 1 1 0 0 1 1
ENSDARG00000000472 16 8 3 5 7 8
ENSDARG00000000476 2 1 2 4 6 3
ENSDARG00000000489 221 203 169 144 84 114
ENSDARG00000000503 133 118 139 89 91 112
ENSDARG00000000529 31 25 17 26 15 24
ENSDARG00000000540 25 17 17 10 28 19
ENSDARG00000000542 15 9 9 6 15 12
How do I ensure all the elements of the top genes are present in it?
When I try to see 20 top genes in the dataset. it looks like a list of genes
6339" "12416" "1241" "3025" "12791" "846" "15090"
[8] "6529" "14564" "4863" "12777" "1122" "7454" "13716"
[15] "5790" "3328" "1231" "13734" "2797" "9072" with the column head V1.
I have used both
topGenes <- read.table("E://mir99h50 Cheng data//topGenesresordered.txt",header = TRUE)
and
topGenes <- read.table("E://mir99h50 Cheng data//topGenesresordered.txt",header = FALSE)
to see if the out of bounds error is removed. However it was of no use. I guess the V1 head is causing the issue.
The top genes function has been generated using the above code snippet.
resordered <- res[order(res$padj),]
#Reorder gene list by increasing pAdj
resordered <- as.data.frame(res[order(res$padj),])
#Filter for genes that are differentially expressed with an FDR < 0.01
ii <- which(res$padj < 0.01)
length(ii)
# Use the rownames() function to get the top 20 differentially expressed genes from our results table
topGenes <- rownames(resordered[1:20,])
topGenes
# Get the counts from the DESeqDataSet using the counts() function
heatmap.data <- counts(dds)[topGenes,]
Perhaps this will do what you want?
counts_dds <- counts(dds)
topgenes <- c("ENSDARG00000000002", "ENSDARG00000000489", "ENSDARG00000000503",
"ENSDARG00000000540", "ENSDARG00000000529", "ENSDARG00000000542")
heatmap.data <- counts_dds[rownames(counts_dds) %in% topgenes,]
If you provide more information it will be easier to advise you on how to fix your problem.
I have a dataset in a given format:
USER.ID avgfrequency
1 3 3.7821782
2 7 14.7500000
3 9 13.4761905
4 13 5.1967213
5 16 6.7812500
6 26 41.7500000
7 49 13.6666667
8 50 7.0000000
9 51 1.0000000
10 52 17.7500000
11 69 4.5000000
12 75 9.9500000
13 91 84.2000000
14 98 8.0185185
15 138 14.2000000
16 139 34.7500000
17 149 7.6666667
18 155 35.3333333
19 167 24.0000000
20 170 7.3529412
21 171 4.4210526
22 175 6.5781250
23 176 19.2857143
24 177 10.4864865
25 178 28.0000000
26 180 4.8461538
27 183 25.5000000
28 184 13.0000000
29 210 32.0000000
30 215 13.4615385
31 220 11.3611111
32 223 26.2500000
I want to first sort the dataset by avgfrequency and then I want to plot count of USER.ID's that fall under different bin categories.
I want to divide avgfrequency into different bin categories of width 10.
I am trying to sort data using:
user_avgfrequency <- user_avgfrequency[order(user_avgfrequency[,1]), ]
but getting an error.
df <- data.frame(USER.ID=c(3,7,9,13,16,26,49,50,51,52,69,75,91,98,138,139,149,155,167,170,171,175,176,177,178,180,183,184,210,215,220,223), avgfrequency=c(3.7821782,14.7500000,13.4761905,5.1967213,6.7812500,41.7500000,13.6666667,7.0000000,1.0000000,17.7500000,4.5000000,9.9500000,84.2000000,8.0185185,14.2000000,34.7500000,7.6666667,35.3333333,24.0000000,7.3529412,4.4210526,6.5781250,19.2857143,10.4864865,28.0000000,4.8461538,25.5000000,13.0000000,32.0000000,13.4615385,11.3611111,26.2500000) );
breaks <- seq(0,ceiling(max(df$avgfrequency)/10)*10,10);
cols <- colorRampPalette(c('blue','green','red'))(length(breaks)-1);
hist(df$avgfrequency,breaks,col=cols,axes=F,xlab='Average Frequency',ylab='Count');
axis(1,breaks);
axis(2,0:max(tabulate(cut(df$avgfrequency,breaks))));
I am using the package trees found here, by #jbaums and explained in this post.
My data are the following:
the tree is composed by
the trunk
Trunk
[1] 13.60415
and the branches
Tree
TreeBranchLength TreeBranchID
1 10.004269 1
2 7.994269 2
3 9.028834 11
4 10.817401 12
5 8.551311 111
6 10.599798 112
7 11.073243 121
8 13.367392 122
9 9.625431 1111
10 10.793569 1112
11 9.896499 11121
12 8.687741 11122
13 7.791180 1211
14 12.506105 1212
15 6.768478 1221
16 10.441796 1222
17 10.751892 1121
18 9.458651 1122
19 10.768509 11221
20 10.150673 11222
21 12.377448 111211
22 12.235136 111212
23 9.074079 11211
24 9.996334 11212
25 9.807019 112221
26 10.895809 112222
27 6.741274 1122211
28 15.841272 1122212
29 5.753920 11222111
30 8.846389 11222112
31 11.925961 112111
32 9.780776 112112
33 8.207965 12221
34 10.079375 12222
the 50 squirrel populations -
Populations
PopulationPositionOnBranch PopulationBranchID ID
1 10.6321655 112111 1
2 1.0644897 1 2
3 3.9315473 1 3
4 1.0310244 0 4
5 9.1768846 0 5
6 13.4267181 0 6
7 7.9461528 0 7
8 6.0533401 121 8
9 2.1227425 121 9
10 1.8256787 121 10
11 4.7332588 11222112 11
12 4.4837432 11222112 12
13 4.6200834 11222112 13
14 2.5622276 1221 14
15 1.2446683 1221 15
16 7.0674052 111 16
17 1.3854674 111 17
18 4.8735635 111 18
19 9.5007998 1222 19
20 6.6373468 1222 20
21 12.6757728 122 21
22 4.2685465 122 22
23 3.9806540 2 23
24 3.1025403 2 24
25 3.9119065 11122 25
26 1.5527653 11122 26
27 1.6687957 11122 27
28 8.0697456 1122 28
29 6.7871391 1122 29
30 9.8050713 111212 30
31 8.5226920 111212 31
32 3.6113379 111212 32
33 7.3184965 111211 33
34 8.6142984 111211 34
35 1.3550870 1211 35
36 8.3650639 12 36
37 4.6411446 112112 37
38 3.2985541 112112 38
39 12.2344148 1212 39
40 9.0290776 1212 40
41 1.3900249 1121 41
42 0.9261425 1122212 42
43 15.2522199 1122212 43
44 4.0253771 12222 44
45 8.7507678 11222 45
46 4.6289841 1122211 46
47 9.1799522 112 47
48 5.1293838 12221 48
49 1.1543080 12221 49
50 10.1014837 112222 50
the code to produce the plot
g <- germinate(list(trunk.height=Trunk,
branches=Tree$TreeBranchID,
lengths=Tree$TreeBranchLength),
left='1', right='2', angle=30))
xy <- squirrels(g, Populations$PopulationBranchID, pos=Populations$PopulationPositionOnBranch,
left='1', right='2', pch=21, bg='white', cex=3, lwd=2)
text(xy$x, xy$y, labels=seq_len(nrow(xy)), font=1)
, which produces
As you can see on the plot bellow population 43 (blue arrow) is out of the tree.. It seems that the length of the branches on the plot do not correspond to the data. For example the branch (left green arrow) on which are populations 38 and 37 is longer than the one where population 43 is (right green arrow), that is not the case in the data. What am I doing wrong? Have I understood correctly how to use trees?
On studying the germinate function it seems to me that the Tree values that you are passing to it needs to be sorted on TreeBranchId field in the ascending order.
The BranchID: 1122212 where you have placed 43 is not the actual 1122212 branch.
Due to the order in which you have fed the values in the Tree, the function is somehow messing the location of branch.
I was curious to see if I increase the length of Branch ID: 1122212, will it change the branch where 43 is placed, and guess what? it didn't. The branch which actually showed an increase in length was the branch where you have placed 37 and 38.
So this hint pointed out that something was wrong with germinate function. On further debugging I was able to make it work using the below code.
Tree<-read.csv("treeBranch.csv")
Tree<-Tree[order(Tree$TreeBranchID),]
g <- germinate(list(trunk.height=15,
branches=Tree$TreeBranchID,
lengths=Tree$TreeBranchLength),
left='1', right='2', angle=30)
xy <- squirrels(g, Populations$PopulationBranchID,pos=Populations$PopulationPositionOnBranch,
left='1', right='2', pch=21, bg='white', cex=3, lwd=2)
text(xy$x, xy$y, labels=seq_len(nrow(xy)), font=1)
I have following data and trying change CCG and Pract to numbers so I can use stan or Winbugs...when I try to change it seems its changing the order of the data..
I want to change CCG and Pract to numbers without changing the order of the data...I tried hard but I couldn't do it.
I am struggling with this basic issue than writing Bugs codes....please help..
I have the following data
CCG pract Deno Numer Points Excep
1 01C N81049 49 46 4 4
2 01C N81022 28 26 4 23
3 01C N81632 66 64 4 4
4 01C N81069 15 14 4 3
5 01C N81062 98 89 4 9
6 01C N81033 31 28 4 9
I tried to change to integer using as.integer() and I am getting I am getting..
CCG pract Deno Numer Points Excep
1 20 6621 160 144 41 36
2 20 6594 130 117 41 18
3 20 6698 179 164 41 36
4 20 6640 57 46 41 25
5 20 6633 214 191 41 62
6 20 6605 137 119 41 62
By checking Deno and Numer it is clear the order of the data has been changed...Why CCG is not starting from 1?
I want
CCG pract Deno Numer Points Excep
1 01C N81049 49 46 4 4
2 01C N81022 28 26 4 23
3 01C N81632 66 64 4 4
4 01C N81069 15 14 4 3
5 01C N81062 98 89 4 9
6 01C N81033 31 28 4 9
change to something like this
CCG pract Deno Numer Points Excep
1 1 1 49 46 4 4
2 1 1 28 26 4 23
3 1 1 66 64 4 4
4 1 1 15 14 4 3
5 1 1 98 89 4 9
6 1 1 31 28 4 9
Please help me..
In R, factors are internally represented as integers, linking to a table of the factor levels. AFAIK, these internal integers are assigned based on a lexicographic order of the factor levels, so 57 gets a higher code than 238.
as.integer() will extract this internal integer coding. As you found out, this is not very useful. (I honestly don't understand why R does this when applying as.integer() to factors that have integers as factor levels.)
Solution: first convert to character, then to integer. as.integer(as.character(Deno))
I haven't been using pie graph a lot in r, is there a way to make a pie graph and only show the top 10 names with percentage?
For example, here's a simple version of my data:
> data
count METRIC_ID
1 8 71
2 2 1035
3 5 1219
4 4 1277
5 1 1322
6 3 1444
7 5 1462
8 17 1720
9 6 2019
10 2 2040
11 1 2413
12 11 2489
13 24 2610
14 29 2737
15 1 2907
16 1 2930
17 2 2992
18 1 2994
19 2 3020
20 4 3045
21 35 3222
22 2 3245
23 5 3306
24 2 3348
25 2 3355
26 2 3381
27 3 3383
28 4 3389
29 6 3404
30 1 3443
31 22 3465
32 3 3558
33 15 3600
34 3 3730
35 6 3750
36 1 3863
37 1 3908
38 5 3913
39 3 3968
40 9 3972
41 2 3978
42 5 4077
43 4 4086
44 3 4124
45 2 4165
46 3 4205
47 8 4206
48 4 4210
49 12 4222
50 4 4228
and I want to see the count of each METRIC_ID's distribution:
pie(data$count, data$METRIC_ID)
But this Chart marks every single METRIC_ID on the graph, when I have over 100 METRIC_ID, it looks like a mess. How can I only mark the top n (for example, n=5) METRIC_ID on the graph, and show the count of that n METRIC_ID only?
Thank you for your help!!!
To suppress plotting of some labels, set them to NA. Try this:
labls <- data$METRIC_ID
labls[data$count < 3] <- NA
pie(data$count, paste(labls))
Simply subset your data before creating the piechart. I'd do somehting like:
Sort your datasets using order.
Select the first ten rows.
Create the pie chart from the resulting data.
Pie charts are not the best way to visualize your data, just google pie chart problems, e.g. this link. I'd go for something like:
library(ggplot2)
dat = dat[order(-dat$count),]
dat = within(dat, {METRIC_ID = factor(METRIC_ID, levels = METRIC_ID)})
ggplot(dat, aes(x = METRIC_ID, y = count)) + geom_point()
Here I just plot all the data, which I think still leads to a readable graph. This graph is more formally known as a dotplot, and is heavily used in the graphics book of Cleveland. Here the height is linked to count, which is much easier to interpret that linking count to the fraction of the area of a circle, as in the case of the piechart.
Find a better type of chart for your data.
Here is a possibility to create the chart you want:
data2 <- data[data$count %in% tail(sort(data$count),5),]
pie(data2$count, data2$METRIC_ID)
Slightly better:
data3 <- data2
data3$METRIC_ID <- as.character(data3$METRIC_ID)
data3 <- rbind(data3,data.frame(count=sum(data[! data$count %in% tail(sort(data$count),5),"count"]),METRIC_ID="others"))
pie(data3$count, data3$METRIC_ID)