R|ggplot2: unordered stacked bar graph [duplicate] - r

This question already has answers here:
Stacked bar chart
(4 answers)
Closed 7 years ago.
I have a data set that looks like this:
samp.data <- structure(list(Track = c(1,1,1,1,1,1,1,1,2,2,2),
Base = c("A","C","B","A","D","D","C","A","A","B","B"),
Length = c(1,1,1,1,2,3,1,1,1,1,1)),
.Names = c("Track", "Base", "Length"), class = "data.frame",row.names = c(NA, 11L))
# Track Base Length
# 1 1 A 1
# 2 1 C 1
# 3 1 B 1
# 4 1 A 1
# 5 1 D 2
# 6 1 D 3
# 7 1 C 1
# 8 1 A 1
# 9 2 A 1
# 10 2 B 1
# 11 2 B 1
I am trying to plot an unordered stacked bar, with Tracks on the x axis and Length on the y axis. In other words, the bar graph wouldn't group the A bases together and plot it as one length of 1+1+1+1=4. It would plot each base in order. First it would plot the A base of length 1 in Track 1, C base of length 1 above that, B base of length 1 above that, A base of length 1 above that, D base of length 2 above that, and so on.
Below is a crude ASCII diagram of what I am trying to describe:
| C
L | Y
e | Y Key
n | R A = Red
g | B B B = Blue
t | B G C = Green
h | R R D = Yellow
----------
2 1
Track
Sorry if the explanation is a little confusing. Thank you for your help!
Edit: This question is different from the possible duplicate, because I would like to ungroup the stacked sections.

Just use geom_bar(stat='identity'), set your x to Track, your y to length - it all works out.
Note - I converted your Base to factor (makes sense), as well as your Track (also makes sense to me, but if you wish to keep it numeric that's fine. You may wish to add a + scale_x_discrete() then in order to have your tracks show up as whole numbers on the x axis).
samp.data$Base <- factor(samp.data$Base)
samp.data$Track <- factor(samp.data$Track)
ggplot(samp.data, aes(x=Track, y=Length, fill=Base)) +
geom_bar(stat='identity') +
scale_fill_manual(values=c('red', 'blue', 'green', 'yellow'))
The last line sets the colours as you please.
If you wish to reverse the x axis order (so that your track 2 appears first), do + scale_x_reverse().
I do not know what you mean by "ungroup the base" in your question, but say you wanted to draw an outline around each "chunk" of DNA you could add (e.g.) colour="black" in the geom_bar (e.g. in track 1, there is a D of length 2 immediately followed by a D of length 3 so it's drawn as a big D of length 5 - adding colour="black" outlines the 2-chunk separately to the 3-chunk though they still have the same colour).

Related

How can I fix colors for the numbers in a matrix

Currently I try to make multiple visualizations in which the numbers in a matrix must get a certain (fixed) color in an image.
Due to the fact that I cannot find a way to really assign a color to a fixed number this causes me more trouble than I had thought.
The problem shows in the following examples:
Say we define the following colors to be associated with the following numbers
cols <- c(
'0' = "#FFFFFF",
'1' = "#99FF66",
'2' = "#66FF33",
'3' = "#33CC00",
'4' = "#009900"
)
image(as.matrix(d), col=cols)
Now if we visualise the following matrix all seems good
d<-read.table(text="
0 1 0 3
3 2 1 4
4 1 0 2
3 3 0 1")
image(as.matrix(d), col=cols)
However if a visualise the following matrix the problem becomes clear
d<-read.table(text="
1 1 1 3
3 2 1 4
4 1 2 2
3 3 2 1")
image(as.matrix(d), col=cols)
We should be skipping white ("#FFFFFF") as the number 0 is not present. However R chooses to use white ("#FFFFFF") anyhow and asociate that with the number 1 skipping "#009900" instead.
For the consistency of my visualizations it is rather important that colors remain associated with the same numbers for all images, so how can I implement this?
remove the color values that are not prominent in your matrix:
image(as.matrix(d), col=cols[names(cols)%in%unlist(d)])
unlist works only on lists as the name tells.
If d is already a matrix simply use c(d)
Thanks to Andre's advice I can solve it in a rather neat fashion
d<-as.matrix(read.table(text="
1 1 1 3
3 2 1 4
4 1 1 2
3 3 1 1"))
cols <- c(
'0' = "#FFFFFF",
'1' = "#99FF66",
'2' = "#66FF33",
'3' = "#33CC00",
'4' = "#009900"
)
image(as.matrix(d), col= cols[ names(cols) %in% d ])

Identify the largest point within the radius of another point?

Use this example data to see what I mean
tag <- as.character(c(1,2,3,4,5,6,7,8,9,10))
species <- c("A","A","A","A","B","B","B","C","C","D")
size <- c(0.10,0.20,0.25,0.30,0.30,0.15,0.15,0.20,0.15,0.15)
radius <- (size*40)
x <- c(9,4,25,14,28,19,9,22,10,2)
y <- c(36,7,15,16,22,24,39,20,34,9)
data <- data.frame(tag, species, size, radius, x, y)
# Plot the points using qplot (from package tidyverse)
qplot(x, y, data = data) +
geom_point(aes(colour = species, size = size))
Now that you can see the plot, what I want to do is for each individual “species A” point, I’d like to identify the largest point within a radius of size*40.
For example, in the bottom left of the plot you can see that species A (tag 2) would produce a radius large enough to contain the close species D point.
However, the species A point on the far right-hand-side of the plot (tag 3) would produce a radius large enough to contain both of the close species B and species C points, in which case I’d want some sort of output that identifies the largest individual within the species A radius.
I’d like to know what I can run (if anything) on this data set to get find the largest “within radius” point for each species A point and get an output like this:
Species A point ---- Largest point within radius
Species A tag 1 ----- Species C tag 9
Species A tag 2 ----- Species D tag 10
Species A tag 3 ----- Species B tag 5
Species A tag 4 ----- Species C tag 8
I've used spatstat and CTFSpackage to make some plots in the past but I can't figure out how to "find largest neighbor within radius". Perhaps I can tackle this in ArcMAP? Also, this is just a small example dataset. Realistically I will be wanting to find the "largest neighbor within radius" for thousands of points.
Any help or feedback would be greatly appreciated.
Following finds the largest species and tag pair that is within given radius for each of the species.
all_df <- data # don't wanna have a variable called data
res_df <- data.frame()
for (j in 1 : nrow(all_df)) {
# subset the data
df <- subset(all_df, species != species[j])
# index of animals within radius
ind <- which ((df$x - x[j])^2 + (df$y - y[j])^2 < radius[j]^2 )
# find the max `size` in the subset df
max_size <- max(df$size[ind])
# all indices with max_size in df
max_inds <- which(df$size[ind] == max_size)
# pick the last one is there is more than on max_size
new_ind <- ind[max_inds[length(max_inds)]]
# results in data.frame
res_df <- rbind(res_df, data.frame(org_sp = all_df$species[j],
org_tag = all_df$tag[j],
res_sp = df$species[new_ind],
res_tag = df$tag[new_ind]))
}
res_df
# org_sp org_tag res_sp res_tag
# 1 A 1 C 9
# 2 A 2 D 10
# 3 A 3 B 5
# 4 A 4 C 8
# 5 B 5 A 3
# 6 B 6 C 8
# 7 B 7 C 9
# 8 C 8 B 5
# 9 C 9 B 7
# 10 D 10 A 2

How can I create a qqplot based on equal values in one column in a loop?

I would like to create a code for several stacked qqplots using the following data.frame. Each plot should refer to a specific net number and should look more or less like this:
(The original data.frame contains 20 volume classes (VCL) for every Net number, here displayed are only the values for Net 1).
It should be possible to apply the code to other data.frames with a different amount of nets.
> f.perr.p
VCl simtype Net perr
1 V1 F 1 2.413043e-03
3 V1 F 3 1.000000e-03
5 V1 F 5 2.173913e-04
14 V2 F 1 2.673913e-03
16 V2 F 3 1.130435e-03
18 V2 F 5 3.043478e-04
...
261 V1 nF 1 4.195652e-03
263 V1 nF 3 4.152174e-03
265 V1 nF 5 3.760870e-03
274 V2 nF 1 4.260870e-03
276 V2 nF 3 4.304348e-03
278 V2 nF 5 4.021739e-03
...
So, my approach would be to create a for-loop, telling R to create a plot with the data that is equal in column "Nets", then doing the same for the next net number until I have a stacked plot of, in this particular case 3 plots referring to Net 1, Net 2 and Net 3.
I started with this, but I have no clue how to proceed:
x<-which(f.perr.p$VCl == "V1" & f.perr.p$simtype == "F") # to identify how many plots should be created
> x
[1] 1 2 3
# VCl = x-axis,
# simtype = colorfill,
# Net= plot.nr,
# perr=y-axis
op<-par(mfrow=c(length(x),1))
for(i in 1:length(x){
ggplot(f.perr.p, aes(factor(VCl), perr, fill = simtype)) +
geom_bar(stat="identity", position = "dodge", colour="black")
})
par(op)
I know that this code can't work at all, but I cannot figure out how to make clear that with each run of the loop only the rows with a specific net number should be considered.
What do I have to include and where?
I hope I expressed my problem in a clear way. Your help is very much appreciated, thank you for your time!

Stata - Preserving encoded variable and stacked graphing

These data represent ice cream preferences where individuals can change these preferences over time
id time flavor_str flavor_enc
1 1 C 1
1 2 C 1
1 3 V 2
2 1 S 3
2 2 V 2
2 3 C 1
3 1 V 2
4 1 C 1
4 2 V 2
Note: flavor_enc is showing a number, but in Stata it would show the string name in blue, which represents the number
Two issues.
When I create a variable off of the encoded, for example
g initial_pref = 0
replace initial_pref = flavor_enc if = time == 1
OR
bysort id: egen max_pref = max(flavor_enc)
The variable first_pref takes on the encoded numeric, however, I would like to keep it in the same format as flavor_enc.
I then want to create a stacked bar chart (by flavor on the x-axis) and the frequency (on the y-axis). The chart would have one piece of the bar that represents the number of times a given flavor was someones initial preference, a second piece that represents the number of times that flavor was someone's second preference (they switched from their initial, 0 otherwise), and the last piece representing the number of times a flavor was their third preference.
For these data the chart would use these inputs.
C as initial = 2
V as initial = 1
S as initial = 1
C as second = 0
V as second = 3
S as second = 0
C as third = 1
V as third = 0
S as third = 0
I tried graph bar with the stacking option but that did not work. I also could see how to do this outside of Stata but was hoping Stata had the functionality.
The wording is not completely clear to me, but I believe the first issue can be managed with clonevar:
clonevar initial_pref2 = flavor_enc
replace initial_pref2 = 0 if time != 1
Regarding your latest comment (and edit), if you want to compute the maximum and still use clonevar, it is possible:
clonevar max_pref2 = flavor_enc
bysort id (max_pref2): replace max_pref2 = max_pref2[_N]
If you have missings in flavor_enc, adjustments are necessary.
An alternative solution involves extracting the data attributes from the original variable using extended macro functions (help extended_fcn), and assigning them to the new variable.
One way to tackle the graph issue is as follows:
clear
set more off
*----- example data -----
input ///
id time str1 flavor_str flavor
1 1 C 1
1 2 C 1
1 3 V 2
2 3 C 1
2 1 S 3
2 2 V 2
3 1 V 2
4 2 V 2
4 1 C 1
end
drop flavor_str
sort id time
list, sepby(id)
*----- bar graph -----
quietly tabulate time, gen(tt)
collapse (sum) tt*, by(flavor)
label define lblflavor 1 "flavor 1" 2 "flavor 2" 3 "flavor 3"
label values flavor lblflavor
graph bar (asis) tt*, over(flavor) stack ///
ylabel(none) blabel(bar, position(center)) legend(off)
But for sure there is a better way. I seldom use these so my experience is minimal.
I can't say much about its appropriateness except that for this example, it seems like an awful waste of space.

Function that group values of a list (in R)

I am trying to construct a function which shouldn't be hard in terms of programming but I am having some difficulties to conceptualize it. Hope you'll be able to understand my problem better than me!
I'd like a function that takes a single list of vectors as argument. Something like
arg1 = list(c(1,2), c(2,3), c(5,6), c(1,3), c(4,6), c(6,7), c(7,5), c(5,8))
The function should output a matrix with two columns (or a list of two vectors or something like that) where one column contains letters and the other numbers. One can think of the argument as a list of the positions/values that should be placed in the same group. If in the list there is the vector c(5,6), then the output should contain somewhere the same letters next to the values 5 and 6 in the number column. If there are the three following vectors c(1,2), c(2,3) and c(1,3), then the output should contain somewhere the same letters next to the value 1, 2 and 3 in the number column.
Therefore if we enter the object arg1 in the function it should return:
myFun(arg1)
number_column letters_column
1 A
2 A
3 A
5 B
6 B
7 B
4 C
6 C
5 D
8 D
(the order is not important. The letters E should not be present before the letter D has been used)
Therefore the function has constructed 2 groups of 3 (A:[1,2,3] and B:[5,6,7]) and 2 groups of 2 (C:[4,6] and D:[5,8]). Note one position or number can be in several group.
Please let me know if something is unclear in my question! Thanks!
As I wrote in the comments, it appears that you want a data frame that lists the maximal cliques of a graph given a list of vectors that define the edges.
require(igraph)
## create a matrix where each row is an edge
argmatrix <- do.call(rbind, arg1)
## create an igraph object from the matrix of edges
gph <- graph.edgelist(argmatrix, directed = FALSE)
## returns a list of the maximal cliques of the graph
mxc <- maximal.cliques(gph)
## creates a data frame of the output
dat <- data.frame(number_column = unlist(mxc),
group_column = rep.int(seq_along(mxc),times = sapply(mxc,length)))
## converts group numbers to letters
## ONLY USE if max(dat$group_column) <= 26
dat$group_column <- LETTERS[dat$group_column]
# number_column group_column
# 1 5 A
# 2 8 A
# 3 5 B
# 4 6 B
# 5 7 B
# 6 4 C
# 7 6 C
# 8 3 D
# 9 1 D
# 10 2 D

Resources