Hello, in what way can I neatly show what points correspond to which sample of "soy yoghurt", "oat yoghurt" and "activia".
The code I have used to generate the plot is here
color_type = rep("white", length(sample_type))
color_type[soy_yoghurt] = "brown"
color_type[oat_yoghurt] = "blue"
color_type[activia] = "gold"
pch_type = rep(NA, length(sample_type))
pch_type[all_yoghurt] = 21 # Circle symbols
# run the mds algorithm
mds = metaMDS(bray_dist)
#plot the results
par(mar=c(5,5,2,2), xpd = TRUE)
plot(main= "Ordination of milk-products",
mds$points[,1], mds$points[,2], cex = 3, pch = pch_type,
col = "black", bg = color_type, xlab = "NMDS1", ylab = "NMDS2"
I am trying to figure out how to get duplicates out of list objects in R.
So my example list:
examplelist <- list(a = c("blue", "red", "yellow"),
b = c("red", "black", "green"),
c = c("black", "green", "brown"))
What I would like to get as a result:
duplicates: c("red", "black", "green")
vector of all entries, without double entries: c("blue", "red", "yellow", "black", "green", "brown")
I was not able to find a function for that other than duplicated() which just checks my list objects in total but not the entries itselves.
Thank you for your help :)
You can unlist first:
unlisted <- unlist(examplelist)
unlisted[duplicated(unlisted)]
# b1 c1 c2
# "red" "black" "green"
unlisted[!duplicated(unlisted)]
# a1 a2 a3 b2 b3 c3
# "blue" "red" "yellow" "black" "green" "brown"
If you only want the vector (without the names), use unname:
unlisted <- unname(unlist(examplelist))
I am trying to plot a survival curve with ten different color for ten different line, its basically the survival probability based on ten regions.
The problem right now is as the default palette number is 8, the last two will color will be repeating even if I give col=1:10 which is making it difficult for us to interpret the result.
Can I used the rainbow function here ?
pdf("E:/survplot.pdf", width=10, height=10)
plot(survfit(survobj ~ strata(region), data=rfdata,type="kaplan-meier",conf.int=FALSE),mark.time=FALSE,col=1:10,xlim=c(0,400),ylim=c(0,1),ylab="Probability", xlab="Days on Lot",main="Survival Probability")
legend("topright",legend = c(1:10), lty = 1, col = 1:10,title = "Regions")
dev.off()
palette()
[1] "black" "red" "green3" "blue" "cyan" "magenta" "yellow" "gray"
I have a matrix in the following format:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] "blue" "red" "blue" "blue" "blue" "red" "green" "blue" "blue"
[2,] "green" "red" "blue" "blue" "blue" "red" "green" "blue" "blue"
[3,] "yellow" "red" "blue" "blue" "blue" "red" "green" "blue" "blue"
[4,] "red" "red" "blue" "blue" "blue" "red" "green" "blue" "blue"
[5,] "blue" "red" "green" "blue" "blue" "red" "green" "blue" "blue"
[6,] "green" "red" "green" "blue" "blue" "red" "green" "blue" "blue"
...
How do I quickly calculate the max color and count per row.
For instance, for row 1, it would be "blue, 6". I am doing this via an apply command that calls "table".
However, my matrix has 1.9 million rows so it takes too long. How can I vectorize this?
How many different possibilities do you have for each cell of the matrix? Is it just like in your example? If yes something like the following may be faster
dat <- structure(c("blue", "green", "yellow", "red", "blue", "green",
"red", "red", "red", "red", "red", "red", "red", "red", "blue",
"blue", "blue", "blue", "green", "green", "red", "blue", "blue",
"blue", "blue", "blue", "blue", "red", "blue", "blue", "blue",
"blue", "blue", "blue", "blue", "red", "red", "red", "red", "red",
"red", "blue", "green", "green", "green", "green", "green", "green",
"blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue",
"blue", "blue", "blue", "blue", "blue", "blue", "green"), .Dim = c(7L,
9L))
values <- c("blue", "red", "green", "yellow")
counts <- vapply(values, function(value) rowSums(dat == value),
numeric(nrow(dat))) # Thanks to #RichardScriven for the improvement :)
counts
# blue red green yellow
# [1,] 6 2 1 0
# [2,] 5 2 2 0
# [3,] 5 2 1 1
# [4,] 5 3 1 0
# [5,] 5 2 2 0
# [6,] 4 2 3 0
# [7,] 4 4 1 0
max.value.col <- max.col(counts)
max.value <- colnames(counts)[max.value.col]
max.counts <- counts[cbind(1:nrow(counts), max.value.col)]
paste(max.value, max.counts, sep = ", ")
# [1] "blue, 6" "blue, 5" "blue, 5" "blue, 5" "blue, 5" "blue, 4"
If you want to get the names of all columns, if there is a tie, the following would work but may take a while (not sure about the performance of apply in this case)
max.value.all.cols <- counts == counts[cbind(1:nrow(counts), max.value.col)]
paste(
apply(max.value.all.cols, 1, function(r) paste(paste(colnames(counts)[r],
collapse = ", "))),
max.counts, sep = ", ")
Here's an actual data.table solution I think. Leverages data.table's fast .N for counting row frequencies
library(data.table)
flip <- data.table(t(mat))
tally <- lapply(names(flip),
function(x) {
setnames(flip[, .N, by=eval(x)][order(-N)][1,],
c('clr', 'N')) } )
do.call(rbind, tally)
# clr N
# 1: blue 6
# 2: blue 5
# 3: blue 5
# 4: blue 5
# 5: blue 5
# 6: blue 4
I take the matrix and transpose it, then do counts by each column (i.e. by each row of the original matrix). The setnames bit is required so that we can conveniently collapse the results together, but if you are happy to get the results in list form it's not required.
I used the same data as others:
mat <-
matrix(c( "blue","red","blue","blue","blue","red","green","blue","blue",
"green","red","blue","blue","blue","red","green","blue","blue",
"yellow","red","blue","blue","blue","red","green","blue","blue",
"red","red","blue","blue","blue","red","green","blue","blue",
"blue","red","green","blue","blue","red","green","blue","blue",
"green","red","green","blue","blue","red","green","blue","blue"),
ncol = 9, byrow = TRUE)
I am plotting boxplots of fish biomass by reefname, in order of median biomass. All reefnames (sites) are either in or out of a MPA, e.g MPA="1" or MPA=="0". Currently all plots show green.
How can I show MPA=="0" sites as blue and MPA=="1" as green for example. While maintaining the order of the fish biomass.
MPA <- factor(Fish$MPA)
bymedian <- with(Fish, reorder(ReefName, log10(Biomassm+1)), median)
boxplot(log10(Biomassm+1) ~ bymedian, data = Fish,
xlab = "ReefName", ylab = "Biomassm",
main = "Biomassm in Caribbean", varwidth = TRUE,
col=(c("darkgreen")), las=3, cex.axis=0.3)
Thank you
It might be a better idea to use the ggplot2 package for this. Your code would then look like this:
ggplot(data=Fish, aes(x=reorder(ReefName, log10(Biomassm+1)), median), y=Biomassm, fill=MPA)) +
geom_boxplot() +
scale_y_log10("Biomassm") +
xlab("ReefName") +
scale_fill_manual(values=c("blue", "green")) +
ggtitle("Biomassm in Caribbean")
Here's a set of boxplots coloured depending on the value of MPA:
# generate some data
set.seed(1)
X = matrix(rnorm(100), ncol=10)
# order by median
X = X[,order(apply(X, 2, median))]
# some fake MPA values
MPA = round(runif(n=10, min=0, max=1))
# generate boxplots and check if MPA==1
boxplot(X, col=ifelse(test=MPA==1, yes='green', no='blue'))
# add legend
legend(x='bottomleft', fill=c('green','blue'), legend=c('MPA=1', 'MPA=0'), inset=c(0.01))
The output of ifelse is a vector of colours according to the MPA values and these are used to colour the boxes:
[1] "blue" "blue" "green" "blue" "blue" "green" "green" "blue" "blue" "green"