perhaps a dumb question, yet I cannot find an answer.
If I make a mosaic plot with a vcd package so:
library(vcd)
test<-matrix(c(65,31,495,651), ncol=2,byrow=T)
colnames(test)<-c("2010", "2011")
rownames(test)<-c("yes", "now")
mosaic(test, shade=T, legend=T)
it works like a charm except that the superscriptions over the years and the outputs (yes/no) are shown "A" and "B".
I would like to name these "Years" and "Outputs" but I cannot find a parameter for this.
How could I do this? Thanks in advance.
You can specify dimnames this way :
dimnames(test) <- list(foo=colnames(test),bar=rownames(test))
mosaic(test, shade=T, legend=T)
In fact, mosaic is better suited to be applied to contingency tables, where the labels are determined by the table function :
color <- sample(c("red","blue"),10,replace=TRUE)
color2 <- sample(c("yellow","green"),10,replace=TRUE)
tab <- table(color,color2)
mosaic(tab, shade=T)
Related
I am using the R programming language and I am new the GGally library. I followed some basic tutorials online and ran the following code:
#load libraries
library(GGally)
library(survival)
library(plotly)
I changed some of the data types:
#manipulate the data
data(lung)
data = lung
data$sex = as.factor(data$sex)
data$status = as.factor(data$status)
data$ph.ecog = as.factor(data$ph.ecog)
Now I visualize:
#make the plots
#I dont know why, but this comes out messy
ggparcoord(data, groupColumn = "sex")
#Cleaner
ggparcoord(data)
Both ggparcoord() code segments successfully ran, however the first one came out pretty messy (the axis labels seem to have been corrupted). Is there a way to fix the labels?
In the second graph, it makes it difficult to tell how the factor variables are labelled on their respective axis (e.g. for the "sex" column, is "male" the bottom point or is "female" the bottom type). Does anyone know if there is a way to fix this?
Finally, is there a way to use the "ggplotly()" function for "ggally" objects?
e.g.
a = ggparcoord(data)
ggplotly(a)
Thanks
Looks like your data columns get converted to a factor when adding the groupColumn. To prevent that you could exclude the groupColumn from the columns to be plotted:
BTW: Not sure about the general case. But at least for ggparcoord ggplotly works.
library(GGally)
library(survival)
data(lung)
data = lung
data$sex = as.factor(data$sex)
data$status = as.factor(data$status)
data$ph.ecog = as.factor(data$ph.ecog)
#I dont know why, but this comes out messy
ggparcoord(data, seq(ncol(data))[!names(data) %in% "sex"], groupColumn = "sex")
I want to perform a PCA analysis in adegenet starting from a genepop file without defined populations.
I imported the data like this:
datapop <- read.genepop('tous.gen', ncode=3, quiet = FALSE)
it works, and I can perform a PCA after scaling the data.
But I would like to plot the results / individuals on the PCA axis according to their population of origin using s.class. I have a vcf file with a three lettre code for each individual. I imported it in R:
pops_list <- read.csv('liste_pops.csv', header=FALSE)
but now how can I use it to define population levels in the genind object datapop?
I tried something likes this:
setPop(datapop, formula = NULL)
setPop(datapop) <- pops_list
but it doesn't work; even the first line doesn't work: I get this message:
"Erreur : formula must be a valid formula object."
And then how should I use it in s.class?
thanks
Didier
Without a working example it is kind of hard to tell but perhaps you can find the solution to your problem here: How to add strata information to a genind
Either way from your examples and given how the setPop method works, your line setPop(datapop, formula = NULL) would not work because you would not be defining anything. You would actually have to do:
setPop(datapop) <- pops_list
while also guaranteeing that pops_list is a factor with the appropriate format
I know this is a bit late, but the way to do this is to add pops_list as the strata and then use setPop() to select a certain column:
strata(datapop) <- pops_list
setPop(datapop) <- ~myPop # set the population to the column called "myPop" in the data frame
I am fairly new to R (coming from a Stata-background) and I am finding difficult to deal with some arguments when plotting with ggplot2. Please consider the following:
test <- data.frame(
time=c(1,2,3,1,2,3,1,2,3),
experiment=c(2,1,2,1,1,2,1,2,2)
)
test$time2 <- factor(test$time,
levels=c("1","2","3"),
labels=c("R1", "R2", "R3")
)
test$experiment2 <- factor(test$experiment,
levels=c(1,2),
labels=c("Yes", "No")
)
ggplot(test, aes(test$time2, ..count../3))+
geom_bar(aes(fill=test$experiment2))+
scale_y_continuous(labels=percent)
The above is just a silly example I just made up to ask about how to use "n" (number of observations) properly. If you reproduce the code above you will see that it graphs a stacked barplot (percentages). However, to make it I had to manually do: ..count../3
What I would like to find out in R is how to substitute that "3" by a generic argument. Looking on the Internet could not find anything, and tentatively I tried "N" and "n" to no avail. Thanks a lot for your help, the move from Stata to R is exciting but not as easy as one would think.
Here I have code that draw simple phylogenetic tree from newick format:
library(ape)
t<-read.tree(text="(F:4,( (D:2,E:2):1,(C:2,(B:1,A:1):1):1):1);")
plot(t,use.egde.length=TRUE)
i am"displaying" correct length of branches, but i want all branch to have labal with it.
edit:
i want my plot to look like this:
I was searching documentation, but I cannot find method to display length of branch in R. How can i do this ?
You can do it by extracting edge lengths and using edgelabels().
# Load package
library(ape)
# Create data
t <- read.tree(text="(F:4,((D:2,E:2):1,(C:2,(B:1,A:1):1):1):1);")
plot(t)
edgelabels(t$edge.length, bg="black", col="white", font=2)
Here is way you can get the plot you want:
t$tip.label <- c("F\n4", "D\n2", "E\n2", "C\n2", "B\n1", "A\n1")
plot(t,show.node.label=TRUE, show.tip.label=TRUE)
However, I don't know of a graceful way to extract out the lengths without doing it manually.
I am trying to create circular phylogenetic tree. I have this part of code:
fit<- hclust(dist(Data[,-4]), method = "complete", members = NULL)
nclus= 3
color=c('red','blue','green')
color_list=rep(color,nclus/length(color))
clus=cutree(fit,nclus)
plot(as.phylo(fit),type='fan',tip.color=color_list[clus],label.offset=0.2,no.margin=TRUE, cex=0.70, show.node.label = TRUE)
And this is result:
Also I am trying to show label for each node and to color branches. Any suggestion how to do that?
Thanks!
When you say "color branches" I assume you mean color the edges. This seems to work, but I have to think there's a better way.
Using the built-in mtcars dataset here, since you did not provide your data.
plot.fan <- function(hc, nclus=3) {
palette <- c('red','blue','green','orange','black')[1:nclus]
clus <-cutree(hc,nclus)
X <- as.phylo(hc)
edge.clus <- sapply(1:nclus,function(i)max(which(X$edge[,2] %in% which(clus==i))))
order <- order(edge.clus)
edge.clus <- c(min(edge.clus),diff(sort(edge.clus)))
edge.clus <- rep(order,edge.clus)
plot(X,type='fan',
tip.color=palette[clus],edge.color=palette[edge.clus],
label.offset=0.2,no.margin=TRUE, cex=0.70)
}
fit <- hclust(dist(mtcars[,c("mpg","hp","wt","disp")]))
plot.fan(fit,3); plot.fan(fit,5)
Regarding "label the nodes", if you mean label the tips, it looks like you've already done that. If you want different labels, unfortunately, unlike plot.hclust(...) the labels=... argument is rejected. You could experiment with the tiplabels(....) function, but it does not seem to work very well with type="fan". The labels come from the row names of Data, so your best bet IMO is to change the row names prior to clustering.
If you actually mean label the nodes (the connection points between the edges, have a look at nodelabels(...). I don't provide a working example because I can't imagine what labels you would put there.