This is my first question here so please excuse any mistakes I (may or may not) make.
The premise:
I got a vegetational dataset containing paired data on different plots for old and new observations. I used the 'openxlsx'-package to load my data, and 'vegan'-package to execute an NMDS as follows:
mydata <- read.xlsx(mydata)
mydataMDS <- metaMDS(mydata, k=2, trymax=500)
The result is then used for a model via the "envfit()"-function, including environmental variables:
myenvdata <- read.xlsx(myenvdata)
mydataMDS_fit <- envfit(mydataMDS, myenvdata, perm=10000, na.rm=TRUE)
plot(mydataMDS, display="sites")
plot(mydataMDS_fit, p.max=0.01, axis=TRUE)
Now I have a plot with my statistical "mydataMDS"-analyses, including vectors produced by the "mydataMDS_fit" R calculated.
The problem:
I want to colour and connect certain points within this plot. As "mydata" consists of observations within the same plot at different times, I intend to colour all points of old observations in one colour, and all the new ones in a different one. I've read about adding columns in order to group old and new observations, but as I'm working with a model there are no columns. How can I edit my datasets ("mydata", "mydataMDS", "myenvdata", "mydataMDS_fit") in order to show old and new plots in 2 different colours (one colour for old, one colour for new), and connect the paired observations with lines? Or: Is there a possibility to directly re-colour the points within my graphical output via checking for old/new observations?
(Sorry, I feel like my explanation is quite complicated, but I still hope someone may be able to help)
Related
I am currently using RStudio to generate a 3D plot for my PCA using data imported from SPSS.
Currently, I have 10 treatment groups, each with 5 subjects. I would like to plot a 3d plot where each treatment group is represented by a color, and each subject in the same treatment group has the same color.
It is also vital that none of these colors are repeated.
I am able to generate the 3d plot however, there are 2 treatment groups that utilizes the same color.
Can anyone help me rectify this issue so there would be no repeating colors for different treatment groups?
Here's the code that I'm using.
db = file.choose()
hpca = read.table(db, header=TRUE)
pc <- princomp(hpca[,2:7], cor=TRUE, scores=TRUE)
plot3d(pc$scores[,1:3], col=hpca$group, size = 6)
text3d(pc$scores[,1:3],texts=hpca$ï..tag)
text3d(pc$loadings[,1:3], texts=rownames(pc$loadings), col="red")
coords <- NULL
for (i in 1:nrow(pc$loadings)) {
coords <- rbind(coords, rbind(c(0,0,0),pc$loadings[i,1:3]))
}
lines3d(coords, col="red", lwd=4)
*p/s; I am completely new to R programming and most of these codes are copied from an online guide. Therefore it would be extremely helpful if you could show me where exactly to include the changes.
Many thanks in advance!
You are setting the colour to be the group value. That sometimes works, because group values are usually factors and factors are stored as integer values, but it is kind of hit and miss.
It is better to compute the colour explicitly. For example, you can get a vector of 10 colours using
cols <- rainbow(10)
and then use it as
plot3d(..., col=cols[as.numeric(gp)])
if gp contains a factor with 10 levels.
There are several functions in base R to select colours (see ?rainbow) and others in contributed packages for different palettes.
I have a data set with multiple categories of study type for pond data. The column of overall categories is organized with each type having individual values that follow. I can make a histogram for each when I produce individual sheets to use. I have dug around for a while, but cannot find how to make the same histogram for the study types from the overall data set.
Piece of data sheet that I am working with. As you can see, there are multiple study types that we have each with their own data.
Basically, I want to pull each individual study type and the num_divided to make a histogram for the types. My end goal is to make one image with the 9 different histograms stacked above one another. Each having the same x-axis values and their individual names on the left-hand side.
The trouble I am running into is that when I make the histograms from the separated sheets, I cannot make the stacked image I want. I apologize in advance if this lacks some information, but I also thank anyone that offers advice.
ggplot2 is the best option.
You didn't give reproducible data but it's easy to make some. Here are 9 studies each with 100 values:
set.seed(111)
dat <- data.frame(study = rep(letters[1:9], each = 100), num_divided = rnorm(900))
What you want is a facetted plot.
library(ggplot2)
ggplot(dat, aes(x = num_divided)) + geom_histogram() + facet_grid(study ~ .)
If you don't know much about ggplot2, a good starting point is the R Cookbook.
I am doing a CCA on a table of abundance data of species and environmental parameters, very similar to the doubs data, package ade4.
cil.cca.r<-cca(cil~.,envm.r)
Where cil is a table with abundance data with 16 observations (sample sites) and 76 variables (abundances) and envm.r is my table with 16 observations (sample sites) and 11 variables (environmental parameters).
My problem is that I have a lot of species and I want the names of the species in the plot to be more readable.
Or if this does´t work, I would need to find out which species e.g. please allow me to say "prefer" the environmental parameter PK/ml. How can I get to that information when the species names are not readable.
The command I started with is the following:
plot(cil.cca.r,scaling=1,display=c("sp","lc","cn"),main="Triplot CCA spe ~envm.r - scaling 1",repel = TRUE )
Next I also tried to separate the commands to make it more readable.
plot(cil.cca.rh, type="n",xlim=c(-2,2))
text(cil.cca.rh, "species", col="red", cex=0.6)
text(cil.cca.rh, dis="cn",cex=0.7,adj=0.8,col="blue")
I don´t get anywhere with trying out the parameters.
Then I found the function orditorp()
plot(cil.cca.rh, type="n",xlim=c(-2,2))
orditorp(cil.cca.rh, "species", col="red", cex=0.6)
orditorp(cil.cca.rh,display="sites",cex=0.8,adj=0.1,col="darkgreen",air=0.7)
Plot with orditorp
This makes the plot look nicer, but still I don´t get the information about my species, since a lots of them get points.
I hoped for some solution like package ggrepel e.g. function repel() in ggplot2 but this doesn´t work for a CCA.
Error "ggplot2 doesn't know how to deal with data of class cca"....
I also hoped for some function like fviz_pca_ind() of package factoextra
but for CCA data.
But I could´t find anything and would be very grateful for solutions either for better plotting the labels or for getting the species names with the locations!
Of course, what I could do is give the species shorter names or numbers instead of names, but it would be nicer to get at one view which species it is...
Or perhaps "zoomig in" part of the graphic?
I would like to plot multiple lines on the same plot, without using ggplot.
I have scores for different individuals across a set time period and wish to plot a line between yearly scores for each individual. Data is organised with each row representing an individual and each column an observed value in a given year.
Currently I am using a for loop, but am aware that this is often not efficient in R, and am interested if there are any more suitable approaches available within base R.
I will be working with up 100,000 individuals
Thanks.
Code:
df=data.frame(runif(10,0,100),runif(10,0,100),runif(10,0,100),runif(10,0,100))
df=data.frame(t(df))
Years=seq(1,10,1)
plot(1,type="n",xlab="Year",ylab="Score", xlim=c(1,10), ylim=c(0,100))
for(x in 1:4){lines(Years,df[x,])}
Efficiency is not much of a consideration when plotting since plotting to a device is a slow operation in itself. You can use matplot (which uses a loop internally). It's basically a more sophisticated version of your code wrapped in a function.
matplot(Years, t(df), xlab="Year", ylab="Score", type = "l")
I ran a pca on a set of 45000 genes on 5 different samples, and when I perform a biplot, all I see is a mass of text (responding to the observation names), and cannot see the location of my samples. Is there a way to plot the location of the samples only, and not the observation, in a biplot?
Using built in data from R
usa <- USArrests
pca1 <- prcomp(usa)
biplot(pca1)
This generates a biplot where all the states (observation names) overlap the variables (my different samples) rape, etc. Is it possible to plot only the variables (samples), and not the states (observation names)?
biplot.default uses text to write the categorical variable name of the observation. As it doesn't use points you need to modify the source if you only want the points (and not the labels) to be plotted.
However, you could "hack" it by doing something like:
biplot(pca1, xlabs = rep(".", nrow(usa)))
I hope this is what you're looking for!
Edit If this is not satisfactory, you can modify the source given when running stats:::biplot.default to use points.