How to use corrplot with is.corr=FALSE - r

I previously made a beautiful functional and perfect actual corrolation plot with corrplot (my plot). Now I have to get the underlying data in the same look. So my goal is to have triangular similarity matrixes in the same colours as my corrolation plot. Imagine it like the conditional formatting in excel.
My Data: my Data from excel
Link to CSV Data file
it is loaded in as a csv and it can read the csv perfectly
My Code:corrplot(Phylogeny, is.corr=FALSE,method="number", cl.lim=c(0,1))
The error it throws me: Error in if (any(corr < cl.lim[1]) || any(corr > cl.lim[2])) { : Missing value, where TRUE/FALSE is required
i made sure all colums are numeric
i made sure to fill the missing bits with NA's (because that was a problem somwhere before)
i made sure all my values are between 0 and 1 like i want the limit to be (in between it told me that my values are not within the limit, when i tried around with some stuff)
the error does not change when i change the limit
the error does not change when i take the is.corr=FALSE out (default=TRUE)
i played around with corrplot.mixed and its still not working
have been referencing information from Corrplot Intro
I have looked into the condformat function but i am not really sure if it can do a filling of each cell with one colour according to the overall gradient like i used for my corrolation plot.
What am I missing here that it does not want to give me my table back with pretty colours?

I had the same error, but I was able to fix it by converting my data.frame to a matrix. I ended up with corrplot(as.matrix(df), is.corr = FALSE).

If I am understanding correctly, your posted data are already a correlation matrix - although not a fully symmetrical one of the sort that would be produced with the call cor on raw data.
In that case, the problem is just that you have variable names (Species) as a column in your data. Change this column to row names, drop the variable names, and call corrplot as user9536160 suggests:
# read in your data
phyl <- as.data.frame(read_csv("Phylogeny.csv"))
# name rows and drop variable names in the df itself
row.names(phyl) <- phyl$Species
phyl <- phyl %>%
select(-Species)
# call corrplot
corrplot(as.matrix(phyl), is.corr = FALSE)
The result:

Related

How to fix corrplot formatting issue in R

When I run the following chunk of code, I get a very small and unuseful correlation plot(see photo). I think it is because the column labels are too large. Is there any suggestion on how to fix this issue without modifying column names (maybe indicating only substring of names for example...)
numeric.var <- sapply(df_train, is.numeric)
corr.matrix <- cor(df_train[,numeric.var])
corrplot(corr.matrix, main="\n\nCorrelation Plot for Numerical Variables",
method="number")

R - [DESeq2] - Making DESeq Dataset object from csv of already normalized counts

I'm trying to use DESeq2's PCAPlot function in a meta-analysis of data.
Most of the files I have received are raw counts pre-normalization. I'm then running DESeq2 to normalize them, then running PCAPlot.
One of the files I received does not have raw counts or even the FASTQ files, just the data that has already been normalized by DESeq2.
How could I go about importing this data (non-integers) as a DESeqDataSet object after it has already been normalized?
Consensus in vignettes and other comments seems to be that objects can only be constructed from matrices of integers.
I was mostly concerned with getting the format the same between plots. Ultimately, I just used a workaround to get the plots looking the same via ggfortify.
If anyone is curious, I just ended up doing this. Note, the "names" file is just organized like the meta file for colData for building a DESeq object from DESeqDataSetFrom Matrix, but I changed the name of the design column from "conditions" to "group" so it would match the output of PCAplot. Should look identical.
library(ggfortify)
data<-read.csv('COUNTS.csv',sep = ",", header = TRUE, row.names = 1)
names<-read.csv("NAMES.csv")
PCA<-prcomp(t(data))
autoplot(PCA, data = names, colour = "group", size=3)

'x' must be numeric R error when reading from file

I am trying to do Hartigan's diptest in R, however, I get the following error: 'x' must be numeric.
Apologies for such a basic question, but how do I ensure that the data that I load is numeric?
If I make up a set of values as follows (code below), the diptest works without problems:
library(diptest)
x = c(1,2,3,4,4,4,4,4,4,4,5,6,7,8,9,9,9,9,9,9,9,9,9)
hist(x)
dip.test(x)
But for example, when the same values are saved in an Excel file/tab delimited .txt file (saved as one column of values), and imported into R, when I run the diptest the 'x' must be numeric error occurs.
x <- read.csv("x.csv") # comes back in as a data frame
hist(x)
dip.test(x)
Is there a way to check what format the imported data from an Excel/.txt file is in R, and subsequently change it to be numeric? Thanks again.
Any help will be much appreciated thank you.
Here's what's happening. If you run the code that you know works, it's working because the data class is numeric as it should be. When you read it back in it's a data.frame, however. So you need to point to the numeric element of the data.frame:
library(diptest)
x = c(1,2,3,4,4,4,4,4,4,4,5,6,7,8,9,9,9,9,9,9,9,9,9)
write.csv(x, "x.csv", row.names=F)
x <- read.csv("x.csv") # comes back in as a data frame
hist(x$x)
dip.test(x$x)
Hartigans' dip test for unimodality / multimodality
data: x$x
D = 0.15217, p-value = 2.216e-05
alternative hypothesis: non-unimodal, i.e., at least bimodal
If you were to save the file to a .RDS instead of .csv then you could avoid this problem.
You could also check if your data frame contains any non-numeric characters as follows:
which(!grepl('^[0-9]',your_data_frame[[1]]))

read.csv() R x must be numeric

I am trying to read data out of a csv-file.
The data consists of small integer numbers (53, 98 ...)
The csv was made with OpenOffice, the data stood there in the first column
one number in each row.
reading data was simple (no problem at all):
BirthNumbers <- read.csv(“/Users/.../RawData.csv”, header=FALSE)
Now I try to calculate mean(BirthNumbers) (for example),
but it is not possible, the error message:
x is not numeric
Where is my mistake?
Thanks for all help
Norbert
It's probably being read in as characters.
Try mean(as.numeric(BirthNumbers))
As per https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html (see Value section), read.csv returns a data frame.
You should be calling mean on the column of the data frame. Since you have no headers (given your header = FALSE), most likely the column is called V1 (verify by doing head(BirthNumbers) or colnames(BirthNumbers)), so you should do mean(BirthNumbers$V1).

R: Issues with line graph of germination trough time

I'm still in the process of learning R using Swirl and RStudio, and a goal I've set for myself is to recreate this graph. I have a small dataset that I will link below (it's saved as a plain text CSV file that I import into R with headings enabled).
If I try to plot that dataset without changing anything, I get this, which is obviously not the goal.
At first I thought the problem would be in the class of my imported dataset, defined as kt. After class(kt) turned out to be data.frame I figured that wasn't the problem. Should I be trying to rewrite the table to something that R can plot instantly, or should I be trying to extract each species individually, plot them separately and then combining the different plots into one graph? Perhaps there is something wrong with my dates, I know that R handles dates in a specific way. Maybe these solutions are not even needed and I'm just doing something stupidly simple wrong, but I can't find it myself.
Your help is much appreciated.
Dataset:
Species,week 0,week 1,week 2,week 3,week 4,week 5,week 6,week 7,week 8,week 9,week 10,week 11,week 12,week 13,week 14,week 15,week 16,week 17,week 18
Caesalpinia coriaria,0.0%,24.0%,28.0%,28.0%,32.0%,37.0%,40.0%,46.0%,52.0%,56.0%,63.0%,64.0%,68.0%,71.0%,72.0%,,,,
Coccoloba swartzii,0.0%,0.0%,1.0%,10.0%,19.0%,31.0%,33.0%,39.0%,43.0%,48.0%,52.0%,52.0%,52.0%,52.0%,52.0%,52.0%,52.0%,55.0%,
Cordia dentata,0.0%,5.0%,18.0%,21.0%,24.0%,26.0%,27.0%,30.0%,32.0%,32.0%,32.0%,32.0%,32.0%,32.0%,33.0%,33.0%,33.0%,34.0%,35.0%
Guaiacum officinale,0.0%,0.0%,0.0%,0.0%,4.0%,5.0%,5.0%,5.0%,7.0%,8.0%,8.0%,8.0%,8.0%,8.0%,8.0%,8.0%,8.0%,,
Randia aculeata,0.0%,0.0%,0.0%,4.0%,13.0%,14.0%,18.0%,19.0%,21.0%,21.0%,21.0%,21.0%,21.0%,22.0%,22.0%,22.0%,22.0%,,
Schoepfia schreberi,0.0%,0.0%,0.0%,0.0%,0.0%,0.0%,1.0%,4.0%,8.0%,11.0%,13.0%,21.0%,21.0%,24.0%,24.0%,25.0%,27.0%,,
Prosopis juliflora,0.0%,7.5%,31.3%,34.2%,,,,,,,,,,,,,,,
Something like this??
# get rid of "%" signs
df <- data.frame(sapply(df,function(x)gsub("%","",x,fixed=T)))
# convert cols 2:20 to numeric
df[,2:20] <- sapply(df[,2:20],function(x)as.numeric(as.character(x)))
library(reshape2)
library(ggplot2)
gg <- melt(df,id="Species")
ggplot(gg,aes(x=variable,y=value,color=Species,group=Species)) +
geom_line()+
theme_bw()+
theme(legend.position="bottom", legend.title=element_blank())
There are lots of problems here.
First, if your dataset really has those % signs, then R interprets the data as character and imports it as factors. So first we have to get rid of the % (using gsub(...), and then we have to convert what's left to numeric. With factors, you have to convert to character first, then numeric, so: as.numeric(as.character(...)). All of this could have been avoided if you exported the data without the % signs!!!
Plotting multiple curves with different colors is something the ggplot package was designed for (among many other things), so we use that. ggplot prefers data in "long" format - all the data in one column, with a second column distinguishing different datasets. Your data is in "wide" format - data in different columns. So we convert to long using melt(...) from the reshape2 package. The result, gg has three columns: Species, variable and value. value contains the actual data and variable contains the week number.
So now we create a ggplot object, setting the x-axis to the variable column, the y-axis to the value column, with color mapped to Species, and we tell ggplot to plot lines (using geom_line(...)).
The rest is to position the legend at the bottom, and turn off some of the ggplot default formatting.

Resources