vegan doesn't recognise column names, but it also does recognise them - r

I'm trying to run RDA analysis using the R package 'vegan'
If I go with the below, using column headers, I get the error "object "width" not recognised". Suggesting that I haven't imported column headers properly.
rda(width+height~age+weight, data=mydata)
But
if I go with the below, it works, so obviously it recognises headers
rda(mydata$width+mydata$height~mydata$age+mydata$weight)
Similarly, other packages recognise headers for example the below works fine.
ggplot(mydata,aes(height, width))
Presumably it's an issue with the use of "data=mydata". I'm baffled and feel it's probably something super simple I'm overlooking, but I have tried and tried to no avail. It was going fine the other day.

As per the help page of rda the left hand side of your formula should be a community data matrix, and data should specify a data frame containing the variables on the right hand side of the formula.
As such, when you pass column names to the left hand side of the formula (e.g. your first line of code), rda is not looking within mydata for those columns and so fails.
In your second line of code you specify where width and height are found, so it is able to run.
You could run it like this:
response <- data.frame(height=mydata$height,width=mydata$width)
rda(response ~ age + weight, data=mydata)
Have a look through the documentation for cca/rda and you'll see example code - try to get your data into the same format as the examples.

Related

How to get R to read my first column as a "header"?

I want to calculate diversity indices of different sampling sites in R. I have sites in the first row and the different species in the first column. However, R is reading the first column as normal data (not as a header so to speak).
Pics:
https://imgur.com/a/iBsFtbe
Code:
>Macro<-read.csv("C:\\Users\\Carly\\OneDrive\\Desktop\\Ecology >Projects\\Macroinvertebrates & Water >Quality\\Macro_RData\\Macroinvert\\MacroR\\MacroCSV.csv", header = T)
You need to add row.names = 1 to your command. This will indicate that row names are stored in column number 1.
Macro <- read.csv("<...>/MacroCSV.csv", header = TRUE, row.names = 1)
I sense that you are frustrated. As r2evans said, it is easier for people to help you if you provide them with the data in text form and not with screenshots - because we can't recreate the problem or try to solve it by loading a screenshot into R.
CSV files are just text, so you can open them with a text editor such as NotePad and copy and paste it here. You don't need the whole text - the columns and lines needed to reproduce the problem are enough. This was what we were looking for:
Site,Aeshnidae,Amnicolidae,Ancylidae,Asellidae
AN0119A,0,0,0,6,0
AN0143,0,0,0,0,0
Programming for many people is very frustrating when they start out, don't let this discourage you!
It looks like your data is in the wrong orientation for analysis in vegan - your species are the rows, and sites are columns. From your pics, it looks like you've spotted this issue and tried transposing, but are having issues with the placement of the headers.
Try reading your csv in, and specifying that the first column should be row names:
MacroDataDataFinal <- read.csv("Path/to/file.csv",
row.names=1)
Then transpose the data
MacroDataDataFinal_transposed <- t(MacroDataDataFinal)
Then try running the specaccum function:
library(vegan)
speccurve <- specaccum(comm=MacroDataDataFinal_transposed,
method="random",
permutation=1000)
Hopefully this will work. If you get any errors please let us know the code you typed, and the precise error message.

How to plot a histogram of a specific data frame column in R

I am super new to coding with R, Im taking is as part of a bachelors degree program. I am super stuck on something I feel should be basic but I cannot get my code to work and I am not sure why. The prompt is:
"In this problem we will be using the mpg data set, to get access to the data set you need to load the tidyverse library.
Complete the following steps:
Create a histogram for the cty column with 10 bins"
and for my code I have:
library(tidyverse)
print(mpg)
df <- mpg[ , c("city")]
histo <- ggplot(data = df, aes(x=median)) + geom_histogram(bins=10)
print(histo)
The first print was just to make sure the data loaded correctly, which it did. I am not sure about the second print function, the histo one. Ive gotten various error messages or bugs so Ive been just moving stuff around and trying different commands to get it to work. Im following the steps previously outlined in our reading, but I cannot seem to get this to work. Any help would be appreciated.
I have tried removing the print(histo) function and just leaving the ggplot, but that give me a blank white box instead of a plot, or no plot is printed.

Expecting numeric in B2 / R2C2: got a date in R

I am reading in a data set from excel that has dates in it. When I read my code it gives me this warning: "Expecting numeric in B2 / R2C2: got a date"
All of my dates are messed up. how do I solve this?
It helps us to help you if you show the exact code that you used, including any packages used.
That warning looks like it comes from the readxl package (but could be a different package).
Basically, when functions like read_xl or even read.table are not told specifically what type of data is in each column then R will read several rows at the top of the file and make an educated guess as to what type of data is in each column, then it will start over and read the data based on those guesses.
Your warning means that there was a cell that your R function was expecting to be a number (based either on the educated guess, or because you told it to expect a number) and instead it saw a date, so it gives a warning to let you know that there was a potential problem. Note that a warning means the code continued to run, there may just be some values that don't match what you were expecting. An error would have stopped the code running and not returned anything.
To fix the problem you can either explicitly tell your R function what type of data is in each column (exactly how depends on the function). Or you can fix your Excel file so that it is clear what each type of data is (remember, just because something looks like a date in Excel does not mean that Excel realizes it is a date or tells other programs that it is a date).

Metafor measure argument error

I have calculated effect size and pooled SE in the way that I wanted. Only thing is drawing a forest plot and let metafor calculate the summary effect size. I have over 30 .csv data files to plot separately. When I do that with the following data (below), it plots and calculates summary effect smoothly.
DeltaPI Spooled
-75.35224985 7.618629848
-51.85221078 7.513461236
-37.77455275 7.164279414
The line I use is:
meta1<-rma(yi=mydata$DeltaPI, sei=mydata$Spooled)
forest(meta1,slab=paste(mydata$Study,mydata$Genotype..Experimental.),showweight=TRUE,alim=c(-100,25),at=c(-100,-50,0,25),xlab="Percentage Change of PI Score",cex=0.7,cex.lab=1,col="red")
However, when I try to do same thing with some other .csv files I have, rma gives an error and asks for 'measure' argument to plot the output. And since the measure is already DeltaPI i calculated manually, I don't want to use.
Weirdly, even if I change the data in those don't working files with the one that working properly(3 data rows above), it still gives the same error. Although, the same data works properly in some other .csv file.
So I'm not clear why I am getting the error and what is the solution.
Any comment would be appreciated!
My guess is that this has nothing to do with the plotting, but occurs when the rma() command is run. And it sounds to me that there are issues with how variables are named in the data that you are reading in. Now you are reading in data from .csv files, but this is probably what is happening:
> library(metafor)
> dat <- data.frame(DeltaP1 = c(.2,.4), Spooled=c(.1,.1))
> rma(dat$DeltaPI, sei=dat$Spooled)
Error in rma(dat$DeltaPi, sei = dat$s) :
Specify the desired outcome measure via the 'measure' argument.
So, in essence, you should carefully check the variable names.

R: partimat function doesn't recognize my classes

I am a relatively novice r user and am attempting to use the partimat() function within the klaR package to plot decision boundaries for a linear discriminant analysis but I keep encountering the same error. I have tried inputing the arguments multiple different ways according to the manual, but keep getting the following error:
Error in partimat.default(x, grouping, ...) :
at least two classes required
Here is an example of the input I've given:
partimat(sources1[,c(3:19)],grouping=sources1[,2],method="lda",prec=100)
where my data table is loaded in under the name "sources1" with columns 3 through 19 containing the explanatory variables and column 2 containing the classes. I have also tried doing it by entering the formula like so:
partimat(sources1$group~sources1$tio2+sources1$v+sources1$cr+sources1$co+sources1$ni+sources1$rb+sources1$sr+sources1$y+sources1$zr+sources1$nb+sources1$la+sources1$gd+sources1$yb+sources1$hf+sources1$ta+sources1$th+sources1$u,data=sources1)
with these being the column heading.
I have successfully run an LDA on this same data set without issue so I'm not quite sure what is wrong.
From the source code of the partimat.default function getAnywhere(partimat.default) it states
if (nlevels(grouping) < 2)
stop("at least two classes required")
Therefore maybe you haven't defined your grouping column as a factor variable. If you try summary(sources1[,2]) what do you get? If it's not a factor, try
sources1[,2] <- as.factor(sources1[,2])
Or in method 2 try removing the "sources1$"on each of your variable names in the formula as you specify the data frame in which to look for these variable names in the data argument. I think you are effectively specifying the dataframe twice and it might be looking, for instance, for
"sources1$sources1$groups"
Rather than
"sources1$groups"
Without further error messages or a reproducible example (i.e. include some data in your post) it's hard to say really.
HTH

Resources