How to use Chao1? (BEGINNER) - r

I am using R for the first time. I am trying to use the Chao1 function to estimate the diversity of my dataset. I have 20 columns, one for each species, and 8 rows (nine if you include the header), one for each plot. Each cell has a number, which is the number of individuals of that species found in that plot. For example, in my Excel file, cell A2 has the value "8", which means that 8 individuals of Species1 were found in the first plot.
I have downloaded the Fossil and Vegan packages, where I believe the Chao1 function is located. They are active in my library. I have imported my dataset as "speciesabund". I am now trying to run Chao1. According to the description (https://artax.karlin.mff.cuni.cz/r-help/library/fossil/html/chao1.html) I'm supposed to type
chao1(x, taxa.row = TRUE)
I assumed "x" was meant to represent my dataset, so I tried
chao1(speciesabund, taxa.row = TRUE)
instead. It did not work and returned me "Error: Unsupported use of matrix or array for column indexing." I assume this means that I need to do something more to my data before trying to use the Chao function, is that correct? If so, how do I do this?
Thank you so much for your help! I am using this for the first time, so I'm sorry if my question is dumb.

Related

Making a histogram

this sounds pretty basic but every time I try to make a histogram, my code is saying x needs to be numeric. I've been looking everywhere but can't find one relating to my problem. I have data with 240 obs with 5 variables.
Nipper length
Number of Whiskers
Crab Carapace
Sex
Estuary location
There is 3 locations and i'm trying to make a histogram with nipper length
I've tried making new factors and levels, with the 80 obs in each location but its not working
Crabs.data <-read.table(pipe("pbpaste"),header = FALSE)##Mac
names(Crabs.data)<-c("Crab Identification","Estuary Location","Sex","Crab Carapace","Length of Nipper","Number of Whiskers")
Crabs.data<-Crabs.data[,-1]
attach(Crabs.data)
hist(`Length of Nipper`~`Estuary Location`)
Error in hist.default(Length of Nipper ~ Estuary Location) :
'x' must be numeric
Instead of correct result
hist() doesn't seem to like taking more than one variable.
I think you'd have the best luck subsetting the data, that is, making a vector of nipper lengths for all crabs in a given estuary.
crabs.data<-read.table("whatever you're calling it")
names<-(as you have it)
Estuary1<-as.vector(unlist(subset(crabs.data, `Estuary Loc`=="Location", select = `Length of Nipper`)))
hist(Estuary1)
Repeat the last two lines for your other two estuaries. You may not need the unlist() command, depending on your table. I've tended to need it for Excel files, but I don't know what format your table is in (that would've been helpful).

Generating 3.000.000 strings of length 11 in R

Apparently if I try this:
# first grab the package
install.packages("stringi")
library(stringi)
# and then try to generate some serious dummy data
my_try <- as.vector(sample(1111111111:99999999999,3000000,replace=T))
R will say NOPE, sorry:
Error: cannot allocate vector of size 736.8 Gb
Should I buy more RAM*?
*this is a joke, but I seriously appreciate any help!
EDIT:
The desired output is a dataframe of 20 variables, and 3x10^6 rows. Some columns/variables should be strings, some integers. All in lengths ranging from 2 to 12.
The error isn't coming from sampling 3 million values, it's from trying to create a population of about 90 billion values 1111111111:99999999999 from which to sample. If you want to sample from that range, sample from the range 1:88888888889 and add 11111111110 using
sample(88888888889, 3000000,replace=TRUE) + 11111111110
There's no need for as.vector at the end, it's already a vector.
P.S. I believe in R-devel the range 1111111111:99999999999 will be stored much more efficiently (basically just the limits), but I don't know if sample() will be modified to work with it that way.

adehabitat compana() doesn't work or returns lambda=NaN

I'm trying to do the compositional analysis of habitat use with the compana() function in the adehabitatHS package (I use adehabitat because I can't install adehabitatHS).
Compana() needs two matrices: one of habitat use and one of avaiable habitat.
When I try to run the function it doesn't work (it never stops), so I have to abort the RStudio session.
I read that one problem could be the 0-values in some habitat types for some animals in the 'avaiable' matrix, whereas other animals have positive values for the same habitat. As done by other people, I replaced 0-values with small values (0,001), ran compana and it worked BUT the lambda values returned me NaN.
The problem is similar to the one found here
adehabitatHS compana test returns lambda = NaN?
They said they resolved using as 'used' habitat matrix the counts (integers) and not the proportions.
I tried also this approach, but never changed (it freezes when there are 0-values in the available matrix, or returns NaN value for Lambda if I replace 0- values wit small values).
I checked all matrices and they are ok, so I'm getting crazy.
I have 6 animals and 21 habitat types.
Can you resolve this BIG problem?
PARTIALLY SOLVED: Asking to some researchers, they told me that the number of habitats shouldn't be higher than the number of animals.
In fact I merged some habitats in order to have six animals per six habitats and now the function works when I replace 0-values in the 'avaiable' matrix with small values (e.d. 0.001).
Unfortunately this is not what I wanted, because I needed to find values (rankings, Log-ratios, etc..) for each habitat type (originally they were 21).

How to compute for the mean and sd

I need help on 4b please
‘Warpbreaks’ is a built-in dataset in R. Load it using the function data(warpbreaks). It consists of the number of warp breaks per loom, where a loom corresponds to a fixed length of yarn. It has three variables namely, breaks, wool, and tension.
b. For the ‘AM.warpbreaks’ dataset, compute for the mean and the standard deviation of the breaks variable for those observations with breaks value not exceeding 30.
data(warpbreaks)
warpbreaks <- data.frame(warpbreaks)
AM.warpbreaks <- subset(warpbreaks, wool=="A" & tension=="M")
mean(AM.warpbreaks<=30)
sd(AM.warpbreaks<=30)
This is what I understood this problem and typed the code as in the last two lines. However, I wasn't able to run the last two lines while the first 3 lines ran successfully. Can anybody tell me what is the error here?
Thanks! :)
Another way to go about it:
This way you aren't generating a bunch of datasets and then working on remembering which is which. This is more a personal thing though.
data(warpbreaks)
mean(AM.warpbreaks[which(AM.warpbreaks$breaks<=30),"breaks"])
sd(AM.warpbreaks[which(AM.warpbreaks$breaks<=30),"breaks"])
There are two problems with your code. The first is that you are comparing to 30, but you're looking at the entire data frame, rather than just the "breaks" column.
AM.warpbreaks$breaks <= 30
is an expression that refers to the breaks being less than thirty.
But mean(AM.warpbreaks$breaks <= 30) will not give the answer you want either, because R will evaluate the inner expression as a vector of boolean TRUE/FALSE values indicating whether that break is less than 30.
Generally, you just want to take another subset for an analysis like this.
AM.lt.30 <- subset(AM.warpbreaks, breaks <= 30)
mean(AM.lt.30$breaks)
sd(AM.lt.30$breaks)

R - How to completely detach a subset plm.dim from a parent plm.dim object?

I want to be able to completely detach a subset (created by tapply) of a dataframe from its parent dataframe. Basically I want R to forget the existing relation and consider the subset dataframe in its own right.
**Following the proposed solution in the comments, I find it does not work for my data. The reason might be that my real dataset is a plm.dim object with an assigned index. I tried this at home for the example dataset and it worked fine. However, once again in my real data, the problem is not solved.
Here's the output of my actual data (original 37 firms)
sum(tapply(p.data$abs_pb_t,p.data$Rfirm,sum)==0)
[1] 7
s.data <- droplevels(p.data[tapply(p.data$abs_pb_t,p.data$ID,sum)!=0,])
sum(tapply(s.data$abs_pb_t,s.data$Rfirm,sum)==0)
[1] 8
Not only is the problem not solved for some reason I get an extra count of a zero variable while I explicitly ask to only keep the ones that differ from zero
Unfortunately, I cannot recreate the same problem with a simple example. For that example, as said, droplevels() works just fine
A simple reproducible example explains:
library(plm)
dad<-cbind(as.data.frame(matrix(seq(1:40),8,5)),factors = c("q","w","e","r"), year = c("1991","1992", "1993","1994"))
dad<-plm.data(dad,index=c("factors","year"))
kid<-dad[tapply(dad$V5,dad$factors,sum)<=70,]
tapply(kid$V1,kid$factors,mean)
kid<-droplevels(dad[tapply(dad$V5,dad$factors,sum)<=70,])
tapply(kid$V1,kid$factors,mean)
So I create a dad and a kid dataframe based on some tapply condition (I'm sure this extends more generally).
the result of the tapply on the kid is the following
e q r w
7 NA 8 NA
Clearly R has not forgotten the dad and it adds that two factors are NA . In itself not much of a problem but in my real dataset which much more variables and subsetting to do, I'd like a cleaner cut so that it will make searching through the kid(s) easier. In other words, I don't want the initial factors q w e r to be remembered. The desired output would thus be:
e r
7 8
So, can anyone think of a reason why what works perfectly in a small data.frame would work differently in a larger dataframe? for p.data (N = 592, T = 16 and n = 37). I find that when I run 2 identical tapply functions, one on s.data and one on p.data, all values are different. So not only have the zeros not disappeared, literally every sum has changed in the s.data which should not be the case. Maybe that gives a clue as to where I go wrong...
And potentially it could solve the mystery of the factors that refuse to drop as well
Thanks
Simon

Resources