How to compute for the mean and sd - r

I need help on 4b please
‘Warpbreaks’ is a built-in dataset in R. Load it using the function data(warpbreaks). It consists of the number of warp breaks per loom, where a loom corresponds to a fixed length of yarn. It has three variables namely, breaks, wool, and tension.
b. For the ‘AM.warpbreaks’ dataset, compute for the mean and the standard deviation of the breaks variable for those observations with breaks value not exceeding 30.
data(warpbreaks)
warpbreaks <- data.frame(warpbreaks)
AM.warpbreaks <- subset(warpbreaks, wool=="A" & tension=="M")
mean(AM.warpbreaks<=30)
sd(AM.warpbreaks<=30)
This is what I understood this problem and typed the code as in the last two lines. However, I wasn't able to run the last two lines while the first 3 lines ran successfully. Can anybody tell me what is the error here?
Thanks! :)

Another way to go about it:
This way you aren't generating a bunch of datasets and then working on remembering which is which. This is more a personal thing though.
data(warpbreaks)
mean(AM.warpbreaks[which(AM.warpbreaks$breaks<=30),"breaks"])
sd(AM.warpbreaks[which(AM.warpbreaks$breaks<=30),"breaks"])

There are two problems with your code. The first is that you are comparing to 30, but you're looking at the entire data frame, rather than just the "breaks" column.
AM.warpbreaks$breaks <= 30
is an expression that refers to the breaks being less than thirty.
But mean(AM.warpbreaks$breaks <= 30) will not give the answer you want either, because R will evaluate the inner expression as a vector of boolean TRUE/FALSE values indicating whether that break is less than 30.
Generally, you just want to take another subset for an analysis like this.
AM.lt.30 <- subset(AM.warpbreaks, breaks <= 30)
mean(AM.lt.30$breaks)
sd(AM.lt.30$breaks)

Related

Making a histogram

this sounds pretty basic but every time I try to make a histogram, my code is saying x needs to be numeric. I've been looking everywhere but can't find one relating to my problem. I have data with 240 obs with 5 variables.
Nipper length
Number of Whiskers
Crab Carapace
Sex
Estuary location
There is 3 locations and i'm trying to make a histogram with nipper length
I've tried making new factors and levels, with the 80 obs in each location but its not working
Crabs.data <-read.table(pipe("pbpaste"),header = FALSE)##Mac
names(Crabs.data)<-c("Crab Identification","Estuary Location","Sex","Crab Carapace","Length of Nipper","Number of Whiskers")
Crabs.data<-Crabs.data[,-1]
attach(Crabs.data)
hist(`Length of Nipper`~`Estuary Location`)
Error in hist.default(Length of Nipper ~ Estuary Location) :
'x' must be numeric
Instead of correct result
hist() doesn't seem to like taking more than one variable.
I think you'd have the best luck subsetting the data, that is, making a vector of nipper lengths for all crabs in a given estuary.
crabs.data<-read.table("whatever you're calling it")
names<-(as you have it)
Estuary1<-as.vector(unlist(subset(crabs.data, `Estuary Loc`=="Location", select = `Length of Nipper`)))
hist(Estuary1)
Repeat the last two lines for your other two estuaries. You may not need the unlist() command, depending on your table. I've tended to need it for Excel files, but I don't know what format your table is in (that would've been helpful).

Finding the first significant figure of difference between two very similar values

I'm trying to reproduce the computations that led to a data set data.ref. I'd like to test how well my current implementation does by comparing the reference data to my computed results, data.my. Since each column of the data should have comparable magnitudes within the column, but not necessarily between columns, I've been looking at
(data.ref - data.my) / data.ref
to put errors on a comparable scale. However, since the data is ultimately going to be rounded off, what I'd really like to do is just run a quick and dirty check of how many significant figures worth of agreement the data has. That is, since I expect data.ref and data.my to be quite close to each other, I'd like the answer the question: what is the first significant figure at which each pair of corresponding entries differs?
Is there an R function that does this?
ceiling(log10(abs(data.ref, data.my))) seems to do the trick.
Example:
> data.my <- c(20, 30, 32, 32.01, 32.012)
> data.ref <- rep(32, length(data.my))
> ceiling(log10(abs(data.my - data.ref)))
[1] 2 1 -Inf -2 -1

R: how to divide a vector of values into fixed number of groups, based on smallest distance?

I think I have a rather simple problem but I can't figure out the best approach. I have a vector with 30 different values. Now I need to divide the vector into 10 groups in such a way that the mean within group variance is as small as possible. the size of the groups is not important, it can anything between one and 21.
Example. Let's say I have vector of six values, that I have to split into three groups:
Myvector <- c(0.88,0.79,0.78,0.62,0.60,0.58)
Obviously the solution would be:
Group1 <-c(0.88)
Group2 <-c(0.79,0.78)
Group3 <-c(0.62,0.60,0.58)
Is there a function that gives the same outcome as the example and that I can use for my vector withe 30 values?
Many thanks in advance.
It sounds like you want to do k-means clustering. Something like this would work
kmeans(Myvector,3, algo="Lloyd")
Note that I changed the default algorithm to match your desired output. If you read the ?kmeans help page you will see that there are different algorithms to calculate the different clusters because it's not a trivial computational problem. They might necessarily guarantee optimality.

Attributing row name of irregular number of rows (populations)

I've been given this to do by the GENELAND tutorial to give population names to a dataset of populations of 60 individuals :
pop.mbrship1<-rep(c(1,2,3), each=60)
Nevertheless, my dataset comprises 10 populations of irregular sizes to which i would give the names of 1,2,3,4,5,6,7,8,9,10 and the distribution of my individuals (represented by one row each) would be :
1:24,25:39,40:58,59:79,80:103,104:126,127:147,148:171,172:191,192:214
I'd be tempted to use each population number as number of repeats which would make it
pop.mbrship1<-rep[c(1,2,3,4,5,6,7,8,9,10), each=c(24,15,19,21,24,23,21,24,20,23)]
Or try their distribution...
pop.mbrship1<-rep[c(1,2,3,4,5,6,7,8,9,10),
c(1:24,25:39,40:58,59:79,80:103,104:126,127:147,148:171,172:191,192:214)]
In both case, R gives me Error: unexpected '>' in ">"
I'm sure i'm really close to having it work but i've spent a shameful amount of time on this and i'd defenetly need a hand. Thanks a lot!
I'm looking at the geneland tutorial and I see that they have > at the beginning of the lines that you're copying/editing.
You are copying everything including the console pointer > all you need to copy/paste is :
# replicates each element 60 times
pop.mbrship1 <- rep(c(1,2,3),each=60)
# replicates each element, respectively
pop.mbrship2 <- rep(c(1,2,3),times=c(60,40,30))
Your answer is what Henrik said above, without a preceding>.
pop.mbrship1 <- rep(c(1,2,3,4,5,6,7,8,9,10), c(24,15,19,21,24,23,21,24,20,23))
# same as
pop.mbrship1 <- rep(c(...),times=c(...))

How to extract Mean Square of each group of entry?

Sorry, I am very weak in using R but very interested in it!
Description of my data: I am having raw data collected from a lattice design (4 reps, 44 blocks, 5 plot per block). 220 entries were used, they are classified in three groups with (FS=200 entries; PC=6 entries and TC=14 entries)!
I would like to get the simple mean and the Mean Square of each group (FS, PC and TC) and the Mean square of the error?
Look forward your kind help,
Thx
I think you could go a long way with the aggregate function, like
aggregate(Data$Values, list(Data$Groups), FUN=mean)
for your mean etc.

Resources