How to extract Mean Square of each group of entry? - r

Sorry, I am very weak in using R but very interested in it!
Description of my data: I am having raw data collected from a lattice design (4 reps, 44 blocks, 5 plot per block). 220 entries were used, they are classified in three groups with (FS=200 entries; PC=6 entries and TC=14 entries)!
I would like to get the simple mean and the Mean Square of each group (FS, PC and TC) and the Mean square of the error?
Look forward your kind help,
Thx

I think you could go a long way with the aggregate function, like
aggregate(Data$Values, list(Data$Groups), FUN=mean)
for your mean etc.

Related

Inner Products in Principal Component Analysis in R

For this, I am using the banknote data in R given by data(banknote), which shows measurements of 200 Swiss banknotes. My data matrix is called X, and I have performed PCA by pca.banknote<-prcomp(X).
I am trying to show that the inner product between each observation X[i,] and Principal Component Loading 3 given by pca.banknote$rot[,3] is the same as the 3rd PC scores given by pca.banknote$x[,3].
I have attempted:
all.equal(as.matrix(X[,])%*%banknote.pca$rot[,3], as.matrix(banknote.pca$x[,3]), check.attributes=FALSE)
but this simply gives a mean difference of 1, i.e. they are not equal.
Do I need to change the format of one of these to a vector/data frame etc for this to work? Or any ideas at all as to where the issue is?
Any feedback would be much appreciated. Thanks.

How to compute for the mean and sd

I need help on 4b please
‘Warpbreaks’ is a built-in dataset in R. Load it using the function data(warpbreaks). It consists of the number of warp breaks per loom, where a loom corresponds to a fixed length of yarn. It has three variables namely, breaks, wool, and tension.
b. For the ‘AM.warpbreaks’ dataset, compute for the mean and the standard deviation of the breaks variable for those observations with breaks value not exceeding 30.
data(warpbreaks)
warpbreaks <- data.frame(warpbreaks)
AM.warpbreaks <- subset(warpbreaks, wool=="A" & tension=="M")
mean(AM.warpbreaks<=30)
sd(AM.warpbreaks<=30)
This is what I understood this problem and typed the code as in the last two lines. However, I wasn't able to run the last two lines while the first 3 lines ran successfully. Can anybody tell me what is the error here?
Thanks! :)
Another way to go about it:
This way you aren't generating a bunch of datasets and then working on remembering which is which. This is more a personal thing though.
data(warpbreaks)
mean(AM.warpbreaks[which(AM.warpbreaks$breaks<=30),"breaks"])
sd(AM.warpbreaks[which(AM.warpbreaks$breaks<=30),"breaks"])
There are two problems with your code. The first is that you are comparing to 30, but you're looking at the entire data frame, rather than just the "breaks" column.
AM.warpbreaks$breaks <= 30
is an expression that refers to the breaks being less than thirty.
But mean(AM.warpbreaks$breaks <= 30) will not give the answer you want either, because R will evaluate the inner expression as a vector of boolean TRUE/FALSE values indicating whether that break is less than 30.
Generally, you just want to take another subset for an analysis like this.
AM.lt.30 <- subset(AM.warpbreaks, breaks <= 30)
mean(AM.lt.30$breaks)
sd(AM.lt.30$breaks)

R: how to divide a vector of values into fixed number of groups, based on smallest distance?

I think I have a rather simple problem but I can't figure out the best approach. I have a vector with 30 different values. Now I need to divide the vector into 10 groups in such a way that the mean within group variance is as small as possible. the size of the groups is not important, it can anything between one and 21.
Example. Let's say I have vector of six values, that I have to split into three groups:
Myvector <- c(0.88,0.79,0.78,0.62,0.60,0.58)
Obviously the solution would be:
Group1 <-c(0.88)
Group2 <-c(0.79,0.78)
Group3 <-c(0.62,0.60,0.58)
Is there a function that gives the same outcome as the example and that I can use for my vector withe 30 values?
Many thanks in advance.
It sounds like you want to do k-means clustering. Something like this would work
kmeans(Myvector,3, algo="Lloyd")
Note that I changed the default algorithm to match your desired output. If you read the ?kmeans help page you will see that there are different algorithms to calculate the different clusters because it's not a trivial computational problem. They might necessarily guarantee optimality.

Interpolation along stream network (QGIS, GRASS, PostGIS, R..)

I would like to inter/extrapolate values(concentration) along a stream network line. So far theoretically the best match would be rtop package in R, but somehow there is a bug and I cannot execute the example data. Do anyone has any other "ready" suggestion using any kind of OS program?
However, I tried to solve the problem in R, but I came a cross several problems.
My dataframe ( I have also shapefiles, stream network, catchments areas)
StartID | EndID | Discharge | Length | Value
First of all I would like to have inverse distance weighted interpolation (IDW), so to find the segments where I have observations and interpolate between the observations for the NA values depending on their distance between the observations.
Secondly I also would like to consider the discharge. When 2 streams join, the stream with higher discharge should have more influence on the concentration in the next segment.
I am able to look for NA values and check if there is observations upstream or downstream of the segment and weighted by discharge and take the mean:
for(i in 1:nrow(DF)) {
if(is.na(DF[i,c("Value")]))
{ a<-merge(DF[i,], DF, by.x=c("StartID"),by.y=c("EndID"), x.all)
a<-a[complete.cases(a[,8]),]
b<-merge(DF[i,], DF, by.x=c("EndID"),by.y=c("StartID"), x.all)
b<-b[complete.cases(b[,8]),]
DF[i,c("Value")] <- mean((sum(a[,c("Discharge.y")]*a[,c("Value.y")])/sum(a[,c("Discharge.y")])),(sum(b[,c("Discharge.y")]*b[,c("Values.y")])/sum(b[,c("Discharge.y")])), na.rm=TRUE, trim=0)
But I think it would be better to look for the observations close to each other and interpolate for the NA values. But I really got stuck. I do not hope for ready-to-use-scripts, but I would be glad if I could get some feedback and directions.
Thanks a lot, Celia

dealing with data table with redundant rows

The title is not precisely stated but I could not come up with other words which summarizes what I exactly going to ask.
I have a table of the following form:
value (0<v<1) # of events
0.5677 100000
0.5688 5000
0.1111 6000
... ...
0.5688 200000
0.1111 35000
Here are some of the things I like to do with this table: drawing the histogram, computing mean value, fitting the distribution, etc. So far, I could only figure out how to do this with vectors like
v=(0.5677,...,0.5688,...,0.1111,...)
but not with tables.
Since the number of possible values are huge by being almost continuous, I guess making a new table would not be that effective, so doing this without modifying the original table and making another table would be desirable very much. But if it has to be done so, it's okay. Thanks in advance.
Appendix: What I want to figure out is how to treat this table as a usual data vector:
If I had the following vector representing the exact same data as above:
v= (0.5677, ...,0.5677 , 0.5688, ... 0.5688, 0.1111,....,0.1111,....)
------------------ ------------------ ------------------
(100000 times) (5000+200000 times) (6000+35000) times
then we just need to apply the basic functions like plot, mean, or etc to get what I wanted. I hope this makes my question more clear.
Your data consist of a value and a count for that value so you are looking for functions that will use the count to weight the value. Type ?weighted.mean to get information on a function that will compute the mean for weighted (grouped) data. For density plots, you want to use the weights= argument in the density() function. For the histogram, you just need to use cut() to combine values into a small number of groups and then use aggregate() to sum the counts for all the values in the group. You will find a variety of weighted statistical measures in package Hmisc (wtd.mean, wtd.var, wtd.quantile, etc).

Resources