performing a calculation with a `paste`d vector reference - r

So I have some lidar data that I want to calculate some metrics for (I'll attach a link to the data in a comment).
I also have ground plots that I have extracted the lidar points around, so that I have a couple hundred points per plot (19 plots). Each point has X, Y, Z, height above ground, and the associated plot.
I need to calculate a bunch of metrics on the plot level, so I created plotsgrouped with split(plotpts, plotpts$AssocPlot).
So now I have a data frame with a "page" for each plot, so I can calculate all my metrics by the "plot page". This works just dandy for individual plots, but I want to automate it. (yes, I know there's only 19 plots, but it's the principle of it, darn it! :-P)
So far, I've got a for loop going that calculates the metrics and puts the results in a data frame called Results. I pulled the names of the groups into a list called groups as well.
for(i in 1:length(groups)){
Results$Plot[i] <- groups[i]
Results$Mean[i] <- mean(plotsgrouped$PLT01$Z)
Results$Std.Dev.[i] <- sd(plotsgrouped$PLT01$Z)
Results$Max[i] <- max(plotsgrouped$PLT01$Z)
Results$75%Avg.[i] <- mean(plotsgrouped$PLT01$Z[plotsgrouped$PLT01$Z <= quantile(plotsgrouped$PLT01$Z, .75)])
Results$50%Avg.[i] <- mean(plotsgrouped$PLT01$Z[plotsgrouped$PLT01$Z <= quantile(plotsgrouped$PLT01$Z, .50)])
...
and so on.
The problem arises when I try to do something like:
Results$mean[i] <- mean(paste("plotsgrouped", groups[i],"Z", sep="$")). mean() doesn't recognize the paste as a reference to the vector plotsgrouped$PLT27$Z, and instead fails. I've deduced that it's because it sees the quotes and thinks, "Oh, you're just some text, I can't get the mean of you." or something to that effect.
Btw, groups is a list of the 19 plot names: PLT01-PLT27 (non-consecutive sometimes) and FTWR, so I can't simply put a sequence for the numeric part of the name.
Anyone have an easier way to iterate across my test plots and get arbitrary metrics?
I feel like I have all the right pieces, but just don't know how they go together to give me what I want.
Also, if anyone can come up with a better title for the question, feel free to post it or change it or whatever.

Try with:
for(i in seq_along(groups)) {
Results$Plot[i] <- groups[i] # character names of the groups
tempZ = plotsgrouped[[groups[i]]][["Z"]]
Results$Mean[i] <- mean(tempZ)
Results$Std.Dev.[i] <- sd(tempZ)
Results$Max[i] <- max(tempZ)
Results$75%Avg.[i] <- mean(tempZ[tempZ <= quantile(tempZ, .75)])
Results$50%Avg.[i] <- mean(tempZ[tempZ <= quantile(tempZ, .50)])
}

Related

I want to use heatmap in my code but i am getting error

heatmap(Web_Data$Timeinpage)
str(Web_Data)
heat = c(t(as.matrix(Web_Data$Timeinpage[,-1])))
heatmap(heat)
A few items to note here:
1) by including the c() operator in the c(t(as.matrix(Web_Data$Timeinpage[,-1]))) You are creating a single vector and not a matrix. You can see this by running the following: is.matirx(c(t(as.matrix(Web_Data$Timeinpage[,-1])))). heatmap (I believe) is checking for a matrix because...
2) You need to provide a matrix with at least two rows and two columns for this function to work. Currently, you are only give on vector - time. You will need to provide some other feature of interest to have it work correctly, such as Continent.
3) If you intend to plot ONLY one field, you may consider doing as suggested here and use the image() function. (I included an example below).
4) I find the heatmap function somewhat dated in look. You may want to consider other popular functions, such as ggplot's geom_tile. (see here).
Below is an example code that should produce an output:
#fake data
Web_Data <- data.frame("Timeinpage" = c(123,321,432,555,332,1221,2,43,0, NA,10, 44),
OTHER = rep(c("good", "bad",6)) )
#a matrix with TWO columns from my data frame. Notice the c() is removed and I am not transposing. Also removing the , from [,-1]
heat <- matrix(c(Web_Data$Timeinpage[-1], Web_Data$OTHER[-1]), 2,11)
#output
heatmap(heat)
#one row
heat2 <- as.matrix(sort(Web_Data$Timeinpage[-1])) #sorting as well
#output
image(heat2)

How to do box plots on a range of variables

I have a data matrix with approximately one hundred variables and I want to do box plots of these variables. Doing them one by one is possible, but tedious. The code I use for my box plots is:
boxplot(myVar ~ Group*Trt*Time,data=exp,col=c('red','blue'),frame.plot=T,las=2, ylab='Counts', at=c(1,2,3,4,6,7,8,9,11,12,13,14,16,17,18,19))
I started doing them one by one, but realized there must be better options. So, the boxplot call will take only one variable at at time (I may be wrong), so I am looking for a way to get it done in one go. A for loop? Next, I would like to print the name of the current variable (= the colName) on the plot in order to keep them apart.
Appreciate suggestions.
Thank you.
jd
Why not try the following:
data(something)
panel.bxp <- function(x, ...)
{
a <- par("a"); on.exit(par(a))
par(a = c(0, 2, a[3:4]))
boxplot(x, add=TRUE)
}
Then, to run the function, you can try something like the following:
pairs(something, diag.panel = panel.bxp, text.panel = function(...){})
EDIT: There is also a nice link to an article here on R-bloggers which you might want to have a look at.
Being very new to R, I've tried to follow my 'old' thinking - making a for-loop. Here is what I came up with. Probably very primitive, and therefore, I'd appreciate comments/suggestions. Anyway: the loop:
for (i in 1:ncol(final)) {
#print(i)
c <- colnames(final)[i]
#print(c)
b <- final[,i]
#b <- t(b)
#dim(b)
#print(b)
exp <- data.frame(Group,Trt,Time,b)
#dim(exp)
#print(exp)
boxplot(b ~ Group*Trt*Time,data=exp,col=c('red','blue'),frame.plot=T, las=2, ylab='Counts',main=c, at=c(1,2,3,4,6,7,8,9,11,12,13,14,16,17,18,19))
}
The loop runs through the data matrix 'final', (48rows x 67cols). Picks up the column header, c, which is used in the boxplot call as main title. Picks up the data column, b. Sets up the experiment using the Group, Trt, and Time factors established outside the loop, and calls the boxplot.
This seem to do what I want. Oddly, Rstudio does not allow more than 25 (approx) plots to be stored in the plots console, so I have to run this loop in a couple of rounds.
Anyway, sorry for answering my own question. Better solutions are greatly appreciated since my way is pretty amateourish, I suspect.

automation with for loop and while statements in R

Most of the times there are more than one ways to implement a solution for a specific problem. Hence, there are bad solutions and good solutions. I consider robust implementations the ones that include for loops and while statements, lists or any other function and build-in types that makes our life easier.
I am looking forward to see and understand some examples of high-programming in R.
Assume a task like the following.
#IMPORT DATASET
Dataset <- read.table("blablabla\\dataset.txt", header=T, dec=".")
#TRAINING OF MODEL
Modeltrain <- lm(temperature~latitude+sea.distance+altitude, data=Dataset)
#COEFFICIENT VALUES FOR INDEPENDENT VARIABLES
Intercept <- summary(Modeltrain)$coefficients[1]
Latitude <- summary(Modeltrain)$coefficients[2]
Sea.distance <- summary(Modeltrain)$coefficients[3]
Altitude <- summary(Modeltrain)$coefficients[4]
#ASK FOR USER INPUT AND CALCULATE y
i <- 1
while (i == 1){
#LATITUDE (Xlat)
cat("Input latitude value please: ")
Xlat <- readLines(con="stdin", 1)
Xlat <- as.numeric(Xlat)
cat(Xlat, "is the latitude value. \n")
#LONGTITUDE (Xlong)
#CALCULATE DISTANCE FROM SEA (Xdifs)
cat("Input longtitude value please: ")
Xlong <- readLines(con="stdin", 1)
Xlong <- as.numeric(Xlong)
#cat(Xlong, "\n")
Xdifs <- min(4-Xlong, Xlat)
cat(Xdifs, "is the calculated distance from sea value. \n")
#ALTITUDE(Xlat)
cat("Input altitude value please: ")
Xalt <- readLines(con="stdin", 1)
Xalt <- as.numeric(Xalt)
cat(Xalt, "is the altitude value. \n")
y = Intercept + Latitude*Xlat + Sea.distance*Xdifs + Altitude*Xalt
cat(y, "is the predicted temperature value.\n")
}
First of all, i would like to ask how to, instead of blablabla\\dataset.txt, set an absolute path making the script functional in other OS too.
Second question is how do i automate the above process, to include additional X variables as well, without having to add them manually in the script.
I understand the latest question probably means re-writing the whole thing therefore i don't expect an answer. As i said before i am more interested in understanding how it could be done and do it myself.
Thanks.
p.s. please don't ask for a reproducible example i can't provide much else.
For the first question, you may want to look at the file.path command. For the second, I would approach this by defining, outside the while loop, two lists, one to store the prompts (e.g. list(lat="Please enter Latitude")) and another, with identical names, to store the input values. Then another loop inside the while iterates through the names of the first list, produces the relevant prompt, and stores the response in the named slot in the second list.
If your users are happy interacting with R in such a way, then you're lucky. Else, as #Roland suggests, delegate the UI to some other technology.

How to do for loops without overwriting?

I have a large data.frame called rain with information of many species mesured in different plots at different times (census), from which I want to extract the information. This data frame have many collumns, and in dataF2 I want to keep the same structure however I want to extract from rain the information of the penultimate census (Census.No is one of the collumns of rain) in each plot (Plot.Code is another one). In idx3 I have the information of the number of the penultimate census for each plot.
It's easy to do it for one plot
data1<- rain[Plot.Code==idx3[1,1] & Census.No==idx3[1,2],]
I've been trying to do for loops in R.. but I keep overwriting my data.frame and ending up just with the last loop.
dataF2<- data.frame(nrow= nrow (rain), ncol = ncol (rain))
summary (dataF2)
for (i in 1:length (idx3[,1])){
dataF2<- rain[Plot.Code==idx3[i,1] & Census.No==idx3[i,2],]
}
Here I want to extract from a data frame the information of the penultimate census in each plot (ixd3 contains this information of what was the penultimate census in each plot).
I've tried many things, like:
dataF2<- data.frame(nrow= nrow (rain), ncol = ncol (rain))
for (i in 1:length (idx3[,1])){
data1<- rainfor[Plot.Code==idx3[i,1] & Census.No==idx3[i,2],]
dataF2<- rbind (data1[i])
}
But nothing worked.. my problem is that it keeps overwithin on dataF2!
Cheers!!!
Your clarifications in the comments helped somewhat, but reproducible examples are always better. Let's start at the beginning:
dataF2<- data.frame(nrow= nrow (rain), ncol = ncol (rain))
This is wrong. I think that you're trying to create an empty data frame with the same dimensions as your data frame rain. If you examine dataF2 you'll see that this is far from what you have done with this line. If you read the documentation for the function ?data.frame it will become clear that there are no arguments called nrow and ncol. What you probably intended was something like this:
dataF2 <- rain
dataF2[] <- NA
Inside your for loop you are overwriting your entire data frame because....you are overwriting your entire data frame.
dataF2<- rain[Plot.Code==idx3[i,1] & Census.No==idx3[i,2],]
This assigns something to dataF2, replacing it completely. If you want to assign to just a single row of dataF2 you need to assign to that specific row:
dataF2[i,] <- rain[Plot.Code==idx3[i,1] & Census.No==idx3[i,2],]
I can't absolutely assure that this will work correctly, since you haven't provided a sufficiently detailed example, so I'm not sure that all the dimensions will coincide properly when you index on i. But this is the basic idea.

Calling a specific column in a subset of data that has been binned and stored in a list

I have a very large data set that I have binned, and stored each bin (subset) as a list so that I can easily call any given subset. My problem is in calling for a specific column within a subset.
For example my data (which has diameters and strengths as the columns), is broken up into 20 bins, by diameter. I manually binned the data, like so:
subset.1 <- subset(mydata, Diameter <= 0.01)
Similar commands were used, to make 20 bins. Then I stored the names (subset.1 through subset.20) into a list:
diameter.bin<-list(subset.1, ... , subset.20)
I can successfully call each diameter bin using:
diameter.bin[x]
Now, if I only want to see the strength values for a given diameter bin, I can use the original name (that is store in the list):
subset.x$Strength
But I cannot get this information using the list call:
diameter.bin[x]$Strength
This command returns NULL
Note that when I call any subset (either by diameter.bin[x], subset.x or even subset.x$Strength) my column headers do show up. When I use:
names(subset.1)
This returns "Diameter" and "Strength"
But when I use:
names(diameter.bin[1])
This returns NULL.
I'm assuming that the column header is part of the problem, but I'm not sure how to fix it, other than take the headers off of the original data file. I would prefer not to do this if at all possible.
The end goal is to look at the distribution of strength values for each diameter bin, so I will be doing things like drawing histograms, calculating parameters etc. I was hoping to do something along these lines to produce the histograms:
n=length(diameter.bin)
for(i in (1:n))
{
hist(diameter.bin[i]$Strength)
}
And do something similar to this to store median values for each bin in a new vector.
Any tips are greatly appreciated, as right now I'm doing it all 1 bin at a time, and I know a loop (or something similar) would really speed up my analysis.
You need two square brackets. Here is a reproducible example demonstrating the issue:
> diam <- data.frame(x=rnorm(5), y=rnorm(5))
>
> diam.l <- list(diam, diam)
> diam.l[1]$x
NULL
> diam.l[[1]]$x
[1] -0.5389441 -0.5155441 -1.2437108 -2.0044323 -0.6914124

Resources