R: Selection for x,y coordinates with conditions - r

I have some difficulties to solve a problem concerning the selection of values in a data frame. Here is the thing:
- I have a data frame containing these variables: x-coordinates, y-coordinates, diameter, G value, H value, Quality value, Ecological value. Each line corresponds to one individual (which are trees in this exercise)
I need to find the individual with the best quality value = this I can do it
But then, I have to find the second tree with a good quality value, which has to be in the 10 next meters of the reference tree (the one with the best quality value).
And this selection has to be made at every tree selected, every time 10 meters further!
this should bring me to a selection of x-y-coordinates, which are separated by 10 meters and represent good quality value.
Now, here is what I tried:
> kk<- function(x, y)
+ {
+ coordx<-data$x.Koordinate[data$Q==24] #I have looked before for the best quality value of the sample, which is 24
+ coordy<-data$y.Koordinate[data$Q==24]
+ x <- ifelse(data$x.Koordinate>coordx-11 & data$Q>15,data$x.Koordinate,0) #I choose that I did'nt wanted to have less than 15 of quality value
+ y<-ifelse(data$y.Koordinate>coordy-11 & data$Q>15,data$y.Koordinate,0)#-11 meters from the reference coordinates, the next tree selected has to be inbetween
+ return(c(x,y))
+ }
> kk(data$x.Koordinate, data$y.Koordinate)
[1] 0 0 0 0 0 205550 205550 0 205600 205600 0 0 0 0 0 0 0
[18] 604100 0 604150 604100 0
The problem here is that we can not clearly see the difference between the coordinates for x and the ones for y.
I tried this:
> kk<- function(x, y)
+ {
+ coordx<-data$x.Koordinate[data$Q==24]
+ coordy<-data$y.Koordinate[data$Q==24]
+ x <- ifelse(data$x.Koordinate>coordx-11 & data$Q>15,data$x.Koordinate," ")
+ y<-ifelse(data$y.Koordinate>coordy-11 & data$Q>15,data$y.Koordinate," ")
+ return(list(x,y))
+ }
> kk(data$x.Koordinate, data$y.Koordinate)
[[1]]
[1] " " " " " " " " " " "205550" "205550" " " "205600" "205600" " "
[[2]]
[1] " " " " " " " " " " " " "604100" " " "604150" "604100" " "
>
Where we can see better the two levels related to the x and y coordinates.
The first question is simple: Is it possible for this function to return the values in a form like x,y or x y ? (without any 0, or « », or space) Or should I use another R function to obtain this result?
The second question is complex: How can I say to R to repeat this function from the coordinates he finds in this first attempt, and for the whole data?

Thank you very much for your answer. It helps me a lot! The first part of my problem is solved, the second seems to have a bug somewhere... And I don't see clearly where the function says to R to go every 10 meters further (in fact, every 50 meters, according to my data, see below)...But thank you anyway, it's a good starter, I will continue my research on this problem :)
PS: I understand it is difficult without the data. Unfortunately, I cannot show them on the net. However, I can show you a part of it:
ID Bezeichnung x.Koordinate y.Koordinate Q N hdom V Mittelstamm Fi Ta Foe Lae ueN Bu Es Ei Ah ueL Struktur
1 10,809 62 205450 603950 8 1067 21 64 10 NA NA NA NA NA 100 NA NA NA NA NA
2 10,810 63 205450 604000 16 1333 22 128 12 NA NA NA NA NA 75 NA NA 25 NA NA
3 10,811 56 205500 604050 20 800 22 160 18 NA NA NA NA NA 60 NA NA NA 40 NA
4 10,812 55 205500 604000 12 1033 20 97 12 33 NA NA NA NA 67 NA NA NA NA NA
5 10,813 54 205500 603950 20 500 56 0 23 NA NA NA NA NA 100 NA NA NA NA NA
6 10,814 46 205550 604050 16 567 32 215 19 75 NA NA NA NA 25 NA NA NA NA NA
7 10,815 47 205550 604100 16 233 26 174 30 NA 25 NA NA NA 50 NA NA NA 25 NA
8 10,816 48 205550 604150 0 1167 16 0 0 NA NA NA NA NA NA NA NA NA NA NA
9 10,817 43 205600 604150 24 633 33 366 22 83 17 NA NA NA NA NA NA NA NA NA
10 10,818 42 205600 604100 16 1500 33 282 12 NA NA NA NA NA NA NA NA 75 25 NA
Here is the result with your answer for the second problem:
> Arbres<-kk(x.Koordinate, y.Koordinate, data=data)
> for (i in 1:length(Arbres[,1])
+ kk(Arbres(i,1),Arbres[i,2])
Error: unexpected symbol in:
"for (i in 1:length(Arbres[,1])
kk"
Sorry, I just rename it "Arbre"
Thanks again,
C.

It's a bit hard for me to try this out without your dataset, or a small example of it, but I think the following should work for your first question.
The first time you use the function you enter the x and y coordinate of the tree that has Quality 24 for x and y in your function
> kk<- function(x, y)
+ {
+ coordx<-x
+ coordy<-y
+ x1 <- ifelse(data$x.Koordinate>coordx-11 & data$Q>15,data$x.Koordinate,NA)
+ y1 <- ifelse(data$y.Koordinate>coordy-11 & data$Q>15,data$y.Koordinate,NA)
+ return(matrix(c(x1,y1),nrow=length(x1), ncol=2, dimnames=list(NULL, c("x","y"))))
+ }
That should give you a matrix with two columns corresponding to the x and y coordinates and a NA if the condition is not met.
The second question is more difficult because as your output already showed there are multiple trees that meet the criteria you've set. If you want all of these checked again you can use the output of your function in a loop. Something like this:
Tree1_friends<-kk(data$x.Koordinate[data$Q==24], data$y.Koordinate[data$Q==24])
for (i in 1:length(Tree1_friends[,1]))
print(kk(Tree1_friends[i,1],Tree1_friends[i,2]))
Note that this code only prints the result, but with some clever assignment strategy you can probably save them as well

Related

how to declare a global variable within a for loop and why is the else statement read as unexpected?

I have datasets that have sulfate and nitrate columns in them. Depending on what the user chooses, either sulfate mean or nitrate mean is returned. I have a for loop and within it I have an IF and ELSE statement to sort this out. The following error arises when attempting to compile data.frame(datada,vec1):
"Error in data.frame(datada, vec1) : object 'datada' not found"
Also, the else statement is considered unexpected. The following error is given:
"Error: unexpected 'else' in " else"
complete <- function(directory,pollutant = "sulfate", id = 1:332) {
datada <- id
filelist <- list.files(path = directory, pattern = ".csv", full.names = TRUE)
vec <- numeric()
vec1 <- numeric()
vec2 <- numeric()
for(i in datada) {
if (pollutant == "sulfate"){
data <- read.csv(filelist[i])
vec1<- c(vec1, colMeans(data$sulfate,na.rm = TRUE )
}
data.frame(datada,vec1) #datada is not "found"
else (pollutant == "nitrate"){ #else is "unexpected"
data <- read.csv(filelist[i])
vec2<- c(vec2, colMeans(data$sulfate,na.rm = TRUE )
}
data.frame(datada,vec2)
}
Here is what one dataset looks like:
Date sulfate nitrate ID
1 2001-01-01 NA NA 2
2 2001-01-02 NA NA 2
3 2001-01-03 NA NA 2
4 2001-01-04 NA NA 2
5 2001-01-05 NA NA 2
6 2001-01-06 NA NA 2
7 2001-01-07 NA NA 2
8 2001-01-08 NA NA 2
9 2001-01-09 NA NA 2
10 2001-01-10 NA NA 2
11 2001-01-11 NA NA 2
12 2001-01-12 NA NA 2
13 2001-01-13 NA NA 2
14 2001-01-14 NA NA 2
15 2001-01-15 NA NA 2
16 2001-01-16 NA NA 2
17 2001-01-17 NA NA 2
18 2001-01-18 NA NA 2
19 2001-01-19 2.30 0.699 2
20 2001-01-20 NA NA 2
21 2001-01-21 NA NA 2
22 2001-01-22 NA NA 2
23 2001-01-23 NA NA 2
24 2001-01-24 NA NA 2
25 2001-01-25 2.19 4.970 2
Its expected to return something like this:
datada vec
1 1 117
2 3 243
3 5 402
4 7 442
5 9 275
Generated by the data.frame(datada,vec1)
Unless you want to manipulate environment objects, the easiest thing to do is to declare your variable outside the function and use <<- form of assignment inside the function.
datada <- NULL
...
complete <- function(directory,pollutant = "sulfate", id = 1:332) {
datada <<- id
...
}
I have no idea why datada is not found - when I tried a simplified version of the function on my system it seems to work fine.
As to the else -- an else must come directly after the end of the if's statement. It's not expected because you placed data.frame(datada,vec1) before it. If you put that line into the {}, everything should be fine.
But generally speaking your code is unnecessarily complex, plus it doesn't actually return anything.
Try something like this:
complete <- function(directory,pollutant = "sulfate", id = 1:332) {
datada <- id
filelist <- list.files(path = directory, pattern = ".csv", full.names = TRUE)
if (!(pollutant) %in% c("sulfate","nitrate")) stop("Unknown pollutant")
lapply(filelist, function(x) {
data<-read.csv(x)
colMeans(data[,pollutant],na.rm=TRUE)
})
}
This will output a list where each element is the vector of colMeans of each of the files. You could replace lapply with sapply which will (probably) give you a matrix instead of a list.
(note I couldn't test it because I don't have the dataset, so there may be some errors here)

Looping through a vector, creating a new variable which is the first vector minus 2 other vectors when none of them are NA

Assuming the following dataset:
Company Sales COGS Staff
A 100 50 25
B 200 NA 100
C NA 50 25
D 75 50 25
E 125 100 NA
I would like to create a new variable called profit which is Sales- COGS -Staff, if neither of those variables is NA. The desired output would be as follows:
Company Sales COGS Staff Profit
A 100 50 25 25
B 200 NA 100 NA
C NA 50 25 NA
D 75 50 25 0
E 125 100 NA NA
I started with something like:
# Creating the profit column (should be unnecessary right?)
df$Profit <- NA
# For each row in the sales column/vector
for(i in df$Sales){
# If all are not NA
if(!is.na(df$Sales) & !is.na(df$COGS) & !is.na(df$Staff)){
# Do calculation for profit
df$Profit <- df$Sales - (df$COGS + df$Staff)
# If calculation not possible
} else {
df$Profit <- NA
}}
Which does not give an error, but it makes R go a bit haywire. Is there a more efficient way to do this?
As simple as what you see ...
df$Sales-df$COGS-df$Staff
[1] 25 NA NA 0 NA
If there is any NA in COGS and Staff result will become NA , just like when you do sum , there is na.rm , the simple operation mark default as na.rm = False
This seems a job for within.
df <- within(df, Profit <- Sales - COGS - Staff)
df
# Company Sales COGS Staff Profit
#1 A 100 50 25 25
#2 B 200 NA 100 NA
#3 C NA 50 25 NA
#4 D 75 50 25 0
#5 E 125 100 NA NA
DATA.
df <- read.table(text = "
Company Sales COGS Staff
A 100 50 25
B 200 NA 100
C NA 50 25
D 75 50 25
E 125 100 NA
", header = TRUE)
We create a logical index with rowSums to check if there is any NA in one of the rows of the selected column dataset and if not, do the subtraction of the columns and assign it to 'Profit'
i1 <- !rowSums(is.na(df1[-1]))
df1$Profit[i1] <- with(df1, (Sales-COGS-Staff)[i1])
df1
# Company Sales COGS Staff Profit
#1 A 100 50 25 25
#2 B 200 NA 100 NA
#3 C NA 50 25 NA
#4 D 75 50 25 0
#5 E 125 100 NA NA
NOTE: It is a general way to exclude the NA rows and it thus we do the calculation only a subset of rows instead of the whole dataset
But, any value substracted with NA returns NA, so using
df1$Profit <- with(df1, (Sales - COGS - Staff))
should also work
Or another option if there are many columns,
rowSums(df1[-1] * c(1, -1, -1)[col(df1[-1])])

R - enter basic formula

I am new to R and struggling to understand its quirks. I'm trying to do something which should be really simple, but is turning out to be apparently very complicated.
I am used to Excel, SQL and Minitab, where you can enter a value in one column which includes references to other columns and parameters. However, R doesn't seem to be allowing me to do this.
I have a table with (currently) four columns:
Date Pallets Lt Tt
1 28/12/2011 491 NA NA
2 29/12/2011 385 NA 0.787890411
3 30/12/2011 662 NA NA
4 31/12/2011 28 NA NA
5 01/01/2012 46 NA NA
6 02/01/2012 403 NA NA
7 03/01/2012 282 NA NA
8 04/01/2012 315 NA NA
9 05/01/2012 327 NA NA
10 06/01/2012 458 NA NA
and have a parameter "beta", with a value which I have assigned as 0.0002.
All I want to do is assign a formula to rows 3:10 which is:
beta*(Pallets t - Pallets t-1)+(1-beta)*Tt t-1.
I thought that the appropriate code might be:
Table[3:10,4]<-beta*(Table[3:10,"Pallets"]-Table[2:9,"Pallets"])+(1-beta)*Table[2:9,"Tt"]
However, this doesn't work. The first time I enter this formula, it generates:
Date Pallets Lt Tt
1 28/12/2011 491 NA NA
2 29/12/2011 385 NA 0.7878904
3 30/12/2011 662 NA 0.8431328
4 31/12/2011 28 NA NA
5 01/01/2012 46 NA NA
6 02/01/2012 403 NA NA
7 03/01/2012 282 NA NA
8 04/01/2012 315 NA NA
9 05/01/2012 327 NA NA
10 06/01/2012 458 NA NA
So it's generated the correct answer for the second item in the series, but not for any of the subsequent values.
It seems as though R doesn't automatically update each row, and the relationship to each other row, when you enter a formula, as Excel does. Having said that, Excel actually would require me to enter the formula in cell [4,Tt], and then drag this down to all of the other cells. Perhaps R is the same, and there is an equivalent to "dragging down" which I need to do?
Finally, I also noticed that when I change the value of the beta parameter, through, e.g. beta<-0.5, and then print the Table values again, they are unchanged - so the table hasn't updated even though I have changed the value of the parameter.
Appreciate that these are basic questions, but I am very new to R.
In R, the computations are not made "cell by cell", but are vectorised - in your example, R takes the vectors Table[3:10,"Pallets"], Table[2:9,"Pallets"] and Table[2:9,"Tt"] as they are at the moment, computes the resulting vector, and finally assigns it to Table[3:10,4].
If you want to make some computations "cell by cell", you have to use the for loop:
beta <- 0.5
df <- data.frame(v1 = 1:12, v2 = 0)
for (i in 3:10) {
df[i, "v2"] <- beta * (df[i, "v1"] - df[i-1, "v1"]) + (1 - beta) * df[i-1, "v2"]
}
df
v1 v2
1 1 0.0000000
2 2 0.0000000
3 3 0.5000000
4 4 0.7500000
5 5 0.8750000
6 6 0.9375000
7 7 0.9687500
8 8 0.9843750
9 9 0.9921875
10 10 0.9960938
11 11 0.0000000
12 12 0.0000000
As it comes to your second question, R will never update any values on its own (imagine having set manual calculation in Excel). So you need to repeat the computations after changing beta.
Although it's generally a bad design, but you can iterate over rows in a loop:
Table$temp <- c(0,diff(Table$Palletes,1))
prevTt = 0
for (i in 1:10)
{
Table$Tt[i] = Table$temp * beta + (1-beta)*prevTt
prevTt = Table$Tt[i]
}
Table$temp <- NULL

R: tapply(x,y,sum) returns NA instead of 0

I have a data set that contains occurrences of events over multiple years, regions, quarters, and types. Sample:
REGION Prov Year Quarter Type Hit Miss
xxx yy 2008 4 Snow 1 0
xxx yy 2009 2 Rain 0 1
I have variables defined to examine the columns of interest:
syno.h <- data$Type
quarter.number<-data$Quarter
syno.wrng<- data$Type
I wanted to get the amount of Hits per type, and quarter for all of the data. Given that the Hits are either 0 or 1, then a simple sum() function using tapply was my first attempt.
tapply(syno.h, list(syno.wrng, quarter.number), sum)
this returned:
1 2 3 4
ARCO NA NA NA 0
BLSN 0 NA 15 74
BLZD 4 NA 17 54
FZDZ NA NA 0 1
FZRA 26 0 143 194
RAIN 106 126 137 124
SNOW 43 2 215 381
SNSQ 0 NA 18 53
WATCHSNSQ NA NA NA 0
WATCHWSTM 0 NA NA NA
WCHL NA NA NA 1
WIND 47 38 155 167
WIND-SUETES 27 6 37 56
WIND-WRECK 34 14 44 58
WTSM 0 1 7 18
For a some of the types that have no occurrences in a given quarter, tapply sometimes returns NA instead of zero. I have checked the data a number of times, and I am confident that it is clean. The values that aren't NA are also correct.
If I check the type/quarter combinations that return NA with tapply using just sum() I get values I expect:
sum(syno.h[quarter.number==3&syno.wrng=="BLSN"])
[1] 15
> sum(syno.h[quarter.number==1&syno.wrng=="BLSN"])
[1] 0
> sum(syno.h[quarter.number==2&syno.wrng=="BLSN"])
[1] 0
> sum(syno.h[quarter.number==2&syno.wrng=="ARCO"])
[1] 0
It seems that my issue is with how I use tapply with sum, and not with the data itself.
Does anyone have any suggestions on what the issue may be?
Thanks in advance
I have two potential solutions for you depending on exactly what you are looking for. If you just are interested in your number of positive Hits per Type and Quarter and don't need a record of when no Hits exist, you can get an answer as
aggregate(data[["Hit"]], by = data[c("Type","Quarter")], FUN = sum)
If it is important to keep a record of the ones where there are no hits as well, you can use
dataHit <- data[data[["Hit"]] == 1, ]
dataHit[["Type"]] <- factor(data[["Type"]])
dataHit[["Quarter"]] <- factor(data[["Quarter"]])
table(dataHit[["Type"]], dataHit[["Quarter"]])

dplyr "not a promise" error

I have a panel dataset for which I have created lagged variables using the lag() function.
When I try to calculate the delta for each timepoint, using the mutate command below, I get the error message "Error: not a promise"
> kw.lags[,c("imps", "lag1_imps", "lag2_imps")]
Source: local data frame [157,737 x 3]
Groups:
imps lag1_imps lag2_imps
1 65 NA NA
2 79 65 NA
3 62 79 65
4 69 62 79
5 1 NA NA
6 2 NA NA
7 2 2 NA
8 1 2 2
9 2 1 2
10 5 NA NA
.. ... ... ...
> kw.deltas <- mutate(kw.lags,
+ d1_imps = imps - lag1_imps,
+ d2_imps = imps - lag2_imps,
+ d3_imps = imps - lag3_imps,
+ )
Error: not a promise
You have a comma after the last line in your mutate statement. Try to remove that, and see if it fixes the error.

Resources