dplyr "not a promise" error - r

I have a panel dataset for which I have created lagged variables using the lag() function.
When I try to calculate the delta for each timepoint, using the mutate command below, I get the error message "Error: not a promise"
> kw.lags[,c("imps", "lag1_imps", "lag2_imps")]
Source: local data frame [157,737 x 3]
Groups:
imps lag1_imps lag2_imps
1 65 NA NA
2 79 65 NA
3 62 79 65
4 69 62 79
5 1 NA NA
6 2 NA NA
7 2 2 NA
8 1 2 2
9 2 1 2
10 5 NA NA
.. ... ... ...
> kw.deltas <- mutate(kw.lags,
+ d1_imps = imps - lag1_imps,
+ d2_imps = imps - lag2_imps,
+ d3_imps = imps - lag3_imps,
+ )
Error: not a promise

You have a comma after the last line in your mutate statement. Try to remove that, and see if it fixes the error.

Related

Why is this error happening with the imputation function in R?

I am trying to do an imputation based on this example: impute example
data(airquality)
summary(airquality)
airq = airquality
ind = sample(nrow(airq), 10)
airq$Wind[ind] = NA
airq$Wind = cut(airq$Wind, c(0,8,16,24))
summary(airq)
imp = impute(airq, classes = list(integer = imputeMean(), factor = imputeMode()),
dummy.classes = "integer")
which gives me a warning:
Warning message:
In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
argument is not numeric or logical: returning NA
However, when I try looking at the returned dataframe, I get:
> head(imp, 10)
Error in x[..., drop = drop] : incorrect number of dimensions
> head(imp$data, 10)
NULL
and desc gives:
> imp$desc
NULL
I had initially done the above using my actual data, and was getting these errors, so I tried the above example for a sanity check.
I've tried this in Windows both from RStudio and from the command line interface, all with same results on the example and my actual data. Also, tried using version 3.63 and 4.03, again with the same results.
I've also tried this on two fresh installs on Ubuntu, with the same results.
Interestingly, when I do names the dummy variable are not there:
> names(imp)
[1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day"
str(imp) gives:
> str(imp)
Classes ‘impute’ and 'data.frame': 153 obs. of 6 variables:
$ Ozone : num 41 36 12 18 NA 28 23 19 8 NA ...
$ Solar.R: num 190 118 149 313 NA NA 299 99 19 194 ...
$ Wind : Factor w/ 3 levels "(0,8]","(8,16]",..: 1 1 2 NA 2 2 2 2 3 2 ...
$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
- attr(*, "imputed")= int [1:54] 5 NA NA NA NA NA NA NA NA NA ...
and looking at one of the columns on which imputation should have taken place:
> head(imp$Solar.R)
[1] 190 118 149 313 NA NA
(my actual data replaced NA with all 0's, even though it should have been the column mean)
UPDATE: I tested this just now on my local machine running MacOS and getting the exact same error.
I figured it out. I was using whichever version of impute was in the Hmisc library. Using mlr did the trick.
> imp$desc
Imputation description
Target:
Features: 6; Imputed: 6
impute.new.levels: TRUE
recode.factor.levels: TRUE
dummy.type: factor

R: Selection for x,y coordinates with conditions

I have some difficulties to solve a problem concerning the selection of values in a data frame. Here is the thing:
- I have a data frame containing these variables: x-coordinates, y-coordinates, diameter, G value, H value, Quality value, Ecological value. Each line corresponds to one individual (which are trees in this exercise)
I need to find the individual with the best quality value = this I can do it
But then, I have to find the second tree with a good quality value, which has to be in the 10 next meters of the reference tree (the one with the best quality value).
And this selection has to be made at every tree selected, every time 10 meters further!
this should bring me to a selection of x-y-coordinates, which are separated by 10 meters and represent good quality value.
Now, here is what I tried:
> kk<- function(x, y)
+ {
+ coordx<-data$x.Koordinate[data$Q==24] #I have looked before for the best quality value of the sample, which is 24
+ coordy<-data$y.Koordinate[data$Q==24]
+ x <- ifelse(data$x.Koordinate>coordx-11 & data$Q>15,data$x.Koordinate,0) #I choose that I did'nt wanted to have less than 15 of quality value
+ y<-ifelse(data$y.Koordinate>coordy-11 & data$Q>15,data$y.Koordinate,0)#-11 meters from the reference coordinates, the next tree selected has to be inbetween
+ return(c(x,y))
+ }
> kk(data$x.Koordinate, data$y.Koordinate)
[1] 0 0 0 0 0 205550 205550 0 205600 205600 0 0 0 0 0 0 0
[18] 604100 0 604150 604100 0
The problem here is that we can not clearly see the difference between the coordinates for x and the ones for y.
I tried this:
> kk<- function(x, y)
+ {
+ coordx<-data$x.Koordinate[data$Q==24]
+ coordy<-data$y.Koordinate[data$Q==24]
+ x <- ifelse(data$x.Koordinate>coordx-11 & data$Q>15,data$x.Koordinate," ")
+ y<-ifelse(data$y.Koordinate>coordy-11 & data$Q>15,data$y.Koordinate," ")
+ return(list(x,y))
+ }
> kk(data$x.Koordinate, data$y.Koordinate)
[[1]]
[1] " " " " " " " " " " "205550" "205550" " " "205600" "205600" " "
[[2]]
[1] " " " " " " " " " " " " "604100" " " "604150" "604100" " "
>
Where we can see better the two levels related to the x and y coordinates.
The first question is simple: Is it possible for this function to return the values in a form like x,y or x y ? (without any 0, or « », or space) Or should I use another R function to obtain this result?
The second question is complex: How can I say to R to repeat this function from the coordinates he finds in this first attempt, and for the whole data?
Thank you very much for your answer. It helps me a lot! The first part of my problem is solved, the second seems to have a bug somewhere... And I don't see clearly where the function says to R to go every 10 meters further (in fact, every 50 meters, according to my data, see below)...But thank you anyway, it's a good starter, I will continue my research on this problem :)
PS: I understand it is difficult without the data. Unfortunately, I cannot show them on the net. However, I can show you a part of it:
ID Bezeichnung x.Koordinate y.Koordinate Q N hdom V Mittelstamm Fi Ta Foe Lae ueN Bu Es Ei Ah ueL Struktur
1 10,809 62 205450 603950 8 1067 21 64 10 NA NA NA NA NA 100 NA NA NA NA NA
2 10,810 63 205450 604000 16 1333 22 128 12 NA NA NA NA NA 75 NA NA 25 NA NA
3 10,811 56 205500 604050 20 800 22 160 18 NA NA NA NA NA 60 NA NA NA 40 NA
4 10,812 55 205500 604000 12 1033 20 97 12 33 NA NA NA NA 67 NA NA NA NA NA
5 10,813 54 205500 603950 20 500 56 0 23 NA NA NA NA NA 100 NA NA NA NA NA
6 10,814 46 205550 604050 16 567 32 215 19 75 NA NA NA NA 25 NA NA NA NA NA
7 10,815 47 205550 604100 16 233 26 174 30 NA 25 NA NA NA 50 NA NA NA 25 NA
8 10,816 48 205550 604150 0 1167 16 0 0 NA NA NA NA NA NA NA NA NA NA NA
9 10,817 43 205600 604150 24 633 33 366 22 83 17 NA NA NA NA NA NA NA NA NA
10 10,818 42 205600 604100 16 1500 33 282 12 NA NA NA NA NA NA NA NA 75 25 NA
Here is the result with your answer for the second problem:
> Arbres<-kk(x.Koordinate, y.Koordinate, data=data)
> for (i in 1:length(Arbres[,1])
+ kk(Arbres(i,1),Arbres[i,2])
Error: unexpected symbol in:
"for (i in 1:length(Arbres[,1])
kk"
Sorry, I just rename it "Arbre"
Thanks again,
C.
It's a bit hard for me to try this out without your dataset, or a small example of it, but I think the following should work for your first question.
The first time you use the function you enter the x and y coordinate of the tree that has Quality 24 for x and y in your function
> kk<- function(x, y)
+ {
+ coordx<-x
+ coordy<-y
+ x1 <- ifelse(data$x.Koordinate>coordx-11 & data$Q>15,data$x.Koordinate,NA)
+ y1 <- ifelse(data$y.Koordinate>coordy-11 & data$Q>15,data$y.Koordinate,NA)
+ return(matrix(c(x1,y1),nrow=length(x1), ncol=2, dimnames=list(NULL, c("x","y"))))
+ }
That should give you a matrix with two columns corresponding to the x and y coordinates and a NA if the condition is not met.
The second question is more difficult because as your output already showed there are multiple trees that meet the criteria you've set. If you want all of these checked again you can use the output of your function in a loop. Something like this:
Tree1_friends<-kk(data$x.Koordinate[data$Q==24], data$y.Koordinate[data$Q==24])
for (i in 1:length(Tree1_friends[,1]))
print(kk(Tree1_friends[i,1],Tree1_friends[i,2]))
Note that this code only prints the result, but with some clever assignment strategy you can probably save them as well

R: tapply(x,y,sum) returns NA instead of 0

I have a data set that contains occurrences of events over multiple years, regions, quarters, and types. Sample:
REGION Prov Year Quarter Type Hit Miss
xxx yy 2008 4 Snow 1 0
xxx yy 2009 2 Rain 0 1
I have variables defined to examine the columns of interest:
syno.h <- data$Type
quarter.number<-data$Quarter
syno.wrng<- data$Type
I wanted to get the amount of Hits per type, and quarter for all of the data. Given that the Hits are either 0 or 1, then a simple sum() function using tapply was my first attempt.
tapply(syno.h, list(syno.wrng, quarter.number), sum)
this returned:
1 2 3 4
ARCO NA NA NA 0
BLSN 0 NA 15 74
BLZD 4 NA 17 54
FZDZ NA NA 0 1
FZRA 26 0 143 194
RAIN 106 126 137 124
SNOW 43 2 215 381
SNSQ 0 NA 18 53
WATCHSNSQ NA NA NA 0
WATCHWSTM 0 NA NA NA
WCHL NA NA NA 1
WIND 47 38 155 167
WIND-SUETES 27 6 37 56
WIND-WRECK 34 14 44 58
WTSM 0 1 7 18
For a some of the types that have no occurrences in a given quarter, tapply sometimes returns NA instead of zero. I have checked the data a number of times, and I am confident that it is clean. The values that aren't NA are also correct.
If I check the type/quarter combinations that return NA with tapply using just sum() I get values I expect:
sum(syno.h[quarter.number==3&syno.wrng=="BLSN"])
[1] 15
> sum(syno.h[quarter.number==1&syno.wrng=="BLSN"])
[1] 0
> sum(syno.h[quarter.number==2&syno.wrng=="BLSN"])
[1] 0
> sum(syno.h[quarter.number==2&syno.wrng=="ARCO"])
[1] 0
It seems that my issue is with how I use tapply with sum, and not with the data itself.
Does anyone have any suggestions on what the issue may be?
Thanks in advance
I have two potential solutions for you depending on exactly what you are looking for. If you just are interested in your number of positive Hits per Type and Quarter and don't need a record of when no Hits exist, you can get an answer as
aggregate(data[["Hit"]], by = data[c("Type","Quarter")], FUN = sum)
If it is important to keep a record of the ones where there are no hits as well, you can use
dataHit <- data[data[["Hit"]] == 1, ]
dataHit[["Type"]] <- factor(data[["Type"]])
dataHit[["Quarter"]] <- factor(data[["Quarter"]])
table(dataHit[["Type"]], dataHit[["Quarter"]])

Issue with NA values when removing rows from data frame in R

This is my data frame:
ID <- c('TZ1','TZ2','TZ3','TZ4')
hr <- c(56,32,38,NA)
cr <- c(1,4,5,2)
data <- data.frame(ID,hr,cr)
ID hr cr
1 TZ1 56 1
2 TZ2 32 4
3 TZ3 38 5
4 TZ4 NA 2
I want to remove the rows where data$hr = 56. This is what I want the end product to be:
ID hr cr
2 TZ2 32 4
3 TZ3 38 5
4 TZ4 NA 2
This is what I thought would work:
data = data[data$hr !=56,]
However the resulting data frame looks like this:
ID hr cr
2 TZ2 32 4
3 TZ3 38 5
NA <NA> NA NA
How can I mofify my code to encorporate the NA value so this doesn't happen? Thank you for your help, I can't figure it out.
EDIT: I also want to keep the NA value in the data frame.
The issue is that when we do the == or !=, if there are NA values, it will remain as such and create an NA row for that corresponding NA value. So one way to make the logical index with only TRUE/FALSE values will be to use is.na also in the comparison.
data[!(data$hr==56 & !is.na(data$hr)),]
# ID hr cr
#2 TZ2 32 4
#3 TZ3 38 5
#4 TZ4 NA 2
We could also apply the reverse logic
subset(data, hr!=56|is.na(hr))
# ID hr cr
#2 TZ2 32 4
#3 TZ3 38 5
#4 TZ4 NA 2

Subsetting rows by passing an argument to a function

I have the following data frame which I imported into R using read.table() (I incorporated read.table() within read_data() which is a function I created that also throw messages in case the file name is not written appropriately):
> raw_data <- read_data("n44.txt")
[1] #### Reading txt file ####
> head(raw_data)
subject block trial_num soa target_identity prime_type target_type congruency prime_exposure target_exposure button_pressed rt ac
1 99 1 1 200 82 9 1 9 0 36 1 1253 1
2 99 1 2 102 95 2 1 2 75 36 1 1895 1
3 99 1 3 68 257 2 2 1 75 36 2 1049 1
4 99 1 4 68 62 9 1 9 0 36 1 1732 1
5 99 1 5 34 482 9 3 9 0 36 3 765 1
6 99 1 6 68 63 9 1 9 0 36 1 2027 1
Then I'm using raw_data within the early_prep() function I created (I copied only the relevant part of the function):
early_prep <- function(file_name, keep_rows = NULL, id = NULL){
if (is.null(id)) {
# Stops running the function
stop("~~~~~~~~~~~ id is missing. Please provide name of id column ~~~~~~~~~~~")
}
# Call read_data() function
raw_data <- read_data(file_name)
if (!is.null(keep_rows)) {
raw_data <- raw_data[keep_rows, ]
# Print to console
print("#### Deleting unnecesarry rows in raw_data ####", quote = FALSE)
}
print(dim(raw_data))
print(head(raw_data))
return(raw_data)
}
}
My problem is with raw_data <- raw_data[keep_rows, ].
When I enter keep_rows = "raw_data$block > 1" this is what I get:
> x1 <- early_prep(file_name = "n44.txt", keep_rows = "raw_data$block > 1", id = "subject")
[1] #### Reading txt file ####
[1] #### Deleting unnecesarry rows in raw_data ####
[1] 1 13
subject block trial_num soa target_identity prime_type target_type congruency prime_exposure target_exposure button_pressed rt ac
NA NA NA NA NA NA NA NA NA NA NA NA NA NA
How can I solve this so it will only delete the rows I want?
Any help will be greatly appreciated
Best,
Ayala
The problem is that you pass the condition as a string and not as a real condition, so R can't evaluate it when you want it to.
if you still want to pass it as string you need to parse and eval it in the right place for example:
cond = eval(parse(text=keep_rows))
raw_data = raw_data[cond,]
This should work, I think

Resources