I am intermediate level in Stata and I feel comfortable working there. I always find the way to do the hardest tasks in Stata instead of R. However, I must present this one in R so I can not avoid it (even if Stata is always simpler to me) this time.
I want to translate this code:
gen new_variable = 0
replace new_variable = 1 if old_variable != old_variable[_n-1]
As per this website (https://www.matthieugomez.com/statar/manipulate-data.html), I should use dplyr library, specifically ifelse and reduce functions, which I do with the following code:
database$new_variable <- mutate(database$new_variable = ifelse(database$old_variable != Reduce(sum, database$old_variable, accumulate = TRUE), 1, database$new_variable))
However, it is not working. I know this code may be quite messy, but I'm so used to Stata.
The question is: How can I successfully translate that code from Stata to R with dplyr library? (if you have a simpler approach it would be great too).
Based on the information you give. Try this
library(dplyr)
new_var=0
database%>%
mutate(new_variable=ifelse(oldvariable!=lag(oldvariable),1,new_var)
I would like to convert this data frame
data <- data.frame(color=c("red","red","red","green","green","green","blue","blue","blue"),object=c("box","chair","table","box","chair","table","box","chair","table"),units=c(1:9),price=c(11.5,12.5,13.5,14.5,15.5,16.5,17.5,18.5,19.5))
to this other one
output <- data.frame(color=c("red","green","blue"),units_box=c(1,4,7),price_box=c(11.5,14.5,17.5), units_chair=c(2,5,8),price_chair=c(12.5,15.5,18.5),units_table=c(3,6,9),price_table=c(13.5,16.5,19.5))
Therefore, I am using reshape2::melt and reshape2::dcast to build a user-defined function as the following
fun<-function(df,var,group){
r<-reshape2::melt(df,id.vars=var)
r<-reshape2::dcast(r,var~group)
return(r)
}
When I use the function as follows
fun(data,color,object)
I get the following error message
Error in melt_check(data, id.vars, measure.vars, variable.name,
value.name) : object 'color' not found
Do you know how can I solve it? I think that the problem is that I should call the variables in reshape2::melt with quotes but I do not know how.
Note 1: I would like keep the original number format of variables (i.e. objects without decimals and price with one decimal)
Note 2: I would like to remark that that my real code (this is just a simplified example) is much longer and involves dplyr functions (including enquo() and UQ() functions). Therefore the solutions for this case should be compatible with dplyr.
Note 3: I do not use tidyr (I am a big fun of the whole tidyverse) because the current tidyr still use the old language for functions and I share the script with other people that might not be willing to use the development version of tidyr.
We can use dcast from data.table
library(data.table)
dcast(setDT(data), color ~object, value.var = c("units", "price"), FUN = c(length, mean))
I solved the issue by myself (although I do not know very well the reasons behind).
The main problem, as I suspected was passing the variables of the user-defined function in melt and dcast cause some kind of conflict maybe due to the lack of quotes (?).
Anyway I renamed the variables using dplyr::rename so that the names are not anymore depended of variables but characters. Here you can see the final code I am applying:
fun<-function(df,var,group){
enquo_var<-enquo(var)
enquo_group<-enquo(group)
r<-df%>%
reshape2::melt(., id.var=1, variable.name = "parameter")%>%
dplyr::rename(var = UQ(enquo_var))%>%
reshape2::dcast(data=., formula = var~parameter, value.var = "value")
return(r)
}
funx<-fun(data,color,object)
Although I found the solution to my particular problem, I would appreciate very much if someone explains me the reasons behind.
PS: I hope anyway that the new version of tidyr is ready soon to make such tasks easier. Thanks #hadley for your fantastic work.
I have a data frame and I'm trying to obtain all strings in there that begin with "RLF" and put them in a list. I tried using the dlply function in plyr, but I couldn't get the syntax quite right.
dlply(.data = unformatted_table,.variables = 1:ncol(unformatted_table), .fun = strsplit("RLF") ,.inform = TRUE)
I looked around alot and couldn't apply the solutions to my problem. Also, please let me know if there's a better way than using dlply.
I want to pass variables to 'summarize' by way of non-standard-evaluation approach (see http://adv-r.had.co.nz/Computing-on-the-language.html#capturing-expressions).
My script is as follows:
library(dplyr)
library(pryr)
x2<-data.frame(x=runif(1000,1,10),y=rnorm(1:1000))
y2<-group_by(x2,x)
field2<-"x"
z<-substitute(summarize(y2,check=sum(x)),list(x=as.name(field2)))
eval(quote(z),parent.frame())
But the output is not a dataframe as I supposed but a string:
>eval(quote(z),parent.frame())
summarize(y2, check = sum(x))
I am a little bit confused with non-standard-evaluation although I have looked through a number of examples.
Could you specify what is wrong with my approach?
I think this is a simple syntax question but its messing with my brain:
data <- data.frame(y=c(1,1,0,NA,1,1),
iso3=c(rep("USA",3),rep("RUS",3)),
year=rep(1999:2001,2))
I simply want to summarize y by year:
summarized <- by(data$y,data$year,sum)
but without loosing the information in 1999 as happens above. I think this could be done by using sum(,na.rm = TRUE) but if I try that in the code above, sum wants an argument. How can I change the specs of sum and still use it inside by as the function applied to the argument of by? I'm very grateful for any hints or how to's!
p.s.: While I'm grateful for any solution, it would be great if you could give me a solution specific to the 'wrapped functions' problem above as its not the first time I run into this problem and I would like to understand it.
Try
by(data$y,data$year,sum, na.rm=TRUE)
If we are using dplyr
library(dplyr)
data %>%
group_by(year) %>%
summarise(Sum= sum(y, na.rm=TRUE))