I want to substitute a variable instead of a string in the %like% function from DescTools package. What I want to do with it after is to have a loop where the variable changes value and I get a different results.
I've tried a few ways but can't get it working.
Here is my sample code:
library(DescTools)
library(dplyr)
x <- c(1,2,3,4,5,6)
y <- c("a","b","c","a","a","a")
df <- data.frame(x = x, y = y)
df
Here is what I get if I seach for "a" in the x column. This is the desired output.
df %>% filter(y %like% "%a%")
# desired output
> df %>% filter(y %like% "%a%")
x y
1 1 a
2 4 a
3 5 a
4 6 a
Now I want to create a variable which will hold the value I want to search
# create a variable which will take out the value I'm looking for
let <- '"%a%"'
If I use that variable in place of the string, I get either no result or the wrong result.
Is there any way for me to use a variable insetead of a string?
#not working
df %>% filter(y %like% let)
> df %>% filter(y %like% let)
[1] x y
<0 rows> (or 0-length row.names)
#not working
df %>% filter(y %like% cat(let))
> df %>% filter(y %like% cat(let))
"%a%" x y
1 1 a
2 2 b
3 3 c
4 4 a
5 5 a
6 6 a
Option 1: Evaluate the variable.
df %>% filter(y %like% eval(parse(text = let)))
Option 2: Take advantage of the filter_ function in dplyr.
df %>% filter_(paste0("y %like%", let))
Edit: actually, the comments are better answers because it's less convoluted---it was the quote level that was the problem.
Related
A small sample of my dataset looks something like this:
x <- c(1,2,3,4,1,7,1)
y <- c("A","b","a","F","A",".A.","B")
data <- cbind(x,y)
My goal is to first group data that have the same number together and then followed by the same name together (A,a,.A. are considered as the same name for my case).
In other words, the final output should look something like this:
xnew <- c(1,1,3,7,1,2,4)
ynew <- c("A","A","a",".A.","B","b","F")
datanew <- cbind(xnew,ynew)
Currently, I am only able to group by number in the column labelled x. I am unable to group by name yet. I would appreciate any help given.
Note: I need an automated solution as my raw dataset contains over 10,000 lines for the x and y columns.
Assuming what you have is a dataframe data <- data.frame(x,y) and not a matrix which is being generated with cbind you could combine different values into one using fct_collapse and then arrange the data by this new column (z) and x value.
library(dplyr)
library(forcats)
data %>%
mutate(z = fct_collapse(y,
"A" = c('A', '.A.', 'a'),
"B" = c('B', 'b'))) %>%
arrange(z, x) %>%
select(-z) -> result
result
# x y
#1 1 A
#2 1 A
#3 3 a
#4 7 .A.
#5 1 B
#6 2 b
#7 4 F
Or you can remove all the punctuations from y column, make them into upper or lower case and then arrange.
data %>%
mutate(z = toupper(gsub("[[:punct:]]", "", y))) %>%
arrange(z, x) %>%
select(-z) -> result
result
library(dplyr)
data %>%
as.data.frame() %>%
group_by(x, y) %>%
summarise(records = n()) %>%
arrange(x, y)
According to your question it's just a matter of ordering data.
result <- data[order(data$x, data$y),]
or considering that you wan to collate A a .A.
result <- data[order(data$x, toupper(gsub("[^A-Za-z]","",data$y))),]
I'm attempting to replace empty values in column z based on the values in column x.
I've used filter() to narrow down to the rows of importance, and apply mutate() afterwards, but the mutate values are not replaced in the original dataframe. I can store it as a new dataframe, but merging afterwards would be a considerable headaches as this is happening across dozens of conditionals.
make dummy data
xx <- data.frame(x = c(1,2,3), y = c("a","","c"), z=c(5,5,""))
xx %>% filter(x == 3) %>% # filter to value of interest
filter(z == "") %>% # filter to NA values to be replaced
mutate(z = replace(z, z =="", 5) ) # mutate to replace NA value
if i do:
xx <- xx %>% filter(x == 3) %>% # filter to value of interest
filter(z == "") %>% # filter to NA values to be replaced
mutate(z = replace(z, z =="", 5) ) # mutate to replace NA value
then only the single row is stored...
I'm looking for a way to keep all of the other dataframe data but replace the mutated data.
Feels like it should be a quick fix, but been stuck on it for a while..
You can use an ifelse() statement within dplyr::mutate().
df <- data.frame(x=sample(1:10,100,T),
y=sample(c(NA,1:5),100,T))
df %>% mutate(y=ifelse(is.na(y),x,y))
x y
1 7 7
2 10 3
3 7 1
4 7 1
5 10 4
6 3 3
...
I have some troubles using the pipe operator (%>%) with the unique function.
df = data.frame(
a = c(1,2,3,1),
b = 'a')
unique(df$a) # no problem here
df %>% unique(.$a) # not working here
# I got "Error: argument 'incomparables != FALSE' is not used (yet)"
Any idea?
As other answers mention : df %>% unique(.$a) is equivalent to df %>% unique(.,.$a).
To force the dots to be explicit you can do:
df %>% {unique(.$a)}
# [1] 1 2 3
An alternative option from magrittr
df %$% unique(a)
# [1] 1 2 3
Or possibly stating the obvious:
df$a %>% unique()
# [1] 1 2 3
What is happening is that %>% takes the object on the left hand side and feeds it into the first argument of the function by default, and then will feed in other arguments as provided. Here is an example:
df = data.frame(
a = c(1,2,3,1),
b = 'a')
MyFun<-function(x,y=FALSE){
return(match.call())
}
> df %>% MyFun(.$a)
MyFun(x = ., y = .$a)
What is happening is that %>% is matching df to x and .$a to y.
So for unique your code is being interpreted as:
unique(x=df, incomparables=.$a)
which explains the error. For your case you need to pull out a before you run unique. If you want to keep with %>% you can use df %>% .$a %>% unique() but obviously there are lots of other ways to do that.
I'd like to use $ at the end of a magrittr/tidyverse pipeline. $ works directly next to tidyverse functions like read_csv and filter, but as soon I create a pipeline with %>% it raises an error. Here is a simple reproducible example.
# Load libraries and create a dummy data file
library(dplyr)
library(readr)
write_csv(data_frame(x=c(0,1), y=c(0,2)), 'tmp.csv')
# This works
y <- read_csv('tmp.csv')$y
str(y)
# This also works
df_y <- read_csv('tmp.csv')
y <- filter(df_y, y > 0)$y
str(y)
# This does not work
y <- read_csv('tmp.csv') %>% filter(y > 0)$y
My questions are:
1) What are the underlying explanations/mechanics for why using $ at the end of a pipepline does not work?
2) What's a best practice way for what I am trying to accomplish? Specifically, to get a vector as the end result of a pipeline?
It does not work because it thinks that the function is $, not filter, and tries to run:
"$"(., filter(y > 0), y)
which, of course, makes no sense.
Suppose DF is as shown below. Then any of the subsequent lines of code work as expected:
DF <- data.frame(y = seq(-3, 3))
DF %>% filter(y > 0) %>% "$"(y)
## [1] 1 2 3
DF %>% { filter(., y > 0)$y }
## [1] 1 2 3
DF %>% filter(y > 0) %>% "[["("y")
## [1] 1 2 3
library(magrittr) # supplies extract2 as an alias for [[
DF %>% filter(y > 0) %>% extract2("y")
## [1] 1 2 3
question 1: I think the problem is grouping. Enclose most of that statement in parentheses, and it produce the same result as your first two approaches:
y <- (read_csv('tmp.csv') %>% filter(y > 0))$y
question 2: the newish function dplyr::pull() is my preference for pulling out a single vector, instead of returning an entire data.frame.
read_csv('tmp.csv') %>%
filter(y > 0) %>%
dplyr::pull(y)
The older way was to treat the data.frame as a list, and pull out a single element. The dot on the last line is magrittr syntax for the output of a pipe.
read_csv('tmp.csv') %>%
filter(y > 0) %>%
.[["y"]]
I am trying to convert the following idiom to use it in a magrittr functional sequence:
x[!is.na(x)]
x is any vector.
Update:
x %>% extract(!is.na(.))
That one is close, but still the operations ! and is.na are not used in functional sequence. I look for something like:
x %>% extract(x %>% is.na %>% `!`)
All operations should be separated.
Using dplyr you could do:
x <- c(1,NA,NA,2,NA,3)
library(dplyr)
data.frame(x) %>% filter(!is.na(.))
Which gives:
# x
#1 1
#2 2
#3 3
Or as mentionned by Khashaa in the comments
library(magrittr)
x %>% extract(!is.na(.))
Which gives:
#[1] 1 2 3