R Error: expecting a single value what does it mean? - r

I'm doing a simple operation using dplyr in R and got 'expecting single value' error
test <- data.frame(a=rep("item",3),b=c("step1","step2","step3"))
test%>%group_by(a)%>%(summarize(seq=paste0(b))
I've seen similar threads but those use cases were more complex, and I couldn't figure out why these 2 lines don't work.

Since you only have one group ("item") the paste0 will get a vector of the three items in b as input and will return a vector of three strings, but your summarize is expecting a single value (since there is only one group). You need to collapse the paste0 to a single string like this:
library(dplyr)
test <- data.frame(a=rep("item",3), b=c("step1","step2","step3"))
test %>% group_by(a) %>% summarize(seq = paste0(b, collapse = ""))

Related

In R list ,how to set sub list names

How to set list names ,here is the code as below.
Currently,split_data include two sub list [[1]] and [[2]], how set names separately for them?
I want set name 'A' for [[1]],'B' for [[2]], so can retrieve data use split_data['A']...
Anyone can help on this, thanks ?
for instance ma <- list(a=c('a1','a2'),b=c('b1','b2')) can use ma["a"] for sub list
library(tidyverse)
test_data <- data.frame(category=c('A','B','A','B','A','B','A','B'),
sales=c(1,2,4,5,8,1,4,6))
split_data <- test_data %>% group_split(category)
Others have shown you in the comments how to get what you want using split() instead of group_split(). That seems like the easiest solution.
However, if you're stuck with the existing code, here's an alternative that keeps your current code, and adds the names.
library(tidyverse)
test_data <- data.frame(category=c('A','B','A','B','A','B','A','B'),
sales=c(1,2,4,5,8,1,4,6))
split_data <- test_data %>% group_split(category)
names(split_data) <- test_data %>% group_by(category) %>% group_keys() %>% apply(1, paste, collapse = ".")
The idea is to use group_by to split in the same way group_split does, then extract the keys as a tibble. This will have one row per group, but will have the different variables in separate columns, so I put them together by pasting the columns with a dot as separator. The last expression in the pipe is equivalent to apply(keys, 1, f)
where f is function(row) paste(row, collapse = "."). It applies f to each row of the tibble, producing a single name.
This should work even if the split happens on multiple variables, and produces names similar to those produced by split().

How to interpret column length error from ddplyr::mutate?

I'm trying to apply a function (more complex than the one used below, but I was trying to simplify) across two vectors. However, I receive the following error:
mutate_impl(.data, dots) :
Column `diff` must be length 2 (the group size) or one, not 777
I think I may be getting this error because the difference between rows results in one row less than the original dataframe, per some posts that I read. However, when I followed that advice and tried to add a vector to add 0/NA on the final line I received another error. Did I at least identify the source of the error correctly? Ideas? Thank you.
Original code:
diff_df <- DF %>%
group_by(DF$var1, DF$var2) %>%
mutate(diff = map2(DF$duration, lead(DF$duration), `-`)) %>%
as.data.frame()
We don't need map2 to get the difference between the 'duration' and the lead of 'duration'. It is vectorized. map2 will loop through each element of 'duration' with the corresponding element of lead(duration) which is unnecessary
DF %>%
group_by(var1, var2) %>%
mutate(diff = duration - lead(duration))
NOTE: When we extract the column with DF$duration after the group_by. it is breaking the grouping condition and get the full dataset column. Also, in the pipe, there is no need for dataset$columnname. It should be columnname (However,in certain situations, when we want to get the full column for some comparison - it can be used)

Select and left join: Error: could not find function, Syntax issue? [duplicate]

I have an imported data frame that has column names with various punctuations including parentheses, e.g. BILLNG.STATUS.(COMPLETED./.INCOMPLTE) .
I was trying to use group_by from dplyr to do some summarizing, something like
df <- df %>% group_by(ORDER.NO, BILLNG.STATUS.(COMPLETED./.INCOMPLTE))
which brings the error Error in mutate_impl(.data, dots) :
could not find function "BILLNG.STATUS."
Short of changing the column names, is there a way to handle such column names directly in group_by ?
I think you can make this work if you enclose the "illegal" column names in backticks. For example, let's say I start with this data frame (called df):
BILLING.STATUS.(COMPLETED./.INCOMPLETE) ORDER.VALUE.(USD)
1 A 0.01544196
2 A 0.95522706
3 B 1.13479303
4 B 1.22848285
Then I can summarise it like this:
dat %>% group_by(`BILLING.STATUS.(COMPLETED./.INCOMPLETE)`) %>%
summarise(count=n(),
mean = mean(`ORDER.VALUE.(USD)`))
Giving:
BILLING.STATUS.(COMPLETED./.INCOMPLETE) count mean
1 A 2 0.4853345
2 B 2 1.1816379
Backticks also come in handy for referring to or creating variable names with whitespace. You can find a number of questions related to dplyr and backticks on SO, and there's also some discussion of backticks in the help for Quotes.
I'm just using this not-an-answer as a counter-example or illustration of limitations for the the backtick method. (It was the first strategem I tried. Perhaps it is the fact that two language operations ("(" and "/") are being handled adjacently that makes this fail.)
names(iris)[5] <- "Specie(/)s"
library(dplyr)
by_species <- iris %>% group_by(`Specie(/)s`)
by_species %>% summarise_each(funs(mean(., na.rm = TRUE)))
#Error: cannot modify grouping variable
Tried a variety or other language-oriented efforts with quote, as.name and substitute that also failed. (I wish there were a mechanism to request that this sink to the bottom of the answers.)

R: How to check if multiple columns' values exist in a list

I have a dataframe with columns containing words that make up an ngram. I would like to sum up the number of stopwords in each ngram and add this column to the dataframe but I can't think of an elegant way to do it with multiple values for n (4-grams, 5-grams etc. . .).
So far I have been doing the following:
mutate(Bigram_Counts_By_Company,
stopword_count = (word1 %in% stop_words$word) %>% as.integer() +
(word2 %in% stop_words$word) %>% as.integer())
Now this works but I'd so much rather write a general function that does the same with all columns starting with "name".
What I'd like to do:
mutate(Web_Bigram_Counts_By_Company,
stopword_count = select(Web_Bigram_Counts_By_Company, starts_with("word")) %in% stop_words$word)
select(Web_Bigram_Counts_By_Company, starts_with("word")) works great to select the columns whose names start with 'name', but when I use it in the call to mutate I get this error: Column 'stopword_count' must be length 360463 (the number of rows) or one, not 2
Is this just a simple R fundamentals error or am I going about this wrong?

column name with brackets or other punctuations for dplyr group_by

I have an imported data frame that has column names with various punctuations including parentheses, e.g. BILLNG.STATUS.(COMPLETED./.INCOMPLTE) .
I was trying to use group_by from dplyr to do some summarizing, something like
df <- df %>% group_by(ORDER.NO, BILLNG.STATUS.(COMPLETED./.INCOMPLTE))
which brings the error Error in mutate_impl(.data, dots) :
could not find function "BILLNG.STATUS."
Short of changing the column names, is there a way to handle such column names directly in group_by ?
I think you can make this work if you enclose the "illegal" column names in backticks. For example, let's say I start with this data frame (called df):
BILLING.STATUS.(COMPLETED./.INCOMPLETE) ORDER.VALUE.(USD)
1 A 0.01544196
2 A 0.95522706
3 B 1.13479303
4 B 1.22848285
Then I can summarise it like this:
dat %>% group_by(`BILLING.STATUS.(COMPLETED./.INCOMPLETE)`) %>%
summarise(count=n(),
mean = mean(`ORDER.VALUE.(USD)`))
Giving:
BILLING.STATUS.(COMPLETED./.INCOMPLETE) count mean
1 A 2 0.4853345
2 B 2 1.1816379
Backticks also come in handy for referring to or creating variable names with whitespace. You can find a number of questions related to dplyr and backticks on SO, and there's also some discussion of backticks in the help for Quotes.
I'm just using this not-an-answer as a counter-example or illustration of limitations for the the backtick method. (It was the first strategem I tried. Perhaps it is the fact that two language operations ("(" and "/") are being handled adjacently that makes this fail.)
names(iris)[5] <- "Specie(/)s"
library(dplyr)
by_species <- iris %>% group_by(`Specie(/)s`)
by_species %>% summarise_each(funs(mean(., na.rm = TRUE)))
#Error: cannot modify grouping variable
Tried a variety or other language-oriented efforts with quote, as.name and substitute that also failed. (I wish there were a mechanism to request that this sink to the bottom of the answers.)

Resources