using grep with count_if (EXPSS package in R)

using grep with count_if (EXPSS package in R) - r

I'm trying to count instances where a certain string appears in a dataframe (this will be a sub-string, i.e. "blue" will appear within a larger block of text), and then summarize those counts by another field. Here's the code:
totals_by_county <- county_data %>%
group_by(county_data$county)%>%
summarise(number_occurences = count(grepl('blue', county_data$color,ignore.case = TRUE)))
totals_by_county
And I get this error:
no applicable method for 'summarise_' applied to an object of class "logical"
Is there a way to do this in the method I'm trying to use above? Thanks in advance!

With grepl:
totals_by_county <- county_data %>%
group_by(county) %>%
summarise(number_occurences = sum(grepl('blue', color, ignore.case = TRUE)))
or, with count_if from expss:
totals_by_county <- county_data %>%
group_by(county) %>%
summarise(number_occurences = count_if(perl('blue', ignore.case = TRUE), color))
UPDATE with reproducible example:
library(dplyr)
library(expss)
county_data = data.frame(
county = c("A", "A", "A", "B", "B"),
color = c("blue-blue", "red", "orange-blue", "yellow", "green"),
stringsAsFactors = FALSE)
county_data %>%
group_by(county) %>%
summarise(number_occurences = count_if(perl('blue', ignore.case = TRUE), color))
# A tibble: 2 x 2
# county number_occurences
# <chr> <int>
# 1 A 2
# 2 B 0

Related

Vectorization to extract and bind very nested data

I have some very nested data. Within my list-column-dataframes, there are some pieces I need to put together and I've done so in a single instance to get my desired dataframe:
a <- df[[2]][["result"]]#data
b <- df[[2]][["result"]]#coords
desired_df <- cbind(a, b)
My original Large list has 171 elements, meaning I have 1:171 (3.3 GB) to go inside those square brackets and would ideally end up with 171 desired dataframes (which I would then bind all together).
I haven't needed to write a loop in 10 years, but I don't see a tidyverse way to deal with this. I also no longer know how to write loops. There are definitely some elements in there that are junk and will fail.

You haven't provided any sort of minimal example of the data.
I've condensed it to mean something like this
base_data <- data.frame(group = c("a", "b", "c"), var1 = c(3, 1, 2),
var2 = c( 2, 4, 8))
base_data2 = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
)
rownames(base_data2) = c("d", "e", "f")
methods::setClass(
"weird_object",
slots = c(data = "data.frame", coords = "matrix"),
prototype = list(data = base_data, coords = base_data2)
)
df <- list(
list(
result = new("weird_object")
),list(
result = new("weird_object")
),list(
result = new("weird_object")
),list(
result = new("weird_object")
)
)
And if I had such a list with these objects, then I could do
df %>%
map(. %>% {
list(data = .$result#data,
cooords = .$result#coords)
}) %>%
enframe() %>%
unnest_wider(value)
But the selecting / hoisting function might fail, thus
one can wrap it in a purrr::possibly, and
choose a reasonable default:
df %>%
map(possibly(. %>% {
list(data = .$result#data,
cooords = .$result#coords)
},
otherwise = list(data = NA, coords = NA))) %>%
enframe() %>%
unnest_wider(value)
Hopefully, this could be a step forward.
Next step is probably something resembling this:
df %>%
map(. %>% {
list(data = .$result#data,
coords = .$result#coords)
}) %>%
enframe() %>%
unnest_wider(value) %>%
mutate(coords = coords %>% map(. %>% as_tibble(rownames = "rowid"))) %>%
unnest(cols = c(data, coords)) %>%
#' rotating the thing now
pivot_longer(cols = c(group, rowid),
names_to = "var_name",
values_to = "var") %>%
select(-var_name) %>%
pivot_longer(cols = c(var1, var2, V1, V2, V3),
names_to = "var_name") %>%
pivot_wider(names_from = var, values_from = value) %>%
identity()

If I understand your data structure, which I probably don't, you could do:
library(tidyverse)
# Create dummy data
df <- mtcars
df$mpg <- list(result = I(list('test')))
df$mpg$result <- list("#data" = I(list('your data')))
df <- df %>% select(mpg, cyl)
df1 <- df
df2 <- df
# Pull data you're interested in.
# The index is 1 here, instead of 2, because it's fake data and not your data.
# Assuming the # is not unique, and is just parsed from JSON or some other format.
dont_at_me <- function(x){
a <- x[[1]][["result"]][["#data"]]
a
}
# Get a list of all of your data.frames
all_dfs <- Filter(function(x) is(x, "data.frame"), mget(ls()))
# Vectorize
purrr::map(all_dfs, ~dont_at_me(.))

Using ifelse() conditionals to create cell_spec() tooltips in KableExtra

I would like to use a conditional to add specific tooltips to certain cells in Kable. I have tried the following in an example below:
col1 <- c("A", "A*","B**")
col2 <- c("A**", "B", "C")
col3 <- c("A*", "B*", "C*")
Test <- data.frame(col1,col2,col3)
Test
Test %>%
mutate_at(vars("col1":"col3"), ~ cell_spec(
., "html",
tooltip = ifelse(. =="A*"|.=="B*"|.=="C*"|.=="D*", "Satisfactory to 22\u00B0C",
ifelse(. == "A**"|.=="B**"|.=="C**"|.=="D**","Satisfactory to 48\u00B0C", )))) %>%
kable(format = "html", escape = FALSE) %>%
kable_styling(full_width = FALSE,
bootstrap_options = c("striped","responsive", "hover"))
I would like to add a the tooltip "Satisfactory to 22\u00B0C" to every observation with one asterisk (e.g. A*, B*, C*), and "Satisfactory to 48\u00B0C" to every observation with two asterisks (A**, B**, C**). I would also like to leave the other data alone. Currently, I can only get this to work if I include a tooltip for all of the FALSE observations at the end of the ifelse statement. I tried setting the "else" arguement to NULL, but this did not work. Any help would be greatly appreciated, as I am very rusty when it comes to conditionals.

Sure, you can just use an empty string ""
library(kableExtra, include.only = NULL)
library(dplyr, include.only = "%>%")
col1 <- c("A", "A*","B**")
col2 <- c("A**", "B", "C")
col3 <- c("A*", "B*", "C*")
Test <- data.frame(col1,col2,col3)
Test
#> col1 col2 col3
#> 1 A A** A*
#> 2 A* B B*
#> 3 B** C C*
Test %>%
dplyr::mutate_at(
.vars = dplyr::vars("col1":"col3"),
.funs = ~kableExtra::cell_spec(
x = .,
format = "html",
tooltip = ifelse(test = . =="A*"|.=="B*"|.=="C*"|.=="D*",
yes = "Satisfactory to 22\u00B0C",
no = ifelse(test = . == "A**"|.=="B**"|.=="C**"|.=="D**",
yes = "Satisfactory to 48\u00B0C",
no = "")))) %>%
kableExtra::kable(format = "html", escape = FALSE) %>%
kableExtra::kable_styling(full_width = FALSE,
bootstrap_options = c("striped","responsive", "hover"))

set names with magrittr where both name and value are variable of data.frame?

Lets say i have the following data:
> data.frame(value = 1:2, name = c("a", "b"))
value name
1 1 a
2 2 b
Goal:
Can i give it as Input to the pipe Operator and "send" it to setNames (or magrittr::set_names)?
What i have tried:
library(magrittr)
data.frame(value = 1:2, name = c("a", "b")) %>%
setNames(object = .$value, nm = .$name)
That doesnt work i guess, because the pipe wants to Hand over the whole data.frame and use it as a first Argument. That got me interested if i can skip this behaviour and use two subsets instead.
(So that data.frame(value = 1:2, name = c("a", "b")) %>% is fixed and not replaced by a variable).
Desired Output:
How it would look like without the pipe Operator:
> a <- data.frame(value = 1:2, name = c("a", "b"))
> setNames(object = a$value, nm = a$name)
a b
1 2

For this case, we can simply wrap it inside {}
library(dplyr)
data.frame(value = 1:2, name = c("a", "b")) %>%
{ setNames(object = .$value, nm = .$name)}
With tidyverse, there is also a deframe which will give a named vector
library(tibble)
data.frame(value = 1:2, name = c("a", "b")) %>%
select(2:1) %>%
deframe
#a b
#1 2

Superscripting a variable over another when building tables in R and knitr

I am trying to build a table, and one of my variables should have another variable superscriptet after it. I can find several related answers here on SO, but they all involve fixed values that need to be superscriptet, instead of vectors as in my case.
Also most examples involve plot legends and not tables like in my case (Although I don't think that makes much of a difference).
Example data:
library(tidyverse)
library(knitr)
df <- crossing(
X = seq(1:2),
Y = c("A", "B"))
df
# A tibble: 4 x 2
X Y
<int> <chr>
1 1 A
2 1 B
3 2 A
4 2 B
I would like to mutate a new variable that is just X with Y values superscriptet after it.
Here is what I have tried (Doesn't work):
df %>% mutate(
New = paste0(X, "^Y")) %>%
kable()
df %>% mutate(
New = paste0(X, ^{Y})) %>%
kable()
df %>% mutate(
New = paste0(X, bquote(^~{.Y}~))) %>%
kable()
Any help appreciated.

You could use tableHTML:
df <- data.frame(
X = seq(1:2),
Y = c("A", "B"))
library(dplyr)
library(tableHTML)
You can slightly modify X with the HTML tag <sup> to display Y as a superset:
df %>%
mutate(X = paste0(X, "<sup>", Y, "</sup>")) %>%
select(X) %>%
tableHTML(rownames = FALSE,
escape = FALSE,
widths = 50)
Edit
As pointed out by Steen, this also works with knitr:
df %>%
mutate(X = paste0(X, "<sup>", Y, "</sup>")) %>%
select(X) %>%
knitr::kable(escape = FALSE)

Is it for a pdf output?
Because in this case the following could work:
library(tidyverse)
library(knitr)
df <- crossing(
X = seq(1:2),
Y = c("A", "B"))
df %>% mutate(
New = paste0(X, "\\textsuperscript{", Y, "}")) %>%
kable(escape = FALSE)
Using escape = FALSE to add LaTeX inside the table.

dplyr: Build set of items in list column

I want a column that tracks which items are included in a set based on a predicate. It seems like I should be able to do this with some combination of the purrr accumulate function and the dplyr lead/lag and union/setdiff functions.
This is probably best expressed as a reprex:
input_df <- dplyr::data_frame(user = c("1", "1", "1", "1"),
item = c("a", "b", "a", "a"),
include = c(TRUE, TRUE, FALSE, TRUE))
output_df <- dplyr::data_frame(user = c("1", "1", "1", "1"),
set = list(
c("a"),
c("a", "b"),
c("b"),
c("a", "b")))
Edit: I'm very close. I need to find a way of finding the "bag difference" (instead of the set difference) between vectors in case a user includes, excludes and then re-includes an item.
numbered_input_df <- input_df %>%
mutate(id = row_number())
include_df <- numbered_input_df %>%
filter(include == TRUE) %>%
mutate(include_set = purrr::accumulate(item, c)) %>%
select(user, id, include_set)
exclude_df <- numbered_input_df %>%
filter(include == FALSE) %>%
mutate(exclude_set = purrr::accumulate(item, c)) %>%
select(user, id, exclude_set)
numbered_input_df %>%
left_join(include_df) %>%
left_join(exclude_df) %>%
fill(include_set, exclude_set) %>%
mutate(set = map2(include_set, exclude_set, ~.x[! .x %in% .y]))

Define Update which takes the union or setdiff of the basket with the ith item and use Reduce to apply it to each i. Use ave to do all that by user. No packages are used.
Update <- function(basket, i) with(input_df[i, ],
(if (include) union else setdiff)(basket, item)
)
n <- nrow(input_df)
reduce_user <- function(ix) Reduce(Update, init = NULL, ix, accumulate = TRUE)[-1]
transform(input_df["user"], set = I(ave(as.list(1:n), user, FUN = reduce_user)))
giving:
user set
1 1 a
2 1 a, b
3 1 b
4 1 b, a
Alternately, translating the above to dplyr and purrr and making use of Update from above we get the code below.
library(dplyr)
library(purrr)
input_df %>%
mutate(ix = 1:n()) %>%
group_by(user) %>%
mutate(set = accumulate(ix, Update, .init = NULL)[-1]) %>%
ungroup %>%
select(user, set)
(Note that the only use of purrr is accumulate and that could easily be replaced with Reduce if you want to reduce dependencies.)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

using grep with count_if (EXPSS package in R) - r

Related

Vectorization to extract and bind very nested data

Using ifelse() conditionals to create cell_spec() tooltips in KableExtra

set names with magrittr where both name and value are variable of data.frame?

Superscripting a variable over another when building tables in R and knitr

dplyr: Build set of items in list column

Categories

Resources