I am using googledrive to get information about a list of files on a shared drive, and would like to unnest(drive_resource) into columns for purpose of exploring the data.
When I do so, I receive an error. Appears to be something about the class of nested list I am trying to unnest as columns. Any suggestions?
library(dplyr)
library(tidyr)
library(googlesheets)
df <- drive_find(team_drive = "my_team_drive")
unnest(df, drive_resource)
Error in as_tibble.dribble(output, .name_repair = "minimal") :
unused argument (.name_repair = "minimal")
Turns out, there is a bug fix in dev for the .name_repair issue.
A quick-and-dirty solution appears below, improvements welcome.
df2 <- df %>%
map_dfr(
.x = .$drive_resource,
.f = ~ unlist(.x) %>% enframe() %>% spread(name, value)
) %>%
bind_cols(select(df, name:id))
If you only want the top level of that list object, the below is simpler. Especially good for team drives with a lot of users, since list of permissionIds gets turned into as many columns as you have users. Just call unnest_wider() for each of (parents, spaces, lastModifyingUser, capabilities, permissionIds, exportLinks, imageMediaMetadata, videoMediaMetadata) that you want to see information about.
Below courtesy of #JennyBryan
df2 <- df %>%
select(drive_resource) %>%
unnest_wider(drive_resource)
Related
I work with the arrow dataset to reduce the RAM usage but I met with the following problem.
I need to remove duplicate rows. With dplyr I can do it using distinct() but this function doesn't supported in Arrow.
Any ideas?
Following to recommendations I wrote the following code
Sales_2021 <- Sales_2021 %>%
group_by(`Cust-Item-Loc`) %>%
arrange(desc(SBINDT)) %>%
distinct(`Cust-Item-Loc`, .keep_all = TRUE) %>%
collect()
and got the Error message
Error: `distinct()` with `.keep_all = TRUE` not supported in Arrow
How can I slice the first rows?
The advice with filter(!duplicate()) is not working as well.
Sales_2021 <- Sales_2021 %>%
group_by(`Cust-Item-Loc`) %>%
arrange(desc(SBINDT)) %>%
filter(!duplicated(`Cust-Item-Loc`)) %>%
collect()
Error message
Error: Filter expression not supported for Arrow Datasets: !duplicated(`Cust-Item-Loc`)
Call collect() first to pull data into R.
I have a tibble tb to which I want to apply two tidyverse functions: select to remove two columns and drop_na to remove the NAs from a third column, like so:
tb %>%
select(-col1, -col2) %>%
drop_na(col3)
However, Autocomplete for the column names only works in the select function, not in drop_na. If I switch the functions, Autocomplete works for drop_na and select.
tb %>%
drop_na(col3) %>%
select(-col1, -col2)
It seems therefore that drop_na must be the first function for Autocomplete to work. Is that a bug or a feature?
I don't understand your problem because for me it works :
library(tidyverse)
data <- as.tibble(data.frame(X = rnorm(100),
Y = rnorm(100)))
data %>%
select(X) %>%
drop
Subsetting and then binding works as expected
var <- c("wt", "mpg")
mtcars %>% select(!!!var) -> df1
mtcars %>% select(!!!var) -> df2
bind_rows(df1, df2)
But if we skip intermediate steps
bind_rows(
mtcars %>% select(!!!var),
mtcars %>% select(!!!var)
)
it fails with Error: only lists can be spliced
This is a bug in rlang that has to do with value splicing. All functions taking dots support splicing, even if they are not quoting their input. This is handy because you don't have to use do.call() with these functions when you have a list of arguments, you can just splice the list.
The mechanism is a bit different for technical reasons. There's currently a bug and value-splicing instead of call-splicing is used within the select() call. This should be fixed shortly.
I never use !! or !!! because there is often something that goes wrong.
Instead, I use UQ. I don't know if it's good practice, but it works.
bind_rows(
UQ(mtcars %>% select(var)),
UQ(mtcars %>% select(var))
)
I am having a hard time manipulating a tibble output that I receive after piping (using dplyr pipe %>%) a data frame through a series of steps. This code below returns a 2 x 3 tibble ouput:
sr_df %>% group_by(ResolutionViolated) %>% tally() %>% arrange(desc(n)) %>% mutate(total = sum(n))
This gives me a count of service requests that are and aren't violated (or simply put, late). This is well and good, but I want to be able to manipulate this same tibble further without having to save the tibble as an object.
Why? Because this way, I can filter my data frame (sr_df) before this piping operations, by company/account, priority, and other factors. I am able to filter with an if function, but this filter will not have an impact on the newly created tibble object. So I am looking to do something like this:
sr_df %>% group_by(ResolutionViolated) %>% tally() %>% arrange(desc(n)) %>% mutate(total = sum(n)) %>% round(tibble[1,2]/tibble$total*100, digits = 2)
I am an R and Coding Noob. Don't hold back - I just want to learn; learn quick and learn right. Any answers are appreciated. Thank you!
I have looked at this: R: Further subset a selection using the pipe %>% and placeholder
but I don't think I get it.
In your caase, you can further manipulate the tibble you have generated using dplyr functions.
Note the existence of mutate_at and summarize_at, that lets you transform a set of columns with the option to select them by column position.
This, using . as a placeholder for the tibble you are currently manipulating, and calling an anonymous function inside mutate_at, will give you the result you expect.
sr_df %>%
group_by(ResolutionViolated) %>%
tally() %>%
arrange(desc(n)) %>%
mutate(total = sum(n)) %>%
mutate_at(.cols = c(1, 2),
.funs = function(column) round(column / .$total * 100, digits = 2))
I can summarise a data frame with dplyr like this:
mtcars %>%
group_by(cyl) %>%
summarise(mean(mpg))
To convert the output back to class data.frame, my current approach is this:
as.data.frame(mtcars %>%
group_by(cyl) %>%
summarise(mean(mpg)))
Is there any way to get dplyr to output a class data.frame without having to use as.data.frame?
As was pointed out in the comments you might not need to convert it since it might be good enough that it inherits from data frame. If that is not good enough then this still uses as.data.frame but is slightly more elegant:
mtcars %>%
group_by(cyl) %>%
summarise(mean(mpg)) %>%
ungroup %>%
as.data.frame()
ADDED I just read in the comments that the reason you want this is to avoid the truncation of printed output. In that case just define this option, possibly in your .Rprofile file:
options(dplyr.print_max = Inf)
(Note that you can still hit the maximum defined by the "max.print" option associated with print so you would need to set that one too if it's also too low for you.)
Update: Changed %.% to %>% to reflect changes in dplyr.
In addition to what G. Grothendieck mentioned above, you can convert it into a new dataframe:
new_summary <- mtcars %>%
group_by(cyl) %>%
summarise(mean(mpg)) %>%
as.data.frame()