Fixing a multiple warning "unknown column" - r

I have a persistent multiple warning of "unknown column" for all types of commands (e.g., str(x) to installing updates on packages), and not sure how to debug this or fix it.
The warning "unknown column" is clearly related to a variable in a tbl_df that I renamed, but the warning comes up in all kinds of commands seemingly unrelated to the tbl_df (e.g., installing updates on a package, str(x) where x is simply a character vector).

This is an issue with the Diagnostics tool in RStudio (the tool that shows warnings and possible mistakes in your code). It was partially fixed at this commit in RStudio v1.1.103 or later by #kevin-ushey. That fix was partial, because the warnings still appeared (albeit with less frequency). This issue was reported with a reproducible example at https://github.com/rstudio/rstudio/issues/7372 and it was fixed on RStudio v1.4 pull request.
Update to the latest RStudio release to fix this issue. Alternatively, there are several workarounds available, choose the solution you prefer:
Disable the code diagnostics for all files in Preferences/Code/Diagnostics
Disable all diagnostics for a specific file:
Add at the beginning of the opened file(s):
# !diagnostics off
Then save the files and the warnings should stop appearing.
Disable the diagnostics for the variables that cause the warning
Add at the beginning of the opened file(s):
# !diagnostics suppress=<comma-separated list of variables>
Then save the files and the warnings should stop appearing.
The warnings appear because the diagnostics tool in RStudio parses the source code to detect errors and when it performs the diagnostic checks it accesses columns in your tibble that are not initialized, giving the Warning we see. The warnings do not appear because you run unrelated things, they appear when the RStudio diagnostics are executed (when a file is saved, then modified, when you run something...).

I have been encountering the same problem, and although I don't know why it occurs, I have been able to pin down when it occurs, and thus prevent it from happening.
The issue seems to be with adding in a new column, derived from indexing, in a base R data frame vs. in a tibble data frame. Take this example, where you add a new column (age) to a base R data frame:
base_df <- data.frame(id = c(1:3), name = c("mary", "jill","steve"))
base_df$age[base_df$name == "mary"] <- 47
That works without returning a warning. But when the same is done with a tibble, it throws a warning (and consequently, I think causing the weird, seemingly unprovoked, multiple warning issue):
library(tibble)
tibble_df <- tibble(id = c(1:3), name = c("mary", "jill","steve"))
tibble_df$age[tibble_df$name == "mary"] <- 47
Warning message:
Unknown column 'age'
There are surely better ways of avoiding this, but I have found that first creating a vector of NAs does the job:
tibble_df$age <- NA
tibble_df$age[tibble_df$name == "mary"] <- 47

I have faced this issue when using the "dplyr" package.
For those facing this problem after using the "group_by" function in the "dplyr" library:
I have found that ungrouping the variables solves the unknown column warning problem. Sometimes I have had to iterate through the ungrouping several times until the problem is resolved.

Converting the class into data.frame solved the problem for me:
library(dplyr)
df <- data.frame(id = c(1,1:3), name = c("mary", "jo", "jill","steve"))
dfTbl <- df %>%
group_by(id) %>%
summarize (n = n())
class(dfTbl) # [1] "tbl_df" "tbl" "data.frame"
dfTbl = as.data.frame(dfTbl)
class(dfTbl) # [1] "data.frame"
Borrowed the partial script from #adts

I had this problem when dealing with tibble and lapply functions together. The tibble seemed to save things as a list inside the dataframe.
I solved it by using unlist before adding the results of an lapply function to the tibble.

I ran into this problem too except through a tibble created using a dyplyr block. Here's slight modification of sabre's code to show how I came to the same error.
library(dplyr)
df <- data.frame(id = c(1,1:3), name = c("mary", "jo", "jill","steve"))
t <- df %>%
group_by(id) %>%
summarize (n = n())
t
str(t)
t$newvar[t$id==1] <- 0

I know this is an old thread, but I just encountered the same problem when loading a spatial vector in geopackage format with the package sf. Using as_tibble=FALSE worked for me. The file was loaded as an sp object but everything still worked fine. As mentioned by #sabre, trying to force an object into a tibble seems to be making the problems while trying to index a column that was not anymore there.

Let's say I wanted to select the following column(s)
best.columns = 'id'
For me the following gave the warning:
df%>% select_(one_of(best.columns))
While this worked as expected, although, as far as I know dplyr, this should be identical.
df%>% select_(.dots = best.columns)

I get these warnings when I rename a column using dplyr::rename after reading it using the readr package.
The old name of the column is not renamed in the spec attribute. So removing the the spec attribute makes the warnings go away. Also removing the "spec_tbl_df" class seems like a good idea.
attr(dat, "spec") <- NULL
class(dat) <- setdiff(class(dat), "spec_tbl_df")

Building on the answer by #stok ( https://stackoverflow.com/a/47848259/7733418 ), who found this problem when using group_by (which also converts your data.frame to a tibble), and solved it in the same way.
For me the problem was ultimately due to the use of "slice()".
Slice() converted my data.frame to a tibble, causing this error.
Checking the class of your data.frame and re-converting it to a data.frame whenever a function converts it to a tibble could solve this issue.

Related

Radlibrary for Facebook Ads as_tibble function not working "Column `percentage` not found in `.data`"

I'm using the Radlibrary package in R and have used it several times. Now I want to update my data on Facebook Ads but when running the as_tibble function to convert the data I have in class paginated_adlib_data_response I'm being met with the error message: Error: Problem with `mutate()` input `percentage`. x Column `percentage` not found in `.data`
Last time I used the API and Radlibrary package was back in May. I don't know if it's the dplyr package that has changed and now producing the problem or if Facebook has changed in it's dataformat. The problem only arises for demographic and regional data - the 'ad' part of the data still works fine with the as_tibble function.
Does anyone know the answer to this or perhaps know another way of converting the "paginated_adlib_data_response" into a data.frame format or something similar?
My code looks like this:
query_dem <- adlib_build_query(ad_reached_countries = 'DK',
ad_active_status = 'ALL',
search_page_ids = ID_media$page_id,
fields = "demographic_data")
result_dem <- adlib_get_paginated(query_dem, max_gets = 100)
tibble_dem <- as_tibble(result_dem, type = "demographic") # This is where the error is produced```
Best,
Mads

Why wont replace_na actually replace the missing values using dplyr and piping?

I have been struggling a lot recently with the replace_na() function when cleaning my data. I have two complementary variables and I want to use one variable (varname2) to supply the missing values for the other (varname1). I've been trying the following:
df %>%
replace_na(varname = varname2)
In response I keep getting the error:
Did you misspecify an argument?
Run `rlang::last_error()` to see where the error occurred.
> df <- df %>%
+ replace_na(varname1= varname2)
Error: 1 components of `...` were not used.
We detected these problematic arguments:
* `varname1`
Suggestions for an efficient way to fix this?
I found a blog response elsewhere in which Hadley himself said they wanted to move away from replace_na() toward a more SQL adjacent command coalesce(). The solution involves both across() and coalesce().
Here's an example of what I just did in my work:
df %>%
mutate(across(varname1, coalesce, varname2))
It seems to have worked like a charm.

Issue with summary() function in R

I am new to programming and trying to learn R using swirl.
In one of the exercises I was told to use the summary function on a dataset. However I encountered a discrepancy in the way the summary was printed:
Instead of summarizing the categorical variable values, it instead says something about length, class and mode.
I went around searching for why this might be happening to no avail, but I did manage to find what the output is supposed to look like:
Any help would be greatly appreciated!
This behaviour is due to the option stringsAsFactors, which is FALSE by default on R 4. Previously it was TRUE by default:
From R 4 news: "now uses a `stringsAsFactors = FALSE' default, and hence by default no longer converts strings to factors in calls to data.frame() and read.table()."
A way to return to the previous behaviour with the same code is to run options(stringsAsFactors=T) before building dataframes. However, there is a warning saying this option will eventually be removed, as explained here.
For your new code, you can use the stringsAsFactors parameter, for instance data.frame(..., stringsAsFactors=T).
If you already have dataframes and you want to convert them, you could use this function to convert all character variables (you will have to adapt if only some variables need conversion):
to.factors <- function(df) {
i <- sapply(df, is.character)
df[i] <- lapply(df[i], as.factor)
df
}

tidyselect changes how to refer an external vector of variable names in selecting functions in R

I have started to recieve a warning when using selecting functions within tidyverse packages.
Example:
library(dplyr)
set.seed(123)
df = data.frame(
"id" = c(rep("G1", 3), rep("G2", 4), rep("G3", 3)),
"total" = sample.int(n = 10),
"C1" = sample.int(n=10),
"C2" = sample.int(n=10),
"C3" = sample.int(n=10))
cols.to.sum = c("C1", "C2")
df.selected = df %>%
dplyr::select(total, cols.to.sum)
Giving:
Note: Using an external vector in selections is ambiguous.
i Use `all_of(cols.to.sum)` instead of `cols.to.sum` to silence this message.
i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
This message is displayed once per session.
It does not warning if I refactor to:
df.selected = df %>%
dplyr::select(total, all_of(cols.to.sum))
This behaviour has changed from tidyselect_0.2.5 to tidyselect_1.0.0. There was no warning untill now.
On documentation about this change (https://tidyselect.r-lib.org/reference/faq-external-vector.html) it is stated that this is just a warning but it will turn into an error in the future.
My question here is how to deal such a change regarding the existing code.
Should I refactor every single line of code that uses this selection method to add the all_of() to external vector reference? That sounds something hard to accomplish when there might be hundreds of pieces in code where a selection has been made this way (it also affects to other functions such as
summarise_at for example).
Would the only alternative be to stick to tidyselect_0.2.5 to keep running code working?
What is the way to go on changes like this in a package regarding the existing code?
Thanks
If should is the operative phrase in your first question then it might just be a matter of ensuring that none of your variables are named cols.to.sum. So long as this is the case, the attributes of using all_of() are not going to be relevant to your use case and you can keep selecting as usual.
If you don't want to stick to using an older version of tidyselect the suppress library might be helpful

Error in scoreItems function in psych package: Columns not found

A student of mine asked me the following question:
I was working through this exercise. Whenever I try this function from the psych package:
scoreItems(meta.bfi[,v$big5], ccases[,meta.bfi$name])
It comes up with this error:
Error: Columns `2`, `3`, `4`, `5`, `1`, `6`, `7`, `8`, `9`, `10`, `13`,
`14`, `15`, `11`, `12`, `16`, `17`, `18`, `19`,
`20`, `21`, `23`, `24`, `22`, `25` not found
What is causing it?
It seems that scoreItems is incompatible with tibbles (at least as of version 1.8.4).
If you convert the key to a pure data.frame, this should fix the problem:
meta.bfi <- as.data.frame(meta.bfi)
That said, scoreItems does expect row names, so in the future, you may need to add row.names again if tibble completely removes row.names from it's functionality (currently, they are deprecated).
Further background
The exercise is based on ProjectTemplate, and in version 0.8.2, ProjectTemplate began defaulting to converting data.frames to tibbles (https://tibble.tidyverse.org/ ) . While tibbles are similar to data.frames, they are not identical. They do cause issues with some functions that expect a pure data.frame.
There is discussion about whether this conversion to tibbles will be optional in a future release of ProjectTemplate:
https://github.com/KentonWhite/ProjectTemplate/issues/271

Resources