summarize_at() for colums with backslash not working - r

I have a given df which has some column names that include a "/" (e.g. "Province/State" and "Country/Region").
I want to first group the df by "Country/Region" and then summarize it like this:
confirmed_by_country <- confirmed %>%
group_by("Country/Region") %>%
summarize_at(vars(-Lat, -Long, -"Province/State"), sum)
When I try to run this code it tells me that the column "Province/State" does not exist. I was warned about using this problem but still can't figure out what I am doing wrong.
I am also confused why I am only getting this error for "Province/State" and not "Country/Region" in the group_by() function.
does anyone have an idea what the problem might be? Thanks!

Somehow it made a difference whether I imported the data with read.csv() or read_csv().
It didn't work with read.csv() even if I used backticks but it did when I used read_csv() and backticks. (The column names were also different depending on which one I used.)
If anyone knows why that is, I'm interested!

Related

Unused arguments while renaming columns from dataframe

Hi all,
I am trying to rename the columns from the data frame (protein_df). As from here, the columns 'id' and 'Intensity' are shown to be located inside the data frame. However, the error message indicates that the argument to rename columns is unused. Does anyone have an idea of how this could happen?
Thanks!
When you have that type of "unexplainable" error for dplyr functions, it is usually because there is a conflict between different libraries. So use dplyr::rename and it should be good.
It's best to post your code as something that's copy/pastable text, you can format using backticks.
That error message means that the first arguments in rename() don't exist. I'm not sure if this is your goal, but my best guess is that you have the rename arguments backwards. Judging from the first print of your dataframe head(protien_df), id and intensity are already the column names, so they need to go first in your rename():
protein_df %>%
rename(Intensity = Protien_intensity,
id = Protien_group_IDs)
You can still pipe in the rename() bit to your read_tsv and save it to that df.
When you download the base and dplyr library together this may happen, I figured out the solution for this. Addition of dplyr:: will help you.
protein_df %>% dplyr::rename(Intensity = Protien_intensity, id = Protein_group_IDs)

Novice Trying To Make Sense Of Another Person's R Code

I'm an undergrad trying to essentially recreate someone else's research, and cannot for the life of me right now make sense of this following line of code:
temp_data[, fips := paste0(sprintf("%02d", STATEFP), sprintf("%03d", COUNTYFP))]
temp_data was fread from a csv, and is a "data.table" "data.frame" which I'm reading as either or...
The error message that started all of this:
Error in paste0("%02d", STATEFP) : object 'STATEFP' not found
I've looked into both paste0, and sprintf, and am currently thinking that the line of code is trying create STATEFP, and COUNTYFP from temp_data using paste0 after sprintf interprets the fips code however it needs to...
Here is what the temp_data looks like:
screenshot
Any suggestions that can help me to figure out what's going on here would be greatly appreciated. I'm using R 4.0.1 on/with x86_64-apple-darwin17.0 if that helps any.
Thank you for the screenshot that was very helpful.
sprintf essentially returns a vector containing both text and variable values. It looks like STATEFP and COUNTRYFP must have been defined earlier in the code, most likely vectors. This line of code uses these vectors to filter the data in some way, but I cannot say how without knowing what STATEFP and COUNTRYFP are.

How to Create Excel Pivot Table to R

I want to create a pivot table from my data set in excel to R. I have been following this tutorial on how to do this: http://excel2r.com/pivot-tables-in-r-basic-pivot-table-columns-and-metrics/ . I have used the codes mentioned in this tutorial by replacing it with my own data variables, but I keep getting an error message noting: Error: select() doesn't handle lists.
What does this error message mean and how I can I fix this?
The R-Script I have been using from the tutorial is:
library(dplyr)
library(tidyr)
pivot <- df %>%
select(Product.Category, Region, Customer.Segment, Sales)%>%
group_by(Product.Category, Region, Customer.Segment) %>%
summarise(TotalSales = sum(Sales))
Thank you in advance for the help!
By your error message: "select() doesn't handle lists.", I supose that your object called df isn't a dataframe.
Maybe you have a dataframe inside a list.
Try this in your R console:
class(df)
If the class is a list, you need take off the dataframe from the list. You can do this by the position. Probably in the first position. df[[1]]
The functions that you are using, works only for dataframes in general. (And tibbles, that is a another type of dataframe)
Like this example:
I hope it works for you.
And, for the next time, try to make an reproducible example.
You could at least print your dataframe original, before try to use these functions, that way I could help you efficiently.

why "separate" and "unite" function don´t not work in dplyr

I used the function separate and unite to clean some data but they don´t seem to work
I've been trying to separate a column string into two columns using dplyr. The function is quite easy and I don't know why it does not work.
The variable (column) I want to separate is season which contains values of “MAD_S1, KGA_S1” etc. (thousands of records, but there are 6 categories, all separated by the “_S1”; raw data has been inspected and all follow the same syntax). Therefore, I applied
separate(six_sites_spp, season, c("code_loc","season1"), sep = "_")
I have tried more explicit script such as:
separate(six_sites_spp,
col = "season",
into = c("code_loc", "season1"),
sep = "_")
but nothing either.
I have updated the dplyr versions, and tried several things. If I use unite instead to merge two columns, it does not work either. I resolved this by using the classic paste function, but not for the splitting; I do however want to know why dplyr does not work (this is a great package and for some reason other commands are not working either).
Would anyone be able to provide feedback on this, please? Is it a possible “bug” or something within my system (Windows10, HP envi)? Do I need another package simultaneously (I also use tidyr in the same script)? Any version mismatch (my R version 3.5.1 (2018-07-02)? When I run the code it does something internally, as I see it runs the commands, but the output is the same data frame (i.e. no new variables code_loc, season1.
Many thanks in advance.
*there are no error messages
Since you mention no error message, I assume the function works properly but you simply fail to save the output.
Usually dplyr flows like this:
library(dplyr)
six_sites_spp %>%
separate(season, c("code_loc", "season1"), sep = "_")) %>%
{.} -> six_sites_spp # This saves the changed data frame under the old name
Alternatively, this works as well:
six_sites_spp <- separate(six_sites_spp,season, c("code_loc", "season1"), sep = "_"))
Naturally you could also save the changed data frame under a new name to preserve the original data.

plyr rename function not working

I can't figure out why this version of plyr's rename function isn't working.
I have a dataframe where I have a single column that ends up being named seq(var_slcut_trucknumber_min, var_slcut_trucknumber_max) because I made it like this:
df_metbal_slcut <- as.data.frame(seq(var_slcut_trucknumber_min,var_slcut_trucknumber_max))
The terms var_slcut_trucknumber_min and var_slcut_trucknumber_max are defined as the min and max of another column.
However, when trying to rename it by the following code,
var_temp <- names(df_metbal_slcut)
df_metbal_slcut <- rename(df_metbal_slcut, c(var_temp="trucknumber"))
I get an error as follows:
The following `from` values were not present in `x`: var_temp
I don't understand why. I know that I can easily do this as colnames(df_metbal_slcut)[1] <- "trucknumber", but I'm an R n00b, and I was looking at a data manipulation tutorial that said that learning plyr was the way to go, so here I am stuck on this.
Try this instead:
df_metbal_slcut <- rename(df_metbal_slcut, setNames("trucknumber",var_temp))
The reason it wasn't working was that c(var_temp = "trucknumber") creates a named vector with the name var_temp, which is not what you were intending. When creating named objects using the tag = value syntax, R won't evaluate variables. It assumes that you literally want the name to be var_temp.
More broadly, it might make sense to name the column more sensibly when initially creating the data frame again using setNames.

Resources