why "separate" and "unite" function don´t not work in dplyr

why "separate" and "unite" function don´t not work in dplyr - r

I used the function separate and unite to clean some data but they don´t seem to work
I've been trying to separate a column string into two columns using dplyr. The function is quite easy and I don't know why it does not work.
The variable (column) I want to separate is season which contains values of “MAD_S1, KGA_S1” etc. (thousands of records, but there are 6 categories, all separated by the “_S1”; raw data has been inspected and all follow the same syntax). Therefore, I applied
separate(six_sites_spp, season, c("code_loc","season1"), sep = "_")
I have tried more explicit script such as:
separate(six_sites_spp,
col = "season",
into = c("code_loc", "season1"),
sep = "_")
but nothing either.
I have updated the dplyr versions, and tried several things. If I use unite instead to merge two columns, it does not work either. I resolved this by using the classic paste function, but not for the splitting; I do however want to know why dplyr does not work (this is a great package and for some reason other commands are not working either).
Would anyone be able to provide feedback on this, please? Is it a possible “bug” or something within my system (Windows10, HP envi)? Do I need another package simultaneously (I also use tidyr in the same script)? Any version mismatch (my R version 3.5.1 (2018-07-02)? When I run the code it does something internally, as I see it runs the commands, but the output is the same data frame (i.e. no new variables code_loc, season1.
Many thanks in advance.
*there are no error messages

Since you mention no error message, I assume the function works properly but you simply fail to save the output.
Usually dplyr flows like this:
library(dplyr)
six_sites_spp %>%
separate(season, c("code_loc", "season1"), sep = "_")) %>%
{.} -> six_sites_spp # This saves the changed data frame under the old name
Alternatively, this works as well:
six_sites_spp <- separate(six_sites_spp,season, c("code_loc", "season1"), sep = "_"))
Naturally you could also save the changed data frame under a new name to preserve the original data.

Related

summarize_at() for colums with backslash not working

I have a given df which has some column names that include a "/" (e.g. "Province/State" and "Country/Region").
I want to first group the df by "Country/Region" and then summarize it like this:
confirmed_by_country <- confirmed %>%
group_by("Country/Region") %>%
summarize_at(vars(-Lat, -Long, -"Province/State"), sum)
When I try to run this code it tells me that the column "Province/State" does not exist. I was warned about using this problem but still can't figure out what I am doing wrong.
I am also confused why I am only getting this error for "Province/State" and not "Country/Region" in the group_by() function.
does anyone have an idea what the problem might be? Thanks!

Somehow it made a difference whether I imported the data with read.csv() or read_csv().
It didn't work with read.csv() even if I used backticks but it did when I used read_csv() and backticks. (The column names were also different depending on which one I used.)
If anyone knows why that is, I'm interested!

Is there a way to delete existing variables in my R environment via the terminal? [duplicate]

I would like to remove some data from the workspace. I know the "Clear All" button will remove all data. However, I would like to remove just certain data.
For example, I have these data frames in the data section:
data
data_1
data_2
data_3
I would like to remove data_1, data_2 and data_3, while keeping data.
I tried data_1 <- data_2 <- data_3 <- NULL, which does remove the data (I think), but still keeps it in the workspace area, so it is not fully what I would like to do.

You'll find the answer by typing ?rm
rm(data_1, data_2, data_3)

A useful way to remove a whole set of named-alike objects:
rm(list = ls()[grep("^tmp", ls())])
thereby removing all objects whose name begins with the string "tmp".
Edit: Following Gsee's comment, making use of the pattern argument:
rm(list = ls(pattern = "^tmp"))
Edit: Answering Rafael comment, one way to retain only a subset of objects is to name the data you want to retain with a specific pattern. For example if you wanted to remove all objects whose name do not start with paper you would issue the following command:
rm(list = grep("^paper", ls(), value = TRUE, invert = TRUE))

Following command will do
rm(list=ls(all=TRUE))

In RStudio, ensure the Environment tab is in Grid (not List) mode.
Tick the object(s) you want to remove from the environment.
Click the broom icon.

You can use the apropos function which is used to find the objects using partial name.
rm(list = apropos("data_"))

Use the following command
remove(list=c("data_1", "data_2", "data_3"))

If you just want to remove one of a group of variables, then you can create a list and keep just the variable you need. The rm function can be used to remove all the variables apart from "data". Here is the script:
0->data
1->data_1
2->data_2
3->data_3
#check variables in workspace
ls()
rm(list=setdiff(ls(), "data"))
#check remaining variables in workspace after deletion
ls()
#note: if you just use rm(list) then R will attempt to remove the "list" variable.
list=setdiff(ls(), "data")
rm(list)
ls()

paste0("data_",seq(1,3,1))
# makes multiple data.frame names with sequential number
rm(list=paste0("data_",seq(1,3,1))
# above code removes data_1~data_3

If you're using RStudio, please consider never using the rm(list = ls()) approach!* Instead, you should build your workflow around frequently employing the Ctrl+Shift+F10 shortcut to restart your R session. This is the fastest way to both nuke the current set of user-defined variables AND to clear loaded packages, devices, etc. The reproducibility of your work will increase markedly by adopting this habit.
See this excellent thread on Rstudio community for (h/t #kierisi) for a more thorough discussion (the main gist is captured by what I've stated already).
I must admit my own first few years of R coding featured script after script starting with the rm "trick" -- I'm writing this answer as advice to anyone else who may be starting out their R careers.
*of course there are legitimate uses for this -- much like attach -- but beginning users will be much better served (IMO) crossing that bridge at a later date.

To clear all data:
click on Misc>Remove all objects.
Your good to go.
To clear the console:
click on edit>Clear console.
No need for any code.

Adding one more way, using ls() and remove()
ls() return a vector of character strings giving the names of the objects in the specified environment.
Create a list of objects you want to remove from the environment using ls() and then use remove() to remove it.
remove(list = ls()[ls() != "data"])

You can also use tidyverse
# to remove specific objects(s)
rm(list = ls() %>% str_subset("xxx"))
# or to keep specific object(s)
rm(list = setdiff(ls(), ls() %>% str_subset("xxx")))

Maybe this can help as well
remove(list = c(ls()[!ls() %in% c("what", "to", "keep", "here")] ) )

Unused arguments while renaming columns from dataframe

Hi all,
I am trying to rename the columns from the data frame (protein_df). As from here, the columns 'id' and 'Intensity' are shown to be located inside the data frame. However, the error message indicates that the argument to rename columns is unused. Does anyone have an idea of how this could happen?
Thanks!

When you have that type of "unexplainable" error for dplyr functions, it is usually because there is a conflict between different libraries. So use dplyr::rename and it should be good.

It's best to post your code as something that's copy/pastable text, you can format using backticks.
That error message means that the first arguments in rename() don't exist. I'm not sure if this is your goal, but my best guess is that you have the rename arguments backwards. Judging from the first print of your dataframe head(protien_df), id and intensity are already the column names, so they need to go first in your rename():
protein_df %>%
rename(Intensity = Protien_intensity,
id = Protien_group_IDs)
You can still pipe in the rename() bit to your read_tsv and save it to that df.

When you download the base and dplyr library together this may happen, I figured out the solution for this. Addition of dplyr:: will help you.
protein_df %>% dplyr::rename(Intensity = Protien_intensity, id = Protein_group_IDs)

How to Create Excel Pivot Table to R

I want to create a pivot table from my data set in excel to R. I have been following this tutorial on how to do this: http://excel2r.com/pivot-tables-in-r-basic-pivot-table-columns-and-metrics/ . I have used the codes mentioned in this tutorial by replacing it with my own data variables, but I keep getting an error message noting: Error: select() doesn't handle lists.
What does this error message mean and how I can I fix this?
The R-Script I have been using from the tutorial is:
library(dplyr)
library(tidyr)
pivot <- df %>%
select(Product.Category, Region, Customer.Segment, Sales)%>%
group_by(Product.Category, Region, Customer.Segment) %>%
summarise(TotalSales = sum(Sales))
Thank you in advance for the help!

By your error message: "select() doesn't handle lists.", I supose that your object called df isn't a dataframe.
Maybe you have a dataframe inside a list.
Try this in your R console:
class(df)
If the class is a list, you need take off the dataframe from the list. You can do this by the position. Probably in the first position. df[[1]]
The functions that you are using, works only for dataframes in general. (And tibbles, that is a another type of dataframe)
Like this example:
I hope it works for you.
And, for the next time, try to make an reproducible example.
You could at least print your dataframe original, before try to use these functions, that way I could help you efficiently.

plyr rename function not working

I can't figure out why this version of plyr's rename function isn't working.
I have a dataframe where I have a single column that ends up being named seq(var_slcut_trucknumber_min, var_slcut_trucknumber_max) because I made it like this:
df_metbal_slcut <- as.data.frame(seq(var_slcut_trucknumber_min,var_slcut_trucknumber_max))
The terms var_slcut_trucknumber_min and var_slcut_trucknumber_max are defined as the min and max of another column.
However, when trying to rename it by the following code,
var_temp <- names(df_metbal_slcut)
df_metbal_slcut <- rename(df_metbal_slcut, c(var_temp="trucknumber"))
I get an error as follows:
The following `from` values were not present in `x`: var_temp
I don't understand why. I know that I can easily do this as colnames(df_metbal_slcut)[1] <- "trucknumber", but I'm an R n00b, and I was looking at a data manipulation tutorial that said that learning plyr was the way to go, so here I am stuck on this.

Try this instead:
df_metbal_slcut <- rename(df_metbal_slcut, setNames("trucknumber",var_temp))
The reason it wasn't working was that c(var_temp = "trucknumber") creates a named vector with the name var_temp, which is not what you were intending. When creating named objects using the tag = value syntax, R won't evaluate variables. It assumes that you literally want the name to be var_temp.
More broadly, it might make sense to name the column more sensibly when initially creating the data frame again using setNames.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

why "separate" and "unite" function don´t not work in dplyr - r

Related

summarize_at() for colums with backslash not working

Is there a way to delete existing variables in my R environment via the terminal? [duplicate]

Unused arguments while renaming columns from dataframe

How to Create Excel Pivot Table to R

plyr rename function not working

Categories

Resources