How to replace entries in a column in r - r

I'm brand new to r, and coding, and am playing around with a dataset. I have what I think should be a really straight-forward problem, but I can't figure it out, and haven't found any other code that will work.
I have a tibble with several columns. In column "RelationshipTypeCd" there are some values of "PT." I would like to change all of these to "PT" (essentially removing the period).
I am working in R studio, and have loaded the tidyverse.
Thanks!

You could use sub here:
dt$RelationshipTypeCd <- sub("^PT\\.$", "PT", dt$RelationshipTypeCd, fixed=TRUE)

dt$RelationshipTypeCd <- ifelse(stringr::str_detect(dt$RelationshipTypeCd, 'PT'), 'PT', dt$RelationshipTypeCd)

Thanks for the replies.
I eventually found a solution:
tibble %>%
mutate(RelationshipTypeCd = replace(RelationshipTypeCd, RelationshipTypeCd == "PT.", "PT"))
I'm not really sure I understand all the arguments there, however. It appears to be using replace inside of mutate, and I don't know why the column "RelationshipTypeCd" is listed twice in the arguments for replace.

Related

summarize_at() for colums with backslash not working

I have a given df which has some column names that include a "/" (e.g. "Province/State" and "Country/Region").
I want to first group the df by "Country/Region" and then summarize it like this:
confirmed_by_country <- confirmed %>%
group_by("Country/Region") %>%
summarize_at(vars(-Lat, -Long, -"Province/State"), sum)
When I try to run this code it tells me that the column "Province/State" does not exist. I was warned about using this problem but still can't figure out what I am doing wrong.
I am also confused why I am only getting this error for "Province/State" and not "Country/Region" in the group_by() function.
does anyone have an idea what the problem might be? Thanks!
Somehow it made a difference whether I imported the data with read.csv() or read_csv().
It didn't work with read.csv() even if I used backticks but it did when I used read_csv() and backticks. (The column names were also different depending on which one I used.)
If anyone knows why that is, I'm interested!

Unused arguments while renaming columns from dataframe

Hi all,
I am trying to rename the columns from the data frame (protein_df). As from here, the columns 'id' and 'Intensity' are shown to be located inside the data frame. However, the error message indicates that the argument to rename columns is unused. Does anyone have an idea of how this could happen?
Thanks!
When you have that type of "unexplainable" error for dplyr functions, it is usually because there is a conflict between different libraries. So use dplyr::rename and it should be good.
It's best to post your code as something that's copy/pastable text, you can format using backticks.
That error message means that the first arguments in rename() don't exist. I'm not sure if this is your goal, but my best guess is that you have the rename arguments backwards. Judging from the first print of your dataframe head(protien_df), id and intensity are already the column names, so they need to go first in your rename():
protein_df %>%
rename(Intensity = Protien_intensity,
id = Protien_group_IDs)
You can still pipe in the rename() bit to your read_tsv and save it to that df.
When you download the base and dplyr library together this may happen, I figured out the solution for this. Addition of dplyr:: will help you.
protein_df %>% dplyr::rename(Intensity = Protien_intensity, id = Protein_group_IDs)

why "separate" and "unite" function don´t not work in dplyr

I used the function separate and unite to clean some data but they don´t seem to work
I've been trying to separate a column string into two columns using dplyr. The function is quite easy and I don't know why it does not work.
The variable (column) I want to separate is season which contains values of “MAD_S1, KGA_S1” etc. (thousands of records, but there are 6 categories, all separated by the “_S1”; raw data has been inspected and all follow the same syntax). Therefore, I applied
separate(six_sites_spp, season, c("code_loc","season1"), sep = "_")
I have tried more explicit script such as:
separate(six_sites_spp,
col = "season",
into = c("code_loc", "season1"),
sep = "_")
but nothing either.
I have updated the dplyr versions, and tried several things. If I use unite instead to merge two columns, it does not work either. I resolved this by using the classic paste function, but not for the splitting; I do however want to know why dplyr does not work (this is a great package and for some reason other commands are not working either).
Would anyone be able to provide feedback on this, please? Is it a possible “bug” or something within my system (Windows10, HP envi)? Do I need another package simultaneously (I also use tidyr in the same script)? Any version mismatch (my R version 3.5.1 (2018-07-02)? When I run the code it does something internally, as I see it runs the commands, but the output is the same data frame (i.e. no new variables code_loc, season1.
Many thanks in advance.
*there are no error messages
Since you mention no error message, I assume the function works properly but you simply fail to save the output.
Usually dplyr flows like this:
library(dplyr)
six_sites_spp %>%
separate(season, c("code_loc", "season1"), sep = "_")) %>%
{.} -> six_sites_spp # This saves the changed data frame under the old name
Alternatively, this works as well:
six_sites_spp <- separate(six_sites_spp,season, c("code_loc", "season1"), sep = "_"))
Naturally you could also save the changed data frame under a new name to preserve the original data.

Difficulties adding data to R dataset

I'm not too advanced with R so any help would be appreciated. I am trying to add values to columns in my dataset and my dataset is called 'katie'.
For example, in the column 'word' I'd like to select instances where 'SUBJECTED' is written and then post 'middle' in the column 'pre.environment', on the same line as 'SUBJECTED' is written. Is there something that I am doing wrong? With this code, the initial line definitely works (as I can see how many "SUBJECTED" items are recognized in the column 'word') but nothing happens when I enter the second line of code.
>x=grep("SUBJECTED", katie$word)
>katie[x,]$pre.environment= c('middle')
I hope this example is sufficient. Thanks in advance for your help.
Try the following code, if I understand your question correctly,
katie$pre.environment <- ifelse(grepl("SUBJECTED", katie$word),
yes = "middle",
no = katie$pre.environment)

R: avoid repeating $

I'm new here and new to R and I think I have a simple question but don't know how to name it so I can't find any help by searching the web.
I have a data set and want to form a new Data set with several variables from the first one.
The working code looks like this:
em.table2 <- data.frame(em.table$item1,em.table$item2,...[here are some more]...,em.table$item22)
In order to keep it more simple, I want to get rid of the "em.table$"-construction in front of every variable... unfortunately i don't know the function to do so...
I tried it like this, but it didn't work (and is a pretty embarrasing try i guess):
em.table2 <- data.frame(em.table$(item1,item2,item3,item4))
Anyone here to help? Thanks a lot!
Instead of the $ operator, try the following:
em.table2 <- em.table[,c("item1","item2","item3","item4")]
Try with
em.table2 <- with(em.table, data.frame(item1, item2, item3, item4))
But if you just want to subset the data, there are better solutions.

Resources