how to remove specific character from rows of dataframe in R - r

I have a data frame containing three columns and first column is Species_Name which contain all species name but i want to remove those rows which are undetermined like "Salmonella sp" and want to keep only those rows which have full determined name like Salmonella enterica or bongori and so on. I tried following code but its not working. please give any suggestions.
dfcox1 <- dfcox1 %>%
filter(Species_Name != "Salmonella sp")

Welcome on stackoverflow.com! Please create reproducible examples so that other people have it easier to help you, which is especially easy when working with GNU R.
If you want to remove a row in a dataframe according to a specific regular expression (e.g. the row name ending with sp), you can do so as follows):
iris %>%
dplyr::filter(!stringr::str_detect(Species, "sp"))

Related

Can not merge data with another dataset

I have a dataset that is like this: list
df
200000
5666666
This dataset continues to 5551
Another dataset has also 5551 observations. I want to merge list dataset with another dataset. But no variable is the same. Just row names are the same.
I gave that
merge(list,df,by="rownames")
The error message is that it should have a valid column name
I tried also merge_all but not work
It is not working? Could someone please help
It's good practice to be more precise with the naming of your dataframe variables. I wouldn't use list but something like df_description. Either way, merging by rownames can be achieved by using by = "row.names" or by = 0. You can read more on merge() in the documentation (under "Details").

How to traverse two columns and fill in another column at the same index with a string in R?

I have a data frame with a column of strings that are the question body of a survey, then I have a separate data frame with those question bodies matched two a question number. I want to traverse the original data frame's column and check if the value matches any within the other data frame and if does I want to store the associated question number in a column in the original df. I am having a lot of trouble figuring this out, I have looked into using apply() or something like that but I can't quite get it. Any help would be greatly appreciated.
If df is the first dataframe and df2 the second and Q is the name of the question strings column, then:
library (dplyr)
left_join(df1, df2, by=question_body) %>% select(-question_body)
Of course, it would be easier to give you an accurate answer if you provided some actual examples of your data structure.

How do I make a new data frame with variables in two columns matching each other, using DPLYR?

this is my very first time posting a question here. If anything I'm asking about is vague or unclear / I forgot to add extra information for context, feel free to let me know, thank you.
MY QUESTION:
I just made a data frame with multiple columns. How do I code for a new data frame that matches two rows with the same variables, and excludes all rows where the variables I want don't match? (along with any other column I want from the previous screenshot)?
SCREENSHOTS OF MY CURRENT DATA FRAME: ONE
, TWO (This isn't the entire data frame since the list is huge, just parts of it.) Notice how each state has multiple 'counties' under it.
THIS IS AN EXAMPLE OF WHAT I WANT MY FINAL DATA FRAME TO LOOK LIKE. In my new data frame, I want to exclude all rows where Location name does not match State name (so I will get rid of all counties and anything that isn't the State name).
e.g. I want to code for a new data frame where I will California = California, while also excluding rows without matching variables such as California = San Juan County
I want to code all of this using DPLYR.
Thank you!
If I understand your somewhat vague question well:
library(dplyr)
df%>%filter(column1==column2)
Assuming you don't have NA's in your numeric data, if so turn them to 0 before executing below code
library(dplyr)
new_df = df %>% filter(any_drinking.state == any_drinking.location) %>%
mutate(both_sexes_2012 = any_drinking.females_2012+any_drinking.males_2012,
diff = any_drinking.males_2012-any_drinking.females_2012) %>%
rename(females_2012 = any_drinking.females_2012,males_2012 = any_drinking.males_2012,
state = any_drinking.state, location = any_drinking.location)

Extract data between two pattern occurrences in dataframe column R

I am using R to perform some data manipulation. I want to extract all rows between 2 occurrences of a pattern. I have attached the dataframe image.
I want to extract all rows starting from 'edu-hist-mark' to 'objectives-mark' using "mark" as a pattern. But I am not sure how to achieve that. Appreciate any help.
Thanks.
EDIT:
After some manipulation , here is the data frame :
Enter code here
Data<- data.frame(class_name = c("edu-hist-mark","date","date","educational","qualif","date","date","educational","qualif","role","company","objectives-mark","additional-info-hobby-mark","nominal"),
text_val=c("EDUCATION AND QUALIFICATIONS:",2000,2003,"ILLINOIS INSTITUTE OF TECHNOLOGY","Master of Science,Computer Science",1999,2000,"MAHARASHTRA INSTITUTE OF TECHNOLOGY","Bachelor of Science","Mechanical Engineering","Enterprise Solution Architect","Liaison Technologies","SUMMARY:,PUBLICATIONS:","Abhay Daftari"))
In code below, I find the indices of the instances where your first column contains the pattern, "mark", and then subset the dataset to find all rows between the first and the second instance of that pattern. If there are more than two instances of that pattern, you can change the index to reflect how the data should be subsetted. Hope this helps!
Data[c(c(as.list(which(grepl("mark", Data$class_name)))[[1]]:as.list(which(grepl("mark", Data$class_name)))[[2]])), ]

Modifying All Values in Data Frame Column With tolower()

I have a data frame and the first column (called company) is filled with various company names, but the case isn't uniform. One might be phillips while another is Phillips, and a third phiLlips. I want to make them standardized - all lower case, so I tried mutate(), with replace() and tolower():
library(readr)
library(dplyr)
library(tidyr)
refine_original <- read_csv("~/Desktop/refine_original.csv")
# cleaning up the company names to match (case)
refine_original %>% mutate(company=replace(company, length(company),
tolower(company)))
print(refine_original$company)
When I print the column contents though, it looks like I didn't make the changes stick to the data frame, they remain unchanged. Any input or help would be much appreciated, thanks!

Resources