Extract data between two pattern occurrences in dataframe column R - r

I am using R to perform some data manipulation. I want to extract all rows between 2 occurrences of a pattern. I have attached the dataframe image.
I want to extract all rows starting from 'edu-hist-mark' to 'objectives-mark' using "mark" as a pattern. But I am not sure how to achieve that. Appreciate any help.
Thanks.
EDIT:
After some manipulation , here is the data frame :
Enter code here
Data<- data.frame(class_name = c("edu-hist-mark","date","date","educational","qualif","date","date","educational","qualif","role","company","objectives-mark","additional-info-hobby-mark","nominal"),
text_val=c("EDUCATION AND QUALIFICATIONS:",2000,2003,"ILLINOIS INSTITUTE OF TECHNOLOGY","Master of Science,Computer Science",1999,2000,"MAHARASHTRA INSTITUTE OF TECHNOLOGY","Bachelor of Science","Mechanical Engineering","Enterprise Solution Architect","Liaison Technologies","SUMMARY:,PUBLICATIONS:","Abhay Daftari"))

In code below, I find the indices of the instances where your first column contains the pattern, "mark", and then subset the dataset to find all rows between the first and the second instance of that pattern. If there are more than two instances of that pattern, you can change the index to reflect how the data should be subsetted. Hope this helps!
Data[c(c(as.list(which(grepl("mark", Data$class_name)))[[1]]:as.list(which(grepl("mark", Data$class_name)))[[2]])), ]

Related

Can not merge data with another dataset

I have a dataset that is like this: list
df
200000
5666666
This dataset continues to 5551
Another dataset has also 5551 observations. I want to merge list dataset with another dataset. But no variable is the same. Just row names are the same.
I gave that
merge(list,df,by="rownames")
The error message is that it should have a valid column name
I tried also merge_all but not work
It is not working? Could someone please help
It's good practice to be more precise with the naming of your dataframe variables. I wouldn't use list but something like df_description. Either way, merging by rownames can be achieved by using by = "row.names" or by = 0. You can read more on merge() in the documentation (under "Details").

how to remove specific character from rows of dataframe in R

I have a data frame containing three columns and first column is Species_Name which contain all species name but i want to remove those rows which are undetermined like "Salmonella sp" and want to keep only those rows which have full determined name like Salmonella enterica or bongori and so on. I tried following code but its not working. please give any suggestions.
dfcox1 <- dfcox1 %>%
filter(Species_Name != "Salmonella sp")
Welcome on stackoverflow.com! Please create reproducible examples so that other people have it easier to help you, which is especially easy when working with GNU R.
If you want to remove a row in a dataframe according to a specific regular expression (e.g. the row name ending with sp), you can do so as follows):
iris %>%
dplyr::filter(!stringr::str_detect(Species, "sp"))

Assigning Unnamed Columns To Another DataFrame

I'm in a very basic class that introduces R for genetic purposes. I'm encountering a rather peculiar problem in trying to follow the instructions given. Here is what I have along with the instructor's notes:
MangrovesRaw<-read.csv("C:/Users/esteb/Documents/PopGen/MangrovesSites.csv")
#i'm going to make a new dataframe now, with one column more than the mangrovesraw dataframe but the same number of rows.
View(MangrovesRaw)
Mangroves<-data.frame(matrix(nrow = 528, ncol = 23))
#next I want you to name the first column of Mangroves "pop"
colnames(Mangroves)<-c(col1="pop")
#i'm now assigning all values of that column to be 1
Mangroves$pop<-1
#assign the rest of the columns (2 to 23) to the entirety of the MangrovesRaw dataframe
#then change the names to match the mangroves raw names
colnames(Mangroves)[2:23]<-colnames(MangrovesRaw)
I'm not really sure how to assign columns that haven't been named used the $ as we have in the past. A friend suggested I first run
colnames(Mangroves)[2:23]<-colnames(MangrovesRaw)
Mangroves$X338<-MangrovesRaw
#X338 is the name of the first column from MangrovesRaw
But while this does transfer the data from MangrovesRaw, it comes at the cost of having my column names messed up with X338. added to every subsequent column. In an attempt to modify this I found the following "fix"
colnames(Mangroves)[2:23]<-colnames(MangrovesRaw)
Mangroves$X338<-MangrovesRaw[,2]
#Mangroves$X338<-MangrovesRaw[,2:22]
#MangrovesRaw has 22 columns in total
While this transferred all the data I needed for the X338 Column, it didn't transfer any data for the remaining 21 columns. The code in # just results in the same problem of having X388. show up in all my column names.
What am I doing wrong?
There are a few ways to solve this problem. It may be that your instructor wants it done a certain way, but here's one simple solution: just cbind() the Mangroves$pop column with the real data. Then the data and column names are already added.
Mangroves <- cbind(Mangroves$pop, MangrovesRaw)
Here's another way:
Mangroves[, 2:23] <- MangrovesRaw
colnames(Mangroves)[2:23] <- colnames(MangrovesRaw)

Remove multiple rows from a list of names in R (a list of 187 names to remove)?

I have a data frame in R containing over 29,000 rows. I need to remove multiple rows using only a list of names (187 names).
My dataset is about airlines, and I need to remove specific airlines from my data set that contains over 200 types of airlines. My first column contains all airline names, and I need to remove the entire row for those specific airlines.
I singled out all airline names that I want removed by this code: transmute(a_name_remove, airline_name). This gave me a table of all names of airlines that I want removed, now I have to remove that list of names from my original dataset named airlines.
I know there is a way to do this manually, which is: mydata[-c("a", "b"), ], for example. But writing out each name would be hectic.
Can you please help me by giving me a way to use the list that I have to forwardly remove those rows from my dataset?
I cannot write out each name on its own.
I also tried this: airlines[!(row.names(airlines) %in% c(remove)), ], in which I made my list "removed" into a data frame and as a vector, then used that code to remove it from my original dataset "airlines", still did not work.
Thank you!
You can create a function that negates %in%, e.g.
'%not_in%' <- Negate('%in%')
so per your code, it should look like this
airlines[row.names(airlines) %not_in% remove, ]
additionally, I do not recommend using remove as a variable name, since it is a base function in R, if possible rename the variable, e.g. discard_airlines ,
airlines[row.names(airlines) %not_in% discard_airlines, ]

How to traverse two columns and fill in another column at the same index with a string in R?

I have a data frame with a column of strings that are the question body of a survey, then I have a separate data frame with those question bodies matched two a question number. I want to traverse the original data frame's column and check if the value matches any within the other data frame and if does I want to store the associated question number in a column in the original df. I am having a lot of trouble figuring this out, I have looked into using apply() or something like that but I can't quite get it. Any help would be greatly appreciated.
If df is the first dataframe and df2 the second and Q is the name of the question strings column, then:
library (dplyr)
left_join(df1, df2, by=question_body) %>% select(-question_body)
Of course, it would be easier to give you an accurate answer if you provided some actual examples of your data structure.

Resources