count occurrences in pipe delimited string in dataframe - r

I have a Names column in my dataframe as follows:
Names
steve|chris|jeff
melissa|jo|john
chris|susan|redi
john|fiona|bart
jo|chris|fiona
The entries are pipe delimited. Is there a way to count the occurrences of the names in this column? For example, Chris occurs 3 times. Using a package like "plyr" works when there are only single entries in the column, but not sure about entries that are combined like above.

Related

Is there a R methodology to select the columns from a dataframe that are listed in a separate array

I have a dataframe with over 100 columns. Post implementation of certain conditions, I need a subset of the dataframe with the columns that are listed in a separate array.
The array has 50 entries with 2 columns. The first column has the selected variable names and the second column has some associated values.
I wish to build a new data frame with just the variables mentioned in the the first column of the separate array. Could you please point me as to how to proceed?
Try this:
library(dplyr)
iris <- iris %>% select(contains(dataframe_with_names$names))
In R you can use square brackets [rows, columns] to select specific rows or specific columns. (Leaving either blank selects all).
If you had a vector of column names you wanted to keep called important_columns you could select only those columns with:
myData[,important_columns]
In your case the vector of column names is actually a column in your array. So you select that column and use it as your vector:
myData[, array$names]

R function for simple lookup replacement of excel

I want to extract the values form file 2 to file matching the values in indicated columns. It is a simple lookup function in Excel.
but many solutions given are based on matching column names which I don't want change in my data set.
2 files having a matching column and file2 column to be inserted in file1
As your column names are different in the two data.frames you need to tell merge which columns correspond to each other:
merge(file1, unique(file2[, c("Symbol", "GeneID"))], by.x="UniprotBlastGeneSymbol", by.y="Symbol")
Your result column will be called GeneID, not Column4, of course. If file2 contains gene Ids that are not found in file1 then you may also want all.y=FALSE.

Rename dataframe columns by string matching in R

I am looping through a series of ids, loading 2 csvs for each, and applying some analysis to them. I need rename the columns of one of the 2 csvs to match the row values of the other. I need to do this inside the loop in order to apply it to the csvs for every id.
I have tried renaming the columns like this:
`names(LCC_diff)[2:length(LCC_diff)] <- c("Bare.areas" = "Bare areas",
"Tree." = "Tree ", "Urban.areas" = "Urban areas",
"Water.bodies" = "Water bodies")`
where LCC_diff is a dataframe and the first value in each pair is the original column name and the second is the name that i want to assign to that column, but it just replaces the column names in order, and does not match them.
This is a problem because not all column names need replaced, and the csvs for different ids have these columns in different orders.
How do I match the original column names to the strings that I want to use to replace them?
Try rename them first, it should be much easier if they have the same name.
library(stringr)
str_replace_all(c("Tree ","Bare areas")," ",".")
[1] "Tree." "Bare.areas"

R how to remove rows in a data frame based on the first character of a column

I have a big data frame and I want to remove certain rows from it based on first char of a column being a letter or a number. Sample of my data frame looks like a below:
y<-c('34TA912','JENAR','TEST','34CC515')
z<-('23.12.2015','24.12.2015','24.12.2015','25.12.2015')
abc<-data.frame(y,z)
Based on the sample above. I would like to remove second and third rows due to the value in y column in second row and third row starting with a letter instead of a number. Characters written in Y column could be anything, so only way I could filter is checking the first character without using any predefined value. If I use grep with a character, since other rows also contain letter, I could remove them aswell. Can you assist?
We can use grep. The regex ^ indicates the beginning of the string. We match numeric element ([0-9]) at the beginning of the string in the 'y' column using grep. The output will be numeric index, which we use to subset the rows of the 'abc'.
abc[grep('^[0-9]', abc$y),]
# y z
#1 34TA912 23.12.2015
#4 34CC515 25.12.2015

text matching loop in r

I have 10000 or more texts in one column of a csv file_1.
In another csv file_2 I have some words which I need to search in file_1, and need to record in next column if text contain that words.
need to search all the words in all the texts many a times single text can contains multiple words from file_2, want all the words in next column to text with comma separated.
case matching also can be one challenge, and I want exact match only:
Example:
file_1
File_1
file_2
Disney,
Hollywood
Desired Output:
Desired Output
I assume you will read the files into two separate data frames such as df1 and df2.
You can subset your search values from df2 as needed, or turn it into one large vector to search through using:
df2 <- as.vector(t(df2))
Then create a new column "Match" on df1 using containing the items matched in df2.
for (i in 1:nrow(df1)) {
df1$Match[i] <- paste0(df2[which(df2 %in df1$SearchColumn[i])],collapse = ",")
}
This loops from row 1 to the max number of rows in df1, finds the indices of matches in df2 using the where function and then calls those values and pastes them together separated by a comma. I'm sure someone else can find a way to achieve this without a loop but I hope this works for you.

Resources