R function for simple lookup replacement of excel - r

I want to extract the values form file 2 to file matching the values in indicated columns. It is a simple lookup function in Excel.
but many solutions given are based on matching column names which I don't want change in my data set.
2 files having a matching column and file2 column to be inserted in file1

As your column names are different in the two data.frames you need to tell merge which columns correspond to each other:
merge(file1, unique(file2[, c("Symbol", "GeneID"))], by.x="UniprotBlastGeneSymbol", by.y="Symbol")
Your result column will be called GeneID, not Column4, of course. If file2 contains gene Ids that are not found in file1 then you may also want all.y=FALSE.

Related

How to extract a common column from multiple tsv files and combine them into one dataframe in R?

I want to extract a common column named "framewise_displacement" from 162 tsv files arranged by subject ID numbers (eg., sub-CC123_timeseries.tsv, sub-CC124_timeseries.tsv, etc) with different number of columns and same number of rows, and merge them into a single dataframe.
The new dataframe is desired to have the columns to be the "framewise_displacement" from different subjects files with subject ID along, and the rows to be the same from the original files.
I tried to use vroom function in R, but failed because the files have different number of columns.
Also tried this code, but the output stacked all the columns into 1 single columns.
files = fs::dir_ls(path = "Documents/subject_timeseries", glob = "*.tsv")
merged_df <- map_df(files, ~vroom(.x, col_select=c(framewise_displacement)))
What should I do to merge them into one dataframe with the desired column side by side?
Any suggestions would be appreciated.
Many thanks!!!

CSV including column zero R

I have a DataFrame in R full of unique values from unique rownames that come from different files. Therefore, I have a column zero with the name of the file where the column name came from and a column 1 with unique rownames.
col0 col1
path/file1 name
path/file1 age
path/file2 color
path/file3 tree
path/file3 house
I want to export this as a CSV, but I haven't been able to write.csv including column0.
I have also tried mutating column 0 as a column, but that doesn't work either.
Any ideas?

Rename dataframe columns by string matching in R

I am looping through a series of ids, loading 2 csvs for each, and applying some analysis to them. I need rename the columns of one of the 2 csvs to match the row values of the other. I need to do this inside the loop in order to apply it to the csvs for every id.
I have tried renaming the columns like this:
`names(LCC_diff)[2:length(LCC_diff)] <- c("Bare.areas" = "Bare areas",
"Tree." = "Tree ", "Urban.areas" = "Urban areas",
"Water.bodies" = "Water bodies")`
where LCC_diff is a dataframe and the first value in each pair is the original column name and the second is the name that i want to assign to that column, but it just replaces the column names in order, and does not match them.
This is a problem because not all column names need replaced, and the csvs for different ids have these columns in different orders.
How do I match the original column names to the strings that I want to use to replace them?
Try rename them first, it should be much easier if they have the same name.
library(stringr)
str_replace_all(c("Tree ","Bare areas")," ",".")
[1] "Tree." "Bare.areas"

text matching loop in r

I have 10000 or more texts in one column of a csv file_1.
In another csv file_2 I have some words which I need to search in file_1, and need to record in next column if text contain that words.
need to search all the words in all the texts many a times single text can contains multiple words from file_2, want all the words in next column to text with comma separated.
case matching also can be one challenge, and I want exact match only:
Example:
file_1
File_1
file_2
Disney,
Hollywood
Desired Output:
Desired Output
I assume you will read the files into two separate data frames such as df1 and df2.
You can subset your search values from df2 as needed, or turn it into one large vector to search through using:
df2 <- as.vector(t(df2))
Then create a new column "Match" on df1 using containing the items matched in df2.
for (i in 1:nrow(df1)) {
df1$Match[i] <- paste0(df2[which(df2 %in df1$SearchColumn[i])],collapse = ",")
}
This loops from row 1 to the max number of rows in df1, finds the indices of matches in df2 using the where function and then calls those values and pastes them together separated by a comma. I'm sure someone else can find a way to achieve this without a loop but I hope this works for you.

count occurrences in pipe delimited string in dataframe

I have a Names column in my dataframe as follows:
Names
steve|chris|jeff
melissa|jo|john
chris|susan|redi
john|fiona|bart
jo|chris|fiona
The entries are pipe delimited. Is there a way to count the occurrences of the names in this column? For example, Chris occurs 3 times. Using a package like "plyr" works when there are only single entries in the column, but not sure about entries that are combined like above.

Resources