R function for identifying values from one column in another? - r

I have two different data frames, each of them consisting of a list of "genes" and a list of "interactors" (other genes). Is it possible with R to check if there any "genes" from one list that are also present in any of the columns of "interactors" from the other data frame, and vice-versa?
I am quite new in R, so perhaps there is an easy way to perform this, but I don't even know how to look for it.
Thanks in advance!
Guillermo.

please can you show a sample of your data?
In any case, I guess the following is what you need:
df_common<-data.frame(df[which(df$genes %in% df$interactors),])
it is checking which elements in the column "genes" in the data frame df are also present %in% the column "interactors" in the same data frame
Is it this what you are looking for? if not, please paste input and desired output

Related

R: Searching a column in a dataframe for matches to a reference list in another dataframe

I am trying to categorize genes with multiple GO descriptors into bins based on what those GO descriptors are related to. I have dataframe A which contains the raw data associated with a list of geneIDs (>500,000) and their associated GO descriptors and dataframe B which classifies these GO descriptors into larger groups.
Example of dataframe A
dfA
Example of dataframe B
dfB
Ideally, the final output would reference the entire list and generate a new column in dataframe A classifying the GeneIDs into the GO_Category's associated with its specific GO_IDs -- bonus points if it removes duplicate hits on the GO_Categorys.
Looking something like this...
Example of Ideal Solution
However, I know that the ideal solution might be difficult to obtain, and I already have dataframe B listed out based on the unique GO_Categories so a solution like this might be easier to obtain.
Example of Acceptable Solution
So far I have struggled with getting any command to search for partial strings using a list from another dataframe with the goal of returning all matches.
I have had partial success with the acceptable solution approach and using:
dfA <- dfA %>%
mutate(GO_Cat_1 = c('No', 'Yes')[1+str_detect(dfA$GO_IDs, as.character(dfB$GO_IDs))])
The solution seems okay, however, it does return an error along the lines of
problem with mutate() column GO_Cat_1.
i GO_Cat_1 = ...[].
i longer object length is not a multiple of shorter object length
I have also tried to look into applying grepl/grep - but struggled to feed it a list of terms to look for partial string matches in dfA.
Any assistance is greatly appreciated!

Unlisting a dataframe from a list of a list

I want to extract a dataframe from a list that is also inside a list. Also some dataframes have different number of columns than others. This is what i have used without success.
Name of the first list is comments.
df <- do.call(rbind.fill,comments)
When i try
df <- do.call(rbind.fill,comments[[1]])
it does work, but i would like for all the dataframes to be together as one.
I know that this is not a reproducible example, but please bear with me, as this would take some time to repproduce, and i think the problem is clear enough.
Thanks

How to calculate combination of Data frame in R

I am a beginner in R program.
I imported a csv file. This file only contains one column with 50 characters, but R classifies it as a dataframe. I need all possible combinations within elements of this column. I think I need to work with a vector not with a data frame, how can I do it?
Thank you!
Actually your data frame already contains the vector you need. You can call it with
dataframe$column_name
The text before the $ operator specifies your data frame, and after is your vector, which is a column in your data frame. So when you run your calculations you can just write
function(dataframe$column_name)
In your specific case with a single vector, it may be simplest to change the dataframe into a 2d vector. But when you start manipulating your data, you'll likely store more vectors of variables. You'll want to keep those vectors organized within data frames.
Do you mean unlist?
You can use it to change a data frame into a vector, then you can use combn to get combination.

copying data from one data frame to other using variable in R

I am trying to transfer data from one data frame to other. I want to copy all 8 columns from a huge data frame to a smaller one and name the columns n1, n2, etc..
first I am trying to find the column number from which I need to copy by using this
x=as.numeric(which(colnames(old_df)=='N1_data'))
Then I am pasting it in new data frame this way
new_df[paste('N',1:8,'new',sep='')]=old_df[x:x+7]
However, when I run this, all the new 8 columns have exactly same data. However, instead if I directly use the value of x, then I get what I want like
new_df[paste('N',1:8,'new',sep='')]=old_df[10:17]
So my questions are
Why I am not able to use the variable x. I added as.numeric just to make sure it is a number not a list. However, that does not seem to help.
Is there any better or more efficient way to achieve this?
If I'm understanding your question correctly, you may be overthinking the problem.
library(dplyr);
new_df <- select(old_df, N1_data, N2_data, N3_data, N4_data,
N5_data, N6_data, N7_data, N8_data);
colnames(new_df) <- sub("N(\\d)_data", "n\\\\1", colnames(new_df));

R data frame issue - non-numeric headers

This is definitely a rookie question but I'm not finding an answer for this (maybe because of my wording) so here goes:
I'm reading a data frame into R studio (csv file) that has 24 columns with headers. There are only numbers in these columns (they're essentially concentrations of several chemicals). It's called all. I need to use them as numeric vectors. When I read them in and type
is.numeric(all[,1])
I get
TRUE
When I type
is.numeric(all[1])
I get
FALSE
I think this is because R interprets the header as a factor. I also tried reading in a table without headers and with headers=FALSE, but R renames it to V1, V2 etc so the result ends up being the same.
I need to work with functions where I invoke something like all[2:24]. How can I go about to make R either "not see" the header or remove it altogether?
Thanks for the answers!
PS: the dataframe I am using (without headers - if it had headers, it would just have names instead of V1, V2, etc) is something like this:
This is a subset from the first column, not the first row.
all[,1]) #subset first column
The following is subset of first row
all[1,]) #subset first row (headers of df not included)
To give columnames
colnames(all) <- c("col1","col2")
Your assumption is wrong. You have a data.frame and all[1] does list subsetting, which results in a data.frame, which is not a vector, and not a numeric vector in particular.
You should study help("[") and An Introduction to R.

Resources