Appending data in R - r

I am producing a script where I have done many manipulations to a bunch of data and, I do these same manipulations to another dataset. Both data sets have the same rows, columns, and headers. I would like to be able to join the two data sets together where I place dataset A above dataset B. I wouldn't need to headers for dataset B and would instead just clump all of the data together as if they were never really separated in the first place. Is there a simply way to do this?

Yes. Use rbind() command.
combineddataset = rbind(dataset1, dataset2)
Hope that helps.

And for completeness, you could also use the rbind.fill function found in the plyr package.

Related

Create filtered dataframe with dplyr and for loop

I want to create several data frames from an original one in R using for loop.
I want to get three separated data frames for each species to conduct separate analysis.
I tried the following code but it doesn't work:
data(iris)
library(dplyr)
for i in levels(iris$Species){
paste0(i,".data") <- data.frame(filter(iris,
Species=="i"))
}
I do not necessarily need dplyr but it's the one I am used to.
I think it is better to keep the separate frames in a list, which as pointed out in the comments is done easily with split(iris, iris$Species)
If you really want them out of the list and into separate named frames, you can use
list2env(split(iris, iris$Species),envir = .GlobalEnv)
An even better approach would be to keep everything in one data frame and then use dplyr/tidyr’s group_by() and nest() functions to fit a model for each group.
See here for a detailed walkthrough: https://tidyr.tidyverse.org/articles/nest.html

Combining two different data tables into one in R

I've got two data tables with the same structure and I'd like to combine them into one.
I've read about the merge function, but as I've understood, that'd try to merge by common values, so I don't know if that's what I'm looking for.
It'd be something like this:
You can use the rbind() function.
new_data <- rbind(data1, data2)
Hope this helps.

R function for identifying values from one column in another?

I have two different data frames, each of them consisting of a list of "genes" and a list of "interactors" (other genes). Is it possible with R to check if there any "genes" from one list that are also present in any of the columns of "interactors" from the other data frame, and vice-versa?
I am quite new in R, so perhaps there is an easy way to perform this, but I don't even know how to look for it.
Thanks in advance!
Guillermo.
please can you show a sample of your data?
In any case, I guess the following is what you need:
df_common<-data.frame(df[which(df$genes %in% df$interactors),])
it is checking which elements in the column "genes" in the data frame df are also present %in% the column "interactors" in the same data frame
Is it this what you are looking for? if not, please paste input and desired output

Reading subset of large data

I have a LARGE dataset with over 100 Million rows. I only want to read part of the data corresponds to one particular level of a factor, say column1 == A. How do I accomplish this in R using read.csv?
Thank you
You can't filter rows using read.csv. You might try sqldf::read.csv.sql as outlined in answers to this question.
But I think most people would process the file using another tool first. For example, csvkit allows filtering by rows.

Convert data frame to list

I am trying to go from a data frame to a list structure in R (and I know technically a data frame is a list). I have a data frame containing reference chemicals and their mechanisms different targets. For example, estrogen is an estrogen receptor agonist. What I would like is to transform the data frame to a list, because I am tired of typing out something like:
refchem$chemical_id[refchem$target=="AR" & refchem$mechanism=="Agonist"]
every time I need to access the list of specific reference chemicals. I would much rather access the chemicals by:
refchem$AR$Agonist
I am looking for a general answer, even though I have given a simplified example, because not all targets have all mechanisms.
This is really easy to accomplish with a loop:
example <- data.frame(target=rep(c("t1","t2","t3"),each=20),
mechan=rep(c("m1","m2"),each=10,3),
chems=paste0("chem",1:60))
oneoption <- list()
for(target in unique(example$target)){
oneoption[[target]] <- list()
for(mech in unique(example$mechan)){
oneoption[[target]][[mech]] <- as.character(example$chems[ example$target==target & example$mechan==mech ])
}
}
I am just wondering if there is a more clever way to do it. I tried playing around with lapply and did not make any progress.
Using split:
split(refchem, list(refchem$target, refchem$mechanism))
Should do the trick.
The new way to access would be refchem$AR.Agonist
If you make a keyed data.table instead, ...
you'll still have all the data in one data.frame (instead of a possibly-nested list of many);
you may find iterating over these subsets nicer; and
the syntax is pretty clean:
To access a subset:
DT[.('AR','Agonist')]
To do something for each group, that will be rbinded together in the result:
DT[,{do stuff},by=key(DT)]
Similar to aggregate(), any list of vectors of the correct length can go into the by, not just the key.
Finally, DT came from...
require(data.table)
DT <- data.table(refchem,key=c('target','mechanism'))
You can also use a plyr function:
library(plyr)
dlply(example, .(target, mechan))
It has the added advantage of using a function to process the data, if needed (there's an implicit identity in the above).

Resources