I have a major dataframe and 70 smaller dataframes. I would like to merge each of the 70 dataframes in the list with the main dataframe (so I end up with one huge dataframe).
The small dataframes all have 2 columns, but all have a different number of rows. I would like to do two things:
1)Change the name of the second column (by adding a character to the column 1 name)
2) Then merge that small column to the main column.
Here is what I have in mind:
colnames(SmallDf1)[2] <- paste(colnames(SmallDf1)[1], "_1")
main_Df <- merge(main_Df, SmallDf1, by.x = "NAME", by.y = names(SmallDf1)[1], all.x = TRUE)
This is obviously, just for a single data frame. Does anyone have ideas for doing this for all 70 dataframes? Any help appreciated!
Related
I want to extract a common column named "framewise_displacement" from 162 tsv files arranged by subject ID numbers (eg., sub-CC123_timeseries.tsv, sub-CC124_timeseries.tsv, etc) with different number of columns and same number of rows, and merge them into a single dataframe.
The new dataframe is desired to have the columns to be the "framewise_displacement" from different subjects files with subject ID along, and the rows to be the same from the original files.
I tried to use vroom function in R, but failed because the files have different number of columns.
Also tried this code, but the output stacked all the columns into 1 single columns.
files = fs::dir_ls(path = "Documents/subject_timeseries", glob = "*.tsv")
merged_df <- map_df(files, ~vroom(.x, col_select=c(framewise_displacement)))
What should I do to merge them into one dataframe with the desired column side by side?
Any suggestions would be appreciated.
Many thanks!!!
The 2 files have different column names and numbers. I would like to combine the rows of the first file under the rows of the second file for the related columns based on the positions of the columns.
I tried the below:
specifying the columns positions for the 2 files.
one = df1[6:59, ]
two = df2[2:55, ]
binding the rows, using mutate because some for the columns contains factor data, not integers
a= bind_rows(mutate_all(one, as.character), mutate_all(two, as.character))
but it didn't work! can anyone help, please?
If you bind by column position you'll need rbind, but the objects will I think need same column names, so you'll need to reassign them with something like:
d1 = read_csv('file1.csv')
d2 = read_csv('file2.csv')
names(d2)[6:59] = names(d1)[2:55]
Then the data frames will need to have the same number of columns.
# rbind method
rbind(d1, d2)
A dplyr approach will work with any number of columns but again the column names will need to be match.
# dplyr method
dplyr::bind_rows(list(d1, d2))
I have six dataframe (all_road_25, all_road_50, all_road_100,all_road_300, all_road_500, all_road_1000) and all of the data frame contain the same column "site" and another column "length". I want to join all of them by the same column "site" and with the rest columns names showing the original dataframe. So I tried:
all_roads_variables<- list(all_road_25, all_road_50, all_road_100,
all_road_300, all_road_500, all_road_1000) %>%
reduce(full_join, by = "site")
names(all_roads_variables)[2:7] <- c("all_road_25","all_road_50","all_road_100",
"all_road_300", "all_road_500", "all_road_1000 ")
It gives the results I want, but I have to copy all the names of original dataframes by hand.
Is there a way of making the scripts shorter?
We can use mget to do this. It will give a named list of all the datasets that have similar pattern names
all_roads_variables <- mget(ls(pattern = "^all_road_\\d+"))
I have two structural identical dataframes: column id-part1, column id-part2 and column data1.
id-part1 and id-part2 are together used as an index-
Now I want to calculate the difference between the two dataframes of column data1 with respect to the two id columns. In fact, in one data-frames it might happen that the combination of id-part1 and id-part2 is not existing...
So it is somehow a SQL join operation, ins't?
The merge() function is what you are looking for.
It works similar as an SQL join operation. Given your description a solution would be:
solution <- merge(DF1, DF2, by = c('id-part1', 'id-part2'), all.x = TRUE, all.y = TRUE)
DF1 and DF2 are your corresponding data frames. merge() uses x and y to reference these data frames where x is the first (DF1) and y the second (DF2).
The by= property defines the column names to match (you can even specify different names for each data frame).
all.x and all.y specify the kind of join you like to perform, depending on the data you like to keep.
The result is a new data frame with different columns for data1. You can then continue with your calculations.
I have data in a dataframe with 139104 rows which is multiple of 96x1449. i have a phenotype file which contains the phenotype information for the 96 samples. the snp name is repeated 1449X96 samples. I haveto merge the two dataframes based on sid and sen. this is how my two dataframes look like
dat <- data.frame(
snpname=rep(letters[1:12],12),
sid=rep(1:12,each=12),
genotype=rep(c('aa','ab','bb'), 12)
)
pheno <- data.frame(
sen=1:12,
disease=rep(c('N','Y'),6),
wellid=1:12
)
I have to merge or add the disease column and 3 other columns to the data file. I am unable to use merge in R. I have searched google, i am not hitting the correct terms to get the answer. I would appreciate any input on this issue.
Thanks, Sharad
You can specify the columns you want to match on directly with merge():
merge(dat, pheno, by.x = "sid", by.y = "sen")