Remove part of name ending in dataframe column named .id - r

Background
I have some big dataframes (ie 15000 obs. of 100 variables) in which one similarity is that one of the columns is named .id.
I need to prepare the big dataframes for merging with each other. In order to perform the merging, then the columns named .id needs to have the same values.
All the dataframes columns named .id have the same beginning of random values call it randomValues, but there is two different type of endings call them randomValues-ending_1 and randomValues-ending_2.
The question
How does one remove remove the -ending_1 and -ending_2 text from the .id column of these big dataframe?
Any help is much appreciated :)

colnames(big.dataframe) <- gsub("-ending_\d+$","",colnames(big.dataframe))

Related

How to extract a common column from multiple tsv files and combine them into one dataframe in R?

I want to extract a common column named "framewise_displacement" from 162 tsv files arranged by subject ID numbers (eg., sub-CC123_timeseries.tsv, sub-CC124_timeseries.tsv, etc) with different number of columns and same number of rows, and merge them into a single dataframe.
The new dataframe is desired to have the columns to be the "framewise_displacement" from different subjects files with subject ID along, and the rows to be the same from the original files.
I tried to use vroom function in R, but failed because the files have different number of columns.
Also tried this code, but the output stacked all the columns into 1 single columns.
files = fs::dir_ls(path = "Documents/subject_timeseries", glob = "*.tsv")
merged_df <- map_df(files, ~vroom(.x, col_select=c(framewise_displacement)))
What should I do to merge them into one dataframe with the desired column side by side?
Any suggestions would be appreciated.
Many thanks!!!

Is there a R methodology to select the columns from a dataframe that are listed in a separate array

I have a dataframe with over 100 columns. Post implementation of certain conditions, I need a subset of the dataframe with the columns that are listed in a separate array.
The array has 50 entries with 2 columns. The first column has the selected variable names and the second column has some associated values.
I wish to build a new data frame with just the variables mentioned in the the first column of the separate array. Could you please point me as to how to proceed?
Try this:
library(dplyr)
iris <- iris %>% select(contains(dataframe_with_names$names))
In R you can use square brackets [rows, columns] to select specific rows or specific columns. (Leaving either blank selects all).
If you had a vector of column names you wanted to keep called important_columns you could select only those columns with:
myData[,important_columns]
In your case the vector of column names is actually a column in your array. So you select that column and use it as your vector:
myData[, array$names]

Remove rows for multiple dataframes having a name matching a pattern

I am trying to remove the first 9 rows of multiple dataframes that have the same structures but different names (keeping similar name structure). In my example, there are 4 dataframes with respectively the names
Mydataframe_A, Mydataframe_B, Mydataframe_C, Mydataframe_D.
Currently it is working with the following code:
`Mydataframe_A`<- `Mydataframe_A`[-c(1:9),]
`Mydataframe_B`<- `Mydataframe_B`[-c(1:9),]
`Mydataframe_C`<- `Mydataframe_C`[-c(1:9),]
`Mydataframe_D`<- `Mydataframe_D`[-c(1:9),]
But I would like to write this is with only one line and not having to specify each time each name of dataframe.
I think this could work by using a pattern name and lists because for example this is what I am doing to rbind different dataframes:
All_mydataframes <- rbindlist(mget(ls(pattern = "^Mydataframe_")))
Any idea on how to do this ?
Thanks a ton!
Since mget turns this into a list, you can use apply family functions:
rbindlist(lapply(mget(ls(pattern = "^Mydataframe_")), function(x) x[-c(1:9), ]))
This takes the list from mget and removes the first 9 rows, then rbind it from list to data.table. The only problem is you can't differentiate what data.frame the original data was part of.

How to copy multiple columns to a new dataframe in R

I have a data set (df2) with 400 columns and thousands of rows. The columns all have different names but all have either 'typeP' or 'typeR' at the end of their names. They are not ordered sequentially (eg. P,P,P,P,R,R,R,R) but randomly (P,P,R,R,R,P,R,P etc). I want to create a new data frame with just those columns whose names have 'type P' in their names.
I'm very new to R and so far I have only managed to find the positions of those columns using: grep("typeP",colnames(df2)). Any help would be appreciated!
After we get the index, we can use that to subset the initial dataset
df3 <- df2[grep("typeP",colnames(df2))]

How to multiply columns of same names belonging to different data.frame

I am having a problem... I have two data. frames with a lot of columns and these two data.frames are of different length, in fact one has many rows and second data.frame has only one row.... But in both data frames there are columns of same names. Now, I want to multiply the matching columns with each other. I fail to solve it. Please help me.
The command
mapply("*", DataFrame1, DataFrame2)
should work if you want to multiply all columns. If the relevant columns are only a subset of all columns in the data frames, we first need to identify the columns being present in both data frames.
mapply("*", DataFrame1[intersect(names(DataFrame1), names(DataFrame2))],
DataFrame2[intersect(names(DataFrame1), names(DataFrame2))])

Resources