I have a data set (df2) with 400 columns and thousands of rows. The columns all have different names but all have either 'typeP' or 'typeR' at the end of their names. They are not ordered sequentially (eg. P,P,P,P,R,R,R,R) but randomly (P,P,R,R,R,P,R,P etc). I want to create a new data frame with just those columns whose names have 'type P' in their names.
I'm very new to R and so far I have only managed to find the positions of those columns using: grep("typeP",colnames(df2)). Any help would be appreciated!
After we get the index, we can use that to subset the initial dataset
df3 <- df2[grep("typeP",colnames(df2))]
Related
I want to extract a common column named "framewise_displacement" from 162 tsv files arranged by subject ID numbers (eg., sub-CC123_timeseries.tsv, sub-CC124_timeseries.tsv, etc) with different number of columns and same number of rows, and merge them into a single dataframe.
The new dataframe is desired to have the columns to be the "framewise_displacement" from different subjects files with subject ID along, and the rows to be the same from the original files.
I tried to use vroom function in R, but failed because the files have different number of columns.
Also tried this code, but the output stacked all the columns into 1 single columns.
files = fs::dir_ls(path = "Documents/subject_timeseries", glob = "*.tsv")
merged_df <- map_df(files, ~vroom(.x, col_select=c(framewise_displacement)))
What should I do to merge them into one dataframe with the desired column side by side?
Any suggestions would be appreciated.
Many thanks!!!
I have the following dataset and I want to divide the "Dose" and "Count" columns by 100. I want my new dataset to include the "Visit" column as well.
However, using the code below, my new dataset does not have the "Visit" column.
What if I want to divide only the "Dose" column by 100?
I uploaded my dataset into R and called it data.
data_new=data[,2:ncol(data)]/100
The reason you're not preserving the untouched columns in your new data frame is because you're only assigning the selected columns on which you've applied the vectorized division operation back to what you expect (at least based on how you're naming it) is a data frame.
data$dose <- data$dose / 100
I have a dataframe with over 100 columns. Post implementation of certain conditions, I need a subset of the dataframe with the columns that are listed in a separate array.
The array has 50 entries with 2 columns. The first column has the selected variable names and the second column has some associated values.
I wish to build a new data frame with just the variables mentioned in the the first column of the separate array. Could you please point me as to how to proceed?
Try this:
library(dplyr)
iris <- iris %>% select(contains(dataframe_with_names$names))
In R you can use square brackets [rows, columns] to select specific rows or specific columns. (Leaving either blank selects all).
If you had a vector of column names you wanted to keep called important_columns you could select only those columns with:
myData[,important_columns]
In your case the vector of column names is actually a column in your array. So you select that column and use it as your vector:
myData[, array$names]
My data frame has over 120 columns (variables) and I would like to create subsets bases on column names.
For example I would like to create a subset where the column name includes the string "mood". Is this possible?
I generally use
SubData <- myData[,grep("whatIWant", colnames(myData))]
I know very well that the "," is not necessary and
colnames
could be replaced by
names
but it would not work with matrices and I hate to change the formalism when changing objects.
I have a data frame which I will call "abs.data" that contains 265 columns (variables). I have another data frame which I will call "corr.abs" that contains updated data on a subset of the columns in "abs.data". Both data frames have an equal number of rows, n=551. I need to replace the columns in "abs.data" with the correct observations in "corr.abs" where the column names match. I have tried the following
abs.samps <- colnames(abs.data) #vector of column names in abs. data
corr.abs.samps <- colnames(corr.abs) #vector of column names in corr.abs
abs.data[,which(abs.samps %in% corr.abs.samps==TRUE)] <- corr.abs[,which(corr.abs.samps %in% abs.samps==TRUE)] #replace columns in abs.data with correct observations in corr.abs where the column names are the same
When I run the left and right side of the last line of code R pulls the right columns, but it fails to replace the columns in abs.data with the correct data in corr.abs. Any ideas why?
you can find the common column names using
comm_col <- intersect(colnames(abs.samps), colnames(corr.abs))
eg. you find X2 is the common column
you can first drop the columns, in this case X2 from abs.samps that you do not want using subset
x<-subset(abs.samps, select = -X2)
then you can just add the new column (eg. column name X2)to the new data frame
y<-cbind(corr.abs$X2,x)