My aim is to replace values from multiple columns by values in a column b_Y_1, provided they are not missing. If there is a missing value in a column b_Y_1 for the corresponding line, it remains unchanged. My problem is that I do not know how to write "remains unchanged" for more columns at once.
I use data.table package.
data[ ,c("a_.Y3_1","a_.Y3_2","a_.Y3_3","a_Y1_1","a_Y1_2","a_Y1_3") := ifelse(!is.na(data$b_Y_1), data$b_Y_1,????)]
Related
I have a dataframe with over 100 columns. Post implementation of certain conditions, I need a subset of the dataframe with the columns that are listed in a separate array.
The array has 50 entries with 2 columns. The first column has the selected variable names and the second column has some associated values.
I wish to build a new data frame with just the variables mentioned in the the first column of the separate array. Could you please point me as to how to proceed?
Try this:
library(dplyr)
iris <- iris %>% select(contains(dataframe_with_names$names))
In R you can use square brackets [rows, columns] to select specific rows or specific columns. (Leaving either blank selects all).
If you had a vector of column names you wanted to keep called important_columns you could select only those columns with:
myData[,important_columns]
In your case the vector of column names is actually a column in your array. So you select that column and use it as your vector:
myData[, array$names]
I omitted columns that are constant from a data frame
DataNew <- Data[, dataPreparation::whichAreConstant(Data, verbose= FALSE)]
However I want a list of unmatched columns with their constant value
I tried this command to get the list of unmatched columns but I want also the columns values
setdfiff(Data, DataNew)
If I understand the question right, you can just use that list to extract the unmatched columns from Data:
Data[,setdfiff(Data, DataNew)]
And if you just want one occurrence of the value:
Data[1,setdfiff(Data, DataNew)]
I'm trying to sum rows that contain a value in a different column.
rowSums(wood_plastics[,c(48,52,56,60)], na.rm=TRUE)
The above got me row sums for the columns identified but now I'd like to only sum rows that contain a certain year in a different column. I tried this
rowSums(mydata[,c(48,52,56,60)], na.rm=TRUE, mydata$current_year = '2015')
with no success. I thought I might have to single out the year value from the column number, 7, in the initial column list.
Any help is appreciated.
I would say simply
rowSums(mydata[mydata$current_year == '2015',c(48,52,56,60)], na.rm=TRUE)
since I don't have the original data frame I cannot give you the result. But the idea is that you can select which rows you want before the comma while selecting which column you want. Is this clear enough for you?
I have a data frame which I will call "abs.data" that contains 265 columns (variables). I have another data frame which I will call "corr.abs" that contains updated data on a subset of the columns in "abs.data". Both data frames have an equal number of rows, n=551. I need to replace the columns in "abs.data" with the correct observations in "corr.abs" where the column names match. I have tried the following
abs.samps <- colnames(abs.data) #vector of column names in abs. data
corr.abs.samps <- colnames(corr.abs) #vector of column names in corr.abs
abs.data[,which(abs.samps %in% corr.abs.samps==TRUE)] <- corr.abs[,which(corr.abs.samps %in% abs.samps==TRUE)] #replace columns in abs.data with correct observations in corr.abs where the column names are the same
When I run the left and right side of the last line of code R pulls the right columns, but it fails to replace the columns in abs.data with the correct data in corr.abs. Any ideas why?
you can find the common column names using
comm_col <- intersect(colnames(abs.samps), colnames(corr.abs))
eg. you find X2 is the common column
you can first drop the columns, in this case X2 from abs.samps that you do not want using subset
x<-subset(abs.samps, select = -X2)
then you can just add the new column (eg. column name X2)to the new data frame
y<-cbind(corr.abs$X2,x)
Please can anyone offer guidance on how to
calculate column totals and then
sort on the resultant totals in R?
Everything I've tried so far to total the columns, eg, (colsums() and sapply() returns the resultant totals as a vector (eg ABOUT 4022) and I cannot find any information on how I can split this into the Column Header ABOUT and Column Value 4022 and then sort both the Header and Value on the Column Value.
Note that the function is called colSums (not colsums).
Use this with sort. Or if you want to order the columns use order:
colSums(mtcars)
sort(colSums(mtcars))
mtcars[ ,order(colSums(mtcars))]