I omitted columns that are constant from a data frame
DataNew <- Data[, dataPreparation::whichAreConstant(Data, verbose= FALSE)]
However I want a list of unmatched columns with their constant value
I tried this command to get the list of unmatched columns but I want also the columns values
setdfiff(Data, DataNew)
If I understand the question right, you can just use that list to extract the unmatched columns from Data:
Data[,setdfiff(Data, DataNew)]
And if you just want one occurrence of the value:
Data[1,setdfiff(Data, DataNew)]
Related
I have a dataframe with over 100 columns. Post implementation of certain conditions, I need a subset of the dataframe with the columns that are listed in a separate array.
The array has 50 entries with 2 columns. The first column has the selected variable names and the second column has some associated values.
I wish to build a new data frame with just the variables mentioned in the the first column of the separate array. Could you please point me as to how to proceed?
Try this:
library(dplyr)
iris <- iris %>% select(contains(dataframe_with_names$names))
In R you can use square brackets [rows, columns] to select specific rows or specific columns. (Leaving either blank selects all).
If you had a vector of column names you wanted to keep called important_columns you could select only those columns with:
myData[,important_columns]
In your case the vector of column names is actually a column in your array. So you select that column and use it as your vector:
myData[, array$names]
My aim is to replace values from multiple columns by values in a column b_Y_1, provided they are not missing. If there is a missing value in a column b_Y_1 for the corresponding line, it remains unchanged. My problem is that I do not know how to write "remains unchanged" for more columns at once.
I use data.table package.
data[ ,c("a_.Y3_1","a_.Y3_2","a_.Y3_3","a_Y1_1","a_Y1_2","a_Y1_3") := ifelse(!is.na(data$b_Y_1), data$b_Y_1,????)]
I have a dataframe with multiple columns that I want to group according to their names. When several columns names respond to the same pattern, I want them grouped in a single column and that column is the sum of the group.
colnames(dataframe)
[1] "Départements" "01...3" "01...4" "01...5" "02...6" "02...7" "02...8" "02...9" "02...10" "03...11"
[11] "03...12" "03...13" "04...14" "04...15" "05...16" "05...17" "05...18" "06...19" "06...20" "06...21"
So I use this bit of code that works just fine when every column are numeric, though the first one is character and therefore I hit an error. How can I exclude the first column from the code?
#Group columns by patern, look for a pattern and loop through
patterns <- unique(substr(names(dataframe_2012), 1, 3))` #store patterns in a vector
dataframe <- sapply(patterns, function(xx) rowSums(dataframe[,grep(xx, names(dataframe)), drop=FALSE]))
#loop through
This is the error code I get
Error in rowSums(DEPTpolicedata_2012[, grep(xx, names(DEPTpolicedata_2012)), :
'x' must be numeric
You can simply remove the first column using
patterns$Départements <- NULL
I had a very big list of data in R, and I used the following line:
subsetList <- tmpList[tmpList$colName=="value"]
Where colName is the name of the column in the list and 'value' is the text I wanted to subset 'tmpList' on.
The result was I received was a complete replication of 'tmpList' in the new list.
After some experimenting, I used the following instead:
subsetList <- tmpList[tmpList$colName=="value" , ]
(Note the comma inserted.)
Now my 'subsetList' contains only the rows where the content of the column given by 'colName' is matching to "value".
Why does the first attempt return all the rows? And why does the second attempt return only rows matching the equivalence criteria?
I have a data frame which I will call "abs.data" that contains 265 columns (variables). I have another data frame which I will call "corr.abs" that contains updated data on a subset of the columns in "abs.data". Both data frames have an equal number of rows, n=551. I need to replace the columns in "abs.data" with the correct observations in "corr.abs" where the column names match. I have tried the following
abs.samps <- colnames(abs.data) #vector of column names in abs. data
corr.abs.samps <- colnames(corr.abs) #vector of column names in corr.abs
abs.data[,which(abs.samps %in% corr.abs.samps==TRUE)] <- corr.abs[,which(corr.abs.samps %in% abs.samps==TRUE)] #replace columns in abs.data with correct observations in corr.abs where the column names are the same
When I run the left and right side of the last line of code R pulls the right columns, but it fails to replace the columns in abs.data with the correct data in corr.abs. Any ideas why?
you can find the common column names using
comm_col <- intersect(colnames(abs.samps), colnames(corr.abs))
eg. you find X2 is the common column
you can first drop the columns, in this case X2 from abs.samps that you do not want using subset
x<-subset(abs.samps, select = -X2)
then you can just add the new column (eg. column name X2)to the new data frame
y<-cbind(corr.abs$X2,x)