I have two dataframes of equal dimensions.
One has some value in cells (i.e. 'abc') that i need to index. Other has all different values. And I need to replace the values in other dataframe with the same index as 'abc'.
Examples:
df1 <- data.frame('1'=c('abc','bbb','rweq','dsaf','cxc','rwer','anc','ewr','yuje','gda'),
'2'=c(NA,NA,'bbb','dsaf','rwer','dsaf','ewr','cxc','dsaf','cxc'),
'3'=c(NA,NA,'dsaf','abc','bbb','cxc','yuje',NA,'ewr','anc'),
'4'=c(NA,NA,'cxc',NA,'abc','anc',NA,NA,'yuje','rweq'),
'5'=c(NA,NA,'anc',NA,'abc',NA,NA,NA,'rwer','rwer'),
'6'=c(NA,NA,'rweq',NA,'dsaf',NA,NA,NA,'bbb','bbb'),
'7'=c(NA,NA,'abc',NA,'ewr',NA,NA,NA,'abc','abc'),
'8'=c(NA,NA,'abc',NA,'rweq',NA,NA,NA,'cxc','bbb'),
'9'=c(NA,NA,NA,NA,'abc',NA,NA,NA,'anc',NA),
'10'=c(NA,NA,NA,NA,'abc',NA,NA,NA,'rweq',NA))
df2 <- data.frame('1'=c('green','black','white','yelp','help','green','red','brown','green','crack'),
'2'=c(NA,NA,'black','yelp','green','yelp','brown','help','yelp','help'),
'3'=c(NA,NA,'yelp','green','black','help','green',NA,'brown','red'),
'4'=c(NA,NA,'help',NA,'green','red',NA,NA,'green','white'),
'5'=c(NA,NA,'red',NA,'green',NA,NA,NA,'green','green'),
'6'=c(NA,NA,'white',NA,'yelp',NA,NA,NA,'black','black'),
'7'=c(NA,NA,'green',NA,'brown',NA,NA,NA,'green','green'),
'8'=c(NA,NA,'green',NA,'white',NA,NA,NA,'help','black'),
'9'=c(NA,NA,NA,NA,'green',NA,NA,NA,'red',NA),
'10'=c(NA,NA,NA,NA,'green',NA,NA,NA,'white',NA))
I can find sequential index of 'abc', but it returns one-sized vector
which(df1 == 'abc')
#[1] 1 24 35 45 63 69 70 73 85 95
And i don't know how to replace values using this method
In output expected to view df2 with replaced values 'green' only on the same indexes as values 'abc' in df1.
But note!! that 'green' values in df2 are not only in the same indexes as in df1
I don't think your problem is appropriately approached with the data in a data.frame. That introduces several complications. First, each variable (column) in the data frame is a factor with different levels! Second, your code is making a comparison between a list (data.frame) and a factor (which is coerced into an atomic vector). The help function for the == operator states ..if the other is a list R attempts to coerce it to the type of the atomic vector.. The help function also points out that factors get special handling in comparisons where it first assumes you are comparing factor levels, which your code is doing.
I think you want to convert your data frames of identical dimensions to a matrix first. If you need the results in a data.frame, convert it back after as I show here but realize that the factor levels may have changed.
# Starting with the values assigned to df1 and df2
m1 <- as.matrix(df1)
m2 <- as.matrix(df2)
index <- which(m1 == "abc")
m2[index] <- "abc"
df2 <- as.data.frame(m2)
Here is a way to. Learn about the *apply family in R: I think it is the most useful group of functions in this language, whatever you plan to do ;) Also know that data.frame are of 'list' type.
df1 <- lapply(df1, function(frame, pattern, replace){ # for each frame = column:
matches <- which(pattern %in% frame) # what are the matching indexes of the frame
if(length(matches) > 0) # If there is at least one index matching,
frame[matches] <- replace # give it the value you want
return(frame) # Commit your changes back to df1
}, pattern="abc", replace= "<whatYouWant>") # don't forget this part: the needed arguments !
I have a dataset with three columns.
## generate sample data
set.seed(1)
x<-sample(1:3,50,replace = T )
y<-sample(1:3,50,replace = T )
z<-sample(1:3,50,replace = T )
data<-as.data.frame(cbind(x,y,z))
What I am trying to do is:
Select those rows where all the three columns have 1
Select those rows where only two columns have 1 (could be any column)
Select only those rows where only column has 1 (could be any column)
Basically I want any two columns (for 2nd case) to fulfill the conditions and not any specific column.
I am aware of rows selection using
subset<-data[c(data$x==1,data$y==1,data$z==1),]
But this only selects those rows based on conditions for specific columns whereas I want any of the three/two columns to fullfill me criteria
Thanks
n = 1 # or 2 or 3
data[rowSums(data == 1) == n,]
Here is another method:
rowCounts <- table(c(which(data$x==1), which(data$y==1), which(data$z==1)))
# this is the long way
df.oneOne <- data[as.integer(names(rowCounts)[rowCounts == 1]),]
df.oneTwo <- data[as.integer(names(rowCounts)[rowCounts == 2]),]
df.oneThree <- data[as.integer(names(rowCounts)[rowCounts == 3]),]
It is better to save multiple data.frames in a list especially when there is some structure that guides this storage as is the case here. Following #richard-scriven 's suggestion, you can do this easily with lapply:
df.oneCountList <- lapply(1:3, function(i)
data[as.integer(names(rowCounts)[rowCounts == i]),]
names(df.oneCountList) <- c("df.oneOne", "df.oneTwo", df.oneThree)
You can then pull out the data.frames using either their index, df.oneCountList[[1]] or their name df.oneCountList[["df.oneOne"]].
#eddi below suggests a nice shortcut to my method of pulling out the table names using tabulate and the arr.ind argument of which. When which is applied on a multipdimensional object such as an array or a data.frame, setting arr.ind==TRUE produces indices of the rows and the columns where the logical expression evaluates to TRUE. His suggestion exploits this to pull out the row vector where a 1 is found across all variables. The tabulate function is then applied to these row values and tabulate returns a sorted vector that where each element represents a row and rows without a 1 are filled in with a 0.
Under this method,
rowCounts <- tabulate(which(data == 1, arr.ind = TRUE)[,1])
returns a vector from which you might immediately pull the values. You can include the above lapply to get a list of data.frames:
df.oneCountList <- lapply(1:3, function(i) data[rowCounts == i,])
names(df.oneCountList) <- c("df.oneOne", "df.oneTwo", df.oneThree)
I have a data.frame of size 8326x13. I would like to order it in parts by a specific column. E.g. order the range 1:1375 only by the column A. Then, I would like to add this order part to same data.frame into the correct place 1:1375. Is it possible?
Thanks in advanced.
Raúl.
Or, (using the dataset of useR)
indx <- rep(c(TRUE,FALSE), each=10) #create a logical index.
In this case the first 10 rows are ordered
data[indx,] <- data[order(data$A[indx]),]
Update
Or instead of creating a logical index, extract the rows that needs to be ordered and replace it with the ordered set
data[1:10,] <- data[order(data$A[1:10]),]
In your dataset if you create a index,
indx <- rep(c(TRUE,FALSE), c(1375, 8326-1375))
As suggested by #JeremyS
A <- sample(1:100, 20)
B <- sample(letters[1:26],20)
data <- data.frame(A, B)
n <- 10 # you want range 1:n
lower <- data[(n+1):dim(data)[1], ] # split to two data.frame with lower and upper part
upper <- data[1:n,]
upper <- upper[order(upper$A),] # or order(upper[,m]), m is the column index
data.new <- rbind.data.frame(upper, lower)