I have a question about shifting of rows in the particular column of a data.
data <- data.frame(B=c(NA,NA,0,NA,NA,0),C=c(1,NA,NA,1,NA,NA))
B C
1 NA 1
2 NA NA
3 0 NA
4 NA 1
5 NA NA
6 0 NA
I tried from this post Shifting a column down by one
na.omit(transform(data, B = c(NA, B[-nrow(data)])))
but only get
B C
4 0 1
expected output;
B C
1 0 1
2 0 1
How can we achieve that ?
Thanks.
If you want to remove all NA from each column and do not care that the rows will not match between columns you can do:
data <- data.frame(B=c(NA,NA,0,NA,NA,0),C=c(1,NA,NA,1,NA,NA))
res<-lapply(data,function(x){x[complete.cases(x)]})
res<-data.frame(res)
the second line says: for every column in data keep only the values which are not NA
Thanks to #thelatemail for the correction from the solution below, which worked, but would have kept the columns as factors:
data <- data.frame(B=c(NA,NA,0,NA,NA,0),C=c(1,NA,NA,1,NA,NA))
res<-apply(data,2,function(x){x[complete.cases(x)]})
Related
I have data like one in the picture where there are two columns (Cday,Dday) with some missing values.
There can't be a row where there are values for both columns; there's a value on either one column or the other or in neither.
I want to create the column "new" that has copied values from whichever column there was a number.
Really appreciate any help!
Since no row has a value for both, you can just sum up the two existing columns. Assume your dataframe is called df.
df$'new' = rowSums(df[,2:3], na.rm=T)
This will sum the rows, removing NAs and should give you what you want. (Note: you may need to adjust column numbering if you have more columns than what you've shown).
The dplyr package has the coalesce function.
library(dplyr)
df <- data.frame(id=1:8, Cday=c(1,2,NA,NA,3,NA,2,NA), Dday=c(NA,NA,NA,3,NA,2,NA,1))
new <- df %>% mutate(new = coalesce(Dday, Cday, na.rm=T))
new
# id Cday Dday new
#1 1 1 NA 1
#2 2 2 NA 2
#3 3 NA NA NA
#4 4 NA 3 3
#5 5 3 NA 3
#6 6 NA 2 2
#7 7 2 NA 2
#8 8 NA 1 1
Here is some reproducible code that shows the problem I am trying to solve in another dataset. Suppose I have a dataframe df with some NULL values in it. I would like to replace these with NAs, as I attempt to do below. But when I print this, it comes out as <NA>. See the second dataframe, which comes is the dataframe I would like to produce from df, in which the NA is a regular old NA without the carrots.
> df = data.frame(a=c(1,2,3,"NULL"),b=c(1,5,4,6))
> df[4,1] = NA
> print(df)
a b
1 1 1
2 2 5
3 3 4
4 <NA> 6
>
> d = data.frame(a=c(1,2,3,NA),b=c(1,5,4,6))
> print(d)
a b
1 1 1
2 2 5
3 3 4
4 NA 6
First, I'm brand new to R and am making the switch from SAS. I have a dataset that is 1000 rows by 24 columns, where the columns are different treatments. I want to count the number of times an observation meets a criteria across rows of my dataset listed below.
Gene A B C D
1 AARS_3 NA NA 4.168365 NA
2 AASDHPPT_21936 NA NA NA -3.221287
3 AATF_26432 NA NA NA NA
4 ABCC2_22 4.501518 3.17992 NA NA
5 ABCC2_26620 NA NA NA NA
I was trying to create column vectors that counted
1) Number of NAs
2) Number of columns <0
3) Number of columns >0
I would then use cbind to add these to my large dataset
I solved the first one with :
NA.Count <- (apply(b01,MARGIN=1,FUN=function(x) length(x[is.na(x)])))
I tried to modify this to count evaluate the !is.na and then count the number of times the value was less than zero with this:
lt0 <- (apply(b01,MARGIN=1,FUN=function(x) ifelse(x[!is.na(x)],count(x[x<0]))))
which didn't work at all.
I tried a dozen ways to get dplyr mutate to work with this and did not succeed.
What I want are the last two columns below; and if you had a cleaner version of the NA.Count I did, that would also be greatly appreciated.
Gene A B C D NA.Count lt0 gt0
1 AARS_3 NA NA 4.168365 NA 3 0 1
2 AASDHPPT_21936 NA NA NA -3.221287 3 1 0
3 AATF_26432 NA NA NA NA 4 0 0
4 ABCC2_22 4.501518 3.17992 NA NA 2 0 2
5 ABCC2_26620 NA NA NA NA 4 0 0
Here is one way to do it taking advantage of the fact that TRUE equals 1 in R.
# test data frame
lil_df <- data.frame(Gene = c("AAR3", "ABCDE"),
A = c(NA, 3),
B = c(2, NA),
C = c(-1, -2),
D = c(NA, NA))
# is.na
NA.count <- rowSums(is.na(lil_df[,-1]))
# less than zero
lt0 <- rowSums(lil_df[,-1]<0, na.rm = TRUE)
# more that zero
mt0 <- rowSums(lil_df[,-1]>0, na.rm = TRUE)
# cbind to data frame
larger_df <- cbind(lil_df, NA.count, lt0, mt0 )
larger_df
Gene A B C D NA.count lt0 mt0
1 AAR3 NA 2 -1 NA 2 1 1
2 ABCDE 3 NA -2 NA 2 1 1
I'm a beginner of R. Although I have read a lot in manuals and here at this board, I have to ask my first question. It's a little bit the same as here but not really the same and i don't understand the explanation there.I have a dataframe with hundreds of thousands of rows and 30 columns. But for my question I created a simplier dataframe that you can use:
a <- sample(c(1,3,5,9), 20, replace = TRUE)
b <- sample(c(1,NA), 20, replace = TRUE)
df <- data.frame(a,b)
Now I want to compare the values of the last column (here column b), so that I'm looking iteratively at the value of each row if it is the same as the in the next row. If it is the same I want to write a 0 as the value in a new column in the same row, otherwise it should be a 1 as the value of the new column.
Here you can see my code, that's not working, because the rows of the new column only contain 0:
m<-c()
for (i in seq(along=df[,1])){
ifelse(df$b[i] == df$b[i+1],m <- 0, m <- 1)
df$mov <- m
}
The result, what I want to get, looks like the example below. What's the mistake? And is there a better way than creating loops? Maybe looping could be very slow for my big dataset.
a b mov
1 9 NA 0
2 1 NA 1
3 1 1 1
4 5 NA 0
5 1 NA 0
6 3 NA 0
7 3 NA 1
8 5 1 0
9 1 1 0
10 3 1 0
11 1 1 0
12 9 1 0
13 1 1 1
14 5 NA 0
15 9 NA 0
16 9 NA 0
17 9 NA 0
18 5 NA 0
19 3 NA 0
20 1 NA 0
Thank you for your help!
There are a couple things to consider in your example.
First, to avoid a loop, you can create a copy of the vector that is shifted by one position. (There are about 20 ways to do this.) Then when you test vector B vs C it will do element-by-element comparison of each position vs its neighbor.
Second, equality comparisons don't work with NA -- they always return NA. So NA == NA is not TRUE it is NA! Again, there are about 20 ways to get around this, but here I have just replaced all the NAs in the temporary vector with a placeholder that will work for the tests of equality.
Finally, you have to decide what you want to do with the last value (which doesn't have a neighbor). Here I have put 1, which is your assignment for "doesn't match its neighbor".
So, depending on the range of values possible in b, you could do
c = df$b
z = length(c)
c[is.na(c)] = 'x' # replace NA with value that will allow equality test
df$mov = c(1 * !(c[1:z-1] == c[2:z]),1) # add 1 to the end for the last value
You could do something like this to mark the ones which match
df$bnext <- c(tail(df$b,-1),NA)
df$bnextsame <- ifelse(df$bnext == df$b | (is.na(df$b) & is.na(df$bnext)),0,1)
There are plenty of NAs here because there are plenty of NAs in your column b as well and any comparison with NA returns an NA and not a TRUE/FALSE. You could add a df[is.na(df$bnextsame),"bnextsame"] <- 0 to fix that.
You can use a "rolling equality test" with zoo 's rollapply. Also, identical is preferred to ==.
#identical(NA, NA)
#[1] TRUE
#NA == NA
#[1] NA
library(zoo)
df$mov <- c(rollapply(df$b, width = 2,
FUN = function(x) as.numeric(!identical(x[1], x[2]))), "no_comparison")
#`!` because you want `0` as `TRUE` ;
#I added a "no_comparison" to last value as it is not compared with any one
df
# a b mov
#1 5 1 0
#2 1 1 0
#3 9 1 1
#4 5 NA 1
#5 9 1 1
#.....
#19 1 NA 0
#20 1 NA no_comparison
I am looking for a way to add 3 values in 3 different columns to a matrix based on the value in an existing column.
experiment = rbind(1,1,1,2,2,2,3,3,3)
newColumns = matrix(NA,dim(experiment)[1],3) # make 3 columns of length experiment filled with NA
experiment = cbind(experiment,newColumns) # add new columns to the experimental data
experiment = data.frame(experiment)
experiment[experiment[,1]==1,2:4] = cbind(0,1,2) # add 3 columns at once
experiment$new[experiment[,1]==2] = 5 # add a single column
print(experiment)
X1 X2 X3 X4 new
1 1 0 0 0 NA
2 1 1 1 1 NA
3 1 2 2 2 NA
4 2 NA NA NA 5
5 2 NA NA NA 5
6 2 NA NA NA 5
7 3 NA NA NA NA
8 3 NA NA NA NA
9 3 NA NA NA NA
this, however, fills the new columns the wrong way. I want column 2 to be all 0's, column 3 to be all 1's and column 4 to be all 3's.
I know I can do it 1 column at a time, but my real dataset is quit large so that isn't my preferred solution. I would like to be able to easily add more columns just by making the range of columns larger and adding values to the 3 values in the example
Instead of this:
experiment[experiment[,1]==1,2:4] = cbind(0,1,2) # add 3 columns at once
Try this:
experiment[experiment[,1] == 1, 2:4] <- rep(c(0:2), each=3)
The problem is that you've provided 3 values (0,1,2) to fill 9 entries. The values are by default filled column-wise. So, the first column is filled with 0, 1, 2 and then the values get recycled. So, it goes again 0,1,2 and 0,1,2. Since you want 0,0,0,1,1,1,2,2,2, you should explicitly generate using rep(0:2, each=3) (the each does the task of generating the data shown just above).