I am new to R with a fairly simple question, I just can't figure out the answer. For my example I will use a data frame with 3 columns, but my actual data set is 139 columns with 10000 rows.
I want to replace all of the values in a given row with NA if the value in the same row in column C contains a value < 10.
Assume that all of my columns are either number or integer values.
so I want to take the data frame:
x=data.frame(c(5,9,2),c(3,4,6),c(12,9,11))
names(x)=c("A","B","C")
and replace row 2 with NA to create
y=data.frame(c(5,"NA",2),c(3,"NA",6),c(12,"NA",11))
names(y)=c("A","B","C")
Thanks!
how about:
x[x$C <10 ,] <- NA
Related
I have a CSV file with 11 columns (but the first 8 I am ignoring for now), last 3 (9 - 11) are important. I am missing some data for column 9, and these cells show up as NA. But to fill in these cells, I can multiply column 11 by column 10.
I want to create a data frame where all of column 9 is filled in and save that as a new CSV file. I first tried to multiply the columns. This worked and I got the missing data from column 9. Then I tried to merge the new column 9 with the column 9 from my data frame but R just attached the 2 columns together.
I would like for the NA data that has been calculated to replace the data in the original data frame (so I end up with a full column 9). Plus, I would like to only multiply the columns with NA cells so that no original data is replaced. How to do that?
col_9 <- matrix(dat[,10] * dat[,11], ncol=1)
print(col_9)
You can use ifelse function:
dat[,9]=ifelse(is.na(dat[,9]), dat[,10]*dat[,11], dat[,9])
If the condition is TRUE (i.e. is.na(dat[,9])), the value will be replaced by the second argument (dat[,10]*dat[,11]), otherwise it is replaced by the third on (i.e. dat[,9], so the value is kept).
I am interested in inserting all missing rows into a data table for a new range of values for 2 columns.
Example, dt1[,a] has some values from 1 to 5, as does dt1[,b], but i'd like not only all pair wise combinations to be present in columns a and b, but all combinations to be present in a newly defined range, e.g. 1 to 7 instead.
# Example data.table
dt1 <- data.table(a=c(1,1,1,1,2,2,2,2,3,3,3,4,4,4,4,4,5,5,5),
b=c(1,3,4,5,1,2,3,4,1,2,3,1,2,3,4,5,3,4,5),
c=sample(1:10,19,replace=T))
setkey(dt1,a,b)
# CJ in data.table will create all rows to ensure all
# pair wise combinations are present (using the nominated columns).
dt1[CJ(a,b,unique=T)]
The above is great but will only use the max and min in the nominated columns. I'd like the inserted rows to give me all combinations between a new, nominated range, e.g. 1 to 7. There would be 49 rows.
# the following is a temporary workaround
template <- data.table(a1=rep(1:7,each=7),b1=rep(1:7,7))
setkey(template,a1,b1)
full <- dt1[template]
Instead of the already existing values in 'a' column, we can have a range of values to pass into 'CJ' for the 'a'
dt1[CJ(a = 1:7, b, unique = TRUE)]
I have a data frame with many rows and columns in it (3000x37) and I want to be able to select only rows that may have >= 2 columns of value "NA". These columns have data of different data types. I know how to do this in case I want to select only one column via:
df[is.na(df$col.name), ]
How to make this selection if I want to select two (or more) columns?
First create a vector nn with the of the number of NA's in each row and then select only those rows with >= 2 NA's d[nn>=2,]
d = data.frame(x=c(NA,1,2,3), y=c(NA,"a",NA,"c"))
nn = apply(d, 1, FUN=function (x) {sum(is.na(x))})
d[nn>=2,]
x y
1 NA <NA>
I am trying to subset a data frame by taking the integer values of 2 columns om my data frame
Subs1<-subset(DATA,DATA[,2][!is.na(DATA[,2])] & DATA[,3][!is.na(DATA[,3])])
but it gives me an error : longer object length is not a multiple of shorter object length.
How can I construct a subset which is composed of NON NA values of column 2 AND column 3?
Thanks a lot?
Try this:
Subs1<-subset(DATA, (!is.na(DATA[,2])) & (!is.na(DATA[,3])))
The second parameter of subset is a logical vector with same length of nrow(DATA), indicating whether to keep the corresponding row.
The na.omit functions can be an answer to you question
Subs1 <- na.omit(DATA[2:3])
[https://stat.ethz.ch/R-manual/R-patched/library/stats/html/na.fail.html]
Here an example.
a,b ,c are 3 vectors which a and b have a missing value.
once they are created i use cbind in order to bind them in one matrix which afterwards you can transform to data frame.
The final result is a dataframe where 2 out of 3 columns have a missing value.
So we need to keep only the rows with complete cases.DATA[complete.cases(DATA), ] is used in order to keep only these rows that have not missing values in every column. subset object is these rows that have complete cases.
a <- c(1,NA,2)
b <- c(NA,1,2)
c <- c(1,2,3)
DATA <- as.data.frame(cbind(a,b,c))
subset <- DATA[complete.cases(DATA), ]
Looking at a Data Frame like so:
set.seed(3)
Data1<-rnorm(20, mean=20)
Dir_1<-rnorm(20,mean=2)
Data2<-rnorm(20, mean=21)
Dir_2<-rnorm(20,mean=2)
Data3<-rnorm(20, mean=22)
Dir_3<-rnorm(20,mean=2)
Data4<-rnorm(20, mean=19)
Dir_4<-rnorm(20,mean=2)
Data5<-rnorm(20, mean=20)
Dir_5<-rnorm(20,mean=2)
Data6<-rnorm(20, mean=23)
Dir_6<-rnorm(20,mean=2)
Data7<-rnorm(20, mean=21)
Dir_7<-rnorm(20,mean=2)
Data8<-rnorm(20, mean=25)
Dir_8<-rnorm(20,mean=2)
Index<-rnorm(20,mean=5)
DF<-data.frame(Data1,Dir_1,Data2,Dir_2,Data3,Dir_3,Data4,Dir_4,Data5,Dir_5,Data6,Dir_6,Data7,Dir_7,Data8,Dir_8,Index)
I end up with a data frame with two columns of data per observation (based on observation 1-8) and an index. Based on this index I would like to remove (or make NA) certain data observations.
As an example:
If the index is greater than 5, drop observation 8 (both Data and Dir) in that row
If the index is greater than 4, drop observations 7 and 8 in that row
If the index is greater than 3 and less then 3.5, drop 6,7,8 in that row
I was hoping to come up with a series of "if" statements that would let me drop columns for each row based on an index value.
Assuming what you want is not to "drop columns for a row" but put NAs into the proper columns for the rpecific row, you need to use a few index vectors and not a series of if statements:
DF[DF$Index>3 & DF$Index<3.5, (6*2-1):(8*2)] <- NA
DF[DF$Index>4, (7*2-1):(8*2)] <- NA
DF[DF$Index>5, (8*2-1):(8*2)] <- NA