Remove Data in a data frame based on an index column - r

Looking at a Data Frame like so:
set.seed(3)
Data1<-rnorm(20, mean=20)
Dir_1<-rnorm(20,mean=2)
Data2<-rnorm(20, mean=21)
Dir_2<-rnorm(20,mean=2)
Data3<-rnorm(20, mean=22)
Dir_3<-rnorm(20,mean=2)
Data4<-rnorm(20, mean=19)
Dir_4<-rnorm(20,mean=2)
Data5<-rnorm(20, mean=20)
Dir_5<-rnorm(20,mean=2)
Data6<-rnorm(20, mean=23)
Dir_6<-rnorm(20,mean=2)
Data7<-rnorm(20, mean=21)
Dir_7<-rnorm(20,mean=2)
Data8<-rnorm(20, mean=25)
Dir_8<-rnorm(20,mean=2)
Index<-rnorm(20,mean=5)
DF<-data.frame(Data1,Dir_1,Data2,Dir_2,Data3,Dir_3,Data4,Dir_4,Data5,Dir_5,Data6,Dir_6,Data7,Dir_7,Data8,Dir_8,Index)
I end up with a data frame with two columns of data per observation (based on observation 1-8) and an index. Based on this index I would like to remove (or make NA) certain data observations.
As an example:
If the index is greater than 5, drop observation 8 (both Data and Dir) in that row
If the index is greater than 4, drop observations 7 and 8 in that row
If the index is greater than 3 and less then 3.5, drop 6,7,8 in that row
I was hoping to come up with a series of "if" statements that would let me drop columns for each row based on an index value.

Assuming what you want is not to "drop columns for a row" but put NAs into the proper columns for the rpecific row, you need to use a few index vectors and not a series of if statements:
DF[DF$Index>3 & DF$Index<3.5, (6*2-1):(8*2)] <- NA
DF[DF$Index>4, (7*2-1):(8*2)] <- NA
DF[DF$Index>5, (8*2-1):(8*2)] <- NA

Related

How do I edit my data frame (multiply columns) in R?

I have a CSV file with 11 columns (but the first 8 I am ignoring for now), last 3 (9 - 11) are important. I am missing some data for column 9, and these cells show up as NA. But to fill in these cells, I can multiply column 11 by column 10.
I want to create a data frame where all of column 9 is filled in and save that as a new CSV file. I first tried to multiply the columns. This worked and I got the missing data from column 9. Then I tried to merge the new column 9 with the column 9 from my data frame but R just attached the 2 columns together.
I would like for the NA data that has been calculated to replace the data in the original data frame (so I end up with a full column 9). Plus, I would like to only multiply the columns with NA cells so that no original data is replaced. How to do that?
col_9 <- matrix(dat[,10] * dat[,11], ncol=1)
print(col_9)
You can use ifelse function:
dat[,9]=ifelse(is.na(dat[,9]), dat[,10]*dat[,11], dat[,9])
If the condition is TRUE (i.e. is.na(dat[,9])), the value will be replaced by the second argument (dat[,10]*dat[,11]), otherwise it is replaced by the third on (i.e. dat[,9], so the value is kept).

Dividing two columns in a Dataframe and placing result in existing column, but referencing columns by Index rather than name

I have a dataframe with 21 columns, columns 4 on wards are pairs of values (numerator and denominator) I want to divide the two and place into the first column, i.e. i want column 4 to become the result of column 4 divided by column 5, then i want column 6 to be the result of column 6 divided by 7 and so on.
I know (or at least can find on google) how to do this easily enough with reference to the column names, but I would prefer not to use these and rather refer to the column index.
It can be done by dividing equal sized datasets. In the numerator, we have the columns starting from 4 till the one before the last column and in denominator, subset from 5th to the last column, update the results by assigning it to the numerator column index subset
df1[4:(ncol(df1)-1)] <- df1[4:(ncol(df1)-1)]/df1[5:ncol(df1)]
NOTE: Assuming the columns are numeric classs

Insert all missing rows into data table for a range of values for 2 columns

I am interested in inserting all missing rows into a data table for a new range of values for 2 columns.
Example, dt1[,a] has some values from 1 to 5, as does dt1[,b], but i'd like not only all pair wise combinations to be present in columns a and b, but all combinations to be present in a newly defined range, e.g. 1 to 7 instead.
# Example data.table
dt1 <- data.table(a=c(1,1,1,1,2,2,2,2,3,3,3,4,4,4,4,4,5,5,5),
b=c(1,3,4,5,1,2,3,4,1,2,3,1,2,3,4,5,3,4,5),
c=sample(1:10,19,replace=T))
setkey(dt1,a,b)
# CJ in data.table will create all rows to ensure all
# pair wise combinations are present (using the nominated columns).
dt1[CJ(a,b,unique=T)]
The above is great but will only use the max and min in the nominated columns. I'd like the inserted rows to give me all combinations between a new, nominated range, e.g. 1 to 7. There would be 49 rows.
# the following is a temporary workaround
template <- data.table(a1=rep(1:7,each=7),b1=rep(1:7,7))
setkey(template,a1,b1)
full <- dt1[template]
Instead of the already existing values in 'a' column, we can have a range of values to pass into 'CJ' for the 'a'
dt1[CJ(a = 1:7, b, unique = TRUE)]

Select only rows if the value in a particular set of columns is 'NA' in R

I have a data frame with many rows and columns in it (3000x37) and I want to be able to select only rows that may have >= 2 columns of value "NA". These columns have data of different data types. I know how to do this in case I want to select only one column via:
df[is.na(df$col.name), ]
How to make this selection if I want to select two (or more) columns?
First create a vector nn with the of the number of NA's in each row and then select only those rows with >= 2 NA's d[nn>=2,]
d = data.frame(x=c(NA,1,2,3), y=c(NA,"a",NA,"c"))
nn = apply(d, 1, FUN=function (x) {sum(is.na(x))})
d[nn>=2,]
x y
1 NA <NA>

Change values in row based on a column value r

I am new to R with a fairly simple question, I just can't figure out the answer. For my example I will use a data frame with 3 columns, but my actual data set is 139 columns with 10000 rows.
I want to replace all of the values in a given row with NA if the value in the same row in column C contains a value < 10.
Assume that all of my columns are either number or integer values.
so I want to take the data frame:
x=data.frame(c(5,9,2),c(3,4,6),c(12,9,11))
names(x)=c("A","B","C")
and replace row 2 with NA to create
y=data.frame(c(5,"NA",2),c(3,"NA",6),c(12,"NA",11))
names(y)=c("A","B","C")
Thanks!
how about:
x[x$C <10 ,] <- NA

Resources