How do I edit my data frame (multiply columns) in R? - r

I have a CSV file with 11 columns (but the first 8 I am ignoring for now), last 3 (9 - 11) are important. I am missing some data for column 9, and these cells show up as NA. But to fill in these cells, I can multiply column 11 by column 10.
I want to create a data frame where all of column 9 is filled in and save that as a new CSV file. I first tried to multiply the columns. This worked and I got the missing data from column 9. Then I tried to merge the new column 9 with the column 9 from my data frame but R just attached the 2 columns together.
I would like for the NA data that has been calculated to replace the data in the original data frame (so I end up with a full column 9). Plus, I would like to only multiply the columns with NA cells so that no original data is replaced. How to do that?
col_9 <- matrix(dat[,10] * dat[,11], ncol=1)
print(col_9)

You can use ifelse function:
dat[,9]=ifelse(is.na(dat[,9]), dat[,10]*dat[,11], dat[,9])
If the condition is TRUE (i.e. is.na(dat[,9])), the value will be replaced by the second argument (dat[,10]*dat[,11]), otherwise it is replaced by the third on (i.e. dat[,9], so the value is kept).

Related

Dividing two columns in a Dataframe and placing result in existing column, but referencing columns by Index rather than name

I have a dataframe with 21 columns, columns 4 on wards are pairs of values (numerator and denominator) I want to divide the two and place into the first column, i.e. i want column 4 to become the result of column 4 divided by column 5, then i want column 6 to be the result of column 6 divided by 7 and so on.
I know (or at least can find on google) how to do this easily enough with reference to the column names, but I would prefer not to use these and rather refer to the column index.
It can be done by dividing equal sized datasets. In the numerator, we have the columns starting from 4 till the one before the last column and in denominator, subset from 5th to the last column, update the results by assigning it to the numerator column index subset
df1[4:(ncol(df1)-1)] <- df1[4:(ncol(df1)-1)]/df1[5:ncol(df1)]
NOTE: Assuming the columns are numeric classs

Filling in values in a blank data frame

I have a data frame with a number of columns I read in, and now I want to add certain pieces only to certain columns.
For example, the variable periodicnumber exists in the dataframe called df and I want to give the first six rows the values 1 through 6. I thought code below would work but I get the error:
periodicnumber=seq(1,6)
df$periodicnumber=periodicnumber
Error in `$<-.data.frame`(`*tmp*`, "periodicnumber", value = 1:6) :
replacement has 6 rows, data has 0
As in, were this in Excel, I would write the numbers 1 through 6 only on the periodicnumber column.
If you only want to change the first six rows of df, you need to specify that in the assignment:
periodicnumber=seq(1,6)
df$periodicnumber[1:6]<-periodicnumber
More generally:
df$column[1:len(x)]<-x

R - How to get value from a column based on value from another column of same row

I have a data frame (df) with 8 columns and 1200 rows. Among those 8 columns I want to find the minimum value of column 7 and find the corresponding value of column 2 in that particular row where the minimum value of column 7 was found. Also column 2 holds characters so I want a character vector giving me its value.
I found the minimum of column 7 using
min_val <- min(as.numeric(df[, 7]), na.rm = TRUE)
Now how do I get the value from column 2 (variable name of column being 'column.2') corresponding to the row in which column 7 contains value of 'min_val' as calculated above?
This might be a trivial question but I am new to R so any help will be much appreciated.
Use which.min to get the minimum value index. Something like :
df[which.min(df[,7]),2]
Note that which.min only returns the first index of the minimum, so if you've got several rows with the same minimal value, you will only get the first one.
If you want to get all the minimum rows, you can use :
df[which(df[,7]==min(df[,7])), 2]
The same answer from juba, but using data.table package (his answer uses just the R base, without the need of loading any libraries).
# Load data.table
library(data.table)
# Get 2nd column's value correspondent to the first minimum value in 7th column
df[which.min(V7), V2]
# Get all respective values in 2nd column correspondent to the minimum value in 7th column
df[V2 == min(V7), V2]
For handling data.frame-like objects, data.table is quite handly and helpful, just like the dplyr package. It's worth to look at them.
Here I've assumed your colnames were named as V1..V8. Otherwise, just replace the V7/V2 with the respective column names in 7th and 2nd position of your data, respectively.

Remove Data in a data frame based on an index column

Looking at a Data Frame like so:
set.seed(3)
Data1<-rnorm(20, mean=20)
Dir_1<-rnorm(20,mean=2)
Data2<-rnorm(20, mean=21)
Dir_2<-rnorm(20,mean=2)
Data3<-rnorm(20, mean=22)
Dir_3<-rnorm(20,mean=2)
Data4<-rnorm(20, mean=19)
Dir_4<-rnorm(20,mean=2)
Data5<-rnorm(20, mean=20)
Dir_5<-rnorm(20,mean=2)
Data6<-rnorm(20, mean=23)
Dir_6<-rnorm(20,mean=2)
Data7<-rnorm(20, mean=21)
Dir_7<-rnorm(20,mean=2)
Data8<-rnorm(20, mean=25)
Dir_8<-rnorm(20,mean=2)
Index<-rnorm(20,mean=5)
DF<-data.frame(Data1,Dir_1,Data2,Dir_2,Data3,Dir_3,Data4,Dir_4,Data5,Dir_5,Data6,Dir_6,Data7,Dir_7,Data8,Dir_8,Index)
I end up with a data frame with two columns of data per observation (based on observation 1-8) and an index. Based on this index I would like to remove (or make NA) certain data observations.
As an example:
If the index is greater than 5, drop observation 8 (both Data and Dir) in that row
If the index is greater than 4, drop observations 7 and 8 in that row
If the index is greater than 3 and less then 3.5, drop 6,7,8 in that row
I was hoping to come up with a series of "if" statements that would let me drop columns for each row based on an index value.
Assuming what you want is not to "drop columns for a row" but put NAs into the proper columns for the rpecific row, you need to use a few index vectors and not a series of if statements:
DF[DF$Index>3 & DF$Index<3.5, (6*2-1):(8*2)] <- NA
DF[DF$Index>4, (7*2-1):(8*2)] <- NA
DF[DF$Index>5, (8*2-1):(8*2)] <- NA

Change values in row based on a column value r

I am new to R with a fairly simple question, I just can't figure out the answer. For my example I will use a data frame with 3 columns, but my actual data set is 139 columns with 10000 rows.
I want to replace all of the values in a given row with NA if the value in the same row in column C contains a value < 10.
Assume that all of my columns are either number or integer values.
so I want to take the data frame:
x=data.frame(c(5,9,2),c(3,4,6),c(12,9,11))
names(x)=c("A","B","C")
and replace row 2 with NA to create
y=data.frame(c(5,"NA",2),c(3,"NA",6),c(12,"NA",11))
names(y)=c("A","B","C")
Thanks!
how about:
x[x$C <10 ,] <- NA

Resources