I have a dataframe as folowing:
base<-matrix(1:20,nrow=10)
base1<-matrix(rnorm(180),nrow=10)
base2<-cbind(base,base1)
What i need is to change a part of each row for NAs, based on the first 2 columns (the numbers in each column show wich columns of that row need to be changed to NA. So the first row would be something like this:
base2[1,1:11]<-NA
base2[1,]
This works for 1 row, but my real dataframe has over 100.000 rows. Any idea on how to do this fast??
thanks!
Related
I have a dataframe where the rows are the names of different genes, with 2 columns called: Control_mean and Patient_mean.
I want to create a third column where I store the value of "Patient_mean - Control_mean" for each row respectively but I cant figure out how!
I tried to do so using this:
for(i in 1:nrow(newdf8)){
newdf8$log2FC[i] <- (newdf8[,2] - newdf8[,1])
}
but it didnt work, since all the values in the new column became the same number, and not the value of the actual difference.
With a dataframe that has, for example, one column x that has nested or multiple values for some rows, how would i, for those rows that have multiple values for x, append duplicate rows to the dataframe, save that that they correspond to one value within x.
To try to explain better, see "mock dataframe pre-transform", below. Row 1 has values "webui, cli, mobile" for column "module", and what i want is to append three near copies of row 1 to the dataframe, one with module value "webui", one with module value "cli" and one with module value "mobile". I also then want to remove the the original row 1. A similar operation would occur for row 4, such that the final dataframe would have 7 rows (see "mock dataframe post-transform, below).
mock dataframe pre-transform
mock dataframe post-transform
I have a dataframe with 21 columns, columns 4 on wards are pairs of values (numerator and denominator) I want to divide the two and place into the first column, i.e. i want column 4 to become the result of column 4 divided by column 5, then i want column 6 to be the result of column 6 divided by 7 and so on.
I know (or at least can find on google) how to do this easily enough with reference to the column names, but I would prefer not to use these and rather refer to the column index.
It can be done by dividing equal sized datasets. In the numerator, we have the columns starting from 4 till the one before the last column and in denominator, subset from 5th to the last column, update the results by assigning it to the numerator column index subset
df1[4:(ncol(df1)-1)] <- df1[4:(ncol(df1)-1)]/df1[5:ncol(df1)]
NOTE: Assuming the columns are numeric classs
I would like to divide every number in all columns by 1000. I would like to omit the row header and the 1st column from this function.
I have tried this code:
TEST2=(TEST[2:503,]/(1000))
But it is not what I am looking for. My dataframe has 503 columns.
Is TEST a dataframe? In that case, the row header won't be divided by 1000. To choose all columns except the first, use an index in j to select all columns but the first? e.g.
TEST[, 2:ncol(TEST)]/1000 # selects every row and 2nd to last columns
# same thing
TEST[, -1]/1000 # selects every row and every but the 1st column
Or you can select columns by name, etc (you select columns just like how you are selecting rows at the moment).
Probably take a look at ?'[' to learn how to select particular rows and columns.
I need to extract the columns from a dataset without header names.
I have a ~10000 x 3 data set and I need to plot the first column against the second two.
I know how to do it when the columns have names ~ plot(data$V1, data$V2) but in this case they do not. How do I access each column individually when they do not have names?
Thanks
Why not give them sensible names?
names(data)=c("This","That","Other")
plot(data$This,data$That)
That's a better solution than using the column number, since names are meaningful and if your data changes to have a different number of columns your code may break in several places. Give your data the correct names and as long as you always refer to data$This then your code will work.
I usually select columns by their position in the matrix/data frame.
e.g.
dataset[,4] to select the 4th column.
The 1st number in brackets refers to rows, the second to columns. Here, I didn't use a "1st number" so all rows of column 4 are selected, i.e., the whole column.
This is easy to remember since it stems from matrix calculations. E.g., a 4x3 dimensional matrix has 4 rows and 3 columns. Thus when I want to select the 1st row of the third column, I could do something like matrix[1,3]