I have a dataframe which has 15 columns. All values are numeric.
I have a vector having numeric values ranging from 1 to 15. Lets say x = c( 5,7,2,8,13,5,6...).
From each row in the dataframe, I need to get a value from a column, such that column corresponds to the vector value.
For example, using vector x, from the first row pull the 5th value, from 2nd row pull 7th, then for 3rd row the 2nd column etc..
PS: I'm nowhere in this
For any one interested:
data[ cbind(1:nrow(data), x) ]
Where data is our data frame with 15 columns
Related
I have a 93060-by-141 matrix file with filled values.
I need to assign zeros to some rows and columns with the condition: Row 1: 65 of column 1; row 67:131 of column 2; row 133:197 of column 3 and so on.
This condition excludes for some column, i.e the values in rows of column 10,20,36 are unchanged.
I think I will need a For-loop. But I have no idea how to code for the link of rows and columns expressing the mentioned condition.
You can change multiple values by providing a matrix of indexes to a matrix. The indices correspond with the row (first column) and column (second column) numbers of the cells. For example doing mymat[cbind(1:10, 1)] <- 0 would change the first through tenth rows of column 1 to zero. In you case, you could put together several such cbind() statements with a call to rbind(). For example, mymat[rbind(cbind(1:5, 1), cbind(6:10, 2))] <- 0 would change the first through fifth rows of column 1 and the sixth through tenth rows of column 2 to zero. In the example you proposed above, it would be something like this:
my_mat[rbind(
cbind(1:65, 1),
cbind(67:131, 2),
cbind(133:197,3))] <- 0
I have a dataframe with 21 columns, columns 4 on wards are pairs of values (numerator and denominator) I want to divide the two and place into the first column, i.e. i want column 4 to become the result of column 4 divided by column 5, then i want column 6 to be the result of column 6 divided by 7 and so on.
I know (or at least can find on google) how to do this easily enough with reference to the column names, but I would prefer not to use these and rather refer to the column index.
It can be done by dividing equal sized datasets. In the numerator, we have the columns starting from 4 till the one before the last column and in denominator, subset from 5th to the last column, update the results by assigning it to the numerator column index subset
df1[4:(ncol(df1)-1)] <- df1[4:(ncol(df1)-1)]/df1[5:ncol(df1)]
NOTE: Assuming the columns are numeric classs
I have a data frame with three column, two of which are character and the third is numeric. How do I find the maximum value of the numerical column while getting all the rest of the information from the row?
so far I have:
apply(dataframe, 2, max)
We can use which.max to get the numeric index of the third column, subset the rows by using that as row index.
df[which.max(df[,3]),]
If there are ties, we can compare (==) the elements of the third column with the max value of that column to give a logical index which can as well be used as the row index.
df[df[,3]==max(df[,3]),]
I know how to add a column that is, say, the sum of two other columns, but I'm looking for a way to make a new column that equals the sum of a subset of rows in another column.
For example, I have a table, "table.1" and the third column "table.1[3]" consists of numbers. I want to add a fourth column such that the 1st row of column 4 = the sum of the values in column 3 from row 1 to 100; the 2nd row = sum of column 3 from row 2 to 101, and so on.
Essentially, at row x, I want table.1[x, 4]=sum(table.1[x:x+99, 3])
Anyone know how I can add a column like that? Thanks.
One way would be to use the embed() command. In this example your "column 3" is named "a" in the data.frame and i'm adding a column named "b" made up such that dd$b[x]<-sum(dd$a[x:(x+4-1)]) so i'm just using a distance of N=4 rather than N=100 for simplicity.
dd<-data.frame(a=c(1,3,6,4,2,4,6,7,8,1))
N<-4
dd$b<-rowSums(embed(c(dd$a, rep.int(0,N-1)), N))
Note that I padded the end of the dd$a vector so that when the range "goes off the end", I assume those values to be 0.
I have a dataframe with 23000 rows and 8 columns
I want to subset it using only unique identifiers that are in column 1. I do this by,
total_res2 <- unique(total_res['Entrez.ID']);
This produces 17,000 rows with only the information from column 1.
I am wondering how to extract the unique rows, based on this column and also take the information from the other 7 columns using only these unique rows.
This returns the rows of total_res containing the first occurrences of each Entrez.ID value:
subset(total_res, ! duplicated( Entrez.ID ) )
or did you mean you only want rows whose Entrez.ID is not duplicated:
subset(total_res, ave(seq_along(Entrez.ID), Entrez.ID, FUN = length) == 1 )
Next time please provide test data and expected output.