R How to get only line numbers from output? - r

this is my code and I would like to get only line number 2174 as output.
Note that the first output row will be always disregarded, so I just care about the 2nd and just to see the number of that line, in this case: 2174
e[which(e$obs_pval==min(e$obs_pval)),]
snp obs_pval
1 1.852962e-07 1.852962e-07
2174 4.971520e+07 1.852962e-07

Your min call results in multiple rows sharing the minimum value, which is why more than one row is displayed.
Do you always just want the last row if there are multiple values that match your min call? If so, then you can wrap it in tail() :
tail(e[which(e$obs_pval == min(e$obs_pval)),], 1)
To just get the index:
tail(which(e$obs_pval == min(e$obs_pval)), 1)
or:
which(e$obs_pval == min(e$obs_pval))[length(which(e$obs_pval == min(e$obs_pval)))]

Related

Creating column based on values of other columns in R

I am trying to create a new column ($Correct) in a data frame based on values in two other columns ($Condition and $Response).
I realise that there are multiple ways of achieving this (I have since used another method), but I'm interested in the reason why the method below did not work.
training_data.df$Correct<- 0
training_data.df$Correct[training_data.df$Condition==2 & training_data.df$Response==1] <- 1
training_data.df$Correct[(training_data.df$Condition==1|3) & training_data.df$Response==2] <- 1
This method produces the correct values in the output (the new $Correct column), except for cases where $Condition==2 and $Response==2 (the value '1' prints in the $Correct column rather than '0').
This line of code works correctly on its own, but not in combination with the other (last) line for $Condition==1|3.
Can anyone explain why this occurs?
training_data.df$Condition==1|3
reads as:
"(training_data.df$Condition is equal to 1)"
or
"three".
"(training_data.df$Condition is equal to 1)" can be TRUE or FALSE.
"three" not so much.
Whereas what I think you mean is:
"training_data.df$Condition is equal to (either 1 or 3)".
This would be (training_data.df$Condition==1 | training_data.df$Condition==3) or training_data.df$Condition %in% c(1,3).

R programming- adding column in dataset error

cv.uk.df$new.d[2:nrow(cv.uk.df)] <- tail(cv.uk.df$deaths, -1) - head(cv.uk.df$deaths, -1) # this line of code works
I wanted to know why do we -1 in the tail and -1 in head to create this new column.
I made an effort to understand by removing the -1 and "R"(The code is in R studio) throws me this error.
Could anyone shed some light on this? I can't explain how much I would appreciate it.
Look at what is being done. On the left-hand side of the assignment operator, we have:
cv.uk.df$new.d[2:nrow(cv.uk.df)] <-
Let's pick this apart.
cv.uk.df # This is the data.frame
$new.d # a new column to assign or a column to reassign
[2:nrow(cv.uk.df)] # the rows which we are going to assign
Specifically, this line of code will assign a new value all rows of this column except the first. Why would we want to do that? We don't have your data, but from your example, it looks like you want to calculate the change from one line to the next. That calculation is invalid for the first row (no previous row).
Now let's look at the right-hand side.
<- tail(cv.uk.df$deaths, -1) - head(cv.uk.df$deaths, -1)
The cv.uk.df$deaths column has the same number of rows as the data.frame. R gets grouchy when the numbers of elements don't follow sum rules. For data.frames, the right-hand side needs to have the same number of elements, or a number that can be recycled a whole-number of times. For example, if you have 10 rows, you need to have a replacement of 10 values. Or you can have 5 values that R will recycle.
If your data.frame has 100 rows, only 99 are being replaced in this operation. You cannot feed 100 values into an operation that expects 99. We need to trim the data. Let's look at what is happening. The tail() function has the usage tail(x, n), where it returns the last n values of x. If n is a negative integer, tail() returns all values but the first n. The head() function works similarly.
tail(cv.uk.df$deaths, -1) # This returns all values but the first
head(cv.uk.df$deaths, -1) # This returns all values but the last
This makes sense for your calculation. You cannot subtract the number of deaths in the row before the first row from the number in the first row, nor can you subtract the number of deaths in the last row from the number in the row after the last row. There are more intuitive ways to do this thing using functions from other packages, but this gets the job done.

In R: How do I increment a column value based on multiples of a certain value in the adjacent column

I'm quite new to R, unfortunately I wasn't able to find help in other related questions so far.
I have this dataframe called selection, including column 'RUN' and column 'TRNO'.
It originally had 9 columns. I added the column 'RUN' which contains a count that increases by 1 whenever the value in the column 'DAP' is 0, using this code:
# Insert column RUN in "selection" dataframe
library(dplyr)
selection$RUN <- cumsum(selection$DAP == 0)
That worked perfectly. Now I would like to do a similar operation for the column 'TRNO'. It also needs to contain a count that this time only increases when the column 'RUN' arrives at multiples of 80 (i.e. from RUN == 1-80 --> count =1; RUN == 81-160 --> count =2,...)
I tried several codes, amongst others this one:
# Insert column TRNO in "selection" dataframe
i = 0
repeat{
i = i+80
selection$TRNO <- cumsum(selection$RUN == i)
break
}
Instead of increasing the count at every multiple of 80, it returns "0" when RUN values are between 1-80, increases to 92 when RUN values are at 80, and then stagnates at 92 for all the higher values in RUN.
try this:
selection$TRONO <- ceiling(selection$RUN/80)

Using logical functions and rowSums together

I am trying to understand an R code I have inherited (see below).
sel <- which(rowSums(m3T3L1mRNA.tmp[,c(2,4)] == 20) != 2)
The output of this code essentially excludes all rows from this table (there are thousands of rows, only the first 5 have been shown) that have the value 20 (which in this table equates to NAs).
The code works fine, but I am having trouble interpreting the code. As I understand the code is asking to get the rowSum of rows that contain a value of "20" at columns 2 and 4 (which is 40) and select ones that do not sum up to 2.
Where does the value 2 come from? Shouldn't it be as below for the code to work as I think it should?
sel <- which(rowSums(m3T3L1mRNA.tmp[,c(2,4)] == 20) != 40)

Grouping price ranges

I am trying to group some price ranges from an .ods file, but have no idea how to do that.
e.g. I have a column with different prices like this:
11,61
6,15
13,68
7,69
6,00
What I want is to tell Calc to group everything from 0,00~10,99 and output text 0-10 and everything from 11,00~20,00 and output text 11-20, so the final output would be:
col1 col2
11,61 11-20
6,15 0-10
13,68 11-20
7,69 0-10
6,00 0-10
You can use the functions ROUNDDOWN() and ROUNDUP() with a negative count to get the next multiple of 10 (-1), 100 (-2) or 1000 (-3). It reduces the accuracy of a certain value by squares of 10. So, rounding to the previous or next multiple of 10 is done using:
=ROUNDDOWN(<yourvalue>; -1)
and
=ROUNDUP(<yourvalue>; -1)
respectively (take care to adapt the formula argument separators to commata (,) if this is required by the i18y your're using).
So, =ROUNDDOWN(11,61; -1) will result in 10, and =ROUNDUP(11,61; -1) will give you 20. This way, you can "calculate" the appropriate group for each value (example for value in A1):
=CONCATENATE(ROUNDDOWN($A1; -1)+1;"-";ROUNDUP($A1;-1))
To split it up on multiple lines:
=CONCATENATE( # Result will be a concatenated string
ROUNDDOWN($A1;-1)+1; # first value: previous multiple of 10, +1;
"-"; # second value: literal "-"
ROUNDUP($A1;-1) # third value: next multiple of 10
)
With your example data, this results in:
EDIT:
For a grouping 0-9, 9-19 and so on, the following formula should work:
=CONCATENATE(ABS(ROUNDDOWN($A2+1; -1)-1);"-";ROUNDUP($A2+1,01;-1)-1)
EDIT2:
For a solution using the IF() function, you could use:
=IF(A2 < 9;"0-9";IF(A2 < 19; "9-19";IF(A2 < 29; "19-29";"more than 29")))
For grouping of values greater than 29, you will have to add according IF clauses replacing the string "more than 29" by additional checks. Every grouping range will require its own IF clause.

Resources