compute character position over a data.frame - r

How do you change values on a row, in function of the position of a specific character ?
I want to replace, by row, all NA values by 0 that are BEFORE S on the line. After this specific character S, NAs on the row has to be keeped.
S is the marker of the end of data by row.
Before S: NA should be values (in fact zero values !!).
After S: NA stays NA, no values at all.
An example of data frame is available here dataframe.txt
I've tried this loop
for (i in 1:length(df)) {
x <- pos = 's' ; y <- pos = i if (y < x) { if (y == "NA"){ replace(y,0) } }
}
Maybe with the which function ...
Thanks for your ideas on that !!
Alex,

This code will replace all NAs before "S" with 0 in your vector:
initial_row <- c(1,2,4,NA,4,NA,2,"S",NA,NA,NA)
result_row <- initial_row
result_row[is.na(result_row[1:which(result_row == "S")[1]])] <- 0
Explanation: First we copied the initial row into the result row that we will do the work on. Then we selected the NAs in the result row that are between position 1 and the position of the "S". Those values get replaced with zero.
Important assumptions:
The vector is at least length 2.
The vector contains an "S"
Loop version
If you insist on using a loop to do this (will run slower), you can do this:
for(i in 1:length(result_row)){
if(result_row[i] == "S"){
break
}
if(is.na(result_row[i])){
result_row[i] <- 0
}
}
Edit: If you have characters "NA" in your vector instead of NA (which R recognizes as a missing element) this code will need to be modified as such:
result_row[(result_row[1:which(result_row == "S")[1]]) == "NA"] <- 0
or
for(i in 1:length(result_row)){
if(result_row[i] == "S"){
break
}
if(result_row[i] == "NA"){
result_row[i] <- 0
}
}

Related

Using if & else if statements within a loop (for) for Conditional formatting

I have a matrix with 52 columns, 1290 rows that contains the the p values and coefficients of a loop I have successfully ran.
In this instance, I want to conditionally format one of the variable's coefficient (r) value (column 3) by appending asterisks to the end of the coefficient value, depending on the associated p value (column 19).
I have not been able to successfully append "***" when an exponential p value is present, I have also tried setting options(scipen=999) but to no avail. I have been able to isolate the 'e' present in the p value using str_sub() and stri_sub(), therefore I could use an if statement if I can get the ifs to work correctly within the loop:
# p<0.001 (append "***") # p<0.01 (append "**") # p<0.05 (append "*")
The following statement works correctly in isolation but fails when I add the below into a for() loop:
# Working asterisk append to coefficient value
if(as.numeric(cell_id_input) < 0.05) {
cell_id_out <- paste0(cell_id_out,"*", sep="")
}
Loop over each row in matrix 'dta' using columns specified (3 and 19 only) and append asterisk if meets following criteria:
for (j in 1:nrow(dta)){
cell_id_input <- dta[j,19] # Column that contains associated (p) values for 'cell_id_out'
cell_id_out <- dta[j,3] { # Column that contains the coefficient (r) value associated with 'cell_id_input'
if(as.numeric(cell_id_input) < 0.001) {
cell_id_out <- paste0(cell_id_out,"***", sep="")
} else if (as.numeric(cell_id_input) < 0.01) {
cell_id_out <- paste0(cell_id_out,"**", sep="")
} else if(as.numeric(cell_id_input) < 0.05) {
cell_id_out <- paste0(cell_id_out,"*", sep="")
} else {
cell_id_out <- paste(cell_id_out)
}
df=data.frame("a"=c(0.1,0.6,0.00000000001,0.000000002),
"b"=c(1,4,3,5),stringsAsFactors = FALSE)
df$a = as.character(df$a)
for (i in 1:nrow(df)){
if (as.numeric(df$a[i])<0.05){
df$a[i]=paste0(df$a[i],"*")
}
}
a b
1 0.1 1
2 0.6 4
3 1e-11* 3
4 2e-09* 5
This works for me. You can adapt it to your case. Your problem is that you are assigning the result to cell_id_out variable, not to the dta dataframe.

R: Error in if argument is of length zero

I have a vector like this:
x <- c(0.9,0.9,0,0,0.9,0,0.8)
I want to eliminate all the zeros and create a new vector from it, so I have created this if statement:
if (x[i] == 0) {
y <- x[-(i)]}
But I get the following error:
Error in if (x[i] == 0) { : argument is of length zero
Anyone has a solution?
Thanks in advance!
We don't need a for loop with if/else. It can be simply done with vectorization
y <- x[x != 0]
Create the logical vector with expression x != 0 , use that to subset (?Extract with square brackets) the original vector and assign the output vector to a variable with identifier 'y'

extract data one row below based on specific condition

have a very large data ~1GB and would like to extract summary data with such condition:
for loop:
if(a[i] == 999) then extract b[i+1]
else next
so that i can then table(b) to find the its distribution/composition, assuming column b is of class character, column a is of class integer
my R code:
summary123 <- data.frame()
j = 1
k = 1
for(i in 1:nrow(df1)){
if(df1$a[i] == 999 & i != nrow(df1)){
j = i + 1
summary123[k,1] <- df1$b[j]
k = k + 1
}
else{
next
}
}
however it is taking a long time, would like faster R-code equivalent
Use lead from dplyr:
output=lead(df1$b,1)[df1$a==999]
Then the answer you are looking for is:
output[-1]
(basically removing the last element, which is a NA introduced by the lead function)

R: Simple Function with For Loop

I have an elementary question that I sadly cannot figure out. I have a set of numeric vector of 1s and 0s that are stored in the return variable below and whose sums are stored in the totals variable. I would like to check each of these individual vectors to see if there were consecutive zeroes in the result, and then return the total number of times this occurred. However, I'm quite rusty and/or bad at for loops/functions and cannot get this result. My latest attempt is below. Any suggestions are welcome - appreciate the help.
set.seed(1)
return = ifelse(runif(10) <= 0.6, 1, 0)
totals = sapply(1:10, function (x) sum(ifelse(runif(10)<=0.6,1,0)))
sums = function (x) {
g = 0
for (i in 1:length(x)-1) {
sum(ifelse (x[i]+x[i+1]=0,1,0))
}
return (g)
}
Although this is not the most efficient way to do so (see akrun's answer), we can get your for loop to work:
sums=function (x)
{
g=0
# watch your brackets! 1:3-1 returns c(0,1,2), not c(1,2)!
for (i in 1:length(x)-1)
{
# To test for equality, use a double ==, rather than a single.
# also, your 'g' variable is not updated, which is what you want to do.
sum(ifelse (x[i]+x[i+1]=0,1,0))
}
return (g)
}
Corrected:
sums <-function(x)
{
g=0
for (i in 1:(length(x)-1))
{
g= g+ifelse(x[i]+x[i+1]==0,1,0)
}
return (g)
}
You can call your function by:
return=ifelse(runif(10)<=0.6,1,0)
sums(return)
Or to generate ten vectors with random 1's and 0's, and apply your function to them, you could do:
totals = lapply(1:10, function (x) ifelse(runif(10) <= 0.6, 1, 0))
sapply(totals,sums)
Hope this helps!
If we are looking for the number of times consecutive 0's occur (i.e. greater than 1) and its length, then use rle
with(rle(return), lengths[values==0 & lengths > 1])
#[1] 4
The return vector is
return
#[1] 1 1 1 0 1 0 0 0 0 1
Now, we can see the 4 consecutive number of 0's. Just to show that the answer matches the initial vector
A for loop (incorrect answer just for the sake of answering)
sums <- function (x) {
g <- 0
for (i in tail(seq_along(x), -1)) {
if(x[i-1]==0 & x[i]==0) {
g <- g+1
}
}
g
}
sums(return)

Check if something is in each row (row length>1)

Basically I have a matrix and row with a in it I want to append a "1" to a list, otherwise append a "0"
The code is as follows:
is.there.A <- function(a,b,c,d,e) {
library(combinat)
x <- c(a,b,c,d,e)
matrix <- matrix(combn(x,3), ncol=3, byrow=T)
row <- nrow(matrix)
list <- list()
for (i in seq(row)) {
if (matrix[i,] %in% "A") {c(list, "1")}
else {c(list, "0")}
print(list)
}
}
But it doesn't work and this shows up.
Warning messages:
1: In if (matrix[i, ] %in% "A") { :
the condition has length > 1 and only the first element will be used
The question is how to overcome this to achieve the objective
You can avoid your explicit loop by using apply
is.there.A <- function(a,b,s,d,e) {
library(combinat)
x <- c(a,b,s,d,e)
.matrix <- matrix(combn(x,3), ncol=3, byrow=T)
any_A <- apply(.matrix, 1, `%in%`, x = 'A')
as.list(as.numeric(any_A))
}
Never grow an object within a for loop, pre-allocate then fill.
Avoid naming objects with function names (eg c or matrix orlist)
You meant to test for "A" %in% matrix[i,], not the other way around. However, note that
row <- nrow(matrix)
list <- list()
for (i in seq(row)) {
if ("A" %in% matrix[i,]) {c(list, "1")}
else {c(list, "0")}
}
can be rewritten
rowSums(matrix == "A") > 0
It returns a vector of logicals (TRUE/FALSE) which is the most appropriate output for your function. However, if you really need a list of '1' or '0', you can wrap it as follows:
as.list(ifelse(rowSums(matrix == "A") > 0, "1", "0"))
Also note that it is a bad idea to name an object matrix since it is also the name of a function in R.

Resources