Calculate count of number of switch in vector - r

I have a vector in which i have to calculate how many times data switched from 0 to 100 and back to 0. An example is given as below.
Input
X1<-c(100,100,100,0,0,0,0,0,100,100,100,100,100,0,0,0,0,100,100,100,0,0,100,100)
So the output should be 3 as the value started at 0 stayed at 100 for the some time and back to 0. My requirements is to count how many times this switch has occurred. I am aware of rle but that only gives me the length.
Thanks in advance for the help.

This looks sufficient
sum(X1[-1] != X1[-length(X1)]) / 2
Assumptions are that
You only have two unique values in X1
The last element of X1 equals the first element, that is, it switches back to original state in the end.

You can do something like,
sum(diff(X1) == 100)
#[1] 3
#Or
min(sum(diff(X1) == 100), sum(diff(X1) == -100))
#[1] 3

You could run rle and then iterate through three elements of values at a time to see if the required condition has been met.
with(rle(X1),
sum(sapply(3:length(lengths), function(i)
values[i-2] == 0 & values[i-1] == 100 & values[i] == 0)))
#[1] 2

more generally for counting switches in n cases (numeric or character):
count_switches_groups <- function(seq.input){
COUNT <- 0
transition = rep("no switch",length(seq.input))
for (i in 2:length(seq.input)) {
if (seq.input[i] != seq.input[i - 1]) {
COUNT <- COUNT + 1
transition[i] <- paste0("from ",seq.input[i - 1]," to ",seq.input[i])
}
}
total_switches <- COUNT
state_transitions <- transition[transition != "no switch"]
occurances <- as.data.frame(table(state_transitions))
return_list <- list(total_switches,occurances)
names(return_list) <- c("total_transitions","unique_switches")
return(return_list)
}
count_switches_groups(X1)

sum((np.diff(x)==100)|(np.diff(x)==-100))
I think this would be the answer, worked for me

Related

Referencing Variable Row Number in R

I am new to R and would appreciate your help with the following question:
I have code that runs through all of the values (x) in a column of a dataset called m, comparing them one by one to a fixed value via a for loop. I'd like for x to be compared to my fixed value (0.17) ONLY IF the cell in m[(SAME ROW AS x), "reference_column_name"] contains a certain string.
The goal is to get at the end of m a column of values 0,1,2, or 3 based the comparison of x with a cell from the reference column with the same row number as x. Something like this:
new_column
0
2
2
3
1
1
2
0
3
How do I refer to the row of x (as my variable is changing as the for loop continues)?
With what can I replace "(SAME ROW AS x)"?
this is my code:
m$new_colum <- 0 #I start by assigning everything the value 0.
for (x in m$current_column) {
if ((grepl("string",((m[(SAME ROW AS x),"reference_column_name"])),fixed=TRUE))==TRUE){
if (is.na(x)){
m$new_column<-0
}
else if (x <= 0.17) {
m$new_column<-1}
else if (x > 0.17) {
m$new_column<-2}
}
else {m$new_column<-3}
}
I have changed all of the variable and column names to make reading this question easier - I am aware that names should be shorter.
Thanks for your help!
As per my understanding of your question, here is my solution:
m$new_column <- ifelse(grepl("string", m$ref_column), ifelse(is.na(m$x), 0, ifelse(m$x <= 0.17, 1, 2)), 3)
This code will first check for the string in the reference column at the same row. If it doesn't find it will equate to 3. If it finds it, it will go further into the 2nd ifelse block.
- In this block, it will first check for NA and assign a 0, else it goes into the 3rd ifelse block where it finally checks for your "x" column to have the value of 0.17 or less and assign 1 else 2.
Hope this helps
Could use a series of properly indexed assignments:
dat <- data.frame( x=runif(20), ref_col=sample( c("string", "not string"), 20, repl=TRUE) )
dat$new_col[dat$x > 0.17 & dat$ref_col=="string"] <- 2
dat$new_col[dat$x <= 0.17 & dat$ref_col=="string"] <- 1
dat$new_col[ is.na(dat$x)] <- 0
dat$new_col[ dat$ref_col != "string"] <- 3
dat
Didn't have any NA's in my x's but I predict they would have been properly assigned

How to drop a buffer of rows in a data frame around rows of a certain condition

I am trying to remove rows in a data frame that are within x rows after rows meeting a certain condition.
I have a data frame with a response variable, a measurement type that represents the condition, and time. Here's a mock data set:
data <- data.frame(rlnorm(45,0,1),
c(rep(1,15),rep(2,15),rep(1,15)),
seq(
from=as.POSIXct("2012-1-1 0:00", tz="EST"),
to=as.POSIXct("2012-1-1 0:44", tz="EST"),
by="min"))
names(data) <- c('Variable','Type','Time')
In this mock case, I want to delete the first 5 rows in condition 1 after condition 2 occurs.
The way I thought about solving this problem was to generate a separate vector that determines the distance that each observation that is a 1 is from the last 2. Here's the code I wrote:
dist = vector()
for(i in 1:nrow(data)) {
if(data$Type[i] != 1) dist[i] <- 0
else {
position = i
tempcount = 0
while(position > 0 && data$Type[position] == 1){
position = position - 1
tempcount = tempcount + 1
}
dist[i] = tempcount
}
}
This code will do the trick, but it's extremely inefficient. I was wondering if anyone had some cleverer, faster solutions.
If I understand you correctly, this should do the trick:
criteria1 = which(data$Type[2:nrow(data)] == 2 & data$Type[2:nrow(data)] != data$Type[1:nrow(data)-1]) +1
criteria2 = as.vector(sapply(criteria1,function(x) seq(x,x+5)))
data[-criteria2,]
How it works:
criteria1 contains indices where Type==2, but the previous row is not the same type. The strange lookign subsets like 2:nrow(data) are because we want to compare to the previous row, but for the first row there is no previous row. herefore we add +1 at then end.
criteria2 contains sequences starting with the number in criteria1, to those numbers+5
the third row performs the subset
This might need small modification, I wasn't exactly clear what criteria 1 and criteria 2 were from your code. Let me know if this works or you need any more advice!

Invert sign of even numbered rows in r dataframe

I have a data frame with 10 items and I want to negate the even numbered rows. I came up with this monstrosity:
change_even <- data.frame(val=runif(10))
change_even$val[row( as.matrix(change_even[,'val']) ) %% 2 == 0 ] <- -change_even$val[row( as.matrix(change_even[,'val']) ) %% 2 == 0 ]
is there a better way?
Simply you can use recycling:
change_even$val*c(1,-1)
#[1] 0.1077468 -0.5418167 0.8319609 -0.7230043 0.6649786 -0.7232669
#[7] 0.2677659 -0.4035824 0.6880934 -0.5600653
(values are not reproducible since seed was not set; however the alternating sign can be seen clearly).
You can simply do,
change_even[c(FALSE,TRUE),] <- change_even[c(FALSE,TRUE),]*(-1)
With a data.table, you can get similar with data.frame. Similar to here Selecting multiple odd or even columns/rows for dataframe in R
library(data.table)
change_even <- data.table(val=runif(10))
even_indexes<-seq(2,nrow(change_even),2)
change_even <- change_even[even_indexes,val:=val*-1]
Use the remainder operator to find the even numbered rows, then simply negate
change_even <- data.frame(val=runif(10))
change_even[seq(nrow(change_even)) %% 2 != 1,] = -change_even[seq(nrow(change_even)) %% 2 != 1,]
This is what I came up with:
change_even$val = change_even$val * c(rep(-1,nrow(change_even))^((row(change_even)+1)))
Another one:
(-1)^(0:(nrow(change_even)-1))*change_even$val

Operations on elements of column vectors

I have a column vector containing 1's. I also have another numeric column containing numbers.
Example:
day_eq day
1 1
1 5
1 3
1 2
I now want to say:
If an element from day is smaller than its corresponding element in day_eq,
make invalid (a column vector element) = 5.
This is my code:
for (i in 1:nrow(setin)){
if (setin[[i,"day"]]<setin[[i,"day_eq"]]){
setin[[i,"valid"]] = 0
setin[[i,"invalid_code"]] = 5
}
}
It isn't working. It keeps saying:
Error in if (setin[[i, "day"]] < setin[[i, "day_eq"]]) { :
missing value where TRUE/FALSE needed
or
In if (test.ID1$day_eq > test.ID1$day) { :
the condition has length > 1 and only the first element will be used
Where test.ID1 is the set name.
You don't need a loop for that. I'm not sure exactly what you are doing... but ifelse should be able to help you...
setin$valid <- ifelse(setin$day < setin$day_eq, 0, NA)
setin$invalid_code <- ifelse(setin$day < setin$day_eq, 5, NA)
your data is
day_eq <- c(1,1,1,1)
day <- c (1,5,3,2)
setin <- data.frame(day_eq,day)
the solution using dplyr is
library(dplyr)
setin %>% mutate(invalid = ifelse (day < day_eq, 5, 0))
I used setin as set name, however, you also use test.ID1, so just replace it in case

adding a column based on other values

I have a dataframe with millions of rows and three columns labeled Keywords, Impressions, Clicks. I'd like to add a column with values depending on the evaluation of this function:
isType <- function(Impressions, Clicks)
{
if (Impressions >= 1 & Clicks >= 1){return("HasClicks")} else if (Impressions >=1 & Clicks == 0){return("NoClicks")} else {return("ZeroImp")}
}
so far so good. I then try this to create the column but 1) it takes for ever and 2) it marks all the rows has "HasClicks" even the ones where it shouldn't.
# Creates a dataframe
Type <- data.frame()
# Loops until last row and store it in data.frame
for (i in c(1:dim(Mydf)[1])) {Type <- rbind(Type,isType(Mydf$Impressions[i], Mydf$Clicks[i]))}
# Add the column to Mydf
Mydf <- transform(Mydf, Type = Type)
input data:
Keywords,Impressions,Clicks
"Hello",0,0
"World",1,0
"R",34,23
Wanted output:
Keywords,Impressions,Clicks,Type
"Hello",0,0,"ZeroImp"
"World",1,0,"NoClicks"
"R",34,23,"HasClicks"
Building on Joshua's solution, I find it cleaner to generate Type in a single shot (note however that this presumes Clicks >= 0...)
Mydf$Type = ifelse(Mydf$Impressions >= 1,
ifelse(Mydf$Clicks >= 1, 'HasClicks', 'NoClicks'), 'ZeroImp')
First, the if/else block in your function will return the warning:
Warning message:
In if (1:2 > 2:3) TRUE else FALSE :
the condition has length > 1 and only the first element will be used
which explains why it all the rows are the same.
Second, you should allocate your data.frame and fill in the elements rather than repeatedly combining objects together. I imagine this is causing your long run-times.
EDIT: My shared code. I'd love for someone to provide a more elegant solution.
Mydf <- data.frame(
Keywords = sample(c("Hello","World","R"),20,TRUE),
Impressions = sample(0:3,20,TRUE),
Clicks = sample(0:3,20,TRUE) )
Mydf$Type <- "ZeroImp"
Mydf$Type <- ifelse(Mydf$Impressions >= 1 & Mydf$Clicks >= 1,
"HasClicks", Mydf$Type)
Mydf$Type <- ifelse(Mydf$Impressions >= 1 & Mydf$Clicks == 0,
"NoClicks", Mydf$Type)
This is a case where arithmetic can be cleaner and most likely faster than nested ifelse statements.
Again building on Joshua's solution:
Mydf$Type <- factor(with(Mydf, (Impressions>=1)*2 + (Clicks>=1)*1),
levels=1:3, labels=c("ZeroImp","NoClicks","HasClicks"))

Resources