How to remove specific (side-by-side) duplicates in r?

How to remove specific (side-by-side) duplicates in r? - r

Suppose I have the following string:
l1 = c(0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1)
and I only want to keep the "FIRST new 1", that is, my desire outcome of the above strong is:
l1 = c(0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1)
I tried to shift and subtract the lists, whatever is not 1, set to 0; but this way doesn't work.

You may try (base R way)
x <- c(0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1)
y <- rle(x)
z<- cumsum(y$lengths)[y$values == 0] + 1
w <- rep(0, length(x))
w[z] <- 1
w
[1] 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1
dplyr way
library(dplyr)
library(xts)
library(data.table)
x <- data.frame(
l1 = c(0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1)
)
x %>%
mutate(y = rleid(l1)) %>%
group_by(y) %>%
mutate(l1 = ifelse((y %% 2) == first(l1) & row_number(y)>1, 0, l1)) %>%
ungroup %>%
select(-y) %>%
pull(l1)
[1] 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1

Clumsy way
bool IsNewOneAppeared = 0
for(int i;i<c.length;i++)
{
if(IsNewOneAppeared )
c[i]= 0;
else if(c[i] equal 1)
{
keep 1;
IsNewOneAppeared =1;
}
}

Related

How do I merge two vectors of same length into one vector that has the same length as well in R

I have two vectors like this:
vec1<-c(0, 0, 1, 1, 0, 0, 0, 0, 0, 0)
vec2<-c(0, 0, 0, 1, 0, 0, 1, 0, 1, 0)
I want to merge it somehow to turn it into this:
vec<-c(0, 0, 1, 1, 0, 0, 1, 0, 1, 0)
Is there any way to do this?

We can use pmax
pmax(vec1, vec2)
[1] 0 0 1 1 0 0 1 0 1 0
or with |
+(vec1|vec2)

Here is a solution with ifelse:
ifelse(vec1==0 & vec2==0, 0, 1)
[1] 0 0 1 1 0 0 1 0 1 0

Create a new variable based on other columns values

I have a paneldata dataframe structure, something like this:
df <- data.frame("id" = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
"Status_2014" = c(1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0),
"Status_2015" = c(0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0),
"Status_2016" = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))
I want to generate a new dummy variable, that takes the value 1, if the rows contains 1 in any of the three columns or otherwise 0 if not. It should end up like this:
df <- data.frame("id" = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
"Status_2014" = c(1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0),
"Status_2015" = c(0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0),
"Status_2016" = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
"Final_status" = c(1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0))
Can anyone help me achieve this?

We can use if_any on the columns that starts_with 'Status', to check for any 1 value in a row and it returns TRUE if there is one or else FALSE which is coerced to binary with as.integer/+
library(dplyr)
df %>%
mutate(Final_status = +(if_any(starts_with('Status'), ~ . ==1)))
-outptu
id Status_2014 Status_2015 Status_2016 Final_status
1 1 1 0 0 1
2 1 1 0 0 1
3 1 1 0 0 1
4 1 1 0 0 1
5 2 0 1 0 1
6 2 0 1 0 1
7 2 0 1 0 1
8 2 0 1 0 1
9 3 0 0 0 0
10 3 0 0 0 0
11 3 0 0 0 0
12 3 0 0 0 0
Or using rowSums from base R
df$Final_status <- +(rowSums(df[-1] > 0) > 0)

You write an if condition to define the variable as 1 or 0, and inside this condition the most straight forward ways would be a dplyr pipe.
I don't have the dplyr syntax in my head, to long not used, but dplyr is what you want.
https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf
best greetings

Split comma- and pound-separated strings into different columns in R

I have a dataframe , a column of which contains colon and pound-separated strings.
data$col1
col1
1: 3#Tier_III_Uncertain EVS=[1, 0, 0, 1, 0, 0, 0, 0, 0, -1, 1, 1]
2: 3#Tier_III_Uncertain EVS=[0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0]
3: 4#Tier_III_Uncertain EVS=[0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 1, 0]
4: 2#Tier_IV_benign EVS=[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0]
5: 3#Tier_III_Uncertain EVS=[0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0]
6: 5#Tier_III_Uncertain EVS=[0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1]
I want to extract the elements of the string and split it into different columns.
col1 col2 col3 EVS1 ... EVS12
3#Tier_III_Uncertain EVS=[1, 0, 0, 1, 0, 0, 0, 0, 0, -1, 1, 1] 3 Tier_III_Uncertain 1 1
3#Tier_III_Uncertain EVS=[0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0] 3 Tier_III_Uncertain 0 0
4#Tier_III_Uncertain EVS=[0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 1, 0] 4 Tier_III_Uncertain 0 0
2#Tier_IV_benign EVS=[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0] 2 Tier_IV_benign 0 0
3#Tier_III_Uncertain EVS=[0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0] 3 Tier_III_Uncertain 0 0
5#Tier_III_Uncertain EVS=[0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1] 5 Tier_III_Uncertain 0 1

read.table(text=gsub("[^A-Za-z_0-9-]", " ", data$col1),
col.names = c(paste0('col', 2:4), paste0('EVS', 1:12)))[-3]
col2 col3 EVS1 EVS2 EVS3 EVS4 EVS5 EVS6 EVS7 EVS8 EVS9 EVS10 EVS11 EVS12
1 3 Tier_III_Uncertain 1 0 0 1 0 0 0 0 0 -1 1 1
2 3 Tier_III_Uncertain 0 0 0 1 0 0 0 0 0 1 1 0
3 4 Tier_III_Uncertain 0 0 0 1 0 0 0 0 2 0 1 0
4 2 Tier_IV_benign 0 0 0 1 0 0 0 0 0 0 1 0
5 3 Tier_III_Uncertain 0 0 0 1 0 0 0 0 1 0 1 0
6 5 Tier_III_Uncertain 0 0 1 1 0 0 0 0 1 0 1 1

Assuming DT shown reproducibly in the Note at the end replace non-word characters and also EVS= with space. Then read that using fread and set the names. Finally cbind DT to it.
DT2 <- fread(text = gsub("EVS=|\\W", " ", DT$col1))
names(DT2) <- c("col2", "col3", paste0("EVS", 1:(ncol(DT2)-2)))
cbind(DT, DT2)
Note
library(data.table)
L <- "3#Tier_III_Uncertain EVS=[1, 0, 0, 1, 0, 0, 0, 0, 0, -1, 1, 1]
3#Tier_III_Uncertain EVS=[0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0]
4#Tier_III_Uncertain EVS=[0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 1, 0]
2#Tier_IV_benign EVS=[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0]
3#Tier_III_Uncertain EVS=[0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0]
5#Tier_III_Uncertain EVS=[0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1]"
DT <- data.table(col1 = trimws(readLines(textConnection(L))))

For the given combination in a data frame, calculate the frequency of occurrence of that combination in another data frame in R

I am having a data frame that has various combinations as follows:
structure(list(`Q1` = c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0), `Q2` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), `Q3` = c(0, 1, 0, 0, 0, 1, 1, 0, 0,
0), `Q4` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Q5` = c(0, 0, 1, 0,
0, 1, 0, 1, 1, 0), `Q6` = c(1, 1, 0, 1, 1, 0, 0, 1, 1, 1), `Q7` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), `Q8` = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
1), `Q9` = c(1, 0, 1, 0, 0, 1, 1, 0, 1, 0), `Q10` = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0), `Q11` = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0),
`Q12` = c(1, 1, 1, 1, 1, 0, 1, 1, 0, 1)), row.names = c(NA,
-10L), class = "data.frame")
I am having a base data frame where I have different combinations with the weightage for each combination.
structure(list(Q1 = c(0, 0, 0, 0, 0, 1, 0, 0, 0, 1), Q2 = c(0,
1, 1, 0, 0, 0, 0, 0, 0, 0), Q3 = c(1, 0, 0, 1, 0, 0, 0, 0, 0,
0), Q4 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Q5 = c(1, 0, 1, 0,
0, 0, 1, 0, 0, 1), Q6 = c(1, 1, 1, 0, 1, 0, 0, 1, 0, 1), Q7 = c(0,
0, 1, 1, 1, 0, 0, 0, 0, 0), Q8 = c(1, 0, 1, 0, 0, 1, 0, 0, 0,
0), Q9 = c(1, 0, 0, 0, 0, 0, 0, 1, 1, 0), Q10 = c(0, 0, 1, 0,
0, 1, 0, 0, 0, 0), Q11 = c(0, 0, 1, 0, 0, 1, 0, 0, 0, 0), Q12 = c(1,
0, 0, 0, 1, 0, 1, 0, 0, 0), RatingBinary = c(1, 1, 0, 1, 0, 1,
0, 1, 1, 1)), row.names = c(NA, 10L), class = "data.frame")
The problem statement is for each 1's combination in 1st data frame (i.e.Q6, Q9, Q12 in 1st row, Q3, Q6, Q12 in 2nd row), I need to get the number of rows that get satisfied in the base data frame.
For example: In the combination data frame (1st Df), in the 1st row Q6, Q9 & Q12 have the binary value 1. I need to get the count of this combination(Q6, Q9 & Q12 which have 1's) in the base data and get the number of rows that have the RatingBinary values 0's and 1's.
How can I get this implemented in R? Can anyone suggest a suitable solution for this scenario?

Here's an algorithmic approach.
Let's call a set in the first data frame a combo set; this is a set of three questions in a given row. Let's also call a set in the base data a base set; this is the set in a given row for which we are trying to find whether a given combo set is part of.
The approach is essentially to iterate through each combo set and find matches over all base sets. Sets seem to only be in threes, so I take advantage of that by hard coding a sum == 3 rather than doing an agnostic match. We store matches in a structure I call pair. A match is indicated by a 1. I define pair(x,y) where x is the row number of the combo data set and y is the row number of base dataset.
pair <- matrix(nrow = 10, ncol = 10)
for(i in 1:nrow(df)) {
ind <- which(df[i,] == 1)
for(j in 1:nrow(df2)) {
if(sum(df2[j, ind]) == 3){
pair[i,j] <- 1
} else {
pair[i,j] <- 0
}
}
}
The pair object is:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 0 0 0 0 0 0 0 0 0
[2,] 1 0 0 0 0 0 0 0 0 0
[3,] 1 0 0 0 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 0 0 0
[6,] 1 0 0 0 0 0 0 0 0 0
[7,] 1 0 0 0 0 0 0 0 0 0
[8,] 1 0 0 0 0 0 0 0 0 0
[9,] 1 0 0 0 0 0 0 0 0 0
[10,] 1 0 0 0 0 0 0 0 0 0
This means for only the first combo set did we find matches in all the base sets except for base set 4 and base set 5. Because there is only one match, the answer to your second question about the number of rows that have RatingBinary 0 or 1 becomes trivial -- it's just the RatingBinary for that row/base set in the base data set.

How to find bounding boxes of objects in raster?

I have a binary raster consisting of objects (1) and background (0). How can I find bounding boxes of objects? Each object should have its own bouding box.
Input:
library("raster")
mat = matrix(
c(0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 0, 0,
0, 0, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 0, 0,
0, 1, 1, 1, 1, 0,
0, 0, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0),
ncol = 6, nrow = 8, byrow = TRUE
)
ras = raster(mat)
I expect this result:
result = raster(matrix(
c(0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 0,
0, 1, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 0,
0, 1, 0, 0, 1, 0,
0, 1, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0),
ncol = 6, nrow = 8, byrow = TRUE
))

Here in an approach
Example data
library(raster)
mat = matrix(
c(0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 0, 0,
0, 0, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 0, 0,
0, 1, 1, 1, 1, 0,
0, 0, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0),
ncol = 6, nrow = 8, byrow = TRUE )
ras <- raster(mat)
Solution
f <- function(r) {
x <- reclassify(ras, cbind(0,NA))
y <- rasterToPolygons(x, dissolve=TRUE)
z <- disaggregate(y)
e <- sapply(1:length(z), function(i) extent(z[i,]))
p <- spPolygons(e)
r <- rasterize(p, r)
d <- boundaries(r)
reclassify(d, cbind(NA, 0))
}
r <- f(res)
as.matrix(r)
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 0 0 0 0 0 0
#[2,] 0 1 1 1 1 0
#[3,] 0 1 1 1 1 0
#[4,] 0 0 0 0 0 0
#[5,] 0 1 1 1 1 0
#[6,] 0 1 0 0 1 0
#[7,] 0 1 1 1 1 0
#[8,] 0 0 0 0 0 0
It is of course possible that bounding boxes of objects overlap, in which there is no solution, I suppose.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to remove specific (side-by-side) duplicates in r? - r

Clumsy way bool IsNewOneAppeared = 0 for(int i;i<c.length;i++) { if(IsNewOneAppeared ) c[i]= 0; else if(c[i] equal 1) { keep 1; IsNewOneAppeared =1; } }

Related

How do I merge two vectors of same length into one vector that has the same length as well in R

Create a new variable based on other columns values

Split comma- and pound-separated strings into different columns in R

For the given combination in a data frame, calculate the frequency of occurrence of that combination in another data frame in R

How to find bounding boxes of objects in raster?

Categories

Resources