I have a dataframe like this:
df<- data.frame(a = 0,b=0,c=1,d=1,e=0,f=1,g=1,h=1)
print(df) would give this result
a b c d e f g h
0 0 1 1 0 1 1 1
Now, I need to find out the span of 1s together , which is maximum. In the above scenario, we have 1s together twice (column C and column D) before zero comes in the next column and thrice next (column f,g,h). I want result to be something like this, as 3 is max of 2 and 3.
a b c d e f g h ***Max_Span***
0 0 1 1 0 1 1 1 ***3***
Is there a easy way to do it rather than jump each byte at once and check its value with previous one? Please advice.
You probably want the function rle.
Here an example to see what it does (counts the number of sequences):
vect <- c(1, 0, 0, 1, 1, 1, 0)
rle(vect)
Run Length Encoding
lengths: int [1:4] 1 2 3 1
values : num [1:4] 1 0 1 0
Edit:
if you want only a particular values just use which:
rle_vect <- rle(vect) #first we assign the output from rle
rle_vect$lengths[which(rle_vect$values==1)] # then we can access where values==1
#[1] 1 3
In your case you want the max number of lengths for only 1s:
rle_1 <- rle(df[1,])
max(rle_1$lengths[which(rle_1$values==1)])
#[1] 3
Data:
df[1, ]
# a b c d e f g h
#1 0 0 1 1 0 1 1 1
Related
I have a sequence which looks like this
SEQENCE
1 A
2 B
3 B
4 C
5 A
Now from this sequence, I want to get the matrix like this where i the row and jth column element denotes how many times movement occurred from ith row node to jth column node
A B C
A 0 1 0
B 0 1 1
C 1 0 0
How Can I get this in R
1) Use table like this:
s <- DF[, 1]
table(tail(s, -1), head(s, -1))
giving:
A B C
A 0 0 1
B 1 1 0
C 0 1 0
2) or like this. Since embed does not work with factors we convert the factor to character,
s <- as.character(DF[, 1])
do.call(table, data.frame(embed(s, 2)))
giving:
X2
X1 A B C
A 0 0 1
B 1 1 0
C 0 1 0
3) xtabs also works:
s <- as.character(DF[, 1])
xtabs(data = data.frame(embed(s, 2)))
giving:
X2
X1 A B C
A 0 0 1
B 1 1 0
C 0 1 0
Note: The input DF in reproducible form is:
Lines <- " SEQENCE
1 A
2 B
3 B
4 C
5 A"
DF <- read.table(text = Lines, header = TRUE)
I need to check whether the number of elements of each unique value in the variable PPT in A is equal to the number of elements of each unique value in PPT in B, and whether there is any value unique only to A or only to B.
For example:
PPTa <- c("ppt0100109","ppt0301104","ppt0100109","ppt0100109","ppt0300249","ppt0100109","ppt0300249","ppt0100109","ppt0504409","ppt2303401","ppt0704210","ppt0704210","ppt0100109")
CNa <- c(110,54,110,110,49,10,49,110,409,40,10,10,110)
LLa <- c(150,55,150,150,45,15,45,115,405,45,5,15,50)
A <-data.frame(PPTa,CNa,LLa)
PPTb <- c("ppt0100200","ppt0300249","ppt0100109","ppt0300249","ppt0100109","ppt0764091","ppt2303401","ppt0704210","ppt0704210","ppt0100109")
CNb <- c(110,54,110,110,49,10,49,110,409,40)
LLb <- c(150,55,150,150,45,15,45,115,405,45)
B <-data.frame(PPTb,CNb,LLb)
In this case, we have these unique values which occur a certain amount of times:
A$PPTa TIMES
"ppt0100109" 6
"ppt0301104" 1
"ppt0300249" 2
"ppt0504409" 1
"ppt2303401" 1
"ppt0704210" 2
B$PPTb TIMES
"ppt0100200" 1
"ppt0300249" 2
"ppt0100109" 3
"ppt0764091" 1
"ppt2303401" 1
"ppt0704210" 2
I would like to create a new matrix (or anything you could suggest) with a value of 0 if the unique value exists both in A and B with the same number of elements, a value of 1 if it exists in both dataframes A and B but the number of elements differ, and a value of 2 if the value exists only in one of the two dataframes.
Something like:
A$PPTa TIMES OUTPUT
"ppt0100109" 6 1
"ppt0301104" 1 2
"ppt0300249" 2 0
"ppt0504409" 1 2
"ppt2303401" 1 0
"ppt0704210" 2 0
B$PPTb TIMES OUTPUT
"ppt0100200" 1 2
"ppt0300249" 2 0
"ppt0100109" 3 1
"ppt0764091" 1 2
"ppt2303401" 1 0
"ppt0704210" 2 0
You can use a nested ifelse statement,
ifelse(do.call(paste0, A) %in% do.call(paste0, B), 0, ifelse(A$PPTa %in% B$PPTb, 1, 2))
#[1] 1 0 2 2 0 0
ifelse(do.call(paste0, B) %in% do.call(paste0, A), 0, ifelse(B$PPTb %in% A$PPTa, 1, 2))
#[1] 1 2 0 0 2 0
I have a data.frame that looks like this:
> DF1
A B C D E
a x c h p
c d q t w
s e r p a
w l t s i
p i y a f
I would like to compare each column of my data.frame with the remaining columns in order to count the number of common elements. For example, I would like to compare column A with all the remaining columns (B, C, D, E) and count the common entities in this way:
A versus the remaining:
A vs B: 0 (because they have 0 common elements)
A vs C: 1 (c in common)
A vs D: 2 (p and s in common)
A vs E: 3 (p,w,a, in common)
Then the same: B versus C,D,E columns and so on.
How can I implement this?
We can loop through the column names and compare with the other columns, by taking the intersect and get the length
sapply(names(DF1), function(x) {
x1 <- lengths(Map(intersect, DF1[setdiff(names(DF1), x)], DF1[x]))
c(x1, setNames(0, setdiff(names(DF1), names(x1))))[names(DF1)]})
# A B C D E
#A 0 0 1 3 3
#B 0 0 0 0 1
#C 1 0 0 1 0
#D 3 0 1 0 2
#E 3 1 0 2 0
Or this can be done more compactly by taking the cross product after getting the frequency of the long formatted (melt) dataset
library(reshape2)
tcrossprod(table(melt(as.matrix(DF1))[-1])) * !diag(5)
# Var2
#Var2 A B C D E
# A 0 0 1 3 3
# B 0 0 0 0 1
# C 1 0 0 1 0
# D 3 0 1 0 2
# E 3 1 0 2 0
NOTE: The crossprod part is also implemented with RcppEigen here which would make this faster
An alternative is to use combn twice, once to get the column combinations and next to find the lengths of the element intersections.
cbind.data.frame returns a data.frame and setNames is used to add column names.
setNames(cbind.data.frame(t(combn(names(df), 2)),
combn(names(df), 2, function(x) length(intersect(df[, x[1]], df[, x[2]])))),
c("col1", "col2", "count"))
col1 col2 count
1 A B 0
2 A C 1
3 A D 3
4 A E 3
5 B C 0
6 B D 0
7 B E 1
8 C D 1
9 C E 0
10 D E 2
I have a table (no header) in R:
1 1 1 0 0 1 1 1 0 00 0 0 1 1
What I want to do is extract continuous 1s from every row and column to later on make a string with these 1s and finally convert the string to an integer from binary.
Example, say I extract the continuous 1s from row 1 -> 1 1 1, then I make a string from these 1s, and finally convert this string which is a binary value to a n integer, the outcome should be 7.
Thank you.
If you can assume that you just have contiguous 1s with no 1s elsewhere, you can do something like this:
2 ^ rowSums(tab) - 1
For example:
tab <- as.table(matrix(c(1,1,1,0,0,1,1,1,0,0,0,0,0,1,1), nrow = 3, byrow = T))
2 ^ rowSums(tab) - 1
# A B C
# 7 7 3
If you do not have this contiguity assumption, us the run-length encoding:
-1 + 2 ^ apply(tab, 1, function(x) max(rle(x)$lengths[rle(x)$values == 1]))
Then the example is
tab <- as.table(matrix(c(0,1,1,1,0,0,0,1,1,1,0,0,0,0,0,0,1,1), nrow = 3, byrow = T))
# A B C D E F
# A 0 1 1 1 0 0
# B 0 1 1 1 0 0
# C 0 0 0 0 1 1
and the above again yields
# A B C
# 7 7 3
So a quick question jumping off of this one....
Fast replacing values in dataframe in R
If I want to do this replace but only for certain rows of my data frame, is there a way to add a row specification to:
df [df<0] =0
Something like applying this to rows 40-52 (Doesn't work):
df[df[40:52,] < 0] = 0
Any suggestions? Much appreciated.
Or simply:
df[40:52,][df[40:52,] < 0] <- 0
Here is a test:
test = data.frame(A = c(1,2,-1), B = c(4,-8,5), C = c(1,2,3), D = c(7,8,-9))
#> test
# A B C D
#1 1 4 1 7
#2 2 -8 2 8
#3 -1 5 3 -9
To replace the negative values with 0 for only rows 2 and 3, you can do:
test[2:3,][test[2:3,] < 0] <- 0
and you get
#> test
# A B C D
#1 1 4 1 7
#2 2 0 2 8
#3 0 5 3 0
This is another way, utilizing R's recycling behavior.
df[df < 0 & 1:nrow(df) %in% 40:52] <- 0