hex2bin in data frame - r

I'm new to R and excited about all the possibilities of data management and presentation.
Actually I have a problem and did not find any solution:
I have built a data frame with:
require(BMS)
n = c(1, 2, 3, 4, 5, 6, 7, 8)
s = c("55aa55aa", "aa55aa55", "12345678", "9ABCDEF0", "55aa55aa", "aa55aa55", "12345678", "9ABCDEF0")
df = data.frame(n, s)
df$s <- as.character(df$s)
df
# n s
# 1 1 55aa55aa
# 2 2 aa55aa55
# 3 3 12345678
# 4 4 9ABCDEF0
# 5 5 55aa55aa
# 6 6 aa55aa55
# 7 7 12345678
# 8 8 9ABCDEF0
Column s is a 32bit hex value which I want to add as the real bit string to the data frame as new column sbin.
It should look like this afterwards:
df
# n s sbin
# 1 1 55aa55aa 01010101101010100101010110101010
# 2 2 aa55aa55 10101010010101011010101001010101
# 3 3 12345678 00010010001101000101011001111000
# 4 4 9ABCDEF0 .......
# 5 5 55aa55aa ......
# 6 6 aa55aa55
# 7 7 12345678
# 8 8 9ABCDEF0
For conversion I like to use the "hex2bin" function out of "BMS" package.
I tried this
lapply(df$s, hex2bin)
# [[1]]
# [1] 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0
# [[2]]
# [1] 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1
# [[3]]
# [1] 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0
# .....
but did not get the required output.
In the end I would like to access each bit in the data frame rows. So I would like to get 32 vectors with 8 bits each in this example.

How about this?
df$sbin <- sapply(df$s, FUN = function(x) { paste(hex2bin(x), collapse = "") })
# n s sbin
# 1 1 55aa55aa 01010101101010100101010110101010
# 2 2 aa55aa55 10101010010101011010101001010101
# 3 3 12345678 00010010001101000101011001111000
# 4 4 9ABCDEF0 10011010101111001101111011110000
# 5 5 55aa55aa 01010101101010100101010110101010
# 6 6 aa55aa55 10101010010101011010101001010101
# 7 7 12345678 00010010001101000101011001111000
# 8 8 9ABCDEF0 10011010101111001101111011110000

Related

Count number of pairs across elements in a list in R?

Similar questions have been asked about counting pairs, however none seem to be specifically useful for what I'm trying to do.
What I want is to count the number of pairs across multiple list elements and turn it into a matrix. For example, if I have a list like so:
myList <- list(
a = c(2,4,6),
b = c(1,2,3,4),
c = c(1,2,5,7),
d = c(1,2,4,5,8)
)
We can see that the pair 1:2 appears 3 times (once each in a, b, and c). The pair 1:3 appears only once in b. The pair 1:4 appears 2 times (once each in b and d)... etc.
I would like to count the number of times a pair appears and then turn it into a symmetrical matrix. For example, my desired output would look something like the matrix I created manually (where each element of the matrix is the total count for that pair of values):
> myMatrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0 3 1 2 2 0 1 1
[2,] 3 0 1 3 2 1 1 1
[3,] 1 1 0 1 0 0 0 0
[4,] 2 3 1 0 0 0 0 1
[5,] 2 2 0 0 0 0 1 1
[6,] 0 1 0 0 0 0 0 0
[7,] 1 1 0 0 1 0 0 0
[8,] 1 1 0 1 1 0 0 0
Any suggestions are greatly appreciated
Inspired by #akrun's answer, I think you can use a crossproduct to get this very quickly and simply:
out <- tcrossprod(table(stack(myList)))
diag(out) <- 0
# values
#values 1 2 3 4 5 6 7 8
# 1 0 3 1 2 2 0 1 1
# 2 3 0 1 3 2 1 1 1
# 3 1 1 0 1 0 0 0 0
# 4 2 3 1 0 1 1 0 1
# 5 2 2 0 1 0 0 1 1
# 6 0 1 0 1 0 0 0 0
# 7 1 1 0 0 1 0 0 0
# 8 1 1 0 1 1 0 0 0
Original answer:
Use combn to get the combinations, as well as reversing each combination.
Then convert to a data.frame and table the results.
tab <- lapply(myList, \(x) combn(x, m=2, FUN=\(cm) rbind(cm, rev(cm)), simplify=FALSE))
tab <- data.frame(do.call(rbind, unlist(tab, rec=FALSE)))
table(tab)
# X2
#X1 1 2 3 4 5 6 7 8
# 1 0 3 1 2 2 0 1 1
# 2 3 0 1 3 2 1 1 1
# 3 1 1 0 1 0 0 0 0
# 4 2 3 1 0 1 1 0 1
# 5 2 2 0 1 0 0 1 1
# 6 0 1 0 1 0 0 0 0
# 7 1 1 0 0 1 0 0 0
# 8 1 1 0 1 1 0 0 0
We could loop over the list, get the pairwise combinations with combn, stack it to a two column dataset, convert the 'values' column to factor with levels specified as 1 to 8, get the frequency count (table), do a cross product (crossprod), convert the output back to logical, and then Reduce the list elements by adding elementwise and finally assign the diagonal elements to 0. (If needed set the names attributes of dimnames to NULL
out <- Reduce(`+`, lapply(myList, function(x)
crossprod(table(transform(stack(setNames(
combn(x,
2, simplify = FALSE), combn(x, 2, paste, collapse="_"))),
values = factor(values, levels = 1:8))[2:1]))> 0))
diag(out) <- 0
names(dimnames(out)) <- NULL
-output
> out
1 2 3 4 5 6 7 8
1 0 3 1 2 2 0 1 1
2 3 0 1 3 2 1 1 1
3 1 1 0 1 0 0 0 0
4 2 3 1 0 1 1 0 1
5 2 2 0 1 0 0 1 1
6 0 1 0 1 0 0 0 0
7 1 1 0 0 1 0 0 0
8 1 1 0 1 1 0 0 0
I thought of a solution based on #TarJae answer, is not a elegant one, but it was a fun challenge!
Libraries
library(tidyverse)
Code
map_df(myList,function(x) as_tibble(t(combn(x,2)))) %>%
count(V1,V2) %>%
{. -> temp_df} %>%
bind_rows(
temp_df %>%
rename(V2 = V1, V1 = V2)
) %>%
full_join(
expand_grid(V1 = 1:8,V2 = 1:8)
) %>%
replace_na(replace = list(n = 0)) %>%
arrange(V2,V1) %>%
pivot_wider(names_from = V1,values_from = n) %>%
as.matrix()
Output
V2 1 2 3 4 5 6 7 8
[1,] 1 0 3 1 2 2 0 1 1
[2,] 2 3 0 1 3 2 1 1 1
[3,] 3 1 1 0 1 0 0 0 0
[4,] 4 2 3 1 0 1 1 0 1
[5,] 5 2 2 0 1 0 0 1 1
[6,] 6 0 1 0 1 0 0 0 0
[7,] 7 1 1 0 0 1 0 0 0
[8,] 8 1 1 0 1 1 0 0 0
First identify the possible combination of each vector from the list to a tibble then I bind them to one tibble and count the combinations.
library(tidyverse)
a <- as_tibble(t(combn(myList[[1]],2)))
b <- as_tibble(t(combn(myList[[2]],2)))
c <- as_tibble(t(combn(myList[[3]],2)))
d <- as_tibble(t(combn(myList[[4]],2)))
bind_rows(a,b,c,d) %>%
count(V1, V2)
V1 V2 n
<dbl> <dbl> <int>
1 1 2 3
2 1 3 1
3 1 4 2
4 1 5 2
5 1 7 1
6 1 8 1
7 2 3 1
8 2 4 3
9 2 5 2
10 2 6 1
11 2 7 1
12 2 8 1
13 3 4 1
14 4 5 1
15 4 6 1
16 4 8 1
17 5 7 1
18 5 8 1

How to convert a data set to a logic data frame in R? [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 2 years ago.
I have a series of dataset with repeating scores, the data frame is as follows
ID,Variable,Category
1,6,A
2,4,C
3,3,D
4,4,C
5,5,B
6,3,D
7,6,A
8,4,C
9,5,B
10,3,D
I want it to create a logic like this
ID,A,B,C,D
1,1,0,0,0
2,0,0,1,0
3,0,0,0,1
4,0,0,1,0
5,0,1,0,0
6,0,0,0,1
7,1,0,0,0
8,0,0,1,0
9,0,1,0,0
10,0,0,0,1
Three options.
This doesn't technically return a data.frame, it returns a "xtabs","table" class object, whose conversion to a data.frame is not necessarily what one might expect.
xtabs(~ID + Category, data=dat)
# Category
# ID A B C D
# 1 1 0 0 0
# 2 0 0 1 0
# 3 0 0 0 1
# 4 0 0 1 0
# 5 0 1 0 0
# 6 0 0 0 1
# 7 1 0 0 0
# 8 0 0 1 0
# 9 0 1 0 0
# 10 0 0 0 1
class(xtabs(~ID + Category, data=dat))
# [1] "xtabs" "table"
head(as.data.frame(xtabs(~ID + Category, data=dat)))
# ID Category Freq
# 1 1 A 1
# 2 2 A 0
# 3 3 A 0
# 4 4 A 0
# 5 5 A 0
# 6 6 A 0
Using tidyr::pivot_wider:
tidyr::pivot_wider(dat, ID, names_from = Category, values_from = Variable, values_fill = list(Variable = 0))
# # A tibble: 10 x 5
# ID A C D B
# <int> <int> <int> <int> <int>
# 1 1 6 0 0 0
# 2 2 0 4 0 0
# 3 3 0 0 3 0
# 4 4 0 4 0 0
# 5 5 0 0 0 5
# 6 6 0 0 3 0
# 7 7 6 0 0 0
# 8 8 0 4 0 0
# 9 9 0 0 0 5
# 10 10 0 0 3 0
data.table::dcast:
library(data.table)
dcast(as.data.table(dat), ID~Category, value.var = "Variable", fill = 0)
# ID A B C D
# 1: 1 6 0 0 0
# 2: 2 0 0 4 0
# 3: 3 0 0 0 3
# 4: 4 0 0 4 0
# 5: 5 0 5 0 0
# 6: 6 0 0 0 3
# 7: 7 6 0 0 0
# 8: 8 0 0 4 0
# 9: 9 0 5 0 0
# 10: 10 0 0 0 3
While options 2 and 3 do not produce your literal output, it shows their flexibility: you can adjust them to be all 0s and 1s by preemptively converting dat$Variable <- 1L.

Row-wise operation by group over time R

Problem:
I am trying to create variable x2 which is equal to 1, for all rows within each ID group where over time x1 switches from 1 to 0.
Additionally, after the switch, every consecutive 0 in the run, x2 is set to 1.
I tried to figure out how to do this using library(dplyr), but could not figure out how to look at previous records within the group.
Input Data:
ID<-c("1","1","1","1","1","2","2","2","2","3","3","3","4","4","5","5","5")
time<-c("1","2","3","4","5","1","2","3","4","1","2","3","1","2","1","2","3")
x1<-c("0","1","1","1","1","0","0","0","0","1","0","0","1","1","1","0","1")
df<-data.frame(ID,time,x1)
Required Output:
ID time x1 x2
1 1 0 0
1 2 1 0
1 3 1 0
1 4 1 0
1 5 1 0
2 1 0 0
2 2 0 0
2 3 0 0
2 4 0 0
3 1 1 0
3 2 0 1
3 3 0 1
4 1 1 0
4 2 1 0
5 1 1 0
5 2 0 1
5 3 1 0
It is better to have the 'x1' as numeric column
library(data.table)
setDT(df)[, x2 := (cumsum(x1) < 2)*cumsum(c(FALSE, diff(x1) < 0)), ID]
df
# ID time x1 x2
# 1: 1 1 0 0
# 2: 1 2 1 0
# 3: 1 3 1 0
# 4: 1 4 1 0
# 5: 1 5 1 0
# 6: 2 1 0 0
# 7: 2 2 0 0
# 8: 2 3 0 0
# 9: 2 4 0 0
#10: 3 1 1 0
#11: 3 2 0 1
#12: 3 3 0 1
#13: 4 1 1 0
#14: 4 2 1 0
#15: 5 1 1 0
#16: 5 2 0 1
#17: 5 3 1 0
data
ID<-c("1","1","1","1","1","2","2","2","2","3","3","3","4","4","5","5","5")
time<-c("1","2","3","4","5","1","2","3","4","1","2","3","1","2","1","2","3")
x1<- as.integer(c("0","1","1","1","1","0","0","0","0","1","0","0","1","1","1","0","1"))
df<-data.frame(ID,time,x1)
If you want a dplyr answer, you can use #akrun's code in mutate after grouping by ID
library(dplyr)
ID<-c("1","1","1","1","1","2","2","2","2","3","3","3","4","4","5","5","5")
time<-c("1","2","3","4","5","1","2","3","4","1","2","3","1","2","1","2","3")
x1<- as.integer(c("0","1","1","1","1","0","0","0","0","1","0","0","1","1","1","0","1"))
df<-data.frame(ID,time,x1)
df <- df %>%
group_by(ID) %>%
mutate(x2 = (cumsum(x1) < 2)*cumsum(c(FALSE, diff(x1) < 0)))
df
# ID time x1 x2
# 1 1 0 0
# 1 2 1 0
# 1 3 1 0
# 1 4 1 0
# 1 5 1 0
# 2 1 0 0
# 2 2 0 0
# 2 3 0 0
# 2 4 0 0
# 3 1 1 0
# 3 2 0 1
# 3 3 0 1
# 4 1 1 0
# 4 2 1 0
# 5 1 1 0
# 5 2 0 1
# 5 3 1 0

Creating a new variable by detecting max value for each id

My data set contains three variables:
id <- c(1,1,1,1,1,1,2,2,2,2,5,5,5,5,5,5)
ind <- c(0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1)
price <- c(1,2,3,4,5,6,1,2,3,4,1,2,3,4,5,6)
mdata <- data.frame(id,ind,price)
I need to create a new variable (ind2) that is if ind=0, then ind2=0.
also, if ind=1, then ind2=0, unless the price value is max, then ind2=1.
The new data looks like:
id ind ind2 price
1 0 0 1
1 0 0 2
1 0 0 3
1 0 0 4
1 0 0 5
1 0 0 6
2 1 0 1
2 1 0 2
2 1 0 3
2 1 1 4
5 1 0 1
5 1 0 2
5 1 0 3
5 1 0 4
5 1 0 5
5 1 1 6
library(dplyr)
mdata %>%
group_by(id) %>%
mutate(ind2 = +(ind == 1L & price == max(price)))
# id ind price ind2
# 1 1 0 1 0
# 2 1 0 2 0
# 3 1 0 3 0
# 4 1 0 4 0
# 5 1 0 5 0
# 6 1 0 6 0
# 7 2 1 1 0
# 8 2 1 2 0
# 9 2 1 3 0
# 10 2 1 4 1
# 11 5 1 1 0
# 12 5 1 2 0
# 13 5 1 3 0
# 14 5 1 4 0
# 15 5 1 5 0
# 16 5 1 6 1
Or if you prefer data.table
setDT(mdata)[, ind2 := +(ind == 1L & price == max(price)), by = id]
Or with base R
mdata$ind2 <- unlist(lapply(split(mdata,mdata$id),
function(x) +(x$ind == 1L & x$price == max(x$price))))

create a new data frame with existing ones

Suppose I have the following data frames
treatmet1<-data.frame(id=c(1,2,7))
treatment2<-data.frame(id=c(3,7,10))
control<-data.frame(id=c(4,5,8,9))
I want to create a new data frame that is the union of those 3 and have an indicator column that takes the value 1 for each one.
experiment<-data.frame(id=c(1:10),treatment1=0, treatment2=0, control=0)
where experiment$treatment1[1]=1 etc etc
What is the best way of doing this in R?
Thanks!
Updated as per # Flodel:
kk<-rbind(treatment1,treatment2,control)
var1<-c("treatment1","treatment2","control")
kk$df<-rep(var1,c(dim(treatment1)[1],dim(treatment2)[1],dim(control)[1]))
kk
id df
1 1 treatment1
2 2 treatment1
3 7 treatment1
4 3 treatment2
5 7 treatment2
6 10 treatment2
7 4 control
8 5 control
9 8 control
10 9 control
If you want in the form of 1 and 0 , you can use table
ll<-table(kk)
ll
df
id control treatment1 treatment2
1 0 1 0
2 0 1 0
3 0 0 1
4 1 0 0
5 1 0 0
7 0 1 1
8 1 0 0
9 1 0 0
10 0 0 1
If you want it as a data.frame, then you can use reshape:
kk2<-reshape(data.frame(ll),timevar = "df",idvar = "id",direction = "wide")
names(kk2)[-1]<-sort(var1)
> kk2
kk2
id control treatment1 treatment2
1 1 0 1 0
2 2 0 1 0
3 3 0 0 1
4 4 1 0 0
5 5 1 0 0
6 7 0 1 1
7 8 1 0 0
8 9 1 0 0
9 10 0 0 1
df.bind <- function(...) {
df.names <- all.names(substitute(list(...)))[-1L]
ids.list <- setNames(lapply(list(...), `[[`, "id"), df.names)
num.ids <- max(unlist(ids.list))
tabs <- lapply(ids.list, tabulate, num.ids)
data.frame(id = seq(num.ids), tabs)
}
df.bind(treatment1, treatment2, control)
# id treatment1 treatment2 control
# 1 1 1 0 0
# 2 2 1 0 0
# 3 3 0 1 0
# 4 4 0 0 1
# 5 5 0 0 1
# 6 6 0 0 0
# 7 7 1 1 0
# 8 8 0 0 1
# 9 9 0 0 1
# 10 10 0 1 0
(Notice how it does include a row for id == 6.)
Taking
treatment1<-data.frame(id=c(1,2,7))
treatment2<-data.frame(id=c(3,7,10))
control<-data.frame(id=c(4,5,8,9))
You can use this:
x <- c("treatment1", "treatment2", "control")
f <- function(s) within(get(s), assign(s, 1))
r <- Reduce(function(x,y) merge(x,y,all=TRUE), lapply(x, f))
r[is.na(r)] <- 0
Result:
> r
id treatment1 treatment2 control
1 1 1 0 0
2 2 1 0 0
3 3 0 1 0
4 4 0 0 1
5 5 0 0 1
6 7 1 1 0
7 8 0 0 1
8 9 0 0 1
9 10 0 1 0
This illustrates what I was imagining to be the rbind strategy:
alldf <- rbind(treatmet1,treatment2,control)
alldf$grps <- model.matrix( ~ factor( c( rep(1,nrow(treatmet1)),
rep(2,nrow(treatment2)),
rep(3,nrow(control) ) ))-1)
dimnames( alldf[[2]])[2]<- list(c("trt1","trt2","ctrl"))
alldf
#-------------------
id grps.trt1 grps.trt2 grps.ctrl
1 1 1 0 0
2 2 1 0 0
3 7 1 0 0
4 3 0 1 0
5 7 0 1 0
6 10 0 1 0
7 4 0 0 1
8 5 0 0 1
9 8 0 0 1
10 9 0 0 1

Resources