I´ve been trying to make a matrix from a data frame in R, without succes. I have the next data frame
Order Object idrA idoA
8001505892 CHR56029398AB 1 1
8001506013 CHR56029398AB 1 2
8001507782 CHR56029398AB 1 3
8001508088 CHR56029398AB 1 4
8001508788 CHR56029398AB 1 5
8001509281 CHR56029398AB 1 6
8001509322 CHR56029398AB 1 7
8001509373 CHR56029398AB 1 8
8001505342 MMRMD343563 2 9
8001506699 MMRMD343563 2 10
8001507102 MMRMD343563 2 11
8001507193 MMRMD343563 2 12
8001508554 MMRMD343563 2 13
8001508654 MMRMD343563 2 14
8001509151 MMRMD343563 2 15
8001509707 MMRMD343563 2 16
8001509712 MMRMD343563 2 17
8001509977 MMRMD343563 2 18
8001510279 MMRMD343563 2 19
8001505342 MMRMD343565 3 9
8001507112 MMRMD343565 3 20
8001507193 MMRMD343565 3 12
8001508554 MMRMD343565 3 13
8001508654 MMRMD343565 3 14
8001509151 MMRMD343565 3 15
8001509707 MMRMD343565 3 16
8001509712 MMRMD343565 3 17
8001509977 MMRMD343565 3 18
8001510279 MMRMD343565 3 19
8001505920 MMRMN146319 4 21
8001506733 MMRMN146319 4 22
8001506929 MMRMN146319 4 23
8001507112 MMRMN146319 4 20
8001507196 MMRMN146319 4 24
8001510302 MMRMN146319 4 25
8001517272 MMRMN146319 4 26
8001506186 MMRMN146320 5 27
8001506733 MMRMN146320 5 22
8001506929 MMRMN146320 5 23
8001507112 MMRMN146320 5 20
8001508638 MMRMN146320 5 28
8001509526 MMRMN146320 5 29
8001505452 SSR664050011 6 30
8001508551 SSR664050011 6 31
8001509229 SSR664050011 6 32
8001510174 SSR664050011 6 33
Where idr are the Id for each object and ido is the Id for each purchase order. So I want to make a matriz with the number of row = N° orders and N° columns= N°object, and fill it with a vector with 1s and 0s, with a 1 when in each order was purchased some of the bjects and 0 if it wasn´t.
Example: the order with ido=20 must have a vector like this (0,0,1,1,1,0).
I hope I could explain clearly, thanks!
You can use xtabs to create a cross table:
Recreate your data:
dat <- read.table(header=TRUE, text="
Order Object idrA idoA
8001505892 CHR56029398AB 1 1
....
8001506013 CHR56029398AB 1 2
8001507782 CHR56029398AB 1 3
8001509229 SSR664050011 6 32
8001510174 SSR664050011 6 33")
Create the cross table:
xtabs(Order ~ idoA + idrA, dat) != 0
idrA
idoA 1 2 3 4 5 6
1 TRUE FALSE FALSE FALSE FALSE FALSE
2 TRUE FALSE FALSE FALSE FALSE FALSE
....
20 FALSE FALSE TRUE TRUE TRUE FALSE
....
32 FALSE FALSE FALSE FALSE FALSE TRUE
33 FALSE FALSE FALSE FALSE FALSE TRUE
To coerce the logical values to numeric values, you can use apply() and as.numeric, but then you have some work left to replace the row names:
apply(xtabs(Order ~ idoA + idrA, dat) != 0, 2, as.numeric)
Or, you can use a little trick by adding 0 to the values. This coerces the logical values to numeric:
(xtabs(Order ~ idoA + idrA, dat) != 0) + 0
idrA
idoA 1 2 3 4 5 6
1 1 0 0 0 0 0
2 1 0 0 0 0 0
3 1 0 0 0 0 0
....
Another option is to use acast from reshape2
library(reshape2)
res1 <- (acast(dat, idoA~idrA, value.var='Order', fill=0)!=0)+0
head(res1)
# 1 2 3 4 5 6
#1 1 0 0 0 0 0
#2 1 0 0 0 0 0
#3 1 0 0 0 0 0
#4 1 0 0 0 0 0
#5 1 0 0 0 0 0
#6 1 0 0 0 0 0
Or using dplyr/tidyr
library(dplyr)
library(tidyr)
dat %>%
select(-Object) %>%
spread(idrA, Order, fill=0) %>%
mutate_each(funs((!!.)+0), select=-idoA) %>%
head()
#idoA 1 2 3 4 5 6
#1 1 1 0 0 0 0 0
#2 2 1 0 0 0 0 0
#3 3 1 0 0 0 0 0
#4 4 1 0 0 0 0 0
#5 5 1 0 0 0 0 0
#6 6 1 0 0 0 0 0
Related
I am trying to use a package where the table they've used is in a certain format, I am very new to R and don't know how to get my data in this same format to be able to use the package.
Their table looks like this:
Recipient
Actor 1 10 11 12 2 3 4 5 6 7 8 9
1 0 0 0 1 3 1 1 2 3 0 2 6
10 1 0 0 1 0 0 0 0 0 0 0 0
11 13 5 0 5 3 8 0 1 3 2 2 9
12 0 0 2 0 1 1 1 3 1 1 3 0
2 0 0 2 0 0 1 0 0 0 2 2 1
3 9 9 0 5 16 0 2 8 21 45 13 6
4 21 28 64 22 40 79 0 16 53 76 43 38
5 2 0 0 0 0 0 1 0 3 0 0 1
6 11 22 4 21 13 9 2 3 0 4 39 8
7 5 32 11 9 16 1 0 4 33 0 17 22
8 4 0 2 0 1 11 0 0 0 1 0 1
9 0 0 3 1 0 0 1 0 0 0 0 0
Where mine at the moment is:
X0 X1 X2 X3 X4 X5
0 0 2 3 3 0 0
1 1 0 4 2 0 0
2 0 0 0 0 0 0
3 0 2 2 0 1 0
4 0 0 3 2 0 2
5 0 0 3 3 1 0
I would like to add the recipient and actor to mine, as well as change to row and column names to 1, ..., 6.
Also my data is listed under Data in my Workspace and it says:
'num' [1:6,1:6] 0 1 ...
Whereas the example data in the workspace is shown in Values as:
'table' num [1:12,1:12] 0 1 13 ...
Please let me know if you have suggestion to get my data in the same type and style as theirs, all help is greatly appreciated!
OK, so you have a matrix like so:
m <- matrix(c(1:9), 3)
rownames(m) <- 0:2
colnames(m) <- paste0("X", 0:2)
# X0 X1 X2
#0 1 4 7
#1 2 5 8
#2 3 6 9
First you need to remove the Xs and turn it into a table:
colnames(m) <- sub("X", "", colnames(m))
m <- as.table(m)
# 0 1 2
#0 1 4 7
#1 2 5 8
#2 3 6 9
Then you can set the dimension names:
names(dimnames(m)) <- c("Actor", "Recipient")
# Recipient
#Actor 0 1 2
# 0 1 4 7
# 1 2 5 8
# 2 3 6 9
However, usually you would create the contingency table from raw data using the table function, which would automatically return a table object. So, maybe you should fix the step creating your matrix?
I have a question regarding creating new columns if a certain value appears in an existing row.
N=5
T=5
time<-rep(1:T, times=N)
id<- rep(1:N,each=T)
dummy<- c(0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,1,0)
df <- data.frame(id, time, dummy)
id time dummy
1 1 1 0
2 1 2 0
3 1 3 1
4 1 4 1
5 1 5 0
6 2 1 0
7 2 2 0
8 2 3 1
9 2 4 0
10 2 5 0
11 3 1 0
12 3 2 1
13 3 3 0
14 3 4 1
15 3 5 0
16 4 1 0
17 4 2 0
18 4 3 0
19 4 4 0
20 4 5 0
21 5 1 1
22 5 2 0
23 5 3 0
24 5 4 1
25 5 5 0
In this case we have some cross-sections in which more than one 1 appears. Now I try to create a new dummy variable/column for each additional 1. After that, for each dummy, the rows for each cross-section should also be filled with a 1 after the first 1 appears. I can fill the rows by using group_by(id) and the cummax function on each column. But how do I get new variables without going through every cross-section manually? So I want to achieve the following:
id time dummy dummy2
1 1 1 0 0
2 1 2 0 0
3 1 3 1 0
4 1 4 1 1
5 1 5 1 1
6 2 1 0 0
7 2 2 0 0
8 2 3 1 0
9 2 4 1 0
10 2 5 1 0
11 3 1 0 0
12 3 2 1 0
13 3 3 1 0
14 3 4 1 1
15 3 5 1 1
16 4 1 0 0
17 4 2 0 0
18 4 3 0 0
19 4 4 0 0
20 4 5 0 0
21 5 1 1 0
22 5 2 1 0
23 5 3 1 0
24 5 4 1 1
25 5 5 1 1
Thanks! :)
You can use cummax and you would need cumsum to create dummy2
df %>%
group_by(id) %>%
mutate(dummy1 = cummax(dummy), # don't alter 'dummy' here we need it in the next line
dummy2 = cummax(cumsum(dummy) == 2)) %>%
as.data.frame() # needed only to display the entire result
# id time dummy dummy1 dummy2
#1 1 1 0 0 0
#2 1 2 0 0 0
#3 1 3 1 1 0
#4 1 4 1 1 1
#5 1 5 0 1 1
#6 2 1 0 0 0
#7 2 2 0 0 0
#8 2 3 1 1 0
#9 2 4 0 1 0
#10 2 5 0 1 0
#11 3 1 0 0 0
#12 3 2 1 1 0
#13 3 3 0 1 0
#14 3 4 1 1 1
#15 3 5 0 1 1
#16 4 1 0 0 0
#17 4 2 0 0 0
#18 4 3 0 0 0
#19 4 4 0 0 0
#20 4 5 0 0 0
#21 5 1 1 1 0
#22 5 2 0 1 0
#23 5 3 0 1 0
#24 5 4 1 1 1
#25 5 5 0 1 1
I'm not sure how to phrase this question. I have a some data which im trying to get into a different format (maybe even an array) so that i can vectorize it. This isn't very concrete, so here's a simplified example:
I have a file like dt, say:
set.seed(1)
time = 1:10
size <- round(runif(10), digits = 1)
count <- round(runif(10)*20)
dt <- data.frame(time,size, count)
dt
time size count
1 1 0.3 4
2 2 0.4 4
3 3 0.6 14
4 4 0.9 8
5 5 0.2 15
6 6 0.9 10
7 7 0.9 14
8 8 0.7 20
9 9 0.6 8
10 10 0.1 16
and i want to end up with...
time size_0.1 size_0.2 size_0.3 size_0.4 size_0.6 size_0.7 size_0.9
1 1 0 0 4 0 0 0 0
2 2 0 0 0 4 0 0 0
3 3 0 0 0 0 14 0 0
4 4 0 0 0 0 0 0 8
5 5 0 15 0 0 0 0 0
6 6 0 0 0 0 0 0 10
7 7 0 0 0 0 0 0 14
8 8 0 0 0 0 0 20 0
9 9 0 0 0 0 8 0 0
10 10 16 0 0 0 0 0 0
which has introduced all the possible results for the size variable as new variables.
Then do a cumulative sum on to get something like this, but really that previous step is the trickiest:
time size_0.1 size_0.2 size_0.3 size_0.4 size_0.6 size_0.7 size_0.9
1 1 0 0 4 0 0 0 0
2 2 0 0 4 4 0 0 0
3 3 0 0 4 4 14 0 0
4 4 0 0 4 4 14 0 8
5 5 0 15 4 4 14 0 8
6 6 0 15 4 4 14 0 18
7 7 0 15 4 4 14 0 32
8 8 0 15 4 4 14 20 32
9 9 0 15 4 4 22 20 32
10 10 16 15 4 4 22 20 32
We can use dcast to create the 'size' columns, and then loop over the 'size' columns (lapply(...) and do the cumsum.
library(reshape2)
dt1 <- dcast(dt, time~paste0('size_', size), value.var='count', fill=0)
dt1[-1] <- lapply(dt1[-1], cumsum)
dt1
# time size_0.1 size_0.2 size_0.3 size_0.4 size_0.6 size_0.7 size_0.9
#1 1 0 0 4 0 0 0 0
#2 2 0 0 4 4 0 0 0
#3 3 0 0 4 4 14 0 0
#4 4 0 0 4 4 14 0 8
#5 5 0 15 4 4 14 0 8
#6 6 0 15 4 4 14 0 18
#7 7 0 15 4 4 14 0 32
#8 8 0 15 4 4 14 20 32
#9 9 0 15 4 4 22 20 32
#10 10 16 15 4 4 22 20 32
I have a data.frame with a factor identifying events
year event
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 1
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 1
18 0
19 0
20 0
And I would need a counter-type identifying a given window around the events. The result should look like this (for a window that is, for example, 3 periods around the event):
year event window
1 0
2 0
3 0
4 0
5 0
6 0 -3
7 0 -2
8 0 -1
9 1 0
10 0 1
11 0 2
12 0 3
13 0
14 0 -3
15 0 -2
16 0 -1
17 1 0
18 0 1
19 0 2
20 0 3
Any guidance on how to implement this within a function would be appreciated. You can copy the data. frame, pasting the block above in "..." here:
dt <- read.table( text="...", , header = TRUE )
Assuming there is no overlapping, you can use on of my favourite base functions, filter:
DF <- read.table(text="year event
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 1
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 1
18 0
19 0
20 0", header=TRUE)
DF$window <- head(filter(c(rep(0, 3), DF$event, rep(0, 3)),
filter=-3:3)[-(1:3)], -3)
DF$window[DF$window == 0 & DF$event==0] <- NA
# year event window
# 1 1 0 NA
# 2 2 0 NA
# 3 3 0 NA
# 4 4 0 NA
# 5 5 0 NA
# 6 6 0 -3
# 7 7 0 -2
# 8 8 0 -1
# 9 9 1 0
# 10 10 0 1
# 11 11 0 2
# 12 12 0 3
# 13 13 0 NA
# 14 14 0 -3
# 15 15 0 -2
# 16 16 0 -1
# 17 17 1 0
# 18 18 0 1
# 19 19 0 2
# 20 20 0 3
I have a dataframe with many rows, but the structure looks like this:
year factor
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 1
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 1
18 0
19 0
20 0
I would need to add a counter as a third column. It should count the cumulative cells that contains zero until it set again to zero once the value 1 is encountered. The result should look like this:
year factor count
1 0 0
2 0 1
3 0 2
4 0 3
5 0 4
6 0 5
7 0 6
8 0 7
9 1 0
10 0 1
11 0 2
12 0 3
13 0 4
14 0 5
15 0 6
16 0 7
17 1 0
18 0 1
19 0 2
20 0 3
I would be glad to do it in a quick way, avoiding loops, since I have to do the operations for hundreds of files.
You can copy my dataframe, pasting the dataframe in "..." here:
dt <- read.table( text="...", , header = TRUE )
Perhaps a solution like this with ave would work for you:
A <- cumsum(dt$factor)
ave(A, A, FUN = seq_along) - 1
# [1] 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3
Original answer:
(Missed that the first value was supposed to be "0". Oops.)
x <- rle(dt$factor == 1)
y <- sequence(x$lengths)
y[dt$factor == 1] <- 0
y
# [1] 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 0 1 2 3