how to create r matrix or table from data - r

My data format in csv is the following. I would like to create a matrix for heatmap using this file. R gglot i am going to use.
A B C
1 apple 3
2 book 5
4 bag 1
9 desk 4
10 apple 8
11 book 66
14 desk 2
I would like to create a matrix for heatmap using that above file.
1 2 4 9 10 11 14
apple 3 0 0 0 8 0 0
book 0 5 0 0 0 66 0
bag 0 0 1 0 0 0 0
desk 0 0 0 4 0 0 2
i have another column in initial file for ordering.
A B C D
1 apple 3 4
2 book 5 1
4 bag 1 2
9 desk 4 3
10 apple 8 4
11 book 66 1
14 desk 2 3
how can i order my matrix due to this D ordering column? or i would like to order by sum of 1-14 column.

You can use xtabs.
d <- read.delim(textConnection("
A B C
1 apple 3
2 book 5
4 bag 1
9 desk 4
10 apple 8
11 book 66
14 desk 2
"), sep=" ")
xtabs(C ~ B + A, d)
A
B 1 2 4 9 10 11 14
0 0 0 0 0 0 0
apple 3 0 0 0 8 0 0
bag 0 0 1 0 0 0 0
book 0 5 0 0 0 66 0
desk 0 0 0 4 0 0 2

You can do this with read.table. You can get the help for choosing the correct parameters by typing ?read.table into your R-GUI.

Using the read.delim part from Vincent above and a reshape approach. Not as elegant...
d <- read.delim(textConnection("
A B C
1 apple 3
2 book 5
4 bag 1
9 desk 4
10 apple 8
11 book 66
14 desk 2
"), sep=" ")
Var1 <- rep(d[,1], d[,3])
Var2 <- rep(d[,2], d[,3])
d <- data.frame(Var1=Var1, Var2=Var2)
d <- cast(melt(d), Var2~value)
> d
Var2 1 2 4 9 10 11 14
1 apple 3 0 0 0 8 0 0
2 bag 0 0 1 0 0 0 0
3 book 0 5 0 0 0 66 0
4 desk 0 0 0 4 0 0 2

Related

formatting table/matrix in R

I am trying to use a package where the table they've used is in a certain format, I am very new to R and don't know how to get my data in this same format to be able to use the package.
Their table looks like this:
Recipient
Actor 1 10 11 12 2 3 4 5 6 7 8 9
1 0 0 0 1 3 1 1 2 3 0 2 6
10 1 0 0 1 0 0 0 0 0 0 0 0
11 13 5 0 5 3 8 0 1 3 2 2 9
12 0 0 2 0 1 1 1 3 1 1 3 0
2 0 0 2 0 0 1 0 0 0 2 2 1
3 9 9 0 5 16 0 2 8 21 45 13 6
4 21 28 64 22 40 79 0 16 53 76 43 38
5 2 0 0 0 0 0 1 0 3 0 0 1
6 11 22 4 21 13 9 2 3 0 4 39 8
7 5 32 11 9 16 1 0 4 33 0 17 22
8 4 0 2 0 1 11 0 0 0 1 0 1
9 0 0 3 1 0 0 1 0 0 0 0 0
Where mine at the moment is:
X0 X1 X2 X3 X4 X5
0 0 2 3 3 0 0
1 1 0 4 2 0 0
2 0 0 0 0 0 0
3 0 2 2 0 1 0
4 0 0 3 2 0 2
5 0 0 3 3 1 0
I would like to add the recipient and actor to mine, as well as change to row and column names to 1, ..., 6.
Also my data is listed under Data in my Workspace and it says:
'num' [1:6,1:6] 0 1 ...
Whereas the example data in the workspace is shown in Values as:
'table' num [1:12,1:12] 0 1 13 ...
Please let me know if you have suggestion to get my data in the same type and style as theirs, all help is greatly appreciated!
OK, so you have a matrix like so:
m <- matrix(c(1:9), 3)
rownames(m) <- 0:2
colnames(m) <- paste0("X", 0:2)
# X0 X1 X2
#0 1 4 7
#1 2 5 8
#2 3 6 9
First you need to remove the Xs and turn it into a table:
colnames(m) <- sub("X", "", colnames(m))
m <- as.table(m)
# 0 1 2
#0 1 4 7
#1 2 5 8
#2 3 6 9
Then you can set the dimension names:
names(dimnames(m)) <- c("Actor", "Recipient")
# Recipient
#Actor 0 1 2
# 0 1 4 7
# 1 2 5 8
# 2 3 6 9
However, usually you would create the contingency table from raw data using the table function, which would automatically return a table object. So, maybe you should fix the step creating your matrix?

How to create a list of data frames as a result of subsetting an old one based on some conditions?

I have the following data frame:
T a b c
1 1 0 0 0
2 2 1 0 0
3 5 1 0 0
4 6 1 0 0
5 7 0 1 0
6 9 0 1 0
7 10 0 0 1
8 12 0 0 0
9 14 0 0 0
10 15 1 0 0
11 16 1 0 0
12 17 0 1 0
13 18 0 0 1
I want to subset this data frame and create a list of data frames. Each data frame has to be populated with the rows (of the old one) that there is a sequence of successively "1" in a column, then in b column and last in c column. The expected result (for this data frame) would be a list of 2 data frames:
data frame 1:
T a b c
1 2 1 0 0
2 5 1 0 0
3 6 1 0 0
4 7 0 1 0
5 9 0 1 0
6 10 0 0 1
and data frame 2:
T a b c
1 15 1 0 0
2 16 1 0 0
3 17 0 1 0
4 18 0 0 1
Any ideas?
Thank you in advance!
Based on the expected output
i1 <- do.call(pmax, df1[-1])
grp <- inverse.rle(within.list(rle(i1 ==1), {values <- seq_along(values)}))
split(df1[i1==1,], grp[i1==1])
#$`2`
# T a b c
#2 2 1 0 0
#3 5 1 0 0
#4 6 1 0 0
#5 7 0 1 0
#6 9 0 1 0
#7 10 0 0 1
#$`4`
# T a b c
#10 15 1 0 0
#11 16 1 0 0
#12 17 0 1 0
#13 18 0 0 1

Finding variance of columns from 2 dataframes

I have 2 dataframes
DataFrame A and Dataframe B.
A <- data.frame(a=c(1,2,3,4,5),b=c(2,4,6,8,10),c=c(3,6,9,12,15),x=c(4,8,12,16,20),y=c(5,10,15,20,25))
B <- data.frame(a=c(1,2,3,4,5),b=c(2,4,6,8,10),c=c(3,6,9,12,15),x=c(4,8,12,16,20),y=c(5,10,15,20,25))
A
a b c x y
1 2 3 4 5
2 4 6 8 10
3 6 9 12 15
4 8 12 16 20
5 10 15 20 25
B
a b c x y
1 2 3 4 5
2 4 6 8 10
3 6 9 12 15
4 8 12 16 20
5 10 15 20 25
Expected Output:
C
a b c x y
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 0 0 0 0
Both have a key column which is alpha-numeric.
Both dataframes have 260 columns in all out of which 250 are float.
Is there an eaiser way to easily compute the variance of each of the 250 columns and store the variance in another dataframe?
I think you want difference brtween respective columns of two dataframes
temp = names(A)
data.frame(A["a"], do.call(cbind, lapply(temp[!temp %in% "a"], function(x) A[x] - B[x])))
# a b c x y
#1 1 0 0 0 0
#2 2 0 0 0 0
#3 3 0 0 0 0
#4 4 0 0 0 0
#5 5 0 0 0 0
We can use Map/mapply to find the difference between the corresponding columns of 'A' and 'B'
cbind(A[1], mapply(`-`, A[-1], B[names(A)[-1]]))
# a b c x y
#1 1 0 0 0 0
#2 2 0 0 0 0
#3 3 0 0 0 0
#4 4 0 0 0 0
#5 5 0 0 0 0
Or just
cbind(A[1], A[-1] - B[-1])

How to count number of particular values

My data looks like this:
ID CO MV
1 0 1
1 5 0
1 0 1
1 9 0
1 8 0
1 0 1
2 69 0
2 0 1
2 8 0
2 0 1
2 78 0
2 53 0
2 0 1
2 3 0
3 54 0
3 0 1
3 8 0
3 90 0
3 0 1
3 56 0
4 0 1
4 56 0
4 0 1
4 45 0
4 0 1
4 34 0
4 31 0
4 0 1
4 45 0
5 0 1
5 0 1
5 67 0
I want it to look like this:
ID CO MV CONUM
1 0 1 3
1 5 0 3
1 0 1 3
1 9 0 3
1 8 0 3
1 0 1 3
2 69 0 5
2 0 1 5
2 8 0 5
2 0 1 5
2 78 0 5
2 53 0 5
2 0 1 5
2 3 0 5
3 54 0 4
3 0 1 4
3 8 0 4
3 90 0 4
3 0 1 4
3 56 0 4
4 0 1 5
4 56 0 5
4 0 1 5
4 45 0 5
4 0 1 5
4 34 0 5
4 31 0 5
4 0 1 5
4 45 0 5
5 0 1 1
5 0 1 1
5 67 0 1
I want to create a column CONUM which is the total number of values other than zero in the CO column for each value in the ID column. So for example the CO column for ID 1 has 3 values other than zero, therefore the corresponding values in CONUM column is 3. The MV column is 0 if CO column has a value and 1 if CO column is 0. So another way to accomplish creating the CONUM column would be to count the number of zeros per ID . It would be great if you could help me with the r code to accomplish this. Thanks.
Here is an option with data.table
library(data.table)
setDT(df)[,CONUM:=sum(CO!=0) ,ID][]
You can use ave in base R:
dat <- transform(dat, CONUM = ave(as.logical(CO), ID, FUN = sum))
and an option with dplyr
# install.packages("dplyr")
library(dplyr)
dat <- dat %>%
group_by(ID) %>%
mutate(CONUM = sum(CO != 0))

cumulative counter in dataframe R

I have a dataframe with many rows, but the structure looks like this:
year factor
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 1
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 1
18 0
19 0
20 0
I would need to add a counter as a third column. It should count the cumulative cells that contains zero until it set again to zero once the value 1 is encountered. The result should look like this:
year factor count
1 0 0
2 0 1
3 0 2
4 0 3
5 0 4
6 0 5
7 0 6
8 0 7
9 1 0
10 0 1
11 0 2
12 0 3
13 0 4
14 0 5
15 0 6
16 0 7
17 1 0
18 0 1
19 0 2
20 0 3
I would be glad to do it in a quick way, avoiding loops, since I have to do the operations for hundreds of files.
You can copy my dataframe, pasting the dataframe in "..." here:
dt <- read.table( text="...", , header = TRUE )
Perhaps a solution like this with ave would work for you:
A <- cumsum(dt$factor)
ave(A, A, FUN = seq_along) - 1
# [1] 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3
Original answer:
(Missed that the first value was supposed to be "0". Oops.)
x <- rle(dt$factor == 1)
y <- sequence(x$lengths)
y[dt$factor == 1] <- 0
y
# [1] 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 0 1 2 3

Resources