Fill a matrix in R based on codes - r

I have a file with codes like this:
V1 V2
1 1.0000000
2 0.2000000
3 0.5000000
4 0.0000000
And one matrix with the codes like the following:
1 1 1 1 1 1 1
3 3 3 3 3 3 3
4 2 4 2 4 2 4
1 1 1 1 1 1 1
I would like to use a loop to make a new matrix in which each value is the value of the codes as follows:
1.0 1.0 1.0 1.0 1.0 1.0 1.0
0.5 0.5 0.5 0.5 0.5 0.5 0.5
0.0 0.2 0.0 0.2 0.0 0.2 0.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0
Any ideas?

Here you can do it somewhat easier because of labels, but in general you can use match:
data:
df_map <- data.frame(
V1 = c(1, 2, 3, 4),
V2 = c(1, 0.2, 0.5, 0)
)
m_codes <- matrix(sample(1:4, 32, TRUE), nrow = 4)
solution:
m_values <- matrix(df_map$V2[match(m_codes, df_map$V1)], nrow = nrow(m_codes))

Related

How to find the first column with a certain value for each row with dplyr

I have a dataset like this:
df <- data.frame(id=c(1:4), time_1=c(1, 0.9, 0.2, 0), time_2=c(0.1, 0.4, 0, 0.9), time_3=c(0,0.5,0.3,1.0))
id time_1 time_2 time_3
1 1.0 0.1 0
2 0.9 0.4 0.5
3 0.2 0 0.3
4 0 0.9 1.0
And I want to identify for each row, the first column containing a 0, and extract the corresponding number (as the last element of colname), obtaining this:
id time_1 time_2 time_3 count
1 1.0 0.1 0 3
2 0.9 0.4 0.5 NA
3 0.2 0 0.3 2
4 0 0.9 1.0 1
Do you have a tidyverse solution?
We may use max.col
v1 <- max.col(df[-1] ==0, "first")
v1[rowSums(df[-1] == 0) == 0] <- NA
df$count <- v1
-output
> df
id time_1 time_2 time_3 count
1 1 1.0 0.1 0.0 3
2 2 0.9 0.4 0.5 NA
3 3 0.2 0.0 0.3 2
4 4 0.0 0.9 1.0 1
Or using dplyr - use if_any to check if there are any 0 in the 'time' columns for each row, if there are any, then return the index of the 'first' 0 value with max.col (pick is from devel version, can replace with across) within the case_when
library(dplyr)
df %>%
mutate(count = case_when(if_any(starts_with("time"), ~ .x== 0) ~
max.col(pick(starts_with("time")) ==0, "first")))
-output
id time_1 time_2 time_3 count
1 1 1.0 0.1 0.0 3
2 2 0.9 0.4 0.5 NA
3 3 0.2 0.0 0.3 2
4 4 0.0 0.9 1.0 1
You can do this:
df <- df %>%
rowwise() %>%
mutate (count = which(c_across(starts_with("time")) == 0)[1])
df
id time_1 time_2 time_3 count
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1 0.1 0 3
2 2 0.9 0.4 0.5 NA
3 3 0.2 0 0.3 2
4 4 0 0.9 1 1

Create proportion matrix for multivariate categorical data

Suppose I've got this data simulated from the below R code:
library(RNGforGPD)
set.seed(1)
sample.size = 10; no.gpois = 3
lambda.vec = c(-0.2, 0.2, -0.3); theta.vec = c(1, 3, 4)
M = c(0.352, 0.265, 0.342); N = diag(3); N[lower.tri(N)] = M
TV = N + t(N); diag(TV) = 1
cstar = CmatStarGpois(TV, theta.vec, lambda.vec, verbose = TRUE)
data = GenMVGpois(sample.size, no.gpois, cstar, theta.vec, lambda.vec, details = FALSE)
> prop.table(table(data[,1]))
0 1 2
0.3 0.4 0.3
> prop.table(table(data[,2]))
2 3 6 8 10
0.2 0.4 0.1 0.2 0.1
> prop.table(table(data[,3]))
2 3 4 5 6
0.2 0.3 0.1 0.3 0.1
> table(data)
data
0 1 2 3 4 5 6 8 10
3 4 7 7 1 3 2 2 1
I'd like to create a proportion matrix for each of the three categorical variables. If the category is missing for a specific column, it will be identified as 0.
Cat
X1
X2
X3
0
0.3
0.0
0.0
1
0.4
0.0
0.0
2
0.3
0.2
0.2
3
0.0
0.4
0.3
4
0.0
0.0
0.1
5
0.0
0.0
0.3
6
0.0
0.1
0.1
8
0.0
0.2
0.0
10
0.0
0.1
0.0
This is the data-object:
dput(data)
structure(c(1, 0, 2, 1, 0, 0, 1, 2, 2, 1, 3, 8, 3, 3, 2, 2, 6,
3, 10, 8, 2, 5, 2, 6, 3, 3, 4, 3, 5, 5), .Dim = c(10L, 3L), .Dimnames = list(
NULL, NULL))
Tried to put logic at appropriate points in code sequence.
props <- data.frame(Cat = sort(unique(c(data))) ) # Just the Cat column
#Now fill in the entries
# the entries will be obtained with table function
apply(data, 2, table) # run `table(.)` over the columns individually
[[1]]
0 1 2 # these are actually character valued names
3 4 3 # while these are the count values
[[2]]
2 3 6 8 10
2 4 1 2 1
[[3]]
2 3 4 5 6
2 3 1 3 1
Now iterate over that list to fill in values that match the Cat column:
props2 <- cbind(props, # using dfrm first argument returns dataframe object
lapply( apply(data, 2, table) , # irregular results are a list
function(col) { # first make a named vector of zeros
x <- setNames(rep(0,length(props$Cat)), props$Cat)
# could have skipped that step by using `tabulate`
# then fill with values using names as indices
x[names(col)] <- col # values to matching names
x}) )
props2
#-------------
Cat V1 V2 V3
0 0 3 0 0
1 1 4 0 0
2 2 3 2 2
3 3 0 4 3
4 4 0 0 1
5 5 0 0 3
6 6 0 1 1
8 8 0 2 0
10 10 0 1 0
#---
# now just "proportionalize" those counts
props2[2:4] <- prop.table(data.matrix(props2[2:4]), margin=2)
props2
#-------------
Cat V1 V2 V3
0 0 0.3 0.0 0.0
1 1 0.4 0.0 0.0
2 2 0.3 0.2 0.2
3 3 0.0 0.4 0.3
4 4 0.0 0.0 0.1
5 5 0.0 0.0 0.3
6 6 0.0 0.1 0.1
8 8 0.0 0.2 0.0
10 10 0.0 0.1 0.0
colnames(data) <- c("X1", "X2", "X3")
as_tibble(data) %>%
pivot_longer(cols = "X1":"X3", values_to = "Cat") %>%
group_by(name, Cat) %>%
count() %>%
ungroup(Cat) %>%
summarize(name, Cat, proportion = n / sum(n)) %>%
pivot_wider(names_from = name, values_from = proportion) %>%
arrange(Cat) %>%
replace(is.na(.), 0)
# A tibble: 9 × 4
Cat X1 X2 X3
<dbl> <dbl> <dbl> <dbl>
1 0 0.3 0 0
2 1 0.4 0 0
3 2 0.3 0.2 0.2
4 3 0 0.4 0.3
5 4 0 0 0.1
6 5 0 0 0.3
7 6 0 0.1 0.1
8 8 0 0.2 0
9 10 0 0.1 0
If you would like it as a matrix, you can use as.matrix()

Quartile sorter with externally specified quartile breakpoints in R data.table

I want to sort observations into quartiles on the variable "varbl". Since my data is pretty big (2Gb), I am trying to implement it via data.table. The problem is that I need to use external quartile breaks, which are group-specific. The group variable is "prd" or "prd1".
My data and breakpoints are as follows:
data <- data.table(id = c(1,2,3,4,5,1,2,3,4,5), prd1 = c(1,1,1,1,1,2,2,2,2,2), varbl = c(-1.6, -0.7, 0.1, 1.2, -0.5, -0.8, 0.4, 1.2, 1.9, 4))
bks <- data.table(prd=c(1,2), br0 = c(-5,-5), br1=c(-1,0), br2=c(0, 0.5), br3=c(1, 3), br4=c(5,5))
> data
id prd1 varbl
1: 1 1 -1.6
2: 2 1 -0.7
3: 3 1 0.1
4: 4 1 1.2
5: 5 1 -0.5
6: 1 2 -0.8
7: 2 2 0.4
8: 3 2 1.2
9: 4 2 1.9
10: 5 2 4.0
> bks
prd br0 br1 br2 br3 br4
1: 1 -5 -1 0.0 1 5
2: 2 -5 0 0.5 3 5
The desired output is:
> output
id prd1 varbl ntile
1: 1 1 -1.6 1
2: 2 1 -0.7 2
3: 3 1 0.1 3
4: 4 1 1.2 4
5: 5 1 -0.5 2
6: 1 2 -0.8 1
7: 2 2 0.4 2
8: 3 2 1.2 3
9: 4 2 1.9 3
10: 5 2 4.0 4
I tried the following code, but it fails, since I can not subset bks on the same prd as the current prd1 from data:
data[, ntile := cut(varbl, breaks = bks[prd==prd1], include.lowest=TRUE, labels = 1:4)]
As another attempt, I tried to join data and bks first (I would prefer not to as it will increase the size of data from 2Gb to 4Gb)
and then sort observations into quantiles. It fails, since I can not understand how to use column names to construct a vector of breakpoints for every row. None of the attempts worked.
setnames(data, "prd1", "prd")
data <- data[bks, on="prd", nomatch=0]
data[, ntile := cut(varbl, breaks = .(br0, br1, br2, br3, br4), include.lowest=TRUE, labels=1:4)]
data[, ntile := cut(varbl, breaks = colnames(bks)[-1], include.lowest=TRUE, labels=1:4)]
data[, ntile := cut(varbl, breaks = c("br0", "br1", "br2", "br3", "br4"), include.lowest=TRUE, labels=1:4)]
Rearranging bks a little means you can do this as a join:
bks <- bks[, data.frame(embed(unlist(.SD),2)[,2:1]), by=prd]
bks[, grp := seq_len(.N), by=prd]
# prd X1 X2 grp
#1: 1 -5.0 -1.0 1
#2: 1 -1.0 0.0 2
#3: 1 0.0 1.0 3
#4: 1 1.0 5.0 4
#5: 2 -5.0 0.0 1
#6: 2 0.0 0.5 2
#7: 2 0.5 3.0 3
#8: 2 3.0 5.0 4
data[bks, on=c("prd1"="prd","varbl>=X1","varbl<X2"), grp := i.grp]
# id prd1 varbl grp
# 1: 1 1 -1.6 1
# 2: 2 1 -0.7 2
# 3: 3 1 0.1 3
# 4: 4 1 1.2 4
# 5: 5 1 -0.5 2
# 6: 1 2 -0.8 1
# 7: 2 2 0.4 2
# 8: 3 2 1.2 3
# 9: 4 2 1.9 3
#10: 5 2 4.0 4

Simulation and Scenarios in R + help for function

Suppose that I have the following.
A table with input data
table <- data.frame(id=c(1,2,3,4,5,6),
cost=c(100,200,300,400,500,600))
A list of possible outcomes with and associate probability
values<-list(c(1),
c(0.5),
c(0))
A simulation of different scenarios
esc<-sample(1:3,100,replace=T)
How can I add a new column which contains the next formula?
id cost final
1 100 100*ifelse(esc[1]==1,values[[1]],ifelse(esc[1]==2,values[[2]],values[[3]]))
2 200 200*ifelse(esc[2]==1,values[[1]],ifelse(esc[2]==2,values[[2]],values[[3]]))
Convert esc variable into factor by using values as labels. Then convert into numeric type. This will map values to esc correctly.
esc <- as.numeric ( as.character( factor( esc, levels = sort( unique( esc )), labels = values) ) )
# [1] 1.0 0.5 0.5 0.0 1.0 0.0 0.0 0.5 0.5 1.0 1.0 1.0 0.0 0.5 0.0 0.5 0.0 0.0 0.5 0.0 0.0 1.0 0.5 1.0 1.0 0.5 1.0 0.5 0.0 0.5 0.5 0.5 0.5 1.0 0.0 0.0 0.0
# [38] 1.0 0.0 0.5 0.0 0.5 0.0 0.5 0.5 0.0 1.0 0.5 0.0 0.0 0.5 0.0 0.5 1.0 1.0 1.0 1.0 0.5 0.5 0.5 0.0 1.0 0.5 1.0 0.5 1.0 0.5 0.0 1.0 0.0 0.5 0.0 0.5 0.5
# [75] 0.5 0.0 0.0 0.5 0.0 0.0 0.5 0.0 0.5 1.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 0.5 0.0 0.0 0.0 0.5 0.5 0.0 0.5
table$esc <- esc[ 1: nrow(table) ] # add esc to table
Now multiply cost with esc to get final
within( table, final <- cost * esc)
# id cost esc final
# 1 1 100 1.0 100
# 2 2 200 0.5 100
# 3 3 300 0.5 150
# 4 4 400 0.0 0
# 5 5 500 1.0 500
# 6 6 600 0.0 0
Data:
table <- data.frame(id=c(1,2,3,4,5,6), cost=c(100,200,300,400,500,600))
values <- c(1, 0.5, 0)
set.seed(1L)
esc <- sample(1:3,100,replace=T)
esc
# [1] 1 2 2 3 1 3 3 2 2 1 1 1 3 2 3 2 3 3 2 3 3 1 2 1 1 2 1 2 3 2 2 2 2 1 3 3 3 1 3 2 3 2 3 2 2 3 1 2 3 3 2 3 2 1 1 1 1 2 2 2 3 1 2 1 2 1 2 3 1 3 2 3 2 2 2
# [76] 3 3 2 3 3 2 3 2 1 3 1 3 1 1 1 1 1 2 3 3 3 2 2 3 2

Reshape matrix to data frame

I have association matrix file that looks like this (4 rows and 3 columns) .
test=read.table("test.csv", sep=",", header=T)
head(test)
LosAngeles SanDiego Seattle
1 2 3
A 1 0.1 0.2 0.2
B 2 0.2 0.4 0.2
C 3 0.3 0.5 0.3
D 4 0.2 0.5 0.1
What I want to is reshape this matrix file into data frame. The result should look something like this (12(= 4 * 3) rows and 3 columns):
RowNum ColumnNum Value
1 1 0.1
2 1 0.2
3 1 0.3
4 1 0.2
1 2 0.2
2 2 0.4
3 2 0.5
4 2 0.5
1 3 0.2
2 3 0.2
3 3 0.3
4 3 0.1
That is, if my matrix file has 100 rows and 90 columns. I want to make new data frame file that contains 9000 (= 100 * 90) rows and 3 columns. I've tried to use reshape package but but I do not seem to be able to get it right. Any suggestions how to solve this problem?
Use as.data.frame.table. Its the boss:
m <- matrix(data = c(0.1, 0.2, 0.2,
0.2, 0.4, 0.2,
0.3, 0.5, 0.3,
0.2, 0.5, 0.1),
nrow = 4, byrow = TRUE,
dimnames = list(row = 1:4, col = 1:3))
m
# col
# row 1 2 3
# 1 0.1 0.2 0.2
# 2 0.2 0.4 0.2
# 3 0.3 0.5 0.3
# 4 0.2 0.5 0.1
as.data.frame.table(m)
# row col Freq
# 1 1 1 0.1
# 2 2 1 0.2
# 3 3 1 0.3
# 4 4 1 0.2
# 5 1 2 0.2
# 6 2 2 0.4
# 7 3 2 0.5
# 8 4 2 0.5
# 9 1 3 0.2
# 10 2 3 0.2
# 11 3 3 0.3
# 12 4 3 0.1
This should do the trick:
test <- as.matrix(read.table(text="
1 2 3
1 0.1 0.2 0.2
2 0.2 0.4 0.2
3 0.3 0.5 0.3
4 0.2 0.5 0.1", header=TRUE))
data.frame(which(test==test, arr.ind=TRUE),
Value=test[which(test==test)],
row.names=NULL)
# row col Value
#1 1 1 0.1
#2 2 1 0.2
#3 3 1 0.3
#4 4 1 0.2
#5 1 2 0.2
#6 2 2 0.4
#7 3 2 0.5
#8 4 2 0.5
#9 1 3 0.2
#10 2 3 0.2
#11 3 3 0.3
#12 4 3 0.1

Resources