R How to reshape matrix using value in a column - r

I have matrix like this:
ID Count
1 2
2 3
3 2
I want to create a matrix in which the number of rows for an ID equals the value of Count while adding a new column containing the index for each row within the ID value. For the matrix above, the result should be:
ID Index
1 1
1 2
2 1
2 2
2 3
3 1
3 2

For a simple case you can just use rep and sequence.
ID=c(1,2,3)
Count=c(2,3,2)
cbind(ID=rep(ID, Count), Index=sequence(Count))
# ID Index
#[1,] 1 1
#[2,] 1 2
#[3,] 2 1
#[4,] 2 2
#[5,] 2 3
#[6,] 3 1
#[7,] 3 2

Using tidyverse
library(tidyverse)
df1 <- df %>%
group_by(ID) %>%
nest() %>%
mutate(data=map(data,~seq_along(1:.x$Count))) %>%
unnest(data)
Output
ID data
1 1 1
2 1 2
3 2 1
4 2 2
5 2 3
6 3 1
7 3 2

Related

Convert 3D array to tidy data frame?

I have a 3D array that looks like this:
# Create two vectors
vector1 <- c(1,2,3,4,5,6)
vector2 <- c(10, 11, 12, 13, 14, 15,16)
# Convert to 3D array
my_array <- array(c(vector1, vector2), dim = c(2,3,2))
print(my_array)
where the output is
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 10 12 14
[2,] 11 13 15
I would like to turn this into a tidy dataset, where is one row per value, and there are 4 columns for each of the values:
the value itself
dimension 1
dimension 2
dimension 3
so for example, a few rows would be
Value Dimension1(Row) Dimension2(Column) Dimension3(Width)
1 1 1 1
2 2 1 1
...
15 2 3 2
Is there a good way to do this in base R, or with tidyverse tools like tidyr?
We could use reshape2::melt
library(reshape2)
melt(my_array)
-output
Var1 Var2 Var3 value
1 1 1 1 1
2 2 1 1 2
3 1 2 1 3
4 2 2 1 4
5 1 3 1 5
6 2 3 1 6
7 1 1 2 10
8 2 1 2 11
9 1 2 2 12
10 2 2 2 13
11 1 3 2 14
12 2 3 2 15
Or use as.data.frame.table in base R
as.data.frame.table(my_array)
Or may also use
cbind(which(is.finite(my_array), arr.ind = TRUE), value = c(my_array))

Add dynamic subset conditions as variables into the data.frame

I have a data frame like
> x = data.frame(A=c(1,2,3),B=c(2,3,4))
> x
A B
1 1 2
2 2 3
3 3 4
and subsetting conditions in a data frame like
> cond = data.frame(condition=c('A>1','B>2 & B<4'))
> cond
condition
1 A>1
2 B>2 & B<4
which I then apply dynamically
> eval(parse(text=paste0("subset(x,",cond[1,'condition'],")")))
A B
2 2 3
3 3 4
> eval(parse(text=paste0("subset(x,",cond[2,'condition'],")")))
A B
2 2 3
Now, instead of subsetting, I would like to add the subsetting conditions as variables into the data. The end result would look like
A B condition1 condition2
1 1 2 0 0
2 2 3 1 1
3 3 4 1 0
How could I derive the above table using the dynamic conditions?
Before using eval parse, I hope you have gone through some readings like
What specifically are the dangers of eval(parse(…))?
and many others which are available.
However, to answer your question, we can continue your flow and use eval parse in sapply
+(sapply(seq_len(nrow(cond)), function(i)
eval(parse(text=paste0("with(x,",cond[i,'condition'],")")))))
# [,1] [,2]
#[1,] 0 0
#[2,] 1 1
#[3,] 1 0
To add it to the dataframe,
x[paste0("condition", 1:nrow(cond))] <-
+(sapply(seq_len(nrow(cond)), function(i)
eval(parse(text=paste0("with(x,",cond[i,'condition'],")")))))
x
# A B condition1 condition2
#1 1 2 0 0
#2 2 3 1 1
#3 3 4 1 0
Simplifying it a bit (using #jogo's comment)
+(sapply(cond$condition, function(i) with(x, eval(parse(text=as.character(i))))))
# [,1] [,2]
#[1,] 0 0
#[2,] 1 1
#[3,] 1 0
Here is an option using tidyverse
library(tidyverse)
x %>%
mutate(!!! rlang::parse_exprs(str_c(cond$condition, collapse=";"))) %>%
rename_at(3:4, ~ paste0("condition", 1:2))
# A B condition1 condition2
#1 1 2 FALSE FALSE
#2 2 3 TRUE TRUE
#3 3 4 TRUE FALSE
If needed, the logical columns can be easily converted to binary with as.integer

Get columns in frame based on values in second frame

I have 2 dataframes. One has a ID column with alot of arranged IDs.
The other one has just specific rows of the first column. Those are my markers.
I need to get the sum of the of the values in a specific column based on the id values of the second column.
The first column may be
id goals cards group
1 2 2 1
2 3 2 1
3 4 2 1
4 5 1 1
5 1 2 1
1 2 2 2
2 3 2 2
3 4 2 2
4 5 1 3
5 1 2 3
the second one:
id goals cards group
2 3 2 1
5 1 2 1
2 3 2 2
3 4 2 2
5 1 2 3
what i need to get:
id goals cards group points
1 2 2 1 2-(2+2)
2 3 2 1 0 cause in second list
3 4 2 1 4-(2+1+2)
4 5 1 1 5-(1+2)
5 1 2 1 0 cause in second list
1 2 2 2 2-(2+2)
2 3 2 2 0
3 4 2 2 0
4 5 1 3 5-(1+2)
5 1 2 3 0
Something like: ??
df1<- df1%>%
rowwise() %>%
mutate(points=
goals
-(sum( df1$cards[df1$id <= df2$id & df1$id>df1$id])))
df1 = read.table(text = "
id goals cards
1 2 2
2 3 2
3 4 2
4 5 1
5 1 2
", header=T)
df2 = read.table(text = "
id goals cards
2 3 2
5 1 2
", header=T)
library(dplyr)
# function that gets an id and returns the sum of cards based on df2
GetSumOfCards = function(x) {
ids = min(df2$id[df2$id >= x]) # for a given id of df1 find the minimum id in df2 that is bigger than this id
ifelse(x %in% df2$id, # if the given id exists in df2
0, # sum of cards is zero
sum(df1$cards[df1$id >= x & df1$id <= ids])) # otherwise get sum of cards in df1 from this id until the id obtained before
}
# update function to be vectorised
GetSumOfCards = Vectorize(GetSumOfCards)
df1 %>%
mutate(sum_cards = GetSumOfCards(id), # get sum of cards for each id using the function
points = goals - sum_cards) # get the points
# id goals cards sum_cards points
# 1 1 2 2 4 -2
# 2 2 3 2 0 3
# 3 3 4 2 5 -1
# 4 4 5 1 3 2
# 5 5 1 2 0 1
Based on your updated question, applying a similar function to every row makes the process very slow. So, this solution groups data in a way that you can just count the cards on chunks of data/rows:
df1 = read.table(text = "
id goals cards group
1 2 2 1
2 3 2 1
3 4 2 1
4 5 1 1
5 1 2 1
1 2 2 2
2 3 2 2
3 4 2 2
4 5 1 3
5 1 2 3
", header=T)
df2 = read.table(text = "
id goals cards group
2 3 2 1
5 1 2 1
2 3 2 2
3 4 2 2
5 1 2 3
", header=T)
library(dplyr)
df1 %>%
arrange(group, desc(id)) %>% # order by group and id descending (this will help with counting the cards)
left_join(df2 %>% # join specific columns of df2 and add a flag to know that this row exists in df2
select(id, group) %>%
mutate(flag = 1), by=c("id","group")) %>%
mutate(flag = ifelse(is.na(flag), 0, flag), # replace NA with 0
flag2 = cumsum(flag)) %>% # this flag will create the groups we need to count cards
group_by(group, flag2) %>% # for each new group (we need both as the card counting will change when we have a row from df2, or if group changes)
mutate(sum_cards = ifelse(flag == 1, 0, cumsum(cards))) %>% # get cummulative sum of cards unless the flag = 1, where we need 0 cards
ungroup() %>% # forget the grouping
arrange(group, id) %>% # back to original order
mutate(points = goals - sum_cards) %>% # calculate points
select(-flag, -flag2) # remove flags
# # A tibble: 10 x 6
# id goals cards group sum_cards points
# <int> <int> <int> <int> <dbl> <dbl>
# 1 1 2 2 1 4 -2
# 2 2 3 2 1 0 3
# 3 3 4 2 1 5 -1
# 4 4 5 1 1 3 2
# 5 5 1 2 1 0 1
# 6 1 2 2 2 4 -2
# 7 2 3 2 2 0 3
# 8 3 4 2 2 0 4
# 9 4 5 1 3 3 2
# 10 5 1 2 3 0 1

recode values into one column

I have a dataframe with one value per row, potentially in one of several columns. How can I create a single column that contains the column number the 1 is in? I would like to do this using dplyr, but the only methods I can think of involve for loops, which seems very not R like.
df<-data.frame(
a=c(1,0,0,0),
b=c(0,1,1,0),
c=c(0,0,0,1)
)
a b c
1 1 0 0
2 0 1 0
3 0 1 0
4 0 0 1
GOAL:
1 1
2 2
3 2
4 3
There is no need for dplyr here. This is what max.col() is for. Since all the other values in the row will be zero, then max.col() will give us the column number where the 1 appears.
max.col(df)
# [1] 1 2 2 3
If you need a column, then
data.frame(x = max.col(df))
# x
# 1 1
# 2 2
# 3 2
# 4 3
Or cbind() or matrix() for a matrix.
We could also do
as.matrix(df) %*%seq_along(df)
# [,1]
#[1,] 1
#[2,] 2
#[3,] 2
#[4,] 3
which(df==1, arr.ind=T)
# row col
# [1,] 1 1
# [2,] 2 2
# [3,] 3 2
# [4,] 4 3

R merging with a preference

Suppose you have a matrix that consists of two columns of only 1's and 2's.
A B
1 2
2 2
1 1
2 1
2 1
2 2
2 1
How would you merge these two columns into one so that 2 always overwrites 1?
Desired Output:
C
2
2
1
2
2
2
2
Assuming that the data is stored in a dataframe named df, you can use
df$C <- pmax(df$A, df$B)
to create a new column C with the desired result.
In the case of a matrix m you can use
m <- cbind(m, pmax(m[,1], m[,2]))
colnames(m) <- LETTERS[1:ncol(m)]
#> m
# A B C
#[1,] 1 2 2
#[2,] 2 2 2
#[3,] 1 1 1
#[4,] 2 1 2
#[5,] 2 1 2
#[6,] 2 2 2
#[7,] 2 1 2
#> class(m)
#[1] "matrix"
Without ifelse:
df$C <- apply(df[,c("A","B")],1,max)
With ifelse:
df$C2 <- with(df, ifelse(A==1&B==1,1,2))
Result
> df
A B C1 C2
1 1 2 2 2
2 2 2 2 2
3 1 1 1 1
4 2 1 2 2
5 2 1 2 2
6 2 2 2 2
7 2 1 2 2

Resources