melting matrices with logical values [duplicate] - r

This question already has answers here:
R matrix to rownames colnames values
(2 answers)
Closed 3 years ago.
I have a matrix with pairwise comparisons, of which the upper triangle and diagonal was set to NA.
df <- data.frame(a=c(NA,1,2), b=c(NA,NA,3), c=c(NA,NA,NA))
row.names(df) <- names(df)
I want to transform the matrix to long format, for which the standard procedure is to use reshape2's melt, followed by na.omit, so my desired output would be:
Var1 Var2 Value
a b 1
a c 2
b c 3
However, df$c is all NA and thus logical, and will be used as a non-measured variable by melt.
The output of melt(df) is therefore not what i am looking for.
library(reshape2)
melt(df)
How can I prevent melt from using df$c as id variable?

The trick is to convert the rownames to column and then convert to long format. A way to do it in tidyverse would be,
library(tidyverse)
df %>%
rownames_to_column() %>%
gather(var, val, -1) %>%
filter(!is.na(val))
# rowname var val
#1 b a 1
#2 c a 2
#3 c b 3
As #Humpelstielzche mentions in comments, there is a na.rm argument in gather so we can omit the last filtering, i.e.
df %>%
rownames_to_column() %>%
gather(var, val, -1, na.rm = TRUE)

While you have other answers already, this can be achieved with reshape2 and melt, if the appropriate function is called. In this case you don't want reshape2:::melt.data.frame but rather reshape2:::melt.matrix to be applied. So, try:
melt(as.matrix(df), na.rm=TRUE)
# Var1 Var2 value
#2 b a 1
#3 c a 2
#6 c b 3
If you then take a look at ?reshape2:::melt.data.frame you will see the statement:
This code is conceptually similar to ‘as.data.frame.table’
which means you could also use the somewhat more convoluted:
na.omit(as.data.frame.table(as.matrix(df), responseName="value"))
# Var1 Var2 value
#2 b a 1
#3 c a 2
#6 c b 3

In base R, we can use row and col to get row names and column names respectively and then filter the NA values.
df1 <- data.frame(col = colnames(df)[col(df)], row = rownames(df)[row(df)],
value = unlist(df), row.names = NULL)
df1[!is.na(df1$value), ]
# col row value
#2 a b 1
#3 a c 2
#6 b c 3

Related

R creating combinations with replacement

I have a small example like the following:
df1 = data.frame(Id1=c(1,2,3))
I want to obtain the list of all combinations with replacement which would look like this:
So far I have seen the following functions which produces some parts of the above table:
a) combn function
t(combn(df1$Id1,2))
# Does not creates rows 1,4 and 5 in the above image
b) expand.grid function
expand.grid(df1$Id1,df1$Id1)
# Duplicates rows 2,3 and 5. In my case the combination 1,2 and 2,1
#are the same. Hence I do not need both of them at the same time.
c) CJ function (from data.table)
#install.packages("data.table")
CJ(df1$Id1,df1$Id1)
#Same problem as the previous function
For your reference, I know that the in python I could do the same using the itertools package (link here: https://www.hackerrank.com/challenges/itertools-combinations-with-replacement/problem)
Is there a way to do this in R?
Here's an alternative using expand.grid by creating a unique key for every combination and then removing duplicates
library(dplyr)
expand.grid(df1$Id1,df1$Id1) %>%
mutate(key = paste(pmin(Var1, Var2), pmax(Var1, Var2), sep = "-")) %>%
filter(!duplicated(key)) %>%
select(-key) %>%
mutate(row = row_number())
# Var1 Var2 row
#1 1 1 1
#2 2 1 2
#3 3 1 3
#4 2 2 4
#5 3 2 5
#6 3 3 6

Operations on single row in dplyr [duplicate]

This question already has answers here:
dplyr mutate/replace several columns on a subset of rows
(12 answers)
Closed 3 years ago.
Is it possible to performa dplyr operations with pipes for single rows of a dataframe? For example say I have the following a dataframe (call it df) and want to do some manipulations to the columns of that dataframe:
df <- df %>%
mutate(col1 = col1 + col2)
This code sets one column equal to the sum of that column and another. What if I want to do this, but only for a single row?
df[1,] <- df[1,] %>%
mutate(col1 = col1 + col2)
I realize this is an easy operation in base R, but I am super curious and would love to use dplyr operations and piping to make this happen. Is this possible or does it go against dplyr grammar?
Here's an example. Say I have a dataframe:
df = data.frame(a = rep(1, 100), b = rep(1,100))
The first example I showed:
df <- df %>%
mutate(a = a + b)
Would result in column a_xPlacexHolderxColumnaPlacexHolderx_ being 2 for all rows.
The second example would only result in the first row of column a_xPlacexHolderxColumnaPlacexHolderx_ being 2.
mutate() is for creating columns.
You can do something like df[1,1] <- df[1,1] + df[1,2]
An Example:
You can mutate() and case_when() for conditional manipulation.
df %>%
mutate(a = case_when(row_number(a) == 1 ~ a + b,
TRUE ~ a))
results in
# A tibble: 100 x 2
a b
<dbl> <dbl>
1 2 1
2 1 1
3 1 1
4 1 1
5 1 1
6 1 1
7 1 1
8 1 1
9 1 1
10 1 1
# … with 90 more rows
Data
library(tidyverse)
df <- tibble(a = rep(1, 100), b = rep(1,100))

dplyr Update a cell in a data.frame

df <-data.frame(x=c(1:5),y=c(letters[1:5]))
Let's say I want to modify the last row,
update.row<-filter(df,x==5) %>% mutate(y="R")
How do I update this row into the data.frame ?
The only way, I found albeit a strange way is to do an
anti-join and append the results.
df <-anti_join(df,update.row,by="x") %>%
bind_rows(update.row)
However, it seems like a very inelegant way to achieve a simple task.
Any ideas are much appreciated...
With data.table, we can assign (:=) the value to the rows where i is TRUE. It is very efficient as the assignment is done in place.
library(data.table)
setDT(df)[x==5, y:="R"]
df
# x y
#1: 1 a
#2: 2 b
#3: 3 c
#4: 4 d
#5: 5 R
As the OP mentioned about the last row, a more general way is
setDT(df)[.N, y:= "R"]
Or as #thelatemail mentioned, if we want to replace any row just mention the row index in i i.e. in this case 5.
setDT(df)[5, y:="R"]
If you are insistant on dplyr, perhaps
df <-data.frame(x=c(1:5),y=c(letters[1:5]))
library(dplyr)
df %>%
mutate(y = as.character(y)) %>%
mutate(y = ifelse(row_number()==n(), "R", y))
# x y
#1 1 a
#2 2 b
#3 3 c
#4 4 d
#5 5 R

How to repeat empty rows so that each split has the same number

My goal is to get the same number of rows for each split (based on column Initial). I am trying to basically pad the number of rows so that each person has the same amount, while retaining the Initial column so I can tell them apart. My attempt failed completely. Anybody have suggestions?
df<-data.frame(Initials=c("a","a","b"),data=c(2,3,4))
attach(df)
maxrows=max(table(Initials))+1
arr<-split(df,Initials)
lapply(arr,function(x){
toadd<-maxrows-dim(x)[1]
replicate(toadd,x<-rbind(x,rep(NA,1)))#colnames -1 because col 1 should the the same Initial
})
Goal:
a 2
a 3
b 4
b NA
Using data.table...
my_rows <- seq.int(max(tabulate(df$Initials)))
library(data.table)
setDT(df)[ , .SD[my_rows], by=Initials]
# Initials data
# 1: a 2
# 2: a 3
# 3: b 4
# 4: b NA
.SD is the Subset of Data associated with each by= group. We can subset its rows like .SD[row_numbers], unlike a data.frame which requires an additional comma DF[row_numbers,].
The analogue in dplyr is
my_rows <- seq.int(max(tabulate(df$Initials)))
library(dplyr)
setDT(df) %>% group_by(Initials) %>% slice(my_rows)
# Initials data
# (fctr) (dbl)
# 1 a 2
# 2 a 3
# 3 b 4
# 4 b NA
Strangely, this only works if df is a data.table. I've filed a report/query with dplyr. There's a good chance that the dplyr devs will prevent this usage in a future version.
Here's a dplyr/tidyr method. We group_by initials, add row_numbers, ungroup, complete row numbers/Initials combinations, then remove our row numbers:
library(dplyr)
library(tidyr)
df %>% group_by(Initials) %>%
mutate(row = row_number()) %>%
ungroup() %>%
complete(Initials, row) %>%
select(-row)
Source: local data frame [4 x 2]
Initials data
(fctr) (dbl)
1 a 2
2 a 3
3 b 4
4 b NA
Interesting problem. Try:
to.add <- max(table(df$Initials)) - table(df$Initials)
rbind(df, c(rep(names(to.add), to.add), rep(NA, ncol(df)-1)))
# Initials data
#1 a 2
#2 a 3
#3 b 4
#4 b <NA>
We calculate the number of extra initials needed then combine the extras with NA values then rbind to the data frame.
max(table(df$Initials)) calculates the the initial with the most repeats. In this case a 2. By subtracting that max amount by the other initials table(df$Initials) we get a vector with the necessary additions. There's an added bonus to this method, by using table we also automatically have a named vector.
We use the names of the new vector to know 1) what initials to repeat, and 2) how many times should they be repeated.
To preserve the class of the data, you can add newdf$data <- as.numeric(newdf$data).

splitting text in column and add row number [duplicate]

This question already has answers here:
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 6 years ago.
I would like to split some text in a data frame column and save it into a data frame together with the row number or an id column.
I normally used plyr to do that, but this is no longer working in dplyr.
If I understand it correctly, it is more a bug in plyr and my code works since it is a bug.
So I am looking for the correct way to do this.
This is a minimal example in plyr:
library(plyr)
set.seed(1)
df <- data.frame(a=seq(2),
b=c(paste(sample(letters,3), collapse=';'),
paste(sample(letters,3), collapse=';')),
stringsAsFactors=FALSE)
ddply(df,.(a),summarise,unlist(strsplit(b,';')))
It turns the original data frame:
a b
1 1 g;j;n
2 2 x;f;v
Into this:
a ..1
1 1 g
2 1 j
3 1 n
4 2 x
5 2 f
6 2 v
What would be the correct dplyr solution?
I'm biased in favor of cSplit from the "splitstackshape" package, but you might be interested in unnest from "tidyr" in conjunction with "dplyr":
library(dplyr)
library(tidyr)
df %>%
mutate(b = strsplit(b, ";")) %>%
unnest(b)
# a b
# 1 1 g
# 2 1 j
# 3 1 n
# 4 2 x
# 5 2 f
# 6 2 v
You could do this using cSplit from splitstackshape
library(splitstackshape)
cSplit(df, 'b', ';', 'long')
# a b
#1: 1 g
#2: 1 j
#3: 1 n
#4: 2 x
#5: 2 f
#6: 2 v
Or using dplyr/tidyr
library(dplyr)
library(tidyr)
separate(df, b, c('b1', 'b2', 'b3'), sep=";") %>%
gather(Var, b, -a) %>%
select(-Var) %>%
arrange(a)
Or another option would be to use do
df %>%
group_by(a) %>%
do(data.frame(b=unlist(strsplit(.$b, ';'))))

Resources