different print.gap value for specific column - r

Is there any way to have a different print.gap for a particular column?
Example data:
dd <- data.frame(col1 = 1:5, col2 = 1:5, col3 = I(letters[1:5]))
print (dd, quote=F, right=T, print.gap=5)
Output with print.gap=5:
col1 col2 col3
1 1 1 a
2 2 2 b
3 3 3 c
4 4 4 d
5 5 5 e
Desired output (print.gap mix, first two with print.gap=5, third with print.gap=12)
col1 col2 col3
1 1 1 a
2 2 2 b
3 3 3 c
4 4 4 d
5 5 5 e
I realise this may not be achievable with any change of the print statement, but perhaps some have an alternative method or suggestion. The output is to be saved in a text file. Also please note, the solution should be flexible enough to not just increase the gap for the last column, it could be any column, or multiple columns with different print.gaps in a data frame.

There's probably a way to do this by defining a "proper" alternative print method, but here's a hackish solution that can be used to adjust each column width independently.
rbind(
data.frame(lapply(dd, as.character), stringsAsFactors=FALSE),
substring(" ", 1, c(1,7,12))
)
# col1 col2 col3
#1 1 1 a
#2 2 1 b
#3 3 1 c
#4 4 1 d
#5 5 2 e
#6 6 2 f
#7 7 2 g
#8 8 2 h

Related

Count the amount of times value A occurs without value B and vice versa

I'm having trouble figuring out how to do the opposite of the answer to this question (and in R not python).
Count the amount of times value A occurs with value B
Basically I have a dataframe with a lot of combinations of pairs of columns like so:
df <- data.frame(id1 = c("1","1","1","1","2","2","2","3","3","4","4"),
id2 = c("2","2","3","4","1","3","4","1","4","2","1"))
I want to count, how often all the values in column A occur in the whole dataframe without the values from column B. So the results for this small example would be the output of:
df_result <- data.frame(id1 = c("1","1","1","2","2","2","3","3","4","4"),
id2 = c("2","3","4","1","3","4","1","4","2","1"),
count = c("4","5","5","3","5","4","2","3","3","3"))
The important criteria for this, is that the final results dataframe is collapsed by the pairs (so in my example rows 1 and 2 are duplicates, and they are collapsed and summed by the total frequency 1 is observed without 2). For tallying the count of occurances, it's important that both columns are examined. I.e. order of columns doesn't matter for calculating the frequency - if column A has 1 and B has 2, this counts the same as if column A has 2 and B has 1.
I can do this very slowly by filtering for each pair, but it's not really feasible for my real data where I have many many different pairs.
Any guidance is greatly appreciated.
First paste the two id columns together to id12 for later matching. Then use sapply to go through all rows to see the records where id1 appears in id12 but id2 doesn't. sum that value and only output the distinct records. Finally, remove the id12 column.
library(dplyr)
df %>% mutate(id12 = paste0(id1, id2),
count = sapply(1:nrow(.),
function(x)
sum(grepl(id1[x], id12) & !grepl(id2[x], id12)))) %>%
distinct() %>%
select(-id12)
Or in base R completely:
id12 <- paste0(df$id1, df$id2)
df$count <- sapply(1:nrow(df), function(x) sum(grepl(df$id1[x], id12) & !grepl(df$id2[x], id12)))
df <- df[!duplicated(df),]
Output
id1 id2 count
1 1 2 4
2 1 3 5
3 1 4 5
4 2 1 3
5 2 3 5
6 2 4 4
7 3 1 2
8 3 4 3
9 4 2 3
10 4 1 3
A full tidyverse version:
library(tidyverse)
df %>%
mutate(id = paste(id1, id2),
count = map(cur_group_rows(), ~ sum(str_detect(id, id1[.x]) & str_detect(id, id2[.x], negate = T))))
A more efficient approach would be to work on a tabulation format:
tab = crossprod(table(rep(seq_len(nrow(df)), ncol(df)), c(df$id1, df$id2)))
#tab
#
# 1 2 3 4
# 1 7 3 2 2
# 2 3 6 1 2
# 3 2 1 4 1
# 4 2 2 1 5
So, now, we have the times each value appears with another (irrespectively of their order in the two columns). Here on, we need a way to subset the above table by each pair and subtract the value of their cooccurence from the value of each id's total appearance.
Make a grid of all combinations:
gr = expand.grid(id1 = colnames(tab), id2 = rownames(tab), stringsAsFactors = FALSE)
Create 2-column matrices to subset the table:
id1.ij = cbind(match(gr$id1, colnames(tab)),
match(gr$id1, rownames(tab)))
id2.ij = cbind(match(gr$id1, colnames(tab)),
match(gr$id2, rownames(tab)))
Subtract the respective values:
cbind(gr, count = tab[id1.ij] - tab[id2.ij])
# id1 id2 count
#1 1 1 0
#2 2 1 3
#3 3 1 2
#4 4 1 3
#5 1 2 4
#6 2 2 0
#7 3 2 3
#8 4 2 3
#9 1 3 5
#10 2 3 5
#11 3 3 0
#12 4 3 4
#13 1 4 5
#14 2 4 4
#15 3 4 3
#16 4 4 0
Of course, if we do not need the full grid of values, we can set:
gr = unique(df)
which results in:
# id1 id2 count
#1 1 2 4
#3 1 3 5
#4 1 4 5
#5 2 1 3
#6 2 3 5
#7 2 4 4
#8 3 1 2
#9 3 4 3
#10 4 2 3
#11 4 1 3

From axis values to coodinates pairs [duplicate]

I have two vectors of integers, say v1=c(1,2) and v2=c(3,4), I want to combine and obtain this as a result (as a data.frame, or matrix):
> combine(v1,v2) <--- doesn't exist
1 3
1 4
2 3
2 4
This is a basic case. What about a little bit more complicated - combine every row with every other row? E.g. imagine that we have two data.frames or matrices d1, and d2, and we want to combine them to obtain the following result:
d1
1 13
2 11
d2
3 12
4 10
> combine(d1,d2) <--- doesn't exist
1 13 3 12
1 13 4 10
2 11 3 12
2 11 4 10
How could I achieve this?
For the simple case of vectors there is expand.grid
v1 <- 1:2
v2 <- 3:4
expand.grid(v1, v2)
# Var1 Var2
#1 1 3
#2 2 3
#3 1 4
#4 2 4
I don't know of a function that will automatically do what you want to do for dataframes(See edit)
We could relatively easily accomplish this using expand.grid and cbind.
df1 <- data.frame(a = 1:2, b=3:4)
df2 <- data.frame(cat = 5:6, dog = c("a","b"))
expand.grid(df1, df2) # doesn't work so let's try something else
id <- expand.grid(seq(nrow(df1)), seq(nrow(df2)))
out <-cbind(df1[id[,1],], df2[id[,2],])
out
# a b cat dog
#1 1 3 5 a
#2 2 4 5 a
#1.1 1 3 6 b
#2.1 2 4 6 b
Edit: As Joran points out in the comments merge does this for us for data frames.
df1 <- data.frame(a = 1:2, b=3:4)
df2 <- data.frame(cat = 5:6, dog = c("a","b"))
merge(df1, df2)
# a b cat dog
#1 1 3 5 a
#2 2 4 5 a
#3 1 3 6 b
#4 2 4 6 b

cumulative product in R across column

I have a dataframe in the following format
> x <- data.frame("a" = c(1,1),"b" = c(2,2),"c" = c(3,4))
> x
a b c
1 1 2 3
2 1 2 4
I'd like to add 3 new columns which is a cumulative product of the columns a b c, however I need a reverse cumulative product i.e. the output should be
row 1:
result_d = 1*2*3 = 6 , result_e = 2*3 = 6, result_f = 3
and similarly for row 2
The end result will be
a b c result_d result_e result_f
1 1 2 3 6 6 3
2 1 2 4 8 8 4
the column names do not matter this is just an example. Does anyone have any idea how to do this?
as per my comment, is it possible to do this on a subset of columns? e.g. only for columns b and c to return:
a b c results_e results_f
1 1 2 3 6 3
2 1 2 4 8 4
so that column "a" is effectively ignored?
One option is to loop through the rows and apply cumprod over the reverse of elements and then do the reverse
nm1 <- paste0("result_", c("d", "e", "f"))
x[nm1] <- t(apply(x, 1,
function(x) rev(cumprod(rev(x)))))
x
# a b c result_d result_e result_f
#1 1 2 3 6 6 3
#2 1 2 4 8 8 4
Or a vectorized option is rowCumprods
library(matrixStats)
x[nm1] <- rowCumprods(as.matrix(x[ncol(x):1]))[,ncol(x):1]
temp = data.frame(Reduce("*", x[NCOL(x):1], accumulate = TRUE))
setNames(cbind(x, temp[NCOL(temp):1]),
c(names(x), c("res_d", "res_e", "res_f")))
# a b c res_d res_e res_f
#1 1 2 3 6 6 3
#2 1 2 4 8 8 4

using a lookup table in R with varying counts of data

Hiiii, I've been working on this issue all weekend. I'm trying to do a simple lookup, but my lookup table has different counts of data per lookup key.
Let's say I have two tables:
Table1: (there are some extra columns of data, but irrelevant to my problem)
Table1: (sample of 3 rows)
GeneName
col1 col2
HGGR .554444
BRAC4 .333222
FAM34 .111222
My lookup table is table of Gene groups followed by their respective genes. The lookup table can varying amount of columns depending on how many genes are in the group... This is a small example, the table often has 20-30 genes per group...
Table2: (example of 2 rows)
GeneGroupName
col1 col2 col3
CHR1_45000_46000 HGGR BRAC4
CHR1_67000_70000 FAM34
What I want is another column in Table1 which shows the corresponding gene group!
FinalResultTable
col1 col2 col3
CHR1_45000_46000 HGGR .554444
CHR1_45000_46000 BRAC4 .333222
CHR1_67000_70000 FAM34 .111222
The code I have so far is:
finalresult<-cbind( gene_group[match(table1[,1], gene_group[,2]),1], table1)
but of course that only works for genes found in the 2nd column of the gene group table! I need it to search thru the whole table and return the row number....
Any help? Thanks in advance
David
One way to do it is to convert your Table 2 to long format, with a column for GeneGroupName and a single column for the member genes, and then use match.
(table1 <- data.frame(GeneName=sample(LETTERS[1:12]), col2=runif(12)))
# GeneName col2
# 1 F 0.6116285
# 2 L 0.5752088
# 3 J 0.7499011
# 4 D 0.9405068
# 5 A 0.9360968
# 6 K 0.6549850
# 7 I 0.7070163
# 8 E 0.3521952
# 9 C 0.4234293
# 10 G 0.7750203
# 11 B 0.1418680
# 12 H 0.6632382
(table2 <- data.frame(GeneGroupName=1:4, g1=LETTERS[1:4], g2=LETTERS[5:8],
g3=LETTERS[9:12]))
# GeneGroupName g1 g2 g3
# 1 1 A E I
# 2 2 B F J
# 3 3 C G K
# 4 4 D H L
(table2.long <- reshape(table2, direction='long', varying=list(-1), timevar='gene'))
# GeneGroupName gene g1 id
# 1.1 1 1 A 1
# 2.1 2 1 B 2
# 3.1 3 1 C 3
# 4.1 4 1 D 4
# 1.2 1 2 E 1
# 2.2 2 2 F 2
# 3.2 3 2 G 3
# 4.2 4 2 H 4
# 1.3 1 3 I 1
# 2.3 2 3 J 2
# 3.3 3 3 K 3
# 4.3 4 3 L 4
table1$grp <- table2.long$GeneGroupName[match(table1$GeneName,
table2.long$g1)]
table1
# GeneName col2 GeneGroupName
# 1 F 0.6116285 2
# 2 L 0.5752088 4
# 3 J 0.7499011 2
# 4 D 0.9405068 4
# 5 A 0.9360968 1
# 6 K 0.6549850 3
# 7 I 0.7070163 1
# 8 E 0.3521952 1
# 9 C 0.4234293 3
# 10 G 0.7750203 3
# 11 B 0.1418680 2
# 12 H 0.6632382 4
On solution could be to use the data.table package.
Reproducing an atomic example:
table1 = data.table(col1=c("HGGR","BRAC4","FAM34"),col2=c(.55,.33,.11))
table2 = data.table(col2=c("HGGR","FAM34"),col1=c("CHR1_45000_46000", "CHR1_67000_70000"), col3=c("BRAC4",NA))
# > table1
# col1 col2
# 1: BRAC4 0.33
# 2: FAM34 0.11
# 3: HGGR 0.55
# > table2
# col2 col1 col3
# 1: HGGR CHR1_45000_46000 BRAC4
# 2: FAM34 CHR1_67000_70000 NA
First deal with the second data.table to merge col2 and col3 with melt:
table2=melt(table2, id=c("col1"), value.name="col2", na.rm=TRUE)
table2[,variable:=NULL]
Then merge the two data.table to get the wanted result:
setkey(table1, col1)
setkey(table2, col2)
table2[table1]
# col2 col1 col2.1
# BRAC4 CHR1_45000_46000 0.33
# FAM34 CHR1_67000_70000 0.11
# HGGR CHR1_45000_46000 0.55
Modying #jbaums's sample data a bit (adding NA in table2), here is one way with dplyr and tidyr.
table1 <- data.frame(GeneName=sample(LETTERS[1:12]), col2=runif(12),
stringsAsFactors = FALSE)
table2 <- data.frame(GeneGroupName=1:4, g1=LETTERS[1:4], g2=LETTERS[5:8],
g3=c(LETTERS[9:11], NA), stringsAsFactors = FALSE)
table2 %>%
gather(gene, whatever, - GeneGroupName) %>%
left_join(., table1, by = c("whatever" = "GeneName")) %>%
select(-gene, GeneGroupName, gene = whatever, value = col2)
# GeneGroupName gene value
#1 1 A 0.9926841
#2 2 B 0.3531973
#3 3 C 0.6547239
#4 4 D 0.4781180
#5 1 E 0.1293723
#6 2 F 0.6334933
#7 3 G 0.2132081
#8 4 H 0.5987610
#9 1 I 0.7317925
#10 2 J 0.9761707
#11 3 K 0.9240745
#12 4 <NA> NA

ifelse: Assigning a condition from 1 column to another and multiple statements

I've got a data.frame. I am trying to use values in column 2, 3, 4 to assign a value in col1. Is this possible?
dat<-data.frame(col1=c(1,2,3,4,5), col2=c(1,2,3,4,"U"), col3=c(1,2,3,"U",5), col4=c("U",2,3,4,5))
dat1=data.frame(col1=ifelse(dat$col2=="U"|dat$col3=="U"|dat$col4=="U", dat$col1=="U", dat$col1))
col1
0
2
3
0
0
Why am I getting a 0 where a U should be?
Don't assign within the ifelse function.
dat1=data.frame(col1=ifelse(dat$col2=="U"|dat$col3=="U"|dat$col4=="U",
"U",
dat$col1))
dat1
col1
1 U
2 2
3 3
4 U
5 U
you probably want to be using this:
dat1 <- data.frame(col1=ifelse(dat$col2=="U"|dat$col3=="U"|dat$col4=="U", "U", dat$col1))
# I changed the dat$col1=="U" to just "U"
If the question is "Why am I getting a 0 where a U should be?" the answer lies in what you have assigned for the if-TRUE portion of your ifelse(.) statement.
Your ifelse statement essentially says
if any of columns 2 through 4 are U
then assign the value of `does column 1 == "U"` <-- Not sure if this is what you want
else assign the value of column 1
So when your ifelse test evaluates to TRUE, what you get returned is the value of col1=="U", but coerced into an integer. ie: 0 for FALSE, 1 for TRUE
You can also take advantage of T/F getting evaluated to 1/0 to clean up your code:
# using the fact that rowSums(dat[2:4]=="U") will be 0 when "U" is not in any column:
ifelse(rowSums(dat[2:4]=="U")>0, "U", dat$col1)
any() makes things like this a lot neater
head(dat)
col1 col2 col3 col4
1 1 1 1 U
2 2 2 2 2
3 3 3 3 3
4 4 4 U 4
5 5 U 5 5
apply(dat,1, function(x)any(x=='U'))
[1] TRUE FALSE FALSE TRUE TRUE
dat[apply(dat,1, function(x)any(x=='U')), 1] <-'U'
dat
col1 col2 col3 col4
1 U 1 1 U
2 2 2 2 2
3 3 3 3 3
4 U 4 U 4
5 U U 5 5
An easy way would be:
dat$col1[as.logical(rowSums(dat[-1]=="U"))] <- "U"
col1 col2 col3 col4
1 U 1 1 U
2 2 2 2 2
3 3 3 3 3
4 U 4 U 4
5 U U 5 5

Resources