I want to bind more then two tables together with rbind() in R.
Based on my experience with R-problems I am sure the solution is easy. But I don't get it. Please see this example data
# create sample data
set.seed(0)
df <- data.frame(A = 0,
B1 = sample(c(1:3, NA), 10, replace=TRUE),
B2 = sample(c(1:3, NA), 10, replace=TRUE),
B3 = sample(c(1:3, NA), 10, replace=TRUE),
C = 0)
# names of relevant objects
n <- names(df)[startsWith(names(df), 'B')]
You see (in n) I just want to use a selection of objects of a data.frame.
No I create tables out of them and bind there rows to gether for better presentation.
t1 <- table(df$B1, useNA="always")
t2 <- table(df$B2, useNA="always")
t3 <- table(df$B3, useNA="always")
# this is a workaround
print( rbind(t1, t2, t3) )
But I would like to make this code more easier because my real data has a lot more tables then three.
This here doesn't work
# this is what I "want" but doesn't work
print( rbind( table(df[,n])) )
# another try
do.call('rbind', list(table(df[,n])))
Where is the error in my thinking?
We can lapply over selected columns and then use table on individual of them and join them together using rbind
do.call("rbind", lapply(df[n], function(x) table(x, useNA = "always")))
# 1 2 3 <NA>
#B1 1 2 3 4
#B2 3 3 2 2
#B3 3 3 1 3
This can also be done using apply with margin = 2 (column-wise)
t(apply(df[n], 2, function(x) table(x, useNA = "always")))
# 1 2 3 <NA>
#B1 1 2 3 4
#B2 3 3 2 2
#B3 3 3 1 3
You can do:
table(stack(df[n])[2:1],useNA = 'always')[-4,]
values
ind 1 2 3 <NA>
B1 1 2 3 4
B2 3 3 2 2
B3 3 3 1 3
well if you do not want to reverse by using [2:1], you can transpose:
t(table(stack(df[n]),useNA = 'always'))[-4,]
values
ind 1 2 3 <NA>
B1 1 2 3 4
B2 3 3 2 2
B3 3 3 1 3
if you want it as a data.frame:
as.data.frame.matrix(table(stack(df[n])[2:1],useNA = 'always')[-4,])
1 2 3 NA
B1 1 2 3 4
B2 3 3 2 2
B3 3 3 1 3
Related
I am facing with the other problem in coding with R-Studio. I have two dataframes (with the same number of rows and colunms). Now I want to merge them two into one, but the 6 columns of dataframe 1 would be columns 1,3,5,7,9.11 in the new matrix; while those of data frame 2 would be 2,4,6,8,10,12 in the new merged dataframe.
I can do it with for loop but is there any smarter way/function to do it? Thank you in advance ; )
You can cbind them and then reorder the columns accordingly:
df1 <- as.data.frame(sapply(LETTERS[1:6], function(x) 1:3))
df2 <- as.data.frame(sapply(letters[1:6], function(x) 1:3))
cbind(df1, df2)[, matrix(seq_len(2*ncol(df1)), 2, byrow=T)]
# A a B b C c D d E e F f
# 1 1 1 1 1 1 1 1 1 1 1 1 1
# 2 2 2 2 2 2 2 2 2 2 2 2 2
# 3 3 3 3 3 3 3 3 3 3 3 3 3
The code below will produce your required result, and will also work if one data frame has more columns than the other
# create sample data
df1 <- data.frame(
a1 = 1:10,
a2 = 2:11,
a3 = 3:12,
a4 = 4:13,
a5 = 5:14,
a6 = 6:15
)
df2 <- data.frame(
b1=11:20,
b2=12:21,
b3=13:22,
b4=14:23,
b5=15:24,
b6=16:25
)
# join by interleaving columns
want <- cbind(df1,df2)[,order(c(1:length(df1),1:length(df2)))]
Explanation:
cbind(df1,df2) combines the data frames with all the df1 columns first, then all the df2 columns.
The [,...] element re-orders these columns.
c(1:length(df1),1:length(df2)) gives 1 2 3 4 5 6 1 2 3 4 5 6 - i.e. the order of the columns in df1, followed by the order in df2
order() of this gives 1 7 2 8 3 9 4 10 5 11 6 12 which is the required column order
So [, order(c(1:length(df1), 1:length(df2)] re-orders the columns so that the columns of the original data frames are interleaved as required.
mydata <- data.frame(a = 2, b = 3, c = 3)
myvec <- c(2, 9, 1)
I would like to column bind mydata with myvec. I want the final output to look something like this:
> mydata
a b c myvec1 myvec2 myvec3
1 2 3 3 2 9 1
However, if I simply use cbind, I don't get the desired result:
> cbind(mydata, myvec)
a b c myvec
1 2 3 3 2
2 2 3 3 9
3 2 3 3 1
One way is to iterate over the entries in myvec with a for loop. Is there a simpler way?
We can convert to list
cbind(mydata, setNames(as.list(myvec), paste0('myvec', seq_along(myvec))))
# a b c myvec1 myvec2 myvec3
#1 2 3 3 2 9 1
Or another option is
mydata[paste0('myvec', seq_along(myvec))] <- myvec
You could transpose the vector :
cbind(mydata, t(myvec))
# a b c 1 2 3
#1 2 3 3 2 9 1
You can name the columns with setNames or names<-
My data is like:
a <- data.frame(a1=c(2,2,1,1,2,2,3,3),
a2=c(5,4,2,2,5,5,6,6),
a3=c(3,1,5,5,7,7,8,8))
Then, i sort the data like:
aa <- a %>%
arrange(desc(a3),desc(a2),desc(a1))
The data looks like:
> aa
a1 a2 a3
1 3 6 8
2 3 6 8
3 2 5 7
4 2 5 7
5 1 2 5
6 1 2 5
7 2 5 3
8 2 4 1
Now i need to group the data by a3, a2 and a1. So, in aa, the rows 1 and 2 will be in one group, and row 3 and 4 will be in one group as well. Now I need to give every group an index, which starts from 1. So, the data should look like below:
> aa
a1 a2 a3 Index
1 3 6 8 1
2 3 6 8 1
3 2 5 7 2
4 2 5 7 2
5 1 2 5 3
6 1 2 5 3
7 2 5 3 4
8 2 4 1 5
So in summarizing, I need to arrange the data in the descending order first, then group it, then give every group an index starting from 1. Could anyone help me out here?
We could potentially use group_indices, but that would also have a reordering issue. Instead, an option is to paste (or str_c - from stringr) on the columns of interest and then match with unique values of pasted string
library(dplyr)
library(stringr)
aa %>%
mutate(Index = str_c(a1, a2, a3),
Index = match(Index, unique(Index)))
Or instead of arrangeing separately, use it with across
library(tidyr)
a %>%
arrange(across(a1:a3, desc)) %>%
unite(Index, everything(), remove = FALSE) %>%
mutate(Index = match(Index, unique(Index)))
Or with .GRP in data.table
library(dplyr)
setDT(aa)[, Index := .GRP, .(a1, a2, a3)]
aa
# a1 a2 a3 Index
#1: 3 6 8 1
#2: 3 6 8 1
#3: 2 5 7 2
#4: 2 5 7 2
#5: 1 2 5 3
#6: 1 2 5 3
#7: 2 5 3 4
#8: 2 4 1 5
Base R:
a_ordered <- with(a, a[rev(order(a1, a2, a3)), ])
a_ordered$idx <- with(a_ordered,
cumsum(abs(c(
0,
diff(as.integer(factor(paste0(
a1, a2, a3
))))
))) + 1)
Data:
a <- data.frame(
a1 = c(2, 2, 1, 1, 2, 2, 3, 3),
a2 = c(5, 4, 2, 2, 5, 5, 6, 6),
a3 = c(3, 1, 5, 5, 7, 7, 8, 8)
)
I have two vectors of integers, say v1=c(1,2) and v2=c(3,4), I want to combine and obtain this as a result (as a data.frame, or matrix):
> combine(v1,v2) <--- doesn't exist
1 3
1 4
2 3
2 4
This is a basic case. What about a little bit more complicated - combine every row with every other row? E.g. imagine that we have two data.frames or matrices d1, and d2, and we want to combine them to obtain the following result:
d1
1 13
2 11
d2
3 12
4 10
> combine(d1,d2) <--- doesn't exist
1 13 3 12
1 13 4 10
2 11 3 12
2 11 4 10
How could I achieve this?
For the simple case of vectors there is expand.grid
v1 <- 1:2
v2 <- 3:4
expand.grid(v1, v2)
# Var1 Var2
#1 1 3
#2 2 3
#3 1 4
#4 2 4
I don't know of a function that will automatically do what you want to do for dataframes(See edit)
We could relatively easily accomplish this using expand.grid and cbind.
df1 <- data.frame(a = 1:2, b=3:4)
df2 <- data.frame(cat = 5:6, dog = c("a","b"))
expand.grid(df1, df2) # doesn't work so let's try something else
id <- expand.grid(seq(nrow(df1)), seq(nrow(df2)))
out <-cbind(df1[id[,1],], df2[id[,2],])
out
# a b cat dog
#1 1 3 5 a
#2 2 4 5 a
#1.1 1 3 6 b
#2.1 2 4 6 b
Edit: As Joran points out in the comments merge does this for us for data frames.
df1 <- data.frame(a = 1:2, b=3:4)
df2 <- data.frame(cat = 5:6, dog = c("a","b"))
merge(df1, df2)
# a b cat dog
#1 1 3 5 a
#2 2 4 5 a
#3 1 3 6 b
#4 2 4 6 b
If I have a dataframe like this:
df <- data.frame(c1=1:6, c2=2:7)
I can happily replace values in c2 that are larger then 4 doing
df$c2[df$c2 > 4] <- 10
yielding the desired output
c1 c2
1 1 2
2 2 3
3 3 4
4 4 10
5 5 10
6 6 10
However, I want to select the column by its name using a string, in this case "c2", as the column selection should not be hard-coded but it is context dependent.
The best I could come up with is
df[,c('c2')][df[,c('c2')] > 4] <- 1000
yielding
c1 c2
1 1 2
2 2 3
3 3 4
4 4 1000
5 5 1000
6 6 1000
It works, but I find it rather ugly. Is there a better way of doing the same thing?
Maybe using replace
df['c2'] <- replace(df['c2'], df['c2'] > 4, 100)
df
# c1 c2
#1 1 2
#2 2 3
#3 3 4
#4 4 100
#5 5 100
#6 6 100
Or something similar as your attempt
df['c2'][df['c2'] > 4] <- 100
If one is open to packages, we can use purrr's modify_at or dplyr's mutate_at
purrr::modify_at(df,"c2",
function(x)
ifelse(x>4,100,x))
With dplyr:
mutate_at(df,"c2",
function(x)
ifelse(x>4,100,x))
Using transform and ifelse
transform(df, c2 = ifelse(c2 > 4, 100, c2))
# c1 c2
#1 1 2
#2 2 3
#3 3 4
#4 4 100
#5 5 100
#6 6 100
If we need to pass a string, one option with dplyr, would be convert to symbol and evaluate
library(dplyr)
df %>%
mutate(!! "c2" := replace(!! rlang::sym("c2"),
!! rlang::sym("c2") > 4, 100))
# c1 c2
#1 1 2
#2 2 3
#3 3 4
#4 4 100
#5 5 100
#6 6 100
df[df$c2 > 4, 'c2'] <- 10
# or
df$c2 <- with(df, replace(c2, c2 > 4, 10))
Using package data.table you could do:
library(data.table)
setDT(df)
df[c2 > 4, c2 := 10]