I have 2 data frames df1 and df2 with the same column names but in different column numbers. How to merge as df3 without creating additional columns/rows.
df1
a b c
1 3 6
df2
b c a
5 6 1
expected df3
a b c
1 3 6
1 5 6
Tried below code but it did not work
df3=merge(df1, df2, by = "col.names")
We may use bind_rows which automatically find the matching column names and if it is not there, it will add a NA row for those doesn't have. The order of columns will be based on the order from the first dataset input in `bind_rows i.e. df1
library(dplyr)
bind_rows(df1, df2)
-output
a b c
1 1 3 6
2 1 5 6
data
df1 <- structure(list(a = 1L, b = 3L, c = 6L), class = "data.frame", row.names = c(NA,
-1L))
df2 <- structure(list(b = 5L, c = 6L, a = 1L), class = "data.frame", row.names = c(NA,
-1L))
Rearrange columns of any one dataframe according on another dataframe so both the columns have the same order of column names and then use rbind.
rbind(df1, df2[names(df1)])
# a b c
#1 1 3 6
#2 1 5 6
In this case, using rbind(df1, df2) should work too.
Related
I'm looking for an efficient way to rename several columns.
I have a dataframe that looks like the following.
id sdf dir fki
1 3 4 2
2 5 2 1
3 4 1 2
I want to rename columns sdf, dir, and fki.
I know I could do so like this:
df <- df %>%
rename(newname1 = sdf,
newname2 = dir,
newname3 = fki)
With the amount of columns I have, it is taking a long time to type the names of the columns I would like to replace.
Ideally, I would like to create a vector with names:
newcolumns <- c("newname1", "newname2", "newname3")
And then specify that these should replace the column names in the dataframe, starting with column sdf. Is there a way to do this?
We can use rename_at
library(dplyr)
df %>%
rename_at(vars(-id), ~ newcolumns)
-output
# id newname1 newname2 newname3
#1 1 3 4 2
#2 2 5 2 1
#3 3 4 1 2
Or with rename_with
df %>%
rename_with(~ newcolumns, -id)
Or pass a named vector and use !!! in rename
df %>%
rename(!!! setNames(names(df)[-1], newcolumns))
Or using base R
names(df)[-1] <- newcolumns
data
df <- structure(list(id = 1:3, sdf = c(3L, 5L, 4L), dir = c(4L, 2L,
1L), fki = c(2L, 1L, 2L)), class = "data.frame", row.names = c(NA,
-3L))
Suppose I have a data frame like this:
1 8
2 12
3 2
5 -6
6 1
8 5
I want to add a row in the places where the 4 and 7 would have gone in the first column and have the second column for these new rows be 0, so adding these rows:
4 0
7 0
I have no idea how to do this in R.
In excel, I could use a vlookup inside an iferror. Is there a similar combo of functions in R to make this happen?
Edit: also, suppose that row 1 was missing and needed to be filled in similarly. Would this require another solution? What if I wanted to add rows until I reached ten rows?
Use tidyr::complete to fill in the missing sequence between min and max values.
library(tidyr)
library(rlang)
complete(df, V1 = min(V1):max(V1), fill = list(V2 = 0))
#Or using `seq`
#complete(df, V1 = seq(min(V1), max(V1)), fill = list(V2 = 0))
# V1 V2
# <int> <dbl>
#1 1 8
#2 2 12
#3 3 2
#4 4 0
#5 5 -6
#6 6 1
#7 7 0
#8 8 5
If we already know min and max of the dataframe we can use them directly. Let's say we want data from V1 = 1 to 10, we can do.
complete(df, V1 = 1:10, fill = list(V2 = 0))
If we don't know the column names beforehand, we can do something like :
col1 <- names(df)[1]
col2 <- names(df)[2]
complete(df, !!sym(col1) := 1:10, fill = as.list(setNames(0, col2)))
data
df <- structure(list(V1 = c(1L, 2L, 3L, 5L, 6L, 8L), V2 = c(8L, 12L,
2L, -6L, 1L, 5L)), class = "data.frame", row.names = c(NA, -6L))
I have a data table which includes NAs in some cells as below.
Datatable:
enter image description here
However, I want to repeat 1st row in the column called "Category" to the following two rows written "NA" without any change in other columns which are "Numeric" and "Numeric.null". Same thing for 4th row in Category, repeat it to 5th and 6th rows but no change in other columns.
New:
2
I'm just learning R programming. I have tried rep function. But I couldn't do. Please help me.
We can use fill from tidyr
library(dplyr)
library(tidyr)
df1 <- df1 %>%
fill(Category)
df1
# Category Numeric Numeric.null
#1 A 1 1
#2 A 2 2
#3 A 3 4
#4 D 4 7
#5 D 5 6
#6 D 6 8
#7 E 7 11
Or using data.table with na.locf0
library(data.table)
library(zoo)
setDT(df1)[, Category := na.locf0(Category)][]
data
df1 <- structure(list(Category = c("A", NA, NA, "D", NA, NA, "E"), Numeric = 1:7,
Numeric.null = c(1L, 2L, 4L, 7L, 6L, 8L, 11L)),
class = "data.frame", row.names = c(NA,
-7L))
I have a dataframe like this
a b
1 A.1 1
2 A.2 2
3 A.3 1
5 B.1 2
6 B.2 2
7 B.3 1
I need to count for each letter (A and B here) the sum of the column b
a b
1 A 4
2 B 5
One option is using separate from tidyr to separate the column 'a' based on the delimiter ., group using the new 'a' and get the sum of 'b'.
library(tidyr)
library(dplyr)
separate(df1, a, into=c('a', 'a1')) %>%
group_by(a) %>%
summarise(b=sum(b))
# a b
#1 A 4
#2 B 5
Or we can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)). Use sub to remove the characters starting from ., followed by digits, use that as the grouping variable and get the sum of 'b'.
library(data.table)
setDT(df1)[,list(b=sum(b)) , by = .(a=sub('\\.\\d+$', '', a))]
# a b
#1: A 4
#2: B 5
Or a similar option using the formula method of aggregate from base R.
aggregate(b~cbind(a=sub('\\.\\d+$', '', a)), df1, FUN=sum)
# a b
# 1 A 4
# 2 B 5
Or using sqldf
library(sqldf)
sqldf('select substr(a, 1, instr(a, ".")-1) as a1,
sum(b) as b
from df1
group by a1')
# a1 b
#1 A 4
#2 B 5
data
df1 <- structure(list(a = c("A.1", "A.2", "A.3", "B.1", "B.2", "B.3"
), b = c(1L, 2L, 1L, 2L, 2L, 1L)), .Names = c("a", "b"),
class = "data.frame", row.names = c(NA, -6L))
I am new in R. I have data frame
A 5 8 9 6
B 8 2 3 6
C 1 8 9 5
I want to make
A 5
A 8
A 9
A 6
B 8
B 2
B 3
B 6
C 1
C 8
C 9
C 5
I have a big data file
Assuming you're starting with something like this:
mydf <- structure(list(V1 = c("A", "B", "C"), V2 = c(5L, 8L, 1L),
V3 = c(8L, 2L, 8L), V4 = c(9L, 3L, 9L),
V5 = c(6L, 6L, 5L)),
.Names = c("V1", "V2", "V3", "V4", "V5"),
class = "data.frame", row.names = c(NA, -3L))
mydf
# V1 V2 V3 V4 V5
# 1 A 5 8 9 6
# 2 B 8 2 3 6
# 3 C 1 8 9 5
Try one of the following:
library(reshape2)
melt(mydf, 1)
Or
cbind(mydf[1], stack(mydf[-1]))
Or
library(splitstackshape)
merged.stack(mydf, var.stubs = "V[2-5]", sep = "var.stubs")
The name pattern in the last example is unlikely to be applicable to your actual data though.
Someone could probably do this in a better way but here I go...
I put your data into a data frame called data
#repeat the value in the first column (c - 1) times were c is the number of columns (data[1,])
rep(data[,1], each=length(data[1,])-1)
#turning your data frame into a matrix allows you then turn it into a vector...
#transpose the matrix because the vector concatenates columns rather than rows
as.vector(t(as.matrix(data[,2:5])))
#combining these ideas you get...
data.frame(col1=rep(data[,1], each=length(data[1,])-1),
col2=as.vector(t(as.matrix(data[,2:5]))))
If you could use a matrix you can just 'cast' it to a vector and add the row names. I have assumed that you really want 'a', 'b', 'c' as row names.
n <- 3;
data <- matrix(1:9, ncol = n);
data <- t(t(as.vector(data)));
rownames(data) <- rep(letters[1:3], each = n);
If you want to keep the rownames from your first data frame this is ofcourse also possible without libraries.
n <- 3;
data <- matrix(1:9, ncol=n);
names <- rownames(data);
data <- t(t(as.vector(data)))
rownames(data) <- rep(names, each = n)