I am new in R. I have data frame
A 5 8 9 6
B 8 2 3 6
C 1 8 9 5
I want to make
A 5
A 8
A 9
A 6
B 8
B 2
B 3
B 6
C 1
C 8
C 9
C 5
I have a big data file
Assuming you're starting with something like this:
mydf <- structure(list(V1 = c("A", "B", "C"), V2 = c(5L, 8L, 1L),
V3 = c(8L, 2L, 8L), V4 = c(9L, 3L, 9L),
V5 = c(6L, 6L, 5L)),
.Names = c("V1", "V2", "V3", "V4", "V5"),
class = "data.frame", row.names = c(NA, -3L))
mydf
# V1 V2 V3 V4 V5
# 1 A 5 8 9 6
# 2 B 8 2 3 6
# 3 C 1 8 9 5
Try one of the following:
library(reshape2)
melt(mydf, 1)
Or
cbind(mydf[1], stack(mydf[-1]))
Or
library(splitstackshape)
merged.stack(mydf, var.stubs = "V[2-5]", sep = "var.stubs")
The name pattern in the last example is unlikely to be applicable to your actual data though.
Someone could probably do this in a better way but here I go...
I put your data into a data frame called data
#repeat the value in the first column (c - 1) times were c is the number of columns (data[1,])
rep(data[,1], each=length(data[1,])-1)
#turning your data frame into a matrix allows you then turn it into a vector...
#transpose the matrix because the vector concatenates columns rather than rows
as.vector(t(as.matrix(data[,2:5])))
#combining these ideas you get...
data.frame(col1=rep(data[,1], each=length(data[1,])-1),
col2=as.vector(t(as.matrix(data[,2:5]))))
If you could use a matrix you can just 'cast' it to a vector and add the row names. I have assumed that you really want 'a', 'b', 'c' as row names.
n <- 3;
data <- matrix(1:9, ncol = n);
data <- t(t(as.vector(data)));
rownames(data) <- rep(letters[1:3], each = n);
If you want to keep the rownames from your first data frame this is ofcourse also possible without libraries.
n <- 3;
data <- matrix(1:9, ncol=n);
names <- rownames(data);
data <- t(t(as.vector(data)))
rownames(data) <- rep(names, each = n)
Related
I have 2 data frames df1 and df2 with the same column names but in different column numbers. How to merge as df3 without creating additional columns/rows.
df1
a b c
1 3 6
df2
b c a
5 6 1
expected df3
a b c
1 3 6
1 5 6
Tried below code but it did not work
df3=merge(df1, df2, by = "col.names")
We may use bind_rows which automatically find the matching column names and if it is not there, it will add a NA row for those doesn't have. The order of columns will be based on the order from the first dataset input in `bind_rows i.e. df1
library(dplyr)
bind_rows(df1, df2)
-output
a b c
1 1 3 6
2 1 5 6
data
df1 <- structure(list(a = 1L, b = 3L, c = 6L), class = "data.frame", row.names = c(NA,
-1L))
df2 <- structure(list(b = 5L, c = 6L, a = 1L), class = "data.frame", row.names = c(NA,
-1L))
Rearrange columns of any one dataframe according on another dataframe so both the columns have the same order of column names and then use rbind.
rbind(df1, df2[names(df1)])
# a b c
#1 1 3 6
#2 1 5 6
In this case, using rbind(df1, df2) should work too.
Suppose I have a data frame like this:
1 8
2 12
3 2
5 -6
6 1
8 5
I want to add a row in the places where the 4 and 7 would have gone in the first column and have the second column for these new rows be 0, so adding these rows:
4 0
7 0
I have no idea how to do this in R.
In excel, I could use a vlookup inside an iferror. Is there a similar combo of functions in R to make this happen?
Edit: also, suppose that row 1 was missing and needed to be filled in similarly. Would this require another solution? What if I wanted to add rows until I reached ten rows?
Use tidyr::complete to fill in the missing sequence between min and max values.
library(tidyr)
library(rlang)
complete(df, V1 = min(V1):max(V1), fill = list(V2 = 0))
#Or using `seq`
#complete(df, V1 = seq(min(V1), max(V1)), fill = list(V2 = 0))
# V1 V2
# <int> <dbl>
#1 1 8
#2 2 12
#3 3 2
#4 4 0
#5 5 -6
#6 6 1
#7 7 0
#8 8 5
If we already know min and max of the dataframe we can use them directly. Let's say we want data from V1 = 1 to 10, we can do.
complete(df, V1 = 1:10, fill = list(V2 = 0))
If we don't know the column names beforehand, we can do something like :
col1 <- names(df)[1]
col2 <- names(df)[2]
complete(df, !!sym(col1) := 1:10, fill = as.list(setNames(0, col2)))
data
df <- structure(list(V1 = c(1L, 2L, 3L, 5L, 6L, 8L), V2 = c(8L, 12L,
2L, -6L, 1L, 5L)), class = "data.frame", row.names = c(NA, -6L))
I have a (x) data frame in R with 5 numeric columns and apart from this one information is sorting order to be followed in form of a vector i.e.
1, 0, 2, 4, 3
dataset
v1 v2 v3 v4 v5
1 2 3 4 5
3 13 12 1 4
6 4 6 5 3
Expected result
v1 v2 v3 v4 v5
3 13 12 1 4
1 2 2 4 5
6 4 6 5 3
this vector define the sorting order that first column needs to be sorted first then 3rd column then 5th column and then 4th column. manually it can be done as
x = x[order(x[1],)]
x = x[order(x[3],)]
x = x[order(x[5],)]
x = x[order(x[4],)]
rownames(x) = NULL
Problem is for 5 columns, it is easy but it is complicated for 100s of columns.
any lead to this will be appreciated.
Thanks
We can do a match on the original vector and then use a for loop to get the output
i1 <- match(seq_along(x), vec, nomatch = 0)
i1 <- i1[i1!=0]
for(i in i1){
x <- x[order(x[i]),]
}
x
# v1 v2 v3 v4 v5
# 2 3 13 12 1 4
# 1 1 2 3 4 5
# 3 6 4 6 5 3
data
x <- structure(list(v1 = c(1L, 3L, 6L), v2 = c(2L, 13L, 4L), v3 = c(3L,
12L, 6L), v4 = c(4L, 1L, 5L), v5 = c(5L, 4L, 3L)), .Names = c("v1",
"v2", "v3", "v4", "v5"), class = "data.frame", row.names = c(NA,
-3L))
vec <- c(1, 0, 2, 4, 3)
So here is my challenge. I am trying to get rid of rows of data that are best organized as a column. The original data set looks like
1|1|a
2|3|b
2|5|c
1|4|d
1|2|e
10|10|f
And the end result desired is
1 |1,2,4 |a| e d
2 |3,5 |b| c
10|10 |f| NA
The table's shaping is based from minimum value Col 2 within groupings of Col 1, where new column 3 is defined from the minimum values within the group and new column 4 is collapsed from not the minimum of. Some of the approaches tried include:
newTable[min(newTable[,(1%o%2)]),] ## returns the minimum of both COL 1 and 2 only
ddply(newTable,"V1", summarize, newCol = paste(V7,collapse = " ")) ## collapses all values by Col 1 and creates a new column nicely.
Variations to combine these lines of code into a single line have not worked, in part to my limited knowledge. These modifications are not included here.
Try:
library(dplyr)
library(tidyr)
dat %>%
group_by(V1) %>%
summarise_each(funs(paste(sort(.), collapse=","))) %>%
extract(V3, c("V3", "V4"), "(.),?(.*)")
gives the output
# V1 V2 V3 V4
#1 1 1,2,4 a d,e
#2 2 3,5 b c
#3 10 10 f
Or using aggregate and str_split_fixed
res1 <- aggregate(.~ V1, data=dat, FUN=function(x) paste(sort(x), collapse=","))
library(stringr)
res1[, paste0("V", 3:4)] <- as.data.frame(str_split_fixed(res1$V3, ",", 2),
stringsAsFactors=FALSE)
If you need NA for missing values
res1[res1==''] <- NA
res1
# V1 V2 V3 V4
#1 1 1,2,4 a d,e
#2 2 3,5 b c
#3 10 10 f <NA>
data
dat <- structure(list(V1 = c(1L, 2L, 2L, 1L, 1L, 10L), V2 = c(1L, 3L,
5L, 4L, 2L, 10L), V3 = c("a", "b", "c", "d", "e", "f")), .Names = c("V1",
"V2", "V3"), class = "data.frame", row.names = c(NA, -6L))
Here's an approach using data.table, with data from #akrun's post:
It might be useful to store the columns as list instead of pasting them together.
require(data.table) ## 1.9.2+
setDT(dat)[order(V1, V2), list(V2=list(V2), V3=V3[1L], V4=list(V3[-1L])), by=V1]
# V1 V2 V3 V4
# 1: 1 1,2,4 a e,d
# 2: 2 3,5 b c
# 3: 10 10 f
setDT(dat) converts the data.frame to data.table, by reference (without copying it). Then, we sort it by columns V1,V2 and group by V1 column on the sorted data, and for each group, we create the columns V2, V3 and V4 as shown.
V2 and V4 will be of type list here. If you'd rather have a character column where all entries are pasted together, just replace list(.) with paste(., sep=...).
HTH
I have multiple files with many rows and three columns and need to merge them on the basis of first two columns match. File1
12 13 a
13 15 b
14 17 c
4 9 d
. . .
. . .
81 23 h
File 2
12 13 e
3 10 b
14 17 c
4 9 j
. . .
. . .
1 2 k
File 3
12 13 m
13 15 k
1 7 x
24 9 d
. . .
. . .
1 2 h
and so on.
I want to merge them to obtain the following result
12 13 a e m
13 15 b k
14 17 c c
4 9 d j
3 10 b
24 9 d
. . .
. . .
81 23 h
1 2 k
1 7 x
The first thing that usually comes to mind with these types of problems is merge, perhaps in conjunction with a Reduce(function(x, y) merge(x, y, by = "somecols", all = TRUE), yourListOfDataFrames).
However, merge is not always the most efficient function, especially since it looks like you want to "collapse" all the values to fill in the rows from left to right, which would not be the default merge behavior.
Instead, I suggest you stack everything into one long data.frame and reshape it after you have added an index variable.
Here are two approaches:
Option 1: "dplyr" + "tidyr"
Use mget to put all of your data.frames into a list.
Use rbind_all to convert that list into a single data.frame.
Use sequence(n()) in mutate from "dplyr" to group the data and create an index.
Use spread from "tidyr" to transform from a "long" format to a "wide" format.
library(dplyr)
library(tidyr)
combined <- rbind_all(mget(ls(pattern = "^file\\d")))
combined %>%
group_by(V1, V2) %>%
mutate(time = sequence(n())) %>%
ungroup() %>%
spread(time, V3, fill = "")
# Source: local data frame [7 x 5]
#
# V1 V2 1 2 3
# 1 1 7 x
# 2 3 10 b
# 3 4 9 d j
# 4 12 13 a e m
# 5 13 15 b k
# 6 14 17 c c
# 7 24 9 d
Option 2: "data.table"
Use mget to put all of your data.frames into a list.
Use rbindlist to convert that list into a single data.table.
Use sequence(.N) to generate your sequence by your groups.
Use dcast.data.table to convert the "long" data.table into a "wide" one.
library(data.table)
dcast.data.table(
rbindlist(mget(ls(pattern = "^file\\d")))[,
time := sequence(.N), by = list(V1, V2)],
V1 + V2 ~ time, value.var = "V3", fill = "")
# V1 V2 1 2 3
# 1: 1 7 x
# 2: 3 10 b
# 3: 4 9 d j
# 4: 12 13 a e m
# 5: 13 15 b k
# 6: 14 17 c c
# 7: 24 9 d
Both of these answers assume we are starting with the following sample data:
file1 <- structure(
list(V1 = c(12L, 13L, 14L, 4L), V2 = c(13L, 15L, 17L, 9L),
V3 = c("a", "b", "c", "d")), .Names = c("V1", "V2", "V3"),
class = "data.frame", row.names = c(NA, -4L))
file2 <- structure(
list(V1 = c(12L, 3L, 14L, 4L), V2 = c(13L, 10L, 17L, 9L),
V3 = c("e", "b", "c", "j")), .Names = c("V1", "V2", "V3"),
class = "data.frame", row.names = c(NA, -4L))
file3 <- structure(
list(V1 = c(12L, 13L, 1L, 24L), V2 = c(13L, 15L, 7L, 9L),
V3 = c("m", "k", "x", "d")), .Names = c("V1", "V2", "V3"),
class = "data.frame", row.names = c(NA, -4L))