How to combine multiple .csv files with different columns in R? [duplicate] - r

This question already has answers here:
Combine two data frames by rows (rbind) when they have different sets of columns
(14 answers)
Closed 6 years ago.
Suppose I've 8 tables. Let 6 columns be same in each of those tables but, among those 8 tables 5 of them has one extra column(whose column name is same in all 5, that means those 5 of them has in total of 7 columns).
My question is how we will bind all 8 tables such that the other 3 tables also now consist of that extra column which the other 5 has.
I hope the question is quite clear.

You can use rbind.fill from the plyr package for this:
library(plyr)
# df_list contains a list of all the csv files you read, e.g. using lapply(list_paths, read.csv)
df_list = list(data.frame(a = c(1,2), b = c(3,4)),
data.frame(a = c(4,5), b = c(6,3), c = c(20, 21)))
> do.call('rbind.fill', df_list)
a b c
1 1 3 NA
2 2 4 NA
3 4 6 20
4 5 3 21
or alternatively, use rbindlist from data.table as #akrun suggested. This is probably a lot faster for larger datasets.

Related

How to duplicate a row based on the presence of multiple values in a column in R [duplicate]

This question already has answers here:
Unlist data frame column preserving information from other column
(3 answers)
Closed 2 years ago.
I have a dataframe with phonetic transcriptions of words called trans, and a column pos_numwhich records the position of the phoneme tin the transcription strings.
df <- data.frame(
trans = c("ðət", "əˈpærəntli", "ˈkɒntrækt", "təˈwɔːdz", "pəˈteɪtəʊz"), stringsAsFactors = F
)
df$pos_num <- sapply(strsplit(df$trans, ""), function(x) which(grepl("t", x)))
df
trans pos_num
1 ðət 3
2 əˈpærəntli 8
3 ˈkɒntrækt 5, 9
4 təˈwɔːdz 1
5 pəˈteɪtəʊz 4, 7
In some transcriptions, t occurs more than once, resulting in multiple values in pos_num. Where this is the case I would like to duplicate the entire row, with the original row containing one value and the duplicated row containing the other value. The desired output would be:
df
trans pos_num
1 ðət 3
2 əˈpærəntli 8
3 ˈkɒntrækt 5
4 ˈkɒntrækt 9
5 təˈwɔːdz 1
6 pəˈteɪtəʊz 4
7 pəˈteɪtəʊz 7
How can this be achieved? (There seem to be a few posts on that question for other programming languages but not R.)
library(data.table)
setDT(df)
df[, .(pos_num = unlist((pos_num))),by = .(trans)]

Is there a good way to compare 2 data tables but compare the data from i to data of i+1 in second data table [duplicate]

This question already has answers here:
Remove duplicated rows
(10 answers)
Closed 2 years ago.
I have tried various functions including compare and all.equal but I am having difficulty finding a test to see if variables are the same.
For context, I have a data.frame which in some cases has a duplicate result. I have tried copying the data.frame so I can compare it with itself. I would like to remove the duplicates.
One approach I considered was to look at row A from dataframe 1 and subtract it from row B from dataframe 2. If they equal to zero, I planned to remove one of them.
Is there an approach I can use to do this without copying my data?
Any help would be great, I'm new to R coding.
Suppose I had a data.frame named data:
data
Col1 Col2
A 1 3
B 2 7
C 2 7
D 2 8
E 4 9
F 5 12
I can use the duplicated function to identify duplicated rows and not select them:
data[!duplicated(data),]
Col1 Col2
A 1 3
B 2 7
D 2 8
E 4 9
F 5 12
I can also perform the same action on a single column:
data[!duplicated(data$Col1),]
Col1 Col2
A 1 3
B 2 7
E 4 9
F 5 12
Sample Data
data <- data.frame(Col1 = c(1,2,2,2,4,5), Col2 = c(3,7,7,8,9,12))
rownames(data) <- LETTERS[1:6]

Repeating rows in data frame by using the content of a column in R [duplicate]

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 2 years ago.
I want to create a data frame by repeating rows by using content of a column in a data frame. Below is the source data frame.
data.frame(c("a","b","c"), c(4,5,6), c(2,2,3)) -> df
colnames(df) <- c("sample", "measurement", "repeat")
df
sample measurement repeat
1 a 4 2
2 b 5 2
3 c 6 3
I want to repeat the rows by using the "repeat" column and its content to get a data frame like the one below. Ideally, I would like to have a function to this.
sample measurement repeat
1 a 4 2
2 a 4 2
3 b 5 2
4 b 5 2
5 c 6 3
6 c 6 3
7 c 6 3
Thanks in advance!
Solved. df[rep(rownames(df), df$repeat), ] did the job.

Compare two datasets to find the rows that are not present in one of the datset in r [duplicate]

This question already has answers here:
Find complement of a data frame (anti - join)
(7 answers)
Closed 3 years ago.
I have two datasets, the Ids in the datasets are unordered and there are multiple values which are present in one dataset but not in the other dataset.
What I want at the end is csv file which contains the non-common Ids of both the dataset columns.
Dataset 1
Id Quant
1 a
2 b
3 c
4 d
5 e
6 f
7 g
Dataset 2
Id Quant2
6 d
4 a
5 f
2 e
1 a
3 b
You can use the dplyr package which has a anti_join function for precisely this task:
library(dplyr)
anti_join(dataset1, dataset2, by = "Id")
This will return all rows of dataset1 where there is no matching Id in dataset2. Similarly you can take a look at
anti_join(dataset2, dataset1, by = "Id")

Combining multiple columns in one R [duplicate]

This question already has answers here:
Flatting a dataframe with all values of a column into one
(3 answers)
Closed 5 years ago.
How can I combine multiple all dataframe's columns in just 1 column? , in an efficient way... I mean not using the column names to do it, using dplyr or tidyr on R, cause I have too much columns (10.000+)
For example, converting this data frame
> Multiple_dataframe
a b c
1 4 7
2 5 8
3 6 9
to
> Uni_dataframe
d
1
2
3
4
5
6
7
8
9
I looked around Stack Overflow but without success.
We can use unlist
Uni_dataframe <- data.frame(d = unlist( Multiple_dataframe, use.names = FALSE))
Or using dplyr/tidyr (as the question is specific about it)
library(tidyverse)
Uni_dataframe <- gather(Multiple_dataframe, key, d) %>%
select(-key)

Resources