merge list elements which are different dimensions r [duplicate] - r

This question already has answers here:
Convert data from long format to wide format with multiple measure columns
(6 answers)
Closed 3 years ago.
I have a df:
df= data.frame(year=c(rep(2018,4),rep(2017,3)),Area=c(1:4,1:3),P=1:7,N=1:7)
I want to split it by years, and then merge everything together again so I can see years as columns for each area. In order to do this, I am splitting and merging:
s=split(df,df$year)
m=merge(s[[1]][,2:4],[s[[2]][,2:4],by='Area',all=1)
colnames(m)=c('area','P2018','C2018','P2017','C2017')
I am sure there is a more efficient way, expecially as the possibility for errors is very high once I include data from other years.
Any suggestions?

We can gather data to long form excluding year and Area column, unite the year and then spread it to wide format.
library(dplyr)
library(tidyr)
df %>%
gather(key, value, -year, -Area) %>%
unite(key, key, year, sep = "") %>%
spread(key, value)
# Area N2017 N2018 P2017 P2018
#1 1 5 1 5 1
#2 2 6 2 6 2
#3 3 7 3 7 3
#4 4 NA 4 NA 4

We can do this with dcast from data.table which can take multiple value.var columns
library(data.table)
dcast(setDT(df), Area ~ year, value.var = c("P", "N"))
# Area P_2017 P_2018 N_2017 N_2018
#1: 1 5 1 5 1
#2: 2 6 2 6 2
#3: 3 7 3 7 3
#4: 4 NA 4 NA 4

Related

Filling out missing information by grouping in R [duplicate]

This question already has answers here:
Replace NA with previous or next value, by group, using dplyr
(5 answers)
Closed 5 months ago.
I have a sample dataset below:
df <- data.frame(id = c(1,1,2,2,3,3),
gender = c("male",NA,"female","female",NA, "female"))
> df
id gender
1 1 male
2 1 <NA>
3 2 female
4 2 female
5 3 <NA>
6 3 female
By grouping the same ids, some rows are missing. What I would like to do is to fill those missing cells based on the existing information.
SO the desired output would be:
> df
id gender
1 1 male
2 1 male
3 2 female
4 2 female
5 3 female
6 3 female
Any thoughts?
Thanks!
You can use dplyr::group_by and tidyr::fill e.g.:
df |>
dplyr::group_by(id) |>
tidyr::fill(gender, .direction = "updown")

Invert rows using dplyr [duplicate]

This question already has answers here:
Reorder the rows of data frame in dplyr
(2 answers)
dplyr arrange by reverse alphabetical order [duplicate]
(1 answer)
Closed 3 years ago.
How can I invert the rows of a dataframe/tibble using dplyr? I don't want to arrange it by a certain variable, but rather have it just inverted.
I.e. the tibble
# A tibble: 5 x 2
a b
<int> <chr>
1 1 one
2 2 two
3 3 three
4 4 four
5 5 five
should become
# A tibble: 5 x 2
a b
<int> <chr>
1 5 five
2 4 four
3 3 three
4 2 two
5 1 one
Just arrange() by descending row_number() like this:
my_tibble %>%
dplyr::arrange(-dplyr::row_number())
We can use desc
my_tibble %>%
arrange(desc(row_number()))
Or another option is slice
my_tibble %>%
slice(rev(row_number()))
Or the 'a' column
my_tibble %>%
arrange(desc(a))
# a b
#1 5 five
#2 4 four
#3 3 three
#4 2 two
#5 1 one

R: reshape dataframe with duplicated variable names labeled var.1, var.2 [duplicate]

This question already has answers here:
R: reshaping wide to long [duplicate]
(1 answer)
Using tidyr to combine multiple columns [duplicate]
(1 answer)
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Closed 4 years ago.
I'm hoping to reshape a dataframe in R so that a set of columns read in with duplicated names, and then renamed as var, var.1, var.2, anothervar, anothervar.1, anothervar.2 etc. can be treated as independent observations. I would like the number appended to the variable name to be used as the observation so that I can melt my data.
For example,
dat <- data.frame(ID=1:3, var=c("A", "A", "B"),
anothervar=c(5,6,7),var.1=c(C,D,E),
anothervar.1 = c(1,2,3))
> dat
ID var anothervar var.1 anothervar.1
1 1 A 5 C 1
2 2 A 6 D 2
3 3 B 7 E 3
How can I reshape the data so it looks like the following:
ID obs var anothervar
1 1 A 5
1 2 C 1
2 1 A 6
2 2 D 2
3 1 B 7
3 2 E 3
Thank you for your help!
We can use melt from data.table that takes multiple patterns in the measure
library(data.table)
melt(setDT(dat), measure = patterns("^var", "anothervar"),
variable.name = "obs", value.name = c("var", "anothervar"))[order(ID)]
# ID obs var anothervar
#1: 1 1 A 5
#2: 1 2 C 1
#3: 2 1 A 6
#4: 2 2 D 2
#5: 3 1 B 7
#6: 3 2 E 3
As for a tidyverse solution, we can use unite with gather
dat %>%
unite("1", var, anothervar) %>%
unite("2", var.1, anothervar.1) %>%
gather(obs, value, -ID) %>%
separate(value, into = c("var", "anothervar"))
# ID obs var anothervar
#1 1 1 A 5
#2 2 1 A 6
#3 3 1 B 7
#4 1 2 C 1
#5 2 2 D 2
#6 3 2 E 3

Collapsing a dataframe in R [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
I'm attempting to collapse a dataframe onto itself. The aggregate dataset seems like my best bet but I'm not sure how to have some columns add themselves and others remain the same.
My dataframe looks like this
A 1 3 2
A 2 3 4
B 1 2 4
B 4 2 2
How can I use the aggergate function or the ddply function to create something that looks like this:
A 3 3 6
B 5 2 6
We can use dplyr
library(dplyr)
df1 %>%
group_by(col1) %>%
summarise_each(funs(if(n_distinct(.)==1) .[1] else sum(.)))
Or another option if the column 'col3' is the same would be to keep it in the group_by and then summarise others
df1 %>%
group_by(col1, col3) %>%
summarise_each(funs(sum))
# col1 col3 col2 col4
# <chr> <int> <int> <int>
#1 A 3 3 6
#2 B 2 5 6
Or with aggregate
aggregate(.~col1+col3, df1, FUN = sum)
# col1 col3 col2 col4
#1 B 2 5 6
#2 A 3 3 6

Group by, take count and filter out entries corresponding to count greater than 1 [duplicate]

This question already has answers here:
Remove duplicated rows
(10 answers)
Closed 6 years ago.
The following is my data,
data
date number value
2016-05-05 1 5
2016-05-05 1 6
2016-05-06 2 7
2016-05-06 2 8
2016-05-07 3 9
2016-05-08 4 10
2016-05-09 5 11
When I use the following command,
data %>% groupby(date, number) %>% summarize(count = n())
I get the following,
date number count
2016-05-05 1 2
2016-05-06 2 2
2016-05-07 3 1
2016-05-08 4 1
2016-05-09 5 1
Now I want to filter out the entries corresponding to the counts greater than 1. I want to remove the combination entries which has count greater than 1. My output should be like the following,
data
date number value
2016-05-07 3 9
2016-05-08 4 10
2016-05-09 5 11
where the first four entries, since it has count greater than 1 , has been filtered out. Can anybody help me in doing this? Or give some idea related to it?
We can use filter after grouping by 'date', 'number' and check whether the number of rows (n()) is equal to 1 and keep those rows with the filter command.
library(dplyr)
data %>%
group_by(date, number) %>%
filter(n() ==1)
# date number value
# <chr> <int> <int>
#1 2016-05-07 3 9
#2 2016-05-08 4 10
#3 2016-05-09 5 11
Just to provide some alternatives using data.table
library(data.table)
setDT(data)[, if(.N == 1) .SD , .(date, number)]
Or with base R
data[with(data, ave(number, number, date, FUN = length) ==1),]

Resources