tidyr wide to long? [duplicate] - r

This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Closed 3 years ago.
I'm trying to learn how to use tidyr to transfor wide data to long. Suppose my data looks like this:
df1 <- data.frame(V1=c(1.5,7.4),
V2=c(6.7,8.8),
sim=c(1,2))
I want to transform me to look like this:
df2 <- data.frame(sim=c(1,1,2,2),
Value=c(1.5,6.7,7.4,8.8))
I think gather is the function that I have to use, but I'm not understanding the help file. This is what I have right now:
df1 %>%
gather(key=sim, value=V1:V2)
The error says "Error: Invalid column specification"
Thanks!

Try
library(tidyr)
df1 %>%
gather(sim1, Value, V1:V2) %>%
select(-sim1) %>%
arrange(sim)
# sim Value
#1 1 1.5
#2 1 6.7
#3 2 7.4
#4 2 8.8
According to ?gather
gather(data, key, value, ..., na.rm = FALSE, convert = FALSE)
key,value: Names of key and value columns to create in output.
...: Specification of columns to gather. Use bare variable names.
Select all variables between x and z with ‘x:z’, exclude y
with ‘-y’. For more options, see the select documentation.

Related

How to add a new column to calculate mean for each group using dplyr in R [duplicate]

This question already has answers here:
Adding a column of means by group to original data [duplicate]
(4 answers)
Closed 2 years ago.
I have a table with 2 columns.
Type: 1 or 2 or 3 or 4
Data: corresponding data (there are multiple data for each type)
Now I want to create a third column that contains means of data each type i.e., all the rows with type 1 have the same mean value. I think I should do it with mutate function but not sure how to proceed.
data %>% mutate(meanData = ifelse(...))
Can somebody help?
Thank you in advance.
We can do a group by operation
library(dplyr)
data <- data %>%
group_by(Type) %>%
mutate(meanData = mean(Data))

How would you substrate a column using tidyverse in R? [duplicate]

This question already has answers here:
Extract the first 2 Characters in a string
(4 answers)
Closed 2 years ago.
So I'm needing to rename a column in R and from that column I need to condense the column. For example in the initial data frame it would say "2017-18" and "2018-19" and I need it to condense to the first four digits, essentially cutting off the "-##" portion. I've attempted to use substr() and when I do it says that I'm having issues with it converting to characters or attempting to convert to a character.
data <- read_excel("nba.xlsx")
data1<- data %>%
rename(year=season) %>%
select(year)
data1 <- data1 + as.numeric(substr(year,1,4))
Above is my code that I currently and have tried rearranging and moving things around. Any help would be greatly appreciated. Thank you!
Use str_replace:
df <- tibble(season = c("2017-18", "2018-19"))
df %>% mutate(year = str_replace(season, "-.*", ""))
# A tibble: 2 x 2
season year
<chr> <chr>
1 2017-18 2017
2 2018-19 2018
Alternately, use str_sub:
str_sub(season, 1, 4)

Can I sort a column (Character) by last part of a string / value? [duplicate]

This question already has answers here:
R Sort strings according to substring
(2 answers)
Closed 3 years ago.
I have a data.frame with a column (character) that has a list of values such as (the prefix refers to the season and suffix a year):
Wi_1984,
Su_1985,
Su_1983,
Wi_1982,
Su_1986,
Su_1984,
I want to keep the column type and format as it is, but what I would like to do is order the df by this column in ascending season_year order. So I would like to produce:
Wi_1982,
Su_1983,
Su_1984,
Wi_1984,
Su_1985,
Su_1986,
Using normal sorting will arrange by Wi_ or Su_ and not by _1984 i.e. _year. Any help much appreciated. If this could be done in dplyr / tidyverse that would be grand.
We can use parse_number to get the numeric part and use that in arrange
library(dplyr)
library(readr)
df1 %>%
arrange(parse_number(col1))
Or if the numbers can appear as prefix, then extract the last part
df1 %>%
arrange(as.numeric(str_extract(col1, "\\d+$")))
To answer based on #zx8754 comment, you can do,
library(dplyr)
df %>%
separate(X1, into = c('season', 'year')) %>%
arrange_at(vars(c(2, 1)))
which gives,
# A tibble: 6 x 2
season year
<chr> <chr>
1 Wi 1982
2 Su 1983
3 Su 1984
4 Wi 1984
5 Su 1985
6 Su 1986
In base R, we can extract the numeric part using sub and order
df[order(as.integer(sub(".*?(\\d+)", "\\1", df$col))), ]

How do I merge rows with the same name in R? [duplicate]

This question already has answers here:
How to get summary statistics by group
(14 answers)
Mean per group in a data.frame [duplicate]
(8 answers)
Closed 5 years ago.
I'm trying to merge rows of a data set by using the mean operator.
Basically, I want to convert data set 1 into data set 2 (see below)
1. ID MEASUREMENT 2. ID MEASURE
A 20 A 22.5
B 30 B 30
A 25 .
. .
. .
How can I do this on R?
Note that in contrast to the example I have given here, my data set is really large and I can't look through the data set, group rows according to their id's then find colMeans.
My thoughts are to order the dataset, separate the measures for each id, then find each mean and regroup the data. However, this will be very time consuming.
I would really appreciate if someone can assist me with a direct code or even a for loop.
This code should be able to do that for you.
library(data.table)
setDT(dat)
dat = dat[ , .(MEASURE = mean(MEASUREMENT)), by = .(ID)]
Just to be a little more complete i'll throw in an example and a way to do this in base R.
Data:
dat = data.frame(ID = c("A","A","A","B","B","C"), MEASUREMENT = c(1:3,61,13,7))
With only base R functions:
aggregate(MEASUREMENT ~ ID, FUN = mean, dat)
ID MEASUREMENT
1 A 2
2 B 37
3 C 7
With data.table:
library(data.table)
setDT(dat)
dat = dat[ , .(MEASURE = mean(MEASUREMENT)), by = .(ID)]
> dat
ID MEASURE
1: A 2
2: B 37
3: C 7
You can also do this easily in dplyr, assuming your data is in df
library(dplyr)
df <- df %>%
group_by(ID) %>%
summarize(MEASURE = mean(MEASUREMENT))

What is the right way to reference part of a dataframe after piping? [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
What is the correct way to do something like this? I am trying to get the colSums of each group for specific columns. The . syntax seems incorrect with this type of subsetting.
csv<-data.frame(id_num=c(1,1,1,2,2),c(1,2,3,4,5),c(1,2,3,3,3))
temp<-csv%>%group_by(id_num)%>%colSums(.[,2:3],na.rm=T)
This can be done with summarise_each or in the recent version additional functions like summarise_at, summarise_if were introduced for convenient use.
csv %>%
group_by(id_num) %>%
summarise_each(funs(sum))
csv %>%
group_by(id_num) %>%
summarise_at(2:3, sum)
If we are using column names, wrap it with vars in the summarise_at
csv %>%
group_by(id_num) %>%
summarise_at(names(csv)[-1], sum)
NOTE: In the OP's dataset, the column names for the 2nd and 3rd columns were not specified resulting in something like c.1..2..3..4..5.
Using the vars to apply the function on the selected column names
csv %>%
group_by(id_num) %>%
summarise_at(vars(c.1..2..3..4..5.), sum)
# # A tibble: 2 × 2
# id_num c.1..2..3..4..5.
# <dbl> <dbl>
#1 1 6
#2 2 9

Resources