How would you substrate a column using tidyverse in R? [duplicate] - r

This question already has answers here:
Extract the first 2 Characters in a string
(4 answers)
Closed 2 years ago.
So I'm needing to rename a column in R and from that column I need to condense the column. For example in the initial data frame it would say "2017-18" and "2018-19" and I need it to condense to the first four digits, essentially cutting off the "-##" portion. I've attempted to use substr() and when I do it says that I'm having issues with it converting to characters or attempting to convert to a character.
data <- read_excel("nba.xlsx")
data1<- data %>%
rename(year=season) %>%
select(year)
data1 <- data1 + as.numeric(substr(year,1,4))
Above is my code that I currently and have tried rearranging and moving things around. Any help would be greatly appreciated. Thank you!

Use str_replace:
df <- tibble(season = c("2017-18", "2018-19"))
df %>% mutate(year = str_replace(season, "-.*", ""))
# A tibble: 2 x 2
season year
<chr> <chr>
1 2017-18 2017
2 2018-19 2018
Alternately, use str_sub:
str_sub(season, 1, 4)

Related

R how to find any months in a column [duplicate]

This question already has answers here:
Is there an R function for finding the index of an element in a vector?
(4 answers)
Closed 1 year ago.
I have a column of random words, and some words contain the months (it could be anything such as Jan, or Dec), I want to be able to find those row numbers with months name. How can I do that?
df = tibble(word=c("asd","May","jbsd"))
grepl(c("Jan","Feb","Mar","Apr","May", etc), df[["word"]])
You can use which with %in% to get the row number of the match.
which(df$word %in% month.abb)
#[1] 2
Note that mont.abb is locale-specific so if df$word is in English it is expected that your system locale is of the same language.
Edit: just saw the comments, I let mine in case you want to have the month name
Using dplyr:
df %>%
mutate(rownumber = row_number()) %>%
filter(word %in% month.abb)
Output:
# A tibble: 1 x 2
word rownumber
<chr> <int>
1 May 2

How do I split number in a column into two columns at certain length in data frame [duplicate]

This question already has answers here:
Convert integer as "20160119" to different columns of "day" "year" "month"
(5 answers)
Closed 2 years ago.
I have a dataframe:
df <- data.frame(year = c(200501:200512))
and I want to split the column into two at the 4th number, so that my data frame look like this:
df_split <- data.frame(year = rep(c(2005)), month = c(01:12)).
This is not so much a question about data frames, but about vectors in R in general. If your actual problem contains more nuances, then update your question or post a new question.
If your vector year is numerical (as asked) you can do simple maths:
year0 <- 200501:200512
year <- as.integer(year0 / 100) # examine the result of `year0 / 100` to see why I use `as.integer` at all and not `round`.
month <- year0 - year
Edit: As nicola pointed out, we can calculate it in other ways, with exact same result:
year <- year0 %/% 100
month <- year0 %% 100
Alternatively, tidyr version may be more compact
library(tidyr)
df %>% separate(year, into = c("yr", "mth"), sep = 4, convert = TRUE)

How to add a new column to calculate mean for each group using dplyr in R [duplicate]

This question already has answers here:
Adding a column of means by group to original data [duplicate]
(4 answers)
Closed 2 years ago.
I have a table with 2 columns.
Type: 1 or 2 or 3 or 4
Data: corresponding data (there are multiple data for each type)
Now I want to create a third column that contains means of data each type i.e., all the rows with type 1 have the same mean value. I think I should do it with mutate function but not sure how to proceed.
data %>% mutate(meanData = ifelse(...))
Can somebody help?
Thank you in advance.
We can do a group by operation
library(dplyr)
data <- data %>%
group_by(Type) %>%
mutate(meanData = mean(Data))

Can I sort a column (Character) by last part of a string / value? [duplicate]

This question already has answers here:
R Sort strings according to substring
(2 answers)
Closed 3 years ago.
I have a data.frame with a column (character) that has a list of values such as (the prefix refers to the season and suffix a year):
Wi_1984,
Su_1985,
Su_1983,
Wi_1982,
Su_1986,
Su_1984,
I want to keep the column type and format as it is, but what I would like to do is order the df by this column in ascending season_year order. So I would like to produce:
Wi_1982,
Su_1983,
Su_1984,
Wi_1984,
Su_1985,
Su_1986,
Using normal sorting will arrange by Wi_ or Su_ and not by _1984 i.e. _year. Any help much appreciated. If this could be done in dplyr / tidyverse that would be grand.
We can use parse_number to get the numeric part and use that in arrange
library(dplyr)
library(readr)
df1 %>%
arrange(parse_number(col1))
Or if the numbers can appear as prefix, then extract the last part
df1 %>%
arrange(as.numeric(str_extract(col1, "\\d+$")))
To answer based on #zx8754 comment, you can do,
library(dplyr)
df %>%
separate(X1, into = c('season', 'year')) %>%
arrange_at(vars(c(2, 1)))
which gives,
# A tibble: 6 x 2
season year
<chr> <chr>
1 Wi 1982
2 Su 1983
3 Su 1984
4 Wi 1984
5 Su 1985
6 Su 1986
In base R, we can extract the numeric part using sub and order
df[order(as.integer(sub(".*?(\\d+)", "\\1", df$col))), ]

tidyr wide to long? [duplicate]

This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Closed 3 years ago.
I'm trying to learn how to use tidyr to transfor wide data to long. Suppose my data looks like this:
df1 <- data.frame(V1=c(1.5,7.4),
V2=c(6.7,8.8),
sim=c(1,2))
I want to transform me to look like this:
df2 <- data.frame(sim=c(1,1,2,2),
Value=c(1.5,6.7,7.4,8.8))
I think gather is the function that I have to use, but I'm not understanding the help file. This is what I have right now:
df1 %>%
gather(key=sim, value=V1:V2)
The error says "Error: Invalid column specification"
Thanks!
Try
library(tidyr)
df1 %>%
gather(sim1, Value, V1:V2) %>%
select(-sim1) %>%
arrange(sim)
# sim Value
#1 1 1.5
#2 1 6.7
#3 2 7.4
#4 2 8.8
According to ?gather
gather(data, key, value, ..., na.rm = FALSE, convert = FALSE)
key,value: Names of key and value columns to create in output.
...: Specification of columns to gather. Use bare variable names.
Select all variables between x and z with ā€˜x:zā€™, exclude y
with ā€˜-yā€™. For more options, see the select documentation.

Resources