R how to find any months in a column [duplicate] - r

This question already has answers here:
Is there an R function for finding the index of an element in a vector?
(4 answers)
Closed 1 year ago.
I have a column of random words, and some words contain the months (it could be anything such as Jan, or Dec), I want to be able to find those row numbers with months name. How can I do that?
df = tibble(word=c("asd","May","jbsd"))
grepl(c("Jan","Feb","Mar","Apr","May", etc), df[["word"]])

You can use which with %in% to get the row number of the match.
which(df$word %in% month.abb)
#[1] 2
Note that mont.abb is locale-specific so if df$word is in English it is expected that your system locale is of the same language.

Edit: just saw the comments, I let mine in case you want to have the month name
Using dplyr:
df %>%
mutate(rownumber = row_number()) %>%
filter(word %in% month.abb)
Output:
# A tibble: 1 x 2
word rownumber
<chr> <int>
1 May 2

Related

Is there any command which give from a specific numeric column how many times a number exists? [duplicate]

This question already has answers here:
Counting the number of elements with the values of x in a vector
(20 answers)
Counting distinct values in column of a data frame in R
(2 answers)
Closed 4 months ago.
Having a specific column like this number_of_columns_with_text:
df <- data.frame(id = c(1,2,3,4,5,6), number_of_columns_with_text = c(3,2,1,3,1,1))
Is there any command which could give the sum of the numbers exists in this column (how many times a number exists).
Example output
data.frame(number = c(1,2,3), volume = c(3,1,2))
What you might be looking for is table(...)
> table(df$number_of_columns_with_text)
1 2 3
3 1 2
In dplyr, you can first group_by the variable you want to tabulate and then use n() to count the frequencies of the distinct values:
library(dplyr)
df %>%
group_by(number_of_columns_with_text)%>%
summarise(volume = n())
# A tibble: 3 × 2
number_of_columns_with_text volume
<dbl> <int>
1 1 3
2 2 1
3 3 2
Using dplyr
library(tidyverse)
df %>%
group_by(number_of_columns_with_text) %>%
count()

How would you substrate a column using tidyverse in R? [duplicate]

This question already has answers here:
Extract the first 2 Characters in a string
(4 answers)
Closed 2 years ago.
So I'm needing to rename a column in R and from that column I need to condense the column. For example in the initial data frame it would say "2017-18" and "2018-19" and I need it to condense to the first four digits, essentially cutting off the "-##" portion. I've attempted to use substr() and when I do it says that I'm having issues with it converting to characters or attempting to convert to a character.
data <- read_excel("nba.xlsx")
data1<- data %>%
rename(year=season) %>%
select(year)
data1 <- data1 + as.numeric(substr(year,1,4))
Above is my code that I currently and have tried rearranging and moving things around. Any help would be greatly appreciated. Thank you!
Use str_replace:
df <- tibble(season = c("2017-18", "2018-19"))
df %>% mutate(year = str_replace(season, "-.*", ""))
# A tibble: 2 x 2
season year
<chr> <chr>
1 2017-18 2017
2 2018-19 2018
Alternately, use str_sub:
str_sub(season, 1, 4)

Can I sort a column (Character) by last part of a string / value? [duplicate]

This question already has answers here:
R Sort strings according to substring
(2 answers)
Closed 3 years ago.
I have a data.frame with a column (character) that has a list of values such as (the prefix refers to the season and suffix a year):
Wi_1984,
Su_1985,
Su_1983,
Wi_1982,
Su_1986,
Su_1984,
I want to keep the column type and format as it is, but what I would like to do is order the df by this column in ascending season_year order. So I would like to produce:
Wi_1982,
Su_1983,
Su_1984,
Wi_1984,
Su_1985,
Su_1986,
Using normal sorting will arrange by Wi_ or Su_ and not by _1984 i.e. _year. Any help much appreciated. If this could be done in dplyr / tidyverse that would be grand.
We can use parse_number to get the numeric part and use that in arrange
library(dplyr)
library(readr)
df1 %>%
arrange(parse_number(col1))
Or if the numbers can appear as prefix, then extract the last part
df1 %>%
arrange(as.numeric(str_extract(col1, "\\d+$")))
To answer based on #zx8754 comment, you can do,
library(dplyr)
df %>%
separate(X1, into = c('season', 'year')) %>%
arrange_at(vars(c(2, 1)))
which gives,
# A tibble: 6 x 2
season year
<chr> <chr>
1 Wi 1982
2 Su 1983
3 Su 1984
4 Wi 1984
5 Su 1985
6 Su 1986
In base R, we can extract the numeric part using sub and order
df[order(as.integer(sub(".*?(\\d+)", "\\1", df$col))), ]

How can I count the frequency of a variable in R? [duplicate]

This question already has answers here:
Count number of occurences for each unique value
(14 answers)
Counting the number of elements with the values of x in a vector
(20 answers)
Closed 3 years ago.
I am currently trying to count the frequency of countries that appear in a dataframe object.
I tried using count commands as well as rle(sort(x)), which apparently is used to search for strings. But it does not seem to yield any results.
rle(sort(x))
I tried using this, but does not seem to work. I also tried to use
count(x, "COUNTRY")
but all it does is count how many entries are there.
How can I get a result such as:
Country Frequency
[1] United States 3
[2] Mexico 5
[3] Germany 12
Here is a small example using dplyr and the built-in dataset mtcars:
library(dplyr)
mtcars %>%
group_by(cyl) %>%
count(cyl)
or
mtcars %>%
group_by(cyl) %>%
add_count(cyl)
other solution is: table(yourdataframe$x)
count(x,Country,Frequency)
Have to include both to see a deeper breakdown then it'll count the countries and Frequency
or
X%>%group_by(Country)%>%summarise(sum = sum(Frequency), n = n())

how do I generate a new column in a tibble by combining 2 other columns? [duplicate]

This question already has answers here:
How to combine multiple character columns into a single column in an R data frame
(7 answers)
Closed 4 years ago.
I have a tibble with the variables Year and Quarter in separate columns in the format '2006' (Year) and '2' (Quarter). I want to merge these to create a new column called Year_Qtr in for format 2006-2 or something similar.
I have tried the unite function with the code:
SA_Long_Data %>%
unite(Year_Qtr, Year, Quarter)
This has created the variable I need but produced it as a table rather than a new column within the tibble.
Can anyone help?
You have a remove parameter in unite which is TRUE by default, set it to FALSE. You can also use the sep argument to have dashes instead of underscores.
library(tidyverse)
SA_Long_Data <- tibble::tibble(Year=1:3,Quarter=5:7)
SA_Long_Data %>% unite(Year_Qtr, Year, Quarter,remove = FALSE,sep="-")
# A tibble: 3 x 3
Year_Qtr Year Quarter
* <chr> <int> <int>
1 1-5 1 5
2 2-6 2 6
3 3-7 3 7

Resources