counting consecutive value in a column by id [closed] - r

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a data frame as below:
df <- data.frame(
id= c(1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3),
name= c("john","bob","bob","bob","bob","bob","leo","bob","bob","max","mike","mike","mike","mike","mike","mike","mike","Ronaldo","mike")
)
I want to count how many times a particular value is present in name column back to back group by id
what I expect is as below:
expected_output<-data.frame(
id=c(2,3),
column_name="name",
value=c("bob","Mike"),
count=c(5,7))
Thanks for helping in advance

If you want to select the maximum consecutive name for each id you can first count consecutive names using data.table::rleid and keep only the max value in each id.
library(dplyr)
df %>%
count(id, name, cons = data.table::rleid(name), name = 'count') %>%
group_by(id) %>%
slice(which.max(count)) %>%
select(-cons)
# id name count
# <dbl> <chr> <int>
#1 1 john 1
#2 2 bob 5
#3 3 mike 7

Related

randomly sampling of dataset to decrease the values in the dataset [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am currently trying to decrease the values in a column randomly according to a given sum.
For example, if the main data is like this;
ID Value
1 4
2 10
3 16
after running the code the sum of Value should be 10 and this need to be done randomly(the decrease for each member should be chosen randomly)
ID Value
1 1
2 8
3 1
Tried several command and library but could not manage it. Still a novice and
Any help would be appreciated!
Thanks
Edit: Sorry I was not clear enough. I would like to assign a new value for each observation smaller than original (randomly). And at the end new sum of value will be equal to 10
Using the sample data
dd <- read.table(text="ID Value
1 4
2 10
3 16", header=TRUE)
and the dplyr + tidyr library, you can do
library(dplyr)
library(tidyr)
dd %>%
mutate(ID=factor(ID)) %>%
uncount(Value) %>%
sample_n(10) %>%
count(ID, name = "Value", .drop=FALSE)
Here we repeat the row once for each Value, then we randomly sample 10 rows, then we count them back up. We turn ID to a factor to make sure IDs with 0 observations are preserved.

Use str_detect to find the ID of the novel Pride and Prejudice [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
Use str_detect to find the ID of the novel Pride and Prejudice.
How many different ID numbers are returned?
library(tidyverse)
library(gutenbergr)
library(tidytext)
options(digits = 3)
An option is to filter the rows based on the substring "Pride and Prejudice" in the 'title' column and get the number of distinct 'gutenberg_id' s with n_distinct. If the ids are alreaydy unique, just do summarise(n = n())
library(gutenbergr)
library(stringr)
gutenberg_metadata %>%
filter(str_detect(title, "Pride and Prejudice")) %>%
summarise(n = n_distinct(gutenberg_id))
# A tibble: 1 x 1
# n
# <int>
#1 6

Moving words in a cell to individual columns [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I have a csv file that has a column with multiple words in each cell. I wonder if there's any R function to move words in each cell to individual cells.
The following are data in two cells in the dataset:
arecapalm,betelnut,konkan,nature,traveldiaries,mirrorlessframes
passangerstories,chakarmanee,atranginikhil,maharashtra,india
Thanks. Any help appreciated.
Chamil
Let's assume this data.frame:
require(dplyr)
require(tidyr)
df<-data.frame(id=1:2, words=c("arecapalm,betelnut,konkan,nature,traveldiaries,mirrorlessframes","passangerstories,chakarmanee,atranginikhil,maharashtra,india"))
df
# id words
#1 1 arecapalm,betelnut,konkan,nature,traveldiaries,mirrorlessframes
#2 2 passangerstories,chakarmanee,atranginikhil,maharashtra,india
Then we can run this using dplyr and tidyr to break down the words cells into multiple columns:
df %>% separate_rows(words) %>%
group_by(id) %>%
mutate(wordid=row_number()) %>%
spread(wordid,words,sep=".")
# A tibble: 2 x 7
# Groups: id [2]
id wordid.1 wordid.2 wordid.3 wordid.4 wordid.5 wordid.6
<int> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 arecapalm betelnut konkan nature traveldiaries mirrorlessframes
2 2 passangerstories chakarmanee atranginikhil maharashtra india NA

Sort data in a data frame and rank them [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a data frame like this
Name Value
A. -5
B. 100
F. 0
G. -5
I want to sort the data in an ascending order and add a rank column. So I want something like this:
Name. Value. Rank
A. -5. 1
G. -5. 1
F. 0. 2
B. 100. 3
A base R solution could be:
v1 <- order(df$Value)
data.frame(df[v1, ], rank = as.numeric(factor(df$Value[v1])))
# Name Value rank
#1 A. -5 1
#4 G. -5 1
#3 F. 0 2
#2 B. 100 3
Sorting the dataframe with order and converting the sorted Value to factors and then numeric so that the Value with same value would get same rank.
This can be achieved easily with the dplyr package.
#Recreate the data
df <- read.table(text = "Name Value
A. -5
B. 100
F. 0
G. -5", header = TRUE)
library(dplyr)
df %>% arrange(Value) %>% mutate(Rank = dense_rank(Value))
The dplyr function reads take the data frame df, then arrange it by Value, then add a new column Rank which equals the dense ranking of Value.

How to separate data based on different variable values [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have a dataset of around 1.5 L observations and 2 variables: name and amount. name can have same value again and again, for example a name ABC can appear 50 times in the dataset.
I want a new data frame with two variables: name and total amount, where each name has a unique value and total amount is the sum of all amounts in previous dataset. For example if ABC appears three times with amount == 1, 2 and 3 respectively in the previous dataset then in the new dataset, ABC will only appear one time with total amount == 6.
You can use data.table for big datasets:
library(data.table)
res<- setDT(df)[, list(Total_Amount=sum(amount)), by=name]
Or use dplyr
library(dplyr)
df %>%
group_by(name) %>%
summarise(Total_Amount=sum(amount))
Or as suggested by #hrbrmstr,
count(df, name, wt=amount)
data
set.seed(24)
df <- data.frame(name=sample(LETTERS[1:5], 25, replace=TRUE),
amount=sample(150,25, replace=TRUE))

Resources