Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I have a csv file that has a column with multiple words in each cell. I wonder if there's any R function to move words in each cell to individual cells.
The following are data in two cells in the dataset:
arecapalm,betelnut,konkan,nature,traveldiaries,mirrorlessframes
passangerstories,chakarmanee,atranginikhil,maharashtra,india
Thanks. Any help appreciated.
Chamil
Let's assume this data.frame:
require(dplyr)
require(tidyr)
df<-data.frame(id=1:2, words=c("arecapalm,betelnut,konkan,nature,traveldiaries,mirrorlessframes","passangerstories,chakarmanee,atranginikhil,maharashtra,india"))
df
# id words
#1 1 arecapalm,betelnut,konkan,nature,traveldiaries,mirrorlessframes
#2 2 passangerstories,chakarmanee,atranginikhil,maharashtra,india
Then we can run this using dplyr and tidyr to break down the words cells into multiple columns:
df %>% separate_rows(words) %>%
group_by(id) %>%
mutate(wordid=row_number()) %>%
spread(wordid,words,sep=".")
# A tibble: 2 x 7
# Groups: id [2]
id wordid.1 wordid.2 wordid.3 wordid.4 wordid.5 wordid.6
<int> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 arecapalm betelnut konkan nature traveldiaries mirrorlessframes
2 2 passangerstories chakarmanee atranginikhil maharashtra india NA
Related
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
Input:
library("UsingR")
library("dplyr")
data("kid.weights")
attach(Kid.weights)
df <- data.frame(gender[1:6],
weight[1:6],
height[1:6],
Kg=weight[1:6] * 0.453592,
M=height[1:6] * 0.0254)
df
df %>%
group_by(df$gender) %>%
summarise(mean(df$weight))
Output:
> df %>%
+ group_by(df$gender) %>%
+ summarise(mean(df$weight))
# A tibble: 2 x 2
`df$gender` `mean(df$weight)`
<fct> <dbl>
1 F 58.3
2 M 58.3
I want to make data frame for mean(weight(kg)) or median(weight(kg)) to gender.
but it is not working. looks like.
how to it solve?
Once you use %>% you don't need to reference to df anymore:
df %>%
group_by(gender) %>%
summarise(mean(weight))
%>% is a pipeline which makes you accessible to the columns directly, on each group, df$gender and df$weight would give you the whole column.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a data frame as below:
df <- data.frame(
id= c(1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3),
name= c("john","bob","bob","bob","bob","bob","leo","bob","bob","max","mike","mike","mike","mike","mike","mike","mike","Ronaldo","mike")
)
I want to count how many times a particular value is present in name column back to back group by id
what I expect is as below:
expected_output<-data.frame(
id=c(2,3),
column_name="name",
value=c("bob","Mike"),
count=c(5,7))
Thanks for helping in advance
If you want to select the maximum consecutive name for each id you can first count consecutive names using data.table::rleid and keep only the max value in each id.
library(dplyr)
df %>%
count(id, name, cons = data.table::rleid(name), name = 'count') %>%
group_by(id) %>%
slice(which.max(count)) %>%
select(-cons)
# id name count
# <dbl> <chr> <int>
#1 1 john 1
#2 2 bob 5
#3 3 mike 7
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
Use str_detect to find the ID of the novel Pride and Prejudice.
How many different ID numbers are returned?
library(tidyverse)
library(gutenbergr)
library(tidytext)
options(digits = 3)
An option is to filter the rows based on the substring "Pride and Prejudice" in the 'title' column and get the number of distinct 'gutenberg_id' s with n_distinct. If the ids are alreaydy unique, just do summarise(n = n())
library(gutenbergr)
library(stringr)
gutenberg_metadata %>%
filter(str_detect(title, "Pride and Prejudice")) %>%
summarise(n = n_distinct(gutenberg_id))
# A tibble: 1 x 1
# n
# <int>
#1 6
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I have two large data frames with same col names and same row names in the same order. Is there an R function to add element wise the two data frames together ?
Element-wise addition is what + does with most objects:
> d <- data.frame(x=1:3, y=4:6)
> d
x y
1 1 4
2 2 5
3 3 6
> d2 <- data.frame(z=4:6, w=6:4)
> d + d2
x y
1 5 10
2 7 10
3 9 10
The names will come from the first data frame, and order of the columns in the two sets does matter. As yours are in the same order, you should be fine.
You'll get an error if the number of rows or columns differ.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I am new to R programming and learnt lots of R functions but not able to comprehend the mutate the data frame. Since I am pursuing a course Introduction to Probability and Data at Coursera where I am not able to solve a question, Recently I came across one of the exercises where it was asked to the mutate the data frame, which is as follows
Suppose you define a flight to be "on time" if it gets to the destination on time or earlier than expected, regardless of any departure delays. Mutate the data frame to create a new variable called arr_type with levels "on time" and "delayed" based on this definition. Then, determine the on-time arrival percentage based on whether the flight departed on time or not. What proportion of flights that
were "delayed" departing arrive "on time"?
Please guide me and explain how to comprehend this clause?
Here's how it works:
(df <- data.frame(group=gl(2,2), value=1:4))
# group value
# 1 1 1
# 2 1 2
# 3 2 3
# 4 2 4
library(dplyr)
df %>% group_by(group) %>% mutate(avg=mean(value))
# Source: local data frame [4 x 3]
# Groups: group [2]
#
# group value avg
# (fctr) (int) (dbl)
# 1 1 1 1.5
# 2 1 2 1.5
# 3 2 3 3.5
# 4 2 4 3.5
You can also group by several variables, like group_by(plane, flight). So you should be able to get where you want easily.