Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
Imagine having the following table called DT
ID Path Status
AA XXX Completed
AB XXX Completed
AC XXX In progress
AD XYY Completed
AE XYY In progress
I want to group this table by Path and count (1) the amount of unique ID's and (2) the amount of unique ID's with the status 'Completed' (there are no duplicate ID's in the original table DT)
I tried the following code:
DT_Grouped <- DT %>%
group_by(Path) %>%
summarise(CountComplete = sum(DT$Status == "Completed"), Count=n())
This gives the following result:
Path CountComplete Count
XXX 3 3
XYY 3 2
CountComplete always gives the total amount of unique ID's with the status complete; not grouped by path. Which is logical as the calculation is referring to the original table and not the grouped dataset.
How should I adapt the code in order for CountComplete to group according to Path?
Thanks in advance for the help.
The reason is that we are getting the full dataset column with DT$ instead of he 'Status' values within each group
sum(DT$Status == "Completed")
^^^^
it should be
library(dplyr)
DT_Grouped <- DT %>%
group_by(Path) %>%
summarise(CountComplete = sum(Status == "Completed"), Count=n())
DT_Grouped
# A tibble: 2 x 3
# Path CountComplete Count
# <chr> <int> <int>
#1 XXX 2 3
#2 XYY 1 2
If it is a data.table, the corresponding method would be
library(data.table)
setDT(DT)[, .(CountComplete = sum(Status == "Completed"), Count = .N), by = Path]
data
DT <- structure(list(ID = c("AA", "AB", "AC", "AD", "AE"), Path = c("XXX",
"XXX", "XXX", "XYY", "XYY"), Status = c("Completed", "Completed",
"In progress", "Completed", "In progress")),
class = "data.frame", row.names = c(NA,
-5L))
Related
I am rather new to R, and I have been trying to write a code that will find and concatenate multiple choice question responses when the data is in long format. The data needs to be pivoted wide, but cannot without resolving the duplicate IDs that result from these multiple choice responses. I want to combine the extra multiple choice response to the distinct ID number, so that it would look like: "affiliation 1, affiliation 2" for the individual respondent, in long format. I would prefer to not use row numbers, as the data is recollected on a monthly basis and row numbers may not stay constant. I need to identify the duplicate ID due to the multiple choice question, and attach its secondary answer to the other response.
I have tried various versions of aggregate, grouping and summarizing, filter, unique, and distinct, but haven't been able to solve the problem.
Here is an example of the data:
ID Question Response
1 question 1 affiliation x
1 question 2 course 1
2 question 1 affiliation y
2 question 2 course 1
3 question 1 affiliation x
3 question 1 affiliation z
4 question 1 affiliation y
I want the data to look like this:
ID Question Response Text
1 question 1 affiliation x
1 question 2 course 1
2 question 1 affiliation y
2 question 2 course 1
3 question 1 affiliation x, affiliation z
4 question 1 affiliation y
so that it is prepared for pivot_wider.
Some example code that I've tried:
library(tidyverse)
course1 <- all_surveys %>%
filter(`Survey Title`=="course 1") %>%
aggregate("ID" ~ "Response Text", by(`User ID`, Question), FUN=sum) %>%
pivot_wider(id_cols = c("ID", `Response Date`),
names_from = "Question",
values_from = "Response Text") %>%
select([questions to be retained from Question])
I have also tried
group_by(question_new, `User ID`) %>%
summarize(text = str_c("Response Text", collapse = ", "))
as well as
aggregate(c[("Response Text" ~ "question_new")],
by = list(`User ID` = `User ID`, `Response Date` = `Response Date`),
function(x) unique(na.omit(x)))
and a bunch of different iterations of the above.
Thank you very much, in advance!
We can try to pivot_wider using values_fn = toString:
df %>% pivot_wider(names_from = Question,
values_from = response,
values_fn = toString)
small minimal example
df<-tibble(ID = c(1,1,2,2), Question = c('question 1', 'question 2', 'question 1', 'question 1'), response = c('affiliation x', 'course 1', 'affiliation x', 'affiliation y'))
# A tibble: 4 × 3
ID Question response
<dbl> <chr> <chr>
1 1 question 1 affiliation x
2 1 question 2 course 1
3 2 question 1 affiliation x
4 2 question 1 affiliation y
output
# A tibble: 2 × 3
ID `question 1` `question 2`
<dbl> <chr> <chr>
1 1 affiliation x course 1
2 2 affiliation x, affiliation y NA
I am a newbie in programming with R, and this is my first question ever here on Stackoverflow.
Let's say that I have a data frame with 4 columns:
(1) Individual ID (numeric);
(2) Morality of the individual (factor);
(3) The city (factor);
(4) Numbers of books possessed (numeric).
Person_ID <- c(1,2,3,4,5,6,7,8,9,10)
Morality <- c("Bad guy","Bad guy","Bad guy","Bad guy","Bad guy",
"Good guy","Good guy","Good guy","Good guy","Good guy")
City <- c("NiceCity", "UglyCity", "NiceCity", "UglyCity", "NiceCity",
"UglyCity", "NiceCity", "UglyCity", "NiceCity", "UglyCity")
Books <- c(0,3,6,9,12,15,18,21,24,27)
mydf <- data.frame(Person_ID, City, Morality, Books)
I am using this code in order to get the counts by each category for the variable Morality in each city:
mycounts<-melt(mydf,
idvars = c("City"),
measure.vars = c("Morality"))%>%
dcast(City~variable+value,
value.var="value",fill=0,fun.aggregate=length)
The code gives this kind of table with the sums:
names(mycounts)<-gsub("Morality_","",names(mycounts))
mycounts
City Bad guy Good guy
1 NiceCity 3 2
2 UglyCity 2 3
I wonder if there is a similar way to use dcast() for numerical variables (inside the same script) e.g. in order to get a sum the Books possessed by all individuals living in each city:
#> City Bad guy Good guy Books
#>1 NiceCity 3 2 [Total number of books in NiceCity]
#>2 UglyCity 2 3 [Total number of books in UglyCity]
Do you mean something like this:
mydf %>%
melt(
idvars = c("City"),
measure.vars = c("Morality")
) %>%
dcast(
City ~ variable + value,
value.var = "Books",
fill = 0,
fun.aggregate = sum
)
#> City Morality_Bad guy Morality_Good guy
#> 1 NiceCity 18 42
#> 2 UglyCity 12 63
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
When I try to seperate a column with (long) string values:
df <- tbl_df(c("Indian | Londen", "Greek | Amsterdam", "Hamburger and BBQ | Paris du Nord"))
df <- separate(df, col = value, into = c("var1","var2"), sep = " | ")
i get a warning message which says that there are too many values at three locations and when i look the altered dataframe i don't get the desired df
# A tibble: 3 × 2
var1 var2
* <chr> <chr>
1 Indian |
2 Greek |
3 Hamburger and
It seems to split at each space, does anyone know a way to work around this? var2 should contain the city or area name, thanks.
separate interpret sep parameter as regular expression when it is character. So you need to escape | which is a special character (or) in regex, pattern | (whitespace or whitespace) is the same as a whitespace in regex, which is why your strings are split by space:
df <- separate(df, col = value, into = c("var1","var2"), sep = " \\| ")
df
# A tibble: 3 × 2
# var1 var2
#* <chr> <chr>
#1 Indian Londen
#2 Greek Amsterdam
#3 Hamburger and BBQ Paris du Nord
Just do :
Since pipe has a special meaning in regex , it means "OR" ,hence you have to escape it first. you can also use it under character class [|] to get the same result
df1 <- separate(df, col = value, into = c("var1","var2"), sep = "\\|")
OR
df1 <- separate(df, col = value, into = c("var1","var2"), sep = "[|]")
BASE R way:
dfx<- data.frame(do.call("rbind",strsplit(df$value,split="\\|")))
Output:
> dfx
X1 X2
1 Indian Londen
2 Greek Amsterdam
3 Hamburger and BBQ Paris du Nord
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I'm looking for a suggestion: I'm trying to re-order/group a data frame by a variable value.
For example transforming a native data frame VARS
into something like this:
So far, I've tried for-loops with cbind/rbind depending on how the data is organized, aggregate, apply, etc. But there's always some wrinkle that prevents the methods from working.
I appreciate any help!
First I'd like to point out reading up on how to give a usefule example, along with the raw data using dput will go a long way to getting feedback. That said:
For the dataset you showed:
A <- structure(list(Var_Typer = c("cnt", "Cont", "cnt", "cnt", "fact",
"fact", "Char", "Char", "Cont"), R_FIELD = c("Gender", "Age",
"Activation", "WakeUpStroke", "ArMode", "PreHospActiv", "EMTag",
"EMTdx", "EMTlams")), .Names = c("Var_Typer", "R_FIELD"), row.names = c(NA,
-9L), class = "data.frame")
> head(A)
Var_Typer R_FIELD
1 cnt Gender
2 Cont Age
3 cnt Activation
4 cnt WakeUpStroke
5 fact ArMode
6 fact PreHospActiv
B <- apply(
dcast(A, Var_Typer ~ R_FIELD, value.var = 'R_FIELD'), 1, function(i){
ndf <- as.data.frame(rbind(i[complete.cases(i)]))
colnames(ndf) <- c('Class',1:(length(ndf)-1))
ndf
}) %>% rbind.pages %>% (function(x){
x[is.na(x)] <- "..."
x
})
Class 1 2 3
1 Char EMTag EMTdx ...
2 cnt Activation Gender WakeUpStroke
3 Cont Age EMTlams ...
4 fact ArMode PreHospActiv ...
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I pulled in a large .csv file with columns such as "paid" and "description"
I am trying to figure out how to only pull the "paid" column when the "description" is Bronchitis or some other illness that is in the column.
This would be like doing a pivot table in Excel and filtering only on a certain Description and receiving all of the individual paid rows.
Paid Description val
$500 Bronchitis 1.5
$3,250 'Complication of Pregnancy/Childbirth' 2.2
$5,400 Burns 3.3
$20.50 Bronchitis 4.4
$24 Ashtma 1.2
If your data is
paid <- c(300,200,150)
desc <- c("bronchitis","headache","broken.leg")
df <- data.frame(paid, desc)
Try
df[desc=="bronchitis",c("paid")]
# the argument ahead of the comma filters the row,
# the argument after the comma refers to the column
# > df[desc=="bronchitis",c("paid")]
# [1] 300
or
library(dplyr)
df %>% filter(desc=="bronchitis") %>% select(paid)
# filter refers to the row condition
# select filters the output column(s)
# > df %>% filter(desc=="bronchitis") %>% select(paid)
# paid
# 1 300
Using data.table
library(data.table)#v1.9.5+
setkey(setDT(df1), Description)[.('Bronchitis'),'Paid', with=FALSE]
# Paid
#1: $500
#2: $20.50
data
df1 <- structure(list(ex = c("Description", "Bronchitis",
"Complication of Pregnancy/Childbirth",
"Burns", "Bronchitis", "Ashtma"), data = c("val", "1.5", "2.2",
"3.3", "4.4", "1.2")), .Names = c("ex", "data"), class = "data.frame",
row.names = c("Paid", "$500", "$3,250", "$5,400", "$20.50", "$24"))