R - .csv file - extract variables [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I pulled in a large .csv file with columns such as "paid" and "description"
I am trying to figure out how to only pull the "paid" column when the "description" is Bronchitis or some other illness that is in the column.
This would be like doing a pivot table in Excel and filtering only on a certain Description and receiving all of the individual paid rows.
Paid Description val
$500 Bronchitis 1.5
$3,250 'Complication of Pregnancy/Childbirth' 2.2
$5,400 Burns 3.3
$20.50 Bronchitis 4.4
$24 Ashtma 1.2

If your data is
paid <- c(300,200,150)
desc <- c("bronchitis","headache","broken.leg")
df <- data.frame(paid, desc)
Try
df[desc=="bronchitis",c("paid")]
# the argument ahead of the comma filters the row,
# the argument after the comma refers to the column
# > df[desc=="bronchitis",c("paid")]
# [1] 300
or
library(dplyr)
df %>% filter(desc=="bronchitis") %>% select(paid)
# filter refers to the row condition
# select filters the output column(s)
# > df %>% filter(desc=="bronchitis") %>% select(paid)
# paid
# 1 300

Using data.table
library(data.table)#v1.9.5+
setkey(setDT(df1), Description)[.('Bronchitis'),'Paid', with=FALSE]
# Paid
#1: $500
#2: $20.50
data
df1 <- structure(list(ex = c("Description", "Bronchitis",
"Complication of Pregnancy/Childbirth",
"Burns", "Bronchitis", "Ashtma"), data = c("val", "1.5", "2.2",
"3.3", "4.4", "1.2")), .Names = c("ex", "data"), class = "data.frame",
row.names = c("Paid", "$500", "$3,250", "$5,400", "$20.50", "$24"))

Related

Group and summarize data with countif calculation [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
Imagine having the following table called DT
ID Path Status
AA XXX Completed
AB XXX Completed
AC XXX In progress
AD XYY Completed
AE XYY In progress
I want to group this table by Path and count (1) the amount of unique ID's and (2) the amount of unique ID's with the status 'Completed' (there are no duplicate ID's in the original table DT)
I tried the following code:
DT_Grouped <- DT %>%
group_by(Path) %>%
summarise(CountComplete = sum(DT$Status == "Completed"), Count=n())
This gives the following result:
Path CountComplete Count
XXX 3 3
XYY 3 2
CountComplete always gives the total amount of unique ID's with the status complete; not grouped by path. Which is logical as the calculation is referring to the original table and not the grouped dataset.
How should I adapt the code in order for CountComplete to group according to Path?
Thanks in advance for the help.
The reason is that we are getting the full dataset column with DT$ instead of he 'Status' values within each group
sum(DT$Status == "Completed")
^^^^
it should be
library(dplyr)
DT_Grouped <- DT %>%
group_by(Path) %>%
summarise(CountComplete = sum(Status == "Completed"), Count=n())
DT_Grouped
# A tibble: 2 x 3
# Path CountComplete Count
# <chr> <int> <int>
#1 XXX 2 3
#2 XYY 1 2
If it is a data.table, the corresponding method would be
library(data.table)
setDT(DT)[, .(CountComplete = sum(Status == "Completed"), Count = .N), by = Path]
data
DT <- structure(list(ID = c("AA", "AB", "AC", "AD", "AE"), Path = c("XXX",
"XXX", "XXX", "XYY", "XYY"), Status = c("Completed", "Completed",
"In progress", "Completed", "In progress")),
class = "data.frame", row.names = c(NA,
-5L))

Regrouping categorical values in a dataframe [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I'm looking for a suggestion: I'm trying to re-order/group a data frame by a variable value.
For example transforming a native data frame VARS
into something like this:
So far, I've tried for-loops with cbind/rbind depending on how the data is organized, aggregate, apply, etc. But there's always some wrinkle that prevents the methods from working.
I appreciate any help!
First I'd like to point out reading up on how to give a usefule example, along with the raw data using dput will go a long way to getting feedback. That said:
For the dataset you showed:
A <- structure(list(Var_Typer = c("cnt", "Cont", "cnt", "cnt", "fact",
"fact", "Char", "Char", "Cont"), R_FIELD = c("Gender", "Age",
"Activation", "WakeUpStroke", "ArMode", "PreHospActiv", "EMTag",
"EMTdx", "EMTlams")), .Names = c("Var_Typer", "R_FIELD"), row.names = c(NA,
-9L), class = "data.frame")
> head(A)
Var_Typer R_FIELD
1 cnt Gender
2 Cont Age
3 cnt Activation
4 cnt WakeUpStroke
5 fact ArMode
6 fact PreHospActiv
B <- apply(
dcast(A, Var_Typer ~ R_FIELD, value.var = 'R_FIELD'), 1, function(i){
ndf <- as.data.frame(rbind(i[complete.cases(i)]))
colnames(ndf) <- c('Class',1:(length(ndf)-1))
ndf
}) %>% rbind.pages %>% (function(x){
x[is.na(x)] <- "..."
x
})
Class 1 2 3
1 Char EMTag EMTdx ...
2 cnt Activation Gender WakeUpStroke
3 Cont Age EMTlams ...
4 fact ArMode PreHospActiv ...

I want to merge data in a single column according to the unique bill no in R [duplicate]

This question already has answers here:
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Closed 6 years ago.
The data is like this
1. CM-00063262-15 EARRINGS
2. CM-00063262-15 EARRINGS
3. CM-00063262-15 NECKLACE
4. CM-00063262-15 WALLET-WOMEN'S
5. CM-00063263-15 SLACKS
6. CM-00063264-15 BATH TUB
7. CM-00063264-15 GIFT SET
I want output like this
1. CM-00063262-15 EARRINGS,EARRINGS,NECKLACE,WALLET-WOMEN'S
2. CM-00063263-15 SLACKS
3. CM-00063264-15 BATH TUB,GIFT SET
Thank you in advance
Use this
aggregate(data=df,V2~V1,FUN=paste)
We need to extract the bill number and use it as grouping variable,
library(data.table)
setDT(df1)[, toString(unique(sub("\\S+", "", Col))),
by = .(grp = sub("\\s+.*", "", Col))]
# grp V1
#1: CM-00063262-15 EARRINGS, NECKLACE, WALLET-WOMEN'S
#2: CM-00063263-15 SLACKS
#3: CM-00063264-15 BATH TUB, GIFT SET
If the OP's dataset have two columns instead of one, it is much easier
setDT(df1)[, toString(unique(Col2)), by = Col1]
data
df1 <- structure(list(Col = c("CM-00063262-15 EARRINGS",
"CM-00063262-15 EARRINGS",
"CM-00063262-15 NECKLACE", "CM-00063262-15 WALLET-WOMEN'S", "CM-00063263-15 SLACKS",
"CM-00063264-15 BATH TUB", "CM-00063264-15 GIFT SET")),
.Names = "Col", class = "data.frame", row.names = c(NA, -7L))

R reshape wide to long data [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have a dataframe like so:
[1] "drugevent" "prr" "prr_lowerCI" "prr_upperCI" "EBGM"
[6] "EBG_lowerCI" "EBGM_upperCI" "strata.coded" "strata" "Reference"
And I want to make a plot for each drugevent, using ggplot. In order to do so I need to format my DF like so:
[1] "drug", "event", "measurement"(prr or EBGM), "lowerCI"(for coresponding measurement), upperCI, strata
But despite the many posts on SO, or R tutorials I was not able to corectly reshape the data. In my last try I added an Id like so:
mutate(DF, count=1:n())
melted the data
melt(DF, id.vars="count")
then I made several DFs subsetting the values of interest
subset(melted, variable in c("prr","EBGM"))
then the upper and lower confidence intervals, strata and drug event,
but when I merged them like so:
merge(measurement, lowerCI, by="count")
in the end I had duplicated values with 4 rows for each count.
The code is messy and the result is wrong. Could you please help me with this?
Edit exampples:
initial data:
drugevent prr prr_lowerCI prr_upperCI
1 CLARITHROMYCIN-Erythema Multiforme 1.3539930 0.1903270 2.517659
2 CLARITHROMYCIN-Erythema Multiforme 1.7741342 0.6647390 2.883529
EBGM EBG_lowerCI EBGM_upperCI strata count
1 0.9003325 0.2128934 2.772558 Infants 1
2 1.4471096 0.5997188 3.053965 Children 2
the desired result:
measurement value upperCI strata drug
1 prr 1.353992979 2.51765895 Infants CLARITHROMYCIN
2 EBGM 0.9009 2.77 Infants CLARITHROMYCIN
reaction lowerCI
1 Erythema Multiforme 2.51765895
2 Erythema Multiforme 1.447
From what I understand you want a long format of the original data frame split based on prr or ebgm
dfPRR <- cbind(df[, !grepl("EBG", colnames(df))], measurement="prr")
colnames(dfPRR)[2:4] <- c("value", "lowerCI", "upperCI")
dfEBGM <- cbind(df[, !grepl("prr", colnames(df))], measurement="EBGM")
colnames(dfEBGM)[2:4] <- c("value", "lowerCI", "upperCI")
rbind(dfPRR, dfEBGM)
Data used
structure(list(drugevent = structure(c(1L, 1L), .Label = "CLARITHROMYCIN-Erythema Multiforme", class = "factor"),
prr = c(1.353993, 1.7741342), prr_lowerCI = c(0.190327, 0.664739
), prr_upperCI = c(2.517659, 2.883529), EBGM = c(0.9003325,
1.4471096), EBG_lowerCI = c(0.2128934, 0.5997188), EBGM_upperCI = c(2.772558,
3.053965), strata = structure(1:2, .Label = c(" Infants",
" Children"), class = "factor"), count = 1:2), .Names = c("drugevent",
"prr", "prr_lowerCI", "prr_upperCI", "EBGM", "EBG_lowerCI", "EBGM_upperCI",
"strata", "count"), class = "data.frame", row.names = c(NA, -2L
))

How to Plot heat map for the given dataset using R [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
How can i plot heatmap for the following data with ID in Y-axis and its corresponding names in the X axis
ID Name1 Name2 Name3 Name4 Name5 Name6
Gp2 2,86148 7,86926 5,00778 3,6586 5,66554 2,00694
Cldn10 3,30779 8,03876 4,73097 4,4237 7,96975 3,54605
Cldn10 4,261 8,7293 4,4683 4,3483 9,03017 4,68187
read.table allows you to specify that the data contains a header and row names:
x = read.table('filename', header = TRUE, row.names = 1)
heatmap(as.matrix(x))
This assumes that the data does not contain ',' as a thousands separator. If you want to use , as decimal point, just specify the appropriate option:
x = read.table('filename', header = TRUE, row.names = 1, dec = ',')
If your input data contains commas as thousands separators, you need to remove them first:
raw_data = readLines('filename')
raw_data = gsub('(\\d+),(\\d+)', '\\1\\2', raw_data)
x = read.table(text = raw_data, header = TRUE, row.names = 1)

Resources