Multiple entries with same patient id: how to make it one entry - r

I have multiple entries in my data from same patient ID, I wanted to make it as one entry. What are my possible options? Here is the data -
PtID WorryHighBGNow
40 5
40 1
40 2
70 3
101 4
263 2
263 5
263 3
143 4
245 4
137 3
219 2
219 3
219 4
3 3
264 3
264 3
98 1
200 3
105 3
111 4
149 3
I want to create a visualization like below out of this data, where on y axis I want to see columns of my table and on X I want to se ranking 1,2,3,4,5

If x is your data frame you can try this
d <- setDT(x)[, list(WorryHighBGNow = paste(WorryHighBGNow, collapse = ', ')),by = c('PtID')]
It will give result like
PtID WorryHighBGNow
40 5,1,2
70 3
101 4
263 2,5,3
And so on.

Not really sure that is what you need. I've just tried to mimic visualization you linked to the question, as close as it possible.
library(tidyverse)
dat %>%
mutate_all(factor) %>%
count(WorryHighBGNow, ) %>%
mutate(percentage = round(n / sum(n) * 100, 1)) %>%
mutate(WorryHighBGNow = reorder(WorryHighBGNow, n)) %>%
ggplot(aes(x = WorryHighBGNow, y = percentage,
fill = WorryHighBGNow, label = paste(percentage, '%'))) +
geom_col() +
geom_text(hjust = -.1, fontface = 'bold') +
scale_fill_brewer(type = 'qual', breaks = 1:5) +
coord_flip() +
expand_limits(y = 50) +
theme_void() +
theme(legend.position = 'bottom')
Data:
dat <- structure(
list(
PtID = c(40L, 40L, 40L, 70L, 101L, 263L, 263L, 263L, 143L, 245L, 137L, 219L,
219L, 219L, 3L, 264L, 264L, 98L, 200L, 105L, 111L, 149L),
WorryHighBGNow = c(5L, 1L, 2L, 3L, 4L, 2L, 5L, 3L, 4L, 4L, 3L, 2L, 3L, 4L,
3L, 3L, 3L, 1L, 3L, 3L, 4L, 3L)
),
class = "data.frame", row.names = c(NA, -22L)
)

Related

how do I Columns to x-axis

Please help i am trying to make all then columns into x-axis and the make side by side bars later by date
this is my data i really tried but to no avail
dateVisited hh_visited hh_ind_confirmed new_in_mig out_mig deaths HOH_death Preg_Obs Preg_Outcome child_forms
102 2020-07-21 292 1170 131 86 18 7 3 14 79
103 2020-07-22 400 1553 115 100 25 10 11 18 107
104 2020-07-23 381 1458 103 67 21 9 5 23 87
105 2020-07-24 345 1379 90 98 12 4 3 20 89
106 2020-07-25 436 1585 131 119 13 2 7 20 117
107 2020-07-26 0 0 0 0 0 0 0
0 0
I think you're looking for something like this:
library(tidyr)
library(ggplot2)
df %>%
pivot_longer(cols = -1) %>%
ggplot(aes(name, value)) +
geom_col(aes(fill = dateVisited), width = 0.6,
position = position_dodge(width = 0.8)) +
guides(x = guide_axis(angle = 45))
Reproducible Data from question
df <- structure(list(dateVisited = structure(1:6, .Label = c("2020-07-21",
"2020-07-22", "2020-07-23", "2020-07-24", "2020-07-25", "2020-07-26"
), class = "factor"), hh_visited = c(292L, 400L, 381L, 345L,
436L, 0L), hh_ind_confirmed = c(1170L, 1553L, 1458L, 1379L, 1585L,
0L), new_in_mig = c(131L, 115L, 103L, 90L, 131L, 0L), out_mig = c(86L,
100L, 67L, 98L, 119L, 0L), deaths = c(18L, 25L, 21L, 12L, 13L,
0L), HOH_death = c(7L, 10L, 9L, 4L, 2L, 0L), Preg_Obs = c(3L,
11L, 5L, 3L, 7L, 0L), Preg_Outcome = c(14L, 18L, 23L, 20L, 20L,
0L), child_forms = c(79L, 107L, 87L, 89L, 117L, 0L)), class = "data.frame",
row.names = c("102", "103", "104", "105", "106", "107"))
Your data cannot be used easily since it requires time to format it into something that could ingested by R. Here is something to get you started. I made up a hypothetical dataframe of 4 columns that resemble your data, use the function melt from reshape2 package to format the data such that it is understandable by ggplot2 package, and use ggplot2 package to generate a bar plot.
df <- data.frame(dateVisited = seq(as.Date('2019-01-01'), as.Date('2019-12-31'), 30),
hh_visited = runif(13, 0, 436),
hh_ind_confirmed = runif(13, 0, 1585),
new_in_mig = runif(13, 0, 131))
df <- reshape2::melt(df, id.vars = 'dateVisited')
ggplot(data = df, aes(x = dateVisited, y = value, fill = variable))+
geom_col(position = 'dodge')

How to aggregate a data frame by columns and rows?

I have the following data set:
Class Total AC Final_Coverage
A 1000 1 55
A 1000 2 66
B 1000 1 77
A 1000 3 88
B 1000 2 99
C 1000 1 11
B 1000 3 12
B 1000 4 13
B 1000 5 22
C 1000 2 33
C 1000 3 44
C 1000 4 55
C 1000 5 102
A 1000 4 105
A 1000 5 109
I would like to get the average of the AC and the Final_Coverage for the first three rows of each class. Then, I want to store the average values along with the class name in a new dataframe. To do that, I did the following:
dataset <- read_csv("/home/ad/Desktop/testt.csv")
classes <- unique(dataset$Class)
new_data <- data.frame(Class = character(0), AC = numeric(0), Coverage = numeric(0))
for(class in classes){
new_data$Class <- class
dataClass <- subset(dataset, Class == class)
tenRows <- dataClass[1:3,]
coverageMean <- mean(tenRows$Final_Coverage)
acMean <- mean(tenRows$AC)
new_data$Coverage <- coverageMean
new_data$AC <- acMean
}
Everything works fine except entering the average value into the new_data frame. I get the following error:
Error in `$<-.data.frame`(`*tmp*`, "Class", value = "A") :
replacement has 1 row, data has 0
Do you know how to solve this?
This should get you the new dataframe by using dplyr.
dataset %>% group_by(Class) %>% slice(1:3) %>% summarise(AC= mean(AC),
Coverage= mean(Final_Coverage))
In your method the error is that you initiated your new dataframe with 0 rows and try to assign a single value to it. This is reflected by the error. You want to replace one row to a dataframe with 0 rows. This would work, though:
new_data <- data.frame(Class = classes, AC = NA, Coverage = NA)
for(class in classes){
new_data$Class <- class
dataClass <- subset(dataset, Class == class)
tenRows <- dataClass[1:3,]
coverageMean <- mean(tenRows$Final_Coverage)
acMean <- mean(tenRows$AC)
new_data$Coverage[classes == class] <- coverageMean
new_data$AC[classes == class] <- acMean
}
You could look into aggregate().
> aggregate(df1[df1$AC <= 3, 3:4], by=list(Class=df1[df1$AC <= 3, 1]), FUN=mean)
Class AC Final_Coverage
1 A 2 69.66667
2 B 2 62.66667
3 C 2 29.33333
DATA
df1 <- structure(list(Class = structure(c(1L, 1L, 2L, 1L, 2L, 3L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 1L, 1L), .Label = c("A", "B", "C"), class = "factor"),
Total = c(1000L, 1000L, 1000L, 1000L, 1000L, 1000L, 1000L,
1000L, 1000L, 1000L, 1000L, 1000L, 1000L, 1000L, 1000L),
AC = c(1L, 2L, 1L, 3L, 2L, 1L, 3L, 4L, 5L, 2L, 3L, 4L, 5L,
4L, 5L), Final_Coverage = c(55L, 66L, 77L, 88L, 99L, 11L,
12L, 13L, 22L, 33L, 44L, 55L, 102L, 105L, 109L)), class = "data.frame", row.names = c(NA,
-15L))

Facing difficulty in convert a data.frame to time series object in R?

I am a novice in R language. I am having text file separated by tab available with sales data for each day. The format will be like product-id, day0, day1, day2, day3 and so on. The part of the input file given below
productid 0 1 2 3 4 5 6
1 53 40 37 45 69 105 62
4 0 0 2 4 0 8 0
5 57 133 60 126 90 87 107
6 108 130 143 92 88 101 66
10 0 0 2 0 4 0 36
11 17 22 16 15 45 32 36
I used code below to read a file
pdInfo <- read.csv("products.txt",header = TRUE, sep="\t")
This allows to read the entire file and variable x is a data frame. I would like to change data.frame x to time series object in order for the further processing.On a stationary test, Dickey–Fuller test (ADF) it shows an error. I tried the below code
x <- ts(data.matrix(pdInfo),frequency = 1)
adf <- adf.test(x)
error: Error in adf.test(x) : x is not a vector or univariate time series
Thanks in advance for the suggestions
In R, time series are usually in the form "one row per date", where your data is in the form "one column per date". You probably need to transpose the data before you convert to a ts object.
First transpose it:
y= t(pdInfo)
Then make the top row (being the product id's) into the row titles
colnames(y) = y[1,]
y= y[-1,] # to drop the first row
This should work:
x = ts(y, frequency = 1)
library(purrr)
library(dplyr)
library(tidyr)
library(tseries)
# create the data
df <- structure(list(productid = c(1L, 4L, 5L, 6L, 10L, 11L),
X0 = c(53L, 0L, 57L, 108L, 0L, 17L),
X1 = c(40L, 0L, 133L, 130L, 0L, 22L),
X2 = c(37L, 2L, 60L, 143L, 2L, 16L),
X3 = c(45L, 4L, 126L, 92L, 0L, 15L),
X4 = c(69L, 0L, 90L, 88L, 4L, 45L),
X5 = c(105L, 8L, 87L, 101L, 0L, 32L),
X6 = c(62L, 0L, 107L, 66L, 36L, 36L)),
.Names = c("productid", "0", "1", "2", "3", "4", "5", "6"),
class = "data.frame", row.names = c(NA, -6L))
# apply adf.test to each productid and return p.value
adfTest <- df %>% gather(key = day, value = sales, -productid) %>%
arrange(productid, day) %>%
group_by(productid) %>%
nest() %>%
mutate(adf = data %>% map(., ~adf.test(as.ts(.$sales)))
,adf.p.value = adf %>% map_dbl(., "p.value")) %>%
select(productid, adf.p.value)

how to merge 4 graphs which has different x-axis and y-axis intervals into one plot

I have 4 graphs, two of them are shown below with the data frame and resulting graph
Here is my dataframe (h1):
h2 <- structure(list(Tool.Module = structure(c(4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("", "M_AUB01",
"M_ETR01", "M_FRA01", "M_FRA01_01", "M_FRA01_02", "M_FRA01_03",
"M_FRA01_04", "M_KPR01_00", "M_KPR02_00", "M_LAM01", "M_LAM01_01",
"M_LAM02", "M_LAM02_01", "M_LAY01", "M_LOT01_01", "M_LOT01_02_1",
"M_LOT01_02_2", "M_LOT01_03_1", "M_LOT01_03_2", "M_LOT01_04",
"M_TAB01_1", "M_TAB01_2"), class = "factor"), end1 = structure(c(1428984210,
1428984310, 1428985632, 1428985772, 1428985881, 1428985990, 1428986230,
1428986332, 1428986460, 1428986580, 1428986700, 1428986780, 1428986923,
1428987020, 1428988400), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("Tool.Module",
"end1"), row.names = c(7L, 37L, 102L, 111L, 118L, 123L, 140L,
147L, 156L, 167L, 174L, 180L, 188L, 191L, 280L), class = "data.frame")
It looks like this:
> h1
Tool.Module end1
7 M_FRA01 2015-04-14 06:03:30
37 M_FRA01 2015-04-14 06:05:10
102 M_FRA01 2015-04-14 06:27:12
111 M_FRA01 2015-04-14 06:29:32
118 M_FRA01 2015-04-14 06:31:21
123 M_FRA01 2015-04-14 06:33:10
140 M_FRA01 2015-04-14 06:37:10
147 M_FRA01 2015-04-14 06:38:52
156 M_FRA01 2015-04-14 06:41:00
167 M_FRA01 2015-04-14 06:43:00
174 M_FRA01 2015-04-14 06:45:00
180 M_FRA01 2015-04-14 06:46:20
188 M_FRA01 2015-04-14 06:48:43
191 M_FRA01 2015-04-14 06:50:20
280 M_FRA01 2015-04-14 07:13:20
Here is the command for the plot:
plot(h1$end1, seq_along(h1$end1), type = "b")
My second dataframe (h2):
h2 <- structure(list(Tool.Module = structure(c(3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("M_AUB02",
"M_ETR02", "M_FRA02", "M_FRA02_01", "M_FRA02_02", "M_FRA02_03",
"M_FRA02_04", "M_KPR03_00", "M_KPR04_00", "M_LAM03", "M_LAM03_01",
"M_LAM04", "M_LAM04_01", "M_LAY02", "M_LOT02_01", "M_LOT02_02_1",
"M_LOT02_02_2", "M_LOT02_03_1", "M_LOT02_03_2", "M_LOT02_04",
"M_TAB02_1", "M_TAB02_2"), class = "factor"), end2 = structure(c(1428984300,
1428984380, 1428984480, 1428984570, 1428984660, 1428984740, 1428984830,
1428984920, 1428985020, 1428985120, 1428985183, 1428985270, 1428985360,
1428985450, 1428985540), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("Tool.Module",
"end2"), row.names = c(17L, 24L, 34L, 44L, 52L, 60L, 69L, 79L,
89L, 99L, 107L, 114L, 124L, 132L, 140L), class = "data.frame")
Here is the command for the plot:
plot(h2$end2, seq_along(h2$end2), type = "b")
I would like to show the above both graphs in one box, and i tried the lines concept. Here is the command for the plot:
plot(h2$end2, seq_along(h2$end2), type = "b")
lines(h1$end1,seq_along(h1$end1), type = "b", col = "red")
but these is not the way, actual graph which i want look like. Actually i want to show 4 graphs(same like above two) in one box.
Your X axis looks the same to me. Since I do not have your data, I am giving here an example
library(wikipediatrend)
views1 <-wp_trend(page = "European debt crisis",from = "2010-01-01",to = "2014-12-31",lang = "en",friendly = TRUE,requestFrom = "wp.trend.tester at wptt.wptt",userAgent = TRUE)
views2 <-wp_trend(page = "National debt of the United States",from = "2010-01-01",to = "2014-12-31",lang = "en",friendly = TRUE,requestFrom = "wp.trend.tester at wptt.wptt",userAgent = TRUE)
views3 <-wp_trend(page = "Arab Spring",from = "2010-01-01",to = "2014-12-31",lang = "en",friendly = TRUE,requestFrom = "wp.trend.tester at wptt.wptt",userAgent = TRUE)
views4 <-wp_trend(page = "Greek government-debt crisis",from = "2010-01-01",to = "2014-12-31",lang = "en",friendly = TRUE,requestFrom = "wp.trend.tester at wptt.wptt",userAgent = TRUE)
combview1<-cbind(views1,views2[,2],views3[,2],views4[,2])
library(ggplot2)
library(reshape2)
meltdf1 <- melt(combview1,id="Time")
ggplot(meltdf1,aes(x=Time,y=value,colour=variable,group=variable)) + geom_line()

Interaction plot in R

I want from the following dataset:
ID Result Days Position
1 70 0 1
1 80 23 1
2 90 15 2
2 89 30 2
2 99 40 2
3 23 24 1
etc...
To make 2 spaghetti plots: 1 for those who are in position 1 and one for those in position 2. I tried a "for & if" loop, but I just got the mixed plot many times. Also I am using ggplot.
dfPr <- df[df$Progress==1]
x11()
ggplot(dfPr, aes(x=OrderToFirstBx, y=result.num, color=factor(MRN))) +
geom_line() + theme_bw() + xlab("Time in Days") + ylab("ALT")
This worked! But if you have another solution please tell me.
Thank you.
You gave such limited example data, and your sample code doesn't seem to match the variable names in your sample data which make it very hard to tell exactly what you wanted.
If you want two separate plots, using facets might be the easiest. Try
#sample data
dfPr <- structure(list(ID = c(1L, 1L, 2L, 2L, 2L, 3L), Result = c(70L,
80L, 90L, 89L, 99L, 23L), Days = c(0L, 23L, 15L, 30L, 40L, 24L
), Position = c(1L, 1L, 2L, 2L, 2L, 1L)), .Names = c("ID", "Result",
"Days", "Position"), class = "data.frame", row.names = c(NA,
-6L))
ggplot(dfPr, aes(x=Days, y=Result, group=ID)) +
geom_line() + facet_wrap(~Position)

Resources