Interaction plot in R - r

I want from the following dataset:
ID Result Days Position
1 70 0 1
1 80 23 1
2 90 15 2
2 89 30 2
2 99 40 2
3 23 24 1
etc...
To make 2 spaghetti plots: 1 for those who are in position 1 and one for those in position 2. I tried a "for & if" loop, but I just got the mixed plot many times. Also I am using ggplot.
dfPr <- df[df$Progress==1]
x11()
ggplot(dfPr, aes(x=OrderToFirstBx, y=result.num, color=factor(MRN))) +
geom_line() + theme_bw() + xlab("Time in Days") + ylab("ALT")
This worked! But if you have another solution please tell me.
Thank you.

You gave such limited example data, and your sample code doesn't seem to match the variable names in your sample data which make it very hard to tell exactly what you wanted.
If you want two separate plots, using facets might be the easiest. Try
#sample data
dfPr <- structure(list(ID = c(1L, 1L, 2L, 2L, 2L, 3L), Result = c(70L,
80L, 90L, 89L, 99L, 23L), Days = c(0L, 23L, 15L, 30L, 40L, 24L
), Position = c(1L, 1L, 2L, 2L, 2L, 1L)), .Names = c("ID", "Result",
"Days", "Position"), class = "data.frame", row.names = c(NA,
-6L))
ggplot(dfPr, aes(x=Days, y=Result, group=ID)) +
geom_line() + facet_wrap(~Position)

Related

Multiple entries with same patient id: how to make it one entry

I have multiple entries in my data from same patient ID, I wanted to make it as one entry. What are my possible options? Here is the data -
PtID WorryHighBGNow
40 5
40 1
40 2
70 3
101 4
263 2
263 5
263 3
143 4
245 4
137 3
219 2
219 3
219 4
3 3
264 3
264 3
98 1
200 3
105 3
111 4
149 3
I want to create a visualization like below out of this data, where on y axis I want to see columns of my table and on X I want to se ranking 1,2,3,4,5
If x is your data frame you can try this
d <- setDT(x)[, list(WorryHighBGNow = paste(WorryHighBGNow, collapse = ', ')),by = c('PtID')]
It will give result like
PtID WorryHighBGNow
40 5,1,2
70 3
101 4
263 2,5,3
And so on.
Not really sure that is what you need. I've just tried to mimic visualization you linked to the question, as close as it possible.
library(tidyverse)
dat %>%
mutate_all(factor) %>%
count(WorryHighBGNow, ) %>%
mutate(percentage = round(n / sum(n) * 100, 1)) %>%
mutate(WorryHighBGNow = reorder(WorryHighBGNow, n)) %>%
ggplot(aes(x = WorryHighBGNow, y = percentage,
fill = WorryHighBGNow, label = paste(percentage, '%'))) +
geom_col() +
geom_text(hjust = -.1, fontface = 'bold') +
scale_fill_brewer(type = 'qual', breaks = 1:5) +
coord_flip() +
expand_limits(y = 50) +
theme_void() +
theme(legend.position = 'bottom')
Data:
dat <- structure(
list(
PtID = c(40L, 40L, 40L, 70L, 101L, 263L, 263L, 263L, 143L, 245L, 137L, 219L,
219L, 219L, 3L, 264L, 264L, 98L, 200L, 105L, 111L, 149L),
WorryHighBGNow = c(5L, 1L, 2L, 3L, 4L, 2L, 5L, 3L, 4L, 4L, 3L, 2L, 3L, 4L,
3L, 3L, 3L, 1L, 3L, 3L, 4L, 3L)
),
class = "data.frame", row.names = c(NA, -22L)
)

Replacing NA depending on distribution type of gender in R

When i selected NA value here
data[data=="na"] <- NA
data[!complete.cases(data),]
i must replace it, but depending on type of distribution.
If using Shapiro.test the distribution by variables not normal,
then missing value must be replace by median,
If it's normal, than replace by mean.
But distribution for each gender(1 girl, 2 -man)
data=structure(list(sex = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), emotion = c(20L,
15L, 49L, NA, 34L, 35L, 54L, 45L), IQ = c(101L, 98L, 105L, NA,
123L, 120L, 115L, NA)), .Names = c("sex", "emotion", "IQ"), class = "data.frame", row.names = c(NA,
-8L))
the desired output
sex emotion IQ
1 20 101
1 15 98
1 49 105
1 28 101
2 34 123
2 35 120
2 54 115
2 45 119
Following code will replace NA values according to the Shapiro Test:
library(dplyr)
data %>%
group_by(sex) %>%
mutate(
emotion = ifelse(!is.na(emotion), emotion,
ifelse(shapiro.test(emotion)$p.value > 0.05,
mean(emotion, na.rm=TRUE), quantile(emotion, na.rm=TRUE, probs=0.5) ) ),
IQ = ifelse(!is.na(IQ), IQ,
ifelse(shapiro.test(IQ)$p.value > 0.05,
mean(IQ, na.rm=TRUE), quantile(IQ, na.rm=TRUE, probs=0.5) )
)
)

How to create a new column showing if and how many variables share a specific observation

I have a question concerning the analysis of some bioinformatics data in R.
My test data frame consists of a variable "sequence" with different letter codes as observations and three different variables representing individuals/samples (P1, P2, P3) that say how often the particular observation was counted in an individual (so P3 contains the sequence "AB" 23 times for example).
I want to create a new column now (already indicated in my data frame as dummy column X with NA) that shows for each sequence row if the sequence is overall shared between individuals (P1, P2, P3) and more importantly how many of the three individuals share it. The numbers in the new column can therefore range only from 1 to 3. For example: for sequence "ABCDE" the new column would show value 1 because it occurs only in one individual P3, for sequence "ABC" the new column would show value 2 because it occurs in both individuals P2 and P3 and finally for "ABCD" it would show 3 since all individuals contain the sequence.
My test data looks like this after dput():
structure(list(Sequence = structure(1:9, .Label = c("AB", "ABC",
"ABCD", "ABCDE", "ABCDEF", "ABCDEFG", "ABCDEFGH", "ABCDEFGHI",
"ABCDEFGHIJ"), class = "factor"), P1 = c(5L, 0L, 20L, 0L, 3L,
1L, 0L, 0L, 0L), P2 = c(6L, 2L, 3L, 0L, 2L, 0L, 56L, 10L, 3L),
P3 = c(23L, 34L, 8L, 5L, 0L, 6L, 0L, 78L, 5L), X = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("Sequence",
"P1", "P2", "P3", "X"), class = "data.frame", row.names = c(NA,
-9L))
Thank you!
You can try to sum the "P." columns with a positive count:
mydf$X <- rowSums(mydf[, grep("^P", names(mydf))]>0)
head(mydf, 4)
# Sequence P1 P2 P3 X
#1 AB 5 6 23 3
#2 ABC 0 2 34 2
#3 ABCD 20 3 8 3
#4 ABCDE 0 0 5 1
We can use Reduce with lapply
df1$X <- Reduce(`+`, lapply(df1[2:4], `>`, 0))
df1$X
#[1] 3 2 3 1 2 2 1 2 2
Reduce can be very efficient as showed in the benchmarks here

Plot Bar Chart in R

I have some data. for example there are two columns. First column data is continuous. second column value is binary value(t|f). I want to plot this in a bar chart in R language. In the first column, I want group the numbers into category like 0-100, 101-200,..... then i want to plot number of t's in y axis. I have used ggplot2 in R. But i am not clear with how to group these x axis data.
1 123 t
2 145 t
3 222 t
4 345 f
5 455 t
6 567 t
7 245 t
8 300 t
9 150 t
10 600 t
11 333 t
First, here's your sample data in a data.frame
dd<-structure(list(V1 = 1:11, V2 = c(123L, 145L, 222L, 345L, 455L,
567L, 245L, 300L, 150L, 600L, 333L), V3 = structure(c(2L, 2L,
2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
.Label = c("f", "t"), class = "factor")), .Names = c("V1",
"V2", "V3"), class = "data.frame", row.names = c(NA, -11L))
Here's a strategy for plotting
ggplot(dd, aes(x=cut(V2, breaks=c(0,1:9*100)), weight=as.numeric(V3=="t"))) +
geom_bar(stat="bin") + xlab("value")
We define x and weights in the aes(). We use cut() to break up with numbers into ranges. Then we use weights to turn each value to a zero/one value that will be added together in the bins.

return a plot for each level of a factor in r

I want to produce an X,Y plot for each separate ID from the dataframe 'trajectories' :
**trajectories**
X Y ID
2 4 1
1 6 1
2 4 1
1 8 2
3 7 2
1 5 2
1 4 3
1 6 3
7 4 3
I use the code:
sapply(unique(trajectories$ID),(plot(log(abs(trajectories$X)+0.01),log((trajectories$Y)+0.01))))
But this does not seem to work since the error:
Error in match.fun(FUN) :
c("'(plot(log(abs(trajectories$X)+0.01),log((trajectories$Y)' is not a function, character or symbol", "' 0.01)))' is not a function, character or symbol")
Is there a way to rewrite this code so that i get a separate plot for each ID?
You can use the ggplot2 package for this nicely:
library(ggplot2)
trajectories <- structure(list(X = c(2L, 1L, 2L, 1L, 3L, 1L, 1L, 1L, 7L), Y = c(4L, 6L, 4L, 8L, 7L, 5L, 4L, 6L, 4L), ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L)), .Names = c("X", "Y", "ID"), class = "data.frame", row.names = c(NA, -9L))
ggplot(trajectories, aes(x=log(abs(X) + 0.01), y=log(Y))) +
geom_point() +
facet_wrap( ~ ID)
For what its worth, the reason your cod is failing is exactly what the error says. the second argument to sapply needs to be a function. If you define your plot code as a function:
myfun <- function(DF) {
plot(log(abs(DF$X) + 0.01), log(DF$Y))
}
But this will not split your data on ID. You could also use the plyr or data.table package to do this splitting and plotting but you will need to write the plots to a file or they will close as each new plot is created.
The lattice package is useful here.
library(lattice)
# Make the data frame
X <- c(2,1,2,1,3,1,1,1,7)
Y <- c(4,6,4,8,7,5,4,6,4)
ID <- c(1,1,1,2,2,2,3,3,3)
trajectories <- data.frame(X=X, Y=Y, ID=ID)
# Plot the graphs as a scatter ploy by ID
xyplot(Y~X | ID,data=trajectories)
# Another useful solution is to treat ID as a factor
# Now, the individual plots are labeled
xyplot(Y~X | factor(ID),data=trajectories)
Even the with basic R this is possible. Using the iris Dataset:
coplot(Sepal.Length ~ Sepal.Width | Species, data = iris)

Resources