Select the same name with different [number] - r

I have column names like the following plot
Can I select all alpha one time instead of typing alpha[1], alpha[2]...alpha[9]?
How can I put in the following codes to let R know I need results of all alpha?
t_alpha <- mcmc_trace(mcmc,pars="alpha")

Something like this perhaps?
library(dplyr)
library(magrittr)
df %>% select(matches("^alpha"))`
# alpha.1. alpha.10.
# 1 55 43
# 2 97 20
# 3 80 84
# 4 24 60
# 5 27 21
# 6 98 70

Related

Dividing all possible rows within a given sub-data in R

My data looks like this:
set <- c(1,1,1,2,2,3,3,3,3,3,4,4)
density <- c(1,3,3,1,3,1,1,1,3,3,1,3)
counts <- c(100,2,4,76,33,12,44,13,54,36,65,1)
data <- data.frame(set,density,counts)
data$set <- as.factor(data$set)
data$density <- as.factor(data$density)
Within a given set there are two levels of densities "1" or "3". For a given set, I want to divide all possible combinations of counts of density "1" and density "3". I then want to print the original density associated with density "1", the ratio, and the set
For example, the result for the first few rows should look like:
set counts ratio
1 100 50 #100/2
1 100 25 #100/4
2 76 2.3 #76/33
3 12 0.22 #12/54
3 12 0.33 #12/36
3 44 0.8148 #44/54
...
I thought I could achieve it by dplyr..but it seems a little too complicated for dplyr.
It looks like the comments get you most of the way there. Here's a dplyr solution. With left_join each of the density1's get matched up with all density3's in the same set, providing output in line with your specification.
# Edited below to use dplyr syntax; my base syntax had a typo
library(dplyr)
data_combined <- data %>% filter(density == 1) %>%
# Match each 1 w/ each 3 in the set
left_join(data %>% filter(density == 3), by = "set") %>%
mutate(ratio = counts.x / counts.y) %>%
select(set, counts.x, counts.y, ratio)
data_combined
# set counts.x counts.y ratio
#1 1 100 2 50.0000000
#2 1 100 4 25.0000000
#3 2 76 33 2.3030303
#4 3 12 54 0.2222222
#5 3 12 36 0.3333333
#6 3 44 54 0.8148148
#7 3 44 36 1.2222222
#8 3 13 54 0.2407407
#9 3 13 36 0.3611111
#10 4 65 1 65.0000000

Counting Attempts of an event in R

I'm relatively new in R and learning. I have the following data frame = data
ID grade Test_Date
1 56 01-25-2012
1 63 02-21-2016
1 73 02-31-2016
2 41 12-23-2015
2 76 01-07-2016
3 66 02-08-2016
I am looking to count the number of people (in this case only two unique individuals) who passed their tests after multiple attempts(passing is defined as 65 or over). So the final product would return me a list of unique ID's who had multiple counts until their test scores hit 65. This would inform me that approx. 66% of the clients in this data frame require multiple test sessions before getting a passing grade.
Below is my idea or concept more or less, I've framed it as an if statement
If ID appears twice
count how often it appears, until TEST GRADE >= 65
ifelse(duplicated(data$ID), count(ID), NA)
I'm struggling with the second piece where I want to say, count the occurrence of ID until grade >=65.
The other option I see is some sort of loop. Below is my attempt
for (i in data$ID) {
duplicated(datad$ID)
count(data$ID)
Here is where something would say until =65
}
Again the struggle comes in how to tell R to stop counting when grade hits 65.
Appreciate the help!
You can use data.table:
library(data.table)
dt <- fread(" ID grade Test_Date
1 56 01-25-2012
1 63 02-21-2016
1 73 02-31-2016
2 41 12-23-2015
2 76 01-07-2016
3 66 02-08-2016")
# count the number of try per ID then get only the one that have been successful
dt <- dt[, N:=.N, by=ID][grade>=65]
# proportion of successful having tried more than once
length(dt[N>1]$ID)/length(dt$ID)
[1] 0.6666667
Another option, though the other two work just fine:
library(dplyr)
dat2 <- dat %>%
group_by(ID) %>%
summarize(
multiattempts = n() > 1 & any(grade < 65),
maxgrade = max(grade)
)
dat2
# Source: local data frame [3 x 3]
# ID multiattempts maxgrade
# <int> <lgl> <int>
# 1 1 TRUE 73
# 2 2 TRUE 76
# 3 3 FALSE 66
sum(dat2$multiattempts) / nrow(dat2)
# [1] 0.6666667
Here is a method using the aggregate function and subsetting that returns the maximum score for testers that took the the test more than once starting from their second test.
multiTestMax <- aggregate(grade~ID, data=df[duplicated(df$ID),], FUN=max)
multiTestMax
ID grade
1 1 73
2 2 76
To get the number of rows, you can use nrow:
nrow(multiTestMax)
2
or the proportion of all test takers
nrow(multiTestMax) / unique(df$ID)
data
df <- read.table(header=T, text="ID grade Test_Date
1 56 01-25-2012
1 63 02-21-2016
1 73 02-31-2016
2 41 12-23-2015
2 76 01-07-2016
3 66 02-08-2016")

Calculate mean of each n-rows in a dataframe in r when the first row is varying

First make some example data:
df = data.frame(matrix(rnorm(200), nrow=100))
df1=data.frame(t(c(25,34)))
The starting row is different in each column. For example, in X1 I would like to start from 25 th row while in X2 from row 34. Then, I want to calculate the mean for each 5 values for the next 50 rows for all the columns in df.
I am new to R so this is probably very obvious. Can anyone provide some suggestions that how I can do this?
You could try Map.
lst <- Map(function(x,y) {x1 <- x[y:length(x)]
tapply(x1,as.numeric(gl(length(x1), 5,
length(x1))), FUN=mean)},
df, df1)
lst
# $X1
# 1 2 3 4 5 6
#-0.16500158 0.11339623 -0.86961872 -0.54985564 0.19958461 0.35234983
# 7 8 9 10 11 12
#0.32792769 0.65989801 -0.30409184 -0.53264725 -0.45792792 -0.59139844
# 13 14 15 16
# 0.03934133 -0.38068187 0.10100007 1.21017392
#$X2
# 1 2 3 4 5 6
# 0.24525622 0.07367300 0.18733973 -0.43784202 -0.45756095 -0.45740178
# 7 8 9 10 11 12
#-0.54086152 0.10439072 0.65660937 0.70623380 -0.51640088 0.46506135
# 13 14
#-0.09428336 -0.86295101
Because of the length difference, it might be better to keep it as a list. But, if you need it in a matrix/data.frame, you can make the lengths equal by padding with NAs.
do.call(cbind,lapply(lst, `length<-`,(max(sapply(lst, length)))))
Update
If you need only 50 rows, then change y:(length(x) to y:(y+49) in the Map code
data
set.seed(24)
df <- data.frame(matrix(rnorm(200), nrow=100))
df1 <- data.frame(t(c(25,34)))
Not entirely clear, especially, the second line of your code, but I think this might be close to what you want to do:
every_fifth_row <- df[seq(1, nrow(df), 5), ]
every_fifth_row
# X1 X2
# 1 -0.09490455 -0.28417104
# 6 -0.14949662 0.12857284
# 11 0.15297366 -0.84428186
# 16 -1.03397309 0.04775516
# 21 -1.95735213 -1.03750794
# 26 1.61135194 1.10189370
# 31 0.12447365 1.80792719
# 36 -0.92344017 0.66639710
# 41 -0.88764143 0.10858376
# 46 0.27761464 0.98382526
# 51 -0.14503359 -0.66868956
# 56 -1.70208187 0.05993688
# 61 0.33828525 1.00208639
# 66 -0.41427863 1.07969341
# 71 0.35027994 -1.46920059
# 76 1.38943839 0.01844205
# 81 -0.81560917 -0.32133221
# 86 1.38188423 -0.77755471
# 91 1.53247872 -0.98660308
# 96 0.45721909 -0.22855622
rowMeans(every_fifth_row)
colMeans(every_fifth_row)
# Alternative
# apply(every_fifth_row, 1, mean) # Row-wise mean
# apply(every_fifth_row, 2, mean) # Column-wise mean

How to make a spaghetti plot in R?

I have the following:
heads(dataframe):
ID Result Days
1 70 0
1 80 23
2 90 15
2 89 30
2 99 40
3 23 24
ect...
what I am trying to do is: Create a spaghetti plot with the above datast. What I use is this:
interaction.plot(dataframe$Days,dataframe$ID,dataframe$Result,xlab="Time",ylab="Results",legend=F) but none of the patient lines are continuous even when they were supposed to be a long line.
Also I want to convert the above dataframe to something like this:
ID Result Days
1 70 0
1 80 23
2 90 0
2 89 15
2 99 25
3 23 0
ect... ( I am trying to take the first (or minimum) of each id and have their dating starting from zero and up). Also in the spaghetti plot i want all patients to have the same color IF a condition in met, and another color if the condition is not met.
Thank you for your time and patience.
How about this, using ggplot2 and data.table
# libs
library(ggplot2)
library(data.table)
# your data
df <- data.table(ID=c(1,1,2,2,2,3),
Result=c(70,80,90,89,99,23),
Days=c(0,23,15,30,40,24))
# adjust each ID to start at day 0, sort
df <- merge(df, df[, list(min_day=min(Days)), by=ID], by='ID')
df[, adj_day:=Days-min_day]
df <- df[order(ID, Days)]
# plot
ggplot(df, aes(x=adj_day, y=Result, color=factor(ID))) +
geom_line() + geom_point() +
theme_bw()
Contents of updated data.frame (actually a data.table):
ID Result Days min_day adj_day
1 70 0 0 0
1 80 23 0 23
2 90 15 15 0
2 89 30 15 15
2 99 40 15 25
3 23 24 24 0
You can handle the color coding easily using scale_color_manual()

ggplot2 is plotting a line strangely

i am trying to plot the time series x_t = A + (-1)^t B
To do this i am using the following code. The problem is, that the ggplot is wrong.
require (ggplot2)
set.seed(42)
N<-2
A<-sample(1:20,N)
B<-rnorm(N)
X<-c(A+B,A-B)
dat<-sapply(1:N,function(n) X[rep(c(n,N+n),20)],simplify=FALSE)
dat<-data.frame(t=rep(1:20,N),w=rep(A,each=20),val=do.call(c,dat))
ggplot(data=dat,aes(x=t, y=val, color=factor(w)))+
geom_line()+facet_grid(w~.,scale = "free")
looking at the head of dat everything looks right:
> head(dat)
t w val
1 1 12 10.5533
2 2 12 13.4467
3 3 12 10.5533
4 4 12 13.4467
5 5 12 10.5533
6 6 12 13.4467
So the lower (blue) line should only have values 10.5533 and 13.4467. But it also takes different values. What is wrong in my code?
Thanks in advance for any help
You really should be more careful before asserting that something is "wrong". The way you are creating dat the rows are not ordered by dat$t, so head(...) is not displaying the extra values:
head(dat[order(dat$w,dat$t),],10)
# t w val
# 21 1 18 18.43530
# 61 1 18 18.36313
# 22 2 18 19.56470
# 62 2 18 17.63687
# 23 3 18 18.43530
# 63 3 18 18.36313
# 24 4 18 19.56470
# 64 4 18 17.63687
# 25 5 18 18.43530
# 65 5 18 18.36313
Note the row numbers.

Resources