Rename factors in a spineplot with R - r

Is it possible to rename factor's in a spineplot? The names of my factors are to long, so they overlap.
Thanks for your advices!

Reading the help for spineplot, it is clear that you can pass the parameters yaxlabels and xaxlabels to control the vectors for annotation of the axes.
One useful function is abbreviate which will shorten character strings.
Combining this information with the spineplot example gives:
treatment <- factor(rep(c(1, 2), c(43, 41)), levels = c(1, 2),
labels = c("placebo", "treated"))
improved <- factor(rep(c(1, 2, 3, 1, 2, 3), c(29, 7, 7, 13, 7, 21)),
levels = c(1, 2, 3),
labels = c("none", "some", "marked"))
spineplot(improved ~ treatment, yaxlabels=abbreviate(levels(improved), 2))
Not all of the plot functions in R have this type of parameter. For a more general solution, it might be necessary to rename the factors before passing to a plot function. You can access and modify factor names using the levels function:
levels(treatment) <- abbreviate(levels(treatment), 5)
plot(improved ~ treatment)

Related

R: Exclude granges overlapped by another range

I would compute the remained part from a query range after excluding another range, is there a way to do it? Thanks.
query <- IRanges(1, 10)
ranges2exclude <- IRanges(c(2, 6), c(3, 7))
The output I want:
IRanges(c(1, 4, 8), c(1, 5, 10))

Loop over non-standard variable names in R

I have a dataframe (df) with variables that look similar to vector-variables:
myvariable[1], myvariable[2] , myvariable[3] , etc.
However, if I want to refer to them, R automatically creates barticks around them:
df$`myvariable[1]`
I want to use those variables within a for-loop, and hence, want to change the number within the brackets automatically. Does anyone know how to do this?
PS: This question is different from other questions insofar as R doesn't see my variables as vector variables but rather as single variables that look the same. Hence, the []-part of my variables is seen as only some kind of string and not as a subsetting operator.
PS2: dput(head(zTT$subjects[, c("myvariable[1]","myvariable[3]","myvariable[4]")],4))
structure(list(\`myvariable[1]\` = c(2, 4, 2, 9), \`myvariable[3]\` = c(1,
1,2, 3), \`myvariable[4]\` = c(2, 4, 2, 7)), .Names = c("myvariable[1]",
"myvariable[3]", "myvariable[4]"), row.names = c(NA, 4L), class = "data.frame")
As akrun has suggested, you can use [[. The code below uses your own data frame to construct the string which corresponds to the list names.
temp <- structure(list(`myvariable[1]` = c(2, 4, 2, 9),
`myvariable[3]` = c(1, 1,2, 3),
`myvariable[4]` = c(2, 4, 2, 7)),
.Names = c("myvariable[1]", "myvariable[3]",
"myvariable[4]"), row.names = c(NA, 4L),
class = "data.frame")
for (i in c(1, 3, 4)) {
myVar <- paste0("myvariable[", i, "]")
print(temp[[myVar]])
}

R: apply the pclm function

I have trouble to apply the Penalized Composite Link Model (PCLM) function which only works with vectors. I use the pclm function to generate single years of age (syoa) population data from 5-year age group population data.
pclm() can be installed by following the instructions given by the author on https://github.com/mpascariu/ungroup.
Usage of the function:
pclm(x, y, nlast,control = list())
-x: vector of the cumulative sum points of the sequence in y.
-y: vector of values to be ungrouped.
-nlast: Length of the last interval.
-control: List with additional parameters.
Here's my training dataset:
data<-data.frame(
GEOID= c(1,2),
name= c("A","B"),
"Under 5 years"= c(17,20),
"5-9 years"= c(82,90),
"10-14 years"= c(18, 22),
"15-19 years"= c(90,88),
"20-24 years"= c(98, 100),
check.names=FALSE)
#generating a data.frame storing the fitted values from the pclm for the first row: GEOID=1.
#using the values directly
syoa <- data.frame(fitted(pclm(x=c(0, 5, 10, 15, 20), y=c(17,82,18,90,98), nlast=5, control = list(lambda = .1, deg = 3, kr = 1))))
#or referring to the vector by its rows and columns
syoa <- data.frame(fitted(pclm(x=c(0, 5, 10, 15, 20), y=c(data[1,3:7]), nlast=5, control = list(lambda = .1, deg = 3, kr = 1))))
As my data have many observations, I'd like to apply the pclm() function across all the rows for columns 3-7: data[,3:7].
apply(data[3:7], 1, pclm(x=c(0, 5, 10, 15, 20), y=c(data[,3:7]), nlast=5, control = list(lambda = .1, deg = 3, kr = 1)))
but it's not working and gives the following error message:
Error in eval(substitute(expr), data, enclos = parent.frame()) :
(list) object cannot be coerced to type 'double'
I don't know the issue's related to apply() or the pclm ()function. Can anyone help? Thanks.
It's easier than I thought.
pclm <- data.frame(apply(data[3:7], 1, function(x){
pclm <- pclm(x=c(0, 5, 10, 15, 20), y=c(x), nlast=5, control = list(lambda = NA, deg = 3, kr = 1))
round(fitted(pclm))
}))

Create conditional variable in multiple data.tables (or data.frames)

I want to execute the same action in multiple data.tables (or data.frames). For example, I want to create the same variable conditional on the same rule in all data.tables.
A simple example can be (df1=df2=df3, without loss of generality here)
df1 <- data.frame(var1 = c(1, 2, 2, 2, 1), var2 =c(20, 10, 10, 10, 20), var3 = c(10, 8, 15, 7, 9))
df2 <- data.frame(var1 = c(1, 2, 2, 2, 1), var2 =c(20, 10, 10, 10, 20), var3 = c(10, 8, 15, 7, 9))
df3 <- data.frame(var1 = c(1, 2, 2, 2, 1), var2 =c(20, 10, 10, 10, 20), var3 = c(10, 8, 15, 7, 9))
My approach was: (i) to create a list of the data frames (list.df), (ii) to loop on this list trying to create the variable:
list.df
list.df<-vector('list',3)
for(j in 1:3){
name <- paste('df',j,sep='')
list.df[j] <- name
}
My (bad) tentative:
for(i in 1:3){
a<-get(paste(list.df[[i]], "$var1", sep=""))
b<-get(paste(list.df[[i]], "$var2", sep=""))
name<-paste(list.df[[i]], "$var.new", sep="")
assign(name, ifelse(a==2 & b==10, 1, 0))
}
Clearly r cannot create this new variable the way I am doing as I get a error message "object not found".
Any clues on how to fix my bad code? I have a feeling that dplyr could help me but I don't know how.
We can use mget after creating the strings of object names with paste so that we get the values ie. data.frames in a list. We loop through the list (lapply(...,) and transform each dataset by creating the variable ('varNew') which is a binary variable. We can either use ifelse on the logical statement or just wrap with + to coerce the TRUE/FALSE to 1/0.
lst <- lapply(mget(paste0('df', 1:3)), transform,
varNew = +(var1==2 & var2==10))
If we need to update the original objects, we can use list2env.
list2env(lst, envir = .GlobalEnv)
df1
df2

Line in R plot should start at a different timepoint

I have the following example data set:
date<-c(1,2,3,4,5,6,7,8)
valuex<-c(2,1,2,1,2,3,4,2)
valuey<-c(2,3,4,5,6)
now I plot the date and the valuex variable:
plot(date,valuex,type="l")
now, I want to add a line of the valuey variable, but it should start with the 4th day, so not at the beginning, therefore I add NA values:
valuexmod<-c(rep(NA,3),valuex)
and I add the line with:
lines(date,valuexmod,type="l",col="red")
But this does not work? R ignores the NA values and the valuexmod line starts with the first day, but it should start with th 4th day?
Given that date and valuex have the same length, I am assuming that you have a typo above.
Try this instead:
date <- c(1, 2, 3, 4, 5, 6, 7, 8)
valuex <- c(2, 1, 2, 1, 2, 3, 4, 2)
valuey <- c(2, 3, 4, 5, 6)
valueymod <- c(rep(NA, 3), valuey)
plot(date, valuex, type = "l", ylim = range(c(valuex, valuey)))
lines(date, valueymod, type = "l", col = "red")
Here's the resulting plot:
Related to your question is a point made in help("lines")...
The coordinates can contain NA values. If a point contains NA in either its x or y value, it is omitted from the plot, and lines are not drawn to or from such points. Thus missing values can be used to achieve breaks in lines.

Resources