I have multiple lists wanted to combine, but got wrong results
The code I used
hiv.Scatter <- list(predictions = predictdata, labels = L)
for (k in 1:2){
hiv.Scatter <-
list(predictions = append(
list(hiv.Scatter$predictions),
list(predictdata)
),
labels = append(list(hiv.Scatter$labels), list(L)))
}
But use the code above, I got very strange results
the results I expected is:
> str(hiv.Scatter)
List of 2
$ predictions:List of 3
..$ : num [1:6] 0.0287 0.00648 0.00926 0.04352 0.01296 ...
..$ : num [1:6] 0.0287 0.00648 0.00926 0.04352 0.01296 ...
..$ : num [1:6] 0.0287 0.00648 0.00926 0.04352 0.01296 ...
$ labels :List of 3
..$ : num [1:6] 1 1 1 1 1 1
..$ : num [1:6] 1 1 1 1 1 1
..$ : num [1:6] 1 1 1 1 1 1
The data I used
> dput(L)
c(1, 1, 1, 1, 1, 1)
> dput(predictdata)
c(0.0287037037037037, 0.00648148148148148, 0.00925925925925926,
0.0435185185185185, 0.012962962962963, 0.00833333333333333)
Thanks for your help
See this,
hiv.Scatter <- list(predictions = list(predictions = predictdata),
labels = list(labels = L))
for (k in 1:2){
hiv.Scatter[[1]] <- append(hiv.Scatter[[1]],
list(predictions = predictdata))
hiv.Scatter[[2]] <- append(hiv.Scatter[[2]], list(labels = L))
}
OR, this
hiv.Scatter <- list(predictions = list(predictions = predictdata),
labels = list(labels = L))
for (k in 1:2){
hiv.Scatter$predictions <- append(hiv.Scatter$predictions,
list(predictions = predictdata))
hiv.Scatter$labels <- append(hiv.Scatter$labels, list(labels = L))
}
Which seems to give the desired output
str(hiv.Scatter)
# List of 2
# $ predictions:List of 3
# ..$ predictions: num [1:6] 0.0287 0.00648 0.00926 0.04352 0.01296 ...
# ..$ predictions: num [1:6] 0.0287 0.00648 0.00926 0.04352 0.01296 ...
# ..$ predictions: num [1:6] 0.0287 0.00648 0.00926 0.04352 0.01296 ...
# $ labels :List of 3
# ..$ labels: num [1:6] 1 1 1 1 1 1
# ..$ labels: num [1:6] 1 1 1 1 1 1
# ..$ labels: num [1:6] 1 1 1 1 1 1
Related
I got this error after runing write.csv(),
how I can fix it?
Thanks
write.csv (res_basic,"ceRNA_basic_result", row.names=TRUE)
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 43131, 10499
str(res_basic):
List of 2
$ cesig :'data.frame': 43131 obs. of 5 variables:
..$ targetce : chr [1:43131] "AASDHPPT" "AASDHPPT" "AASDHPPT" "AASDHPPT" ...
..$ anotherce : chr [1:43131] "ADGRG1" "AFAP1" "BCL3" "C1orf147" ...
..$ miRNAs : chr [1:43131] "hsa-miR-6837-3p" "hsa-miR-1185-1-3p" "hsa-miR-6837-3p" "hsa-miR-1185-1-3p" ...
..$ miRNAs_num: num [1:43131] 1 1 1 1 1 1 1 1 1 1 ...
..$ ratio : num [1:43131] 0.5 1 1 1 1 0.5 1 1 1 1 ...
$ cenotsig:'data.frame': 10499 obs. of 5 variables:
..$ targetce : chr [1:10499] "AASDHPPT" "AASDHPPT" "AASDHPPT" "AASDHPPT" ...
..$ anotherce : chr [1:10499] "ARCN1" "BACH1" "CDK6" "DNAJA1" ...
..$ miRNAs : chr [1:10499] "hsa-miR-1185-1-3p" "hsa-miR-1185-1-3p" "hsa-miR-7849-3p" "hsa-miR-6837-3p" ...
..$ miRNAs_num: num [1:10499] 1 1 1 1 1 1 1 1 1 1 ...
..$ ratio : num [1:10499] 0.333 0.25 0.2 0.333 0.333 ...
We have a list of data.frame. Thus, we need to loop if we want to write as two separate datasets
Map(function(x, y) write.csv(x, paste0(y, ".csv"), row.names = TRUE),
res_basic, names(res_basic))
If it needs to be a single file, bind them together into a single data and then write it back
library(dplyr)
write.csv(bind_rows(res_basic, .id = 'grp'),
"ceRNA_basic_result.csv", row.names=TRUE)
I'm using MLR package and I stumbled on a problem with an S4 object. More specifically it's the slot name that causes the trouble. I'm looking for a way to change the slot's name, not the value.
Here's a reproducible code example that generates the object in question:
lrn1 = makeLearner("classif.lda", predict.type = "prob")
lrn2 = makeLearner("classif.ksvm", predict.type = "prob")
lrns = list(lrn1, lrn2)
rdesc.outer = makeResampleDesc("CV", iters = 5)
ms = list(auc, mmce)
bmr = benchmark(lrns, tasks = sonar.task, resampling = rdesc.outer,
measures = ms, show.info = FALSE)
preds = getBMRPredictions(bmr, drop = TRUE)
ROCRpreds = lapply(preds, asROCRPrediction)
ROCRperfs = lapply(ROCRpreds, function(x) ROCR::performance(x, "tpr", "fpr"))
The object is made of two lists and I need to change the name slots in both of them. Instead of x.values and y.values the names should be x and y respectively.
str(ROCRperfs$classif.lda)
Formal class 'performance' [package "ROCR"] with 6 slots
..# x.name : chr "False positive rate"
..# y.name : chr "True positive rate"
..# alpha.name : chr "Cutoff"
..# x.values :List of 5
.. ..$ : num [1:43] 0 0 0 0 0 ...
.. ..$ : num [1:42] 0 0 0 0.0526 0.0526 ...
.. ..$ : num [1:42] 0 0 0 0.05 0.05 0.05 0.05 0.05 0.05 0.05 ...
.. ..$ : num [1:43] 0 0 0.0476 0.0476 0.0476 ...
.. ..$ : num [1:43] 0 0 0 0 0 ...
..# y.values :List of 5
.. ..$ : num [1:43] 0 0.0417 0.0833 0.125 0.1667 ...
.. ..$ : num [1:42] 0 0.0455 0.0909 0.0909 0.1364 ...
.. ..$ : num [1:42] 0 0.0476 0.0952 0.0952 0.1429 ...
.. ..$ : num [1:43] 0 0.0476 0.0476 0.0952 0.1429 ...
.. ..$ : num [1:43] 0 0.0435 0.087 0.1304 0.1739 ...
..# alpha.values:List of 5
.. ..$ : num [1:43] Inf 1 1 1 1 ...
.. ..$ : num [1:42] Inf 1 1 1 0.999 ...
.. ..$ : num [1:42] Inf 1 1 1 1 ...
.. ..$ : num [1:43] Inf 1 1 0.999 0.999 ...
.. ..$ : num [1:43] Inf 1 1 1 1 ...
As I'm beginner to OOP in R all I could was to print the slot with slot().
The bottom line is that all I want to do with the object in question is to plot is as follows:
plot(ROCRperfs[[1]], col = "blue", avg = "vertical", spread.estimate = "stderror",
show.spread.at = seq(0.1, 0.8, 0.1), plotCI.col = "blue", plotCI.lwd = 2, lwd = 2)
You cannot change the structure of an S4 class once it's defined. This is a feature, not a bug. By imposing restrictions on what can be done, S4 reduces the chance of bugs creeping into your code.
For example, consider what might happen if you changed the slotnames in the object to x and y, and then passed the object to a function that's expecting x.values and y.values. By not allowing you to make this change, S4 rules out the possibility that code down the line will be given an object whose structure they can't handle.
For your use case, you can just plot the x.values and y.values slots individually:
plot(ROCRperfs[[1]]#x.values, ROCRperfs[[1]]#y.values,
col = "blue", avg = "vertical", spread.estimate = "stderror",
show.spread.at = seq(0.1, 0.8, 0.1), plotCI.col = "blue",
plotCI.lwd = 2, lwd = 2))
I would like to plot a 3D surface graph like on the figure:
My attempt with the plotly package is below:
library(plotly)
packageVersion("plotly")
# [1] ‘4.5.2’
# random data
a <- 0; s <- c(1:16)
x <- seq(a-3*max(s), a+3*max(s), len=10)
f <- sapply(s, function(ss) dnorm(x, a, ss))
df0=data.frame(x=rep(x,length(s)),
y=rep(s,each=length(x)),
z=f,
col=rep(seq(1,31,2),each=length(x)))
df0 %>% group_by(y) %>%
plot_ly(x = ~x, y = ~y, z = ~f, type = 'scatter3d', mode = 'lines',
line = list(width = 6,color = ~col,colorscale = 'Viridis'))
I have the error message:
Error in function_list[[i]](value) : could not find function "group_by"
The group argument is deprecated and I have not had success with group_by.
Question. How to rewrite the group_by argument?
There is a problem in the construction of the dataset 'df0'. If we look at the
str(df0)
#'data.frame': 160 obs. of 19 variables:
# $ x : num -48 -37.33 -26.67 -16 -5.33 ...
# $ y : int 1 1 1 1 1 1 1 1 1 1 ...
# $ z.1 : num 0.00 8.83e-304 1.53e-155 1.03e-56 2.66e-07 ...
# $ z.2 : num 1.67e-126 4.33e-77 4.97e-40 2.53e-15 5.70e-03 ...
# $ z.3 : num 3.42e-57 3.13e-35 9.26e-19 8.85e-08 2.74e-02 ...
# $ z.4 : num 5.37e-33 1.21e-20 2.23e-11 3.35e-05 4.10e-02 ...
# $ z.5 : num 7.76e-22 6.25e-14 5.31e-08 4.77e-04 4.52e-02 ...
# $ z.6 : num 8.42e-16 2.60e-10 3.42e-06 1.90e-03 4.48e-02 ...
# $ z.7 : num 3.51e-12 3.79e-08 4.02e-05 4.18e-03 4.26e-02 ...
# $ z.8 : num 7.59e-10 9.31e-07 1.93e-04 6.75e-03 3.99e-02 ...
# $ z.9 : num 2.95e-08 8.13e-06 5.50e-04 9.13e-03 3.72e-02 ...
# $ z.10: num 3.96e-07 3.75e-05 1.14e-03 1.11e-02 3.46e-02 ...
# $ z.11: num 2.66e-06 1.14e-04 1.92e-03 1.26e-02 3.22e-02 ...
# $ z.12: num 1.12e-05 2.63e-04 2.81e-03 1.37e-02 3.01e-02 ...
# $ z.13: num 3.36e-05 4.97e-04 3.74e-03 1.44e-02 2.82e-02 ...
# $ z.14: num 7.98e-05 8.14e-04 4.64e-03 1.48e-02 2.65e-02 ...
# $ z.15: num 0.000159 0.001201 0.005477 0.015058 0.024967 ...
# $ z.16: num 0.000277 0.001639 0.006217 0.015123 0.023586 ...
# $ col : num 1 1 1 1 1 1 1 1 1 1 ...
it will be evident. the f returns a matrix and it should be converted to vector to create the 'z'
df0 <- data.frame(x=rep(x,length(s)),
y=rep(s,each=length(x)),
z=c(f), ######
col=rep(seq(1,31,2),each=length(x)))
str(df0)
#'data.frame': 160 obs. of 4 variables:
#$ x : num -48 -37.33 -26.67 -16 -5.33 ...
#$ y : int 1 1 1 1 1 1 1 1 1 1 ...
#$ z : num 0.00 8.83e-304 1.53e-155 1.03e-56 2.66e-07 ...
#$ col: num 1 1 1 1 1 1 1 1 1 1 ...
Another error mentioned is the group_by. If we have loaded
library(dplyr)
that error message would be gone as well.
I have a list of dataframes as follows (dput is way too big even with head=1 so I've had to do a mockup here with str(df_list))
$ OC_AH_026C :'data.frame': 13081 obs. of 3 variables:
..$ chr : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
..$ Means : num [1:13081] 45.183 111.038 162.785 -0.712 83.473 ...
$ OC_AH_026C.1:'data.frame': 13081 obs. of 3 variables:
..$ chr : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
..$ Means : num [1:13081] 69.6 125.1 156.4 12.8 97.4 ...
$ OC_AH_026T :'data.frame': 13081 obs. of 3 variables:
..$ chr : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
..$ Means : num [1:13081] 13 12.5 103.1 56.7 145.4 ...
$ OC_AH_058T :'data.frame': 13081 obs. of 3 variables:
..$ chr : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
..$ Means : num [1:13081] 87.114 118.963 184.31 -0.173 171.733 ...
$ OC_AH_084T :'data.frame': 13081 obs. of 3 variables:
..$ chr : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
..$ Means : num [1:13081] 29.111 103.142 57.476 -0.712 50.156 ...
$ OC_AH_086T :'data.frame': 13081 obs. of 3 variables:
..$ chr : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
..$ Means : num [1:13081] 49.8 81 111.5 47 98.8 ...
$ OC_AH_088T :'data.frame': 13081 obs. of 3 variables:
..$ chr : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
..$ Means : num [1:13081] 117 152 224 121 196 ...
$ OC_AH_096T :'data.frame': 13081 obs. of 3 variables:
..$ chr : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
..$ Means : num [1:13081] 49.5 102.8 93.6 15.2 103.2 ...
I am trying to calculate all the significant scores for each of the third column of each dataframe (Means grouped into bins using dplyr) and if they are significantly elevated they are ascribed a 1 ,significantly depressed a -1 and neither, a zero in a new column for each dataframe.
To do the grouping I have done as follows which works fine:
CLL <- function (col) {
col <- col %>%
group_by(chr, binnum = (leftPos) %/% 500000) %>%
summarise(Means = mean(Means)) %>%
mutate(leftPos = (binnum+1) * 120000) %>%
select(leftPos, Means)}
CML<-lapply(df_list, CLL)
I am stuck on then calculating the upper and lower limits for each Means column in each dataframe. I think this is because I do not know how to reference this column because it is in a list of dataframes. For a non list dataframe I use:
UL = median(col2, na.rm = TRUE) + alpha*IQR(col2[1], na.rm = TRUE)
LL = median(col2, na.rm = TRUE) - alpha*IQR(col2, na.rm = TRUE)
I have tried to reference the third column of each dataframe as follows:
tre<-lapply(CML, "[[", 3)
but of course this extracts the third column and puts it in 'tre' whereas I want to alter the dataframes in the list so that the third column has its relationship with the other two columns maintained.
So.....
a) How do I reference the Means column and get the upper and lower limit of each dataframe and then
b) on the basis of whether the row in the Means column of each dataframe are >upper limit or
This is what you can do, which is similar to #Roland's answer.
Say that you have data that looks like this (a simplified version of the data you showed):
df_list <- list(OC_AH_026C = data.frame(chr = 1,
leftPos= c(73, 81, 41, 44),
Means = c(111, 111, 162, -0.7)),
OC_AH_026C.1 = data.frame(chr = 1,
leftPos = c(73, 81, 41, 44),
Means = c(69, 125, 156, 12)))
You can use lapply to "loop" through the elements of the list like this, which calculates the UL and LL of an input (defaults to "leftPos"), additionally, it calculates a binary column (res) which indicates if the Means-value is outside of the confidence-interval:
df_list2 <- lapply(df_list, function(df, alpha, col2) {
# perform all your calculations here
df$LL <- median(df[, col2], na.rm = T) - alpha*IQR(df[, col2], na.rm = T)
df$UL <- median(df[, col2], na.rm = T) + alpha*IQR(df[, col2], na.rm = T)
# -1 if Means < LL,
# 1 if Means > UL
# 0 otherwise, nest the operators
# if you wish to calculate more complex conditions
df$res <- 0 + ((df$Means < df$LL)*(-1)) + ((df$Means > df$UL)*1)
return(df)
}, alpha = 0.95, col2 = "Means")
df_list2
# $OC_AH_026C
# chr leftPos Means LL UL res
# 1 1 73 111.0 72.35875 149.6412 0
# 2 1 81 111.0 72.35875 149.6412 0
# 3 1 41 162.0 72.35875 149.6412 1
# 4 1 44 -0.7 72.35875 149.6412 -1
#
# $OC_AH_026C.1
# chr leftPos Means LL UL res
# 1 1 73 69 22.9 171.1 0
# 2 1 81 125 22.9 171.1 0
# 3 1 41 156 22.9 171.1 0
# 4 1 44 12 22.9 171.1 -1
(I hope I got your question right of what you need, otherwise let me know and I will correct the answer).
data.table way
For the sake of the completeness, I incude a data.table-way, which is faster (but gets rid of the list-structure). The approach looks like this:
library(data.table)
library(magrittr) # for some piping
# combine all listed data.frames to one data.table with another column, which indicates the name
dt <- lapply(1:length(df_list), function(i) {
nam <- names(df_list)[i]
df <- df_list[[i]]
tmpdt <- data.table(name = nam, df)
}) %>% rbindlist
# calculate the limits
alpha = 0.95
dt[, LL := median(Means, na.rm = T) - alpha*IQR(Means, na.rm = T), by = name]
dt[, UL := median(Means, na.rm = T) + alpha*IQR(Means, na.rm = T), by = name]
dt[, res := 0 + ((df$Means < df$LL)*(-1)) + ((df$Means > df$UL)*1)]
I am looking the right R idiom to run a function over a set of parameters and create a long data frame from the results. Imagine that you have the following toy function:
fun <- function(sd, mean, foobar = "foobar") {
list(random = rnorm(10) * sd + mean + 1:10, foobar = foobar)
}
Now you want to run fun over different values of sd and mean:
par_sd <- rep(1:5, 3)
par_mean <- rep(0:2, each = 5)
pars <- data.frame(sd = par_sd, mean = par_mean)
I want to run fun for the parameters in each row of pars, and collect the results in a data frame with columns sd, mean, pos, value. Here is a rather clumsy solution:
set.seed(42)
## Run fun
res <- lapply(seq_len(nrow(pars)), function(x) {
do.call(fun, as.list(pars[x, ]))
})
## Select the result we need
res <- lapply(res, "[[", "random")
## Make it a single data frame
res <- do.call(rbind, res)
## Together with the parameters
res <- as.data.frame(cbind(sd = par_sd, mean = par_mean, res))
colnames(res) <- c("sd", "mean", 1:10)
## Make it a long data frame
res <- reshape2::melt(res, id.vars=c("sd", "mean"),
variable.name = "pos", value.name="value")
## Done
res[1:5,]
#> sd mean pos value
#> 1 1 0 1 2.37095845
#> 2 2 0 1 3.60973931
#> 3 3 0 1 0.08008422
#> 4 4 0 1 2.82180049
#> 5 5 0 1 2.02999300
Is there a simpler way to do this? Anyone knows a package that does things like this? My quick search did not give any good results...
If you're willing to amend fun() to return a data.frame, I find the most elegant solution is plyr's mdply.
fun <- function(sd, mean, foobar = "foobar") {
data.frame(random = rnorm(10) * sd + mean + 1:10, foobar = foobar)
}
par_sd <- rep(1:5, 3)
par_mean <- rep(0:2, each = 5)
pars <- data.frame(sd = par_sd, mean = par_mean)
results = mdply(pars, fun, foobar = "stuff")
str(results)
mapply would seem a good fit:
> str(with(pars, mapply(fun, sd=sd, mean=mean) ) )
List of 30
$ : num [1:10] 3.16 2.28 2.84 1.49 3.43 ...
$ : chr "foobar"
$ : num [1:10] 3.429 0.157 0.583 1.542 6.485 ...
$ : chr "foobar"
$ : num [1:10] -4.56 -1.51 -1.33 7.16 3.21 ...
$ : chr "foobar"
$ : num [1:10] -2.275 2.225 4.196 0.962 15.739 ...
$ : chr "foobar"
$ : num [1:10] 6.23 10.08 2.85 6.81 4.51 ...
$ : chr "foobar"
$ : num [1:10] 1.65 3.15 5.62 5.91 6.14 ...
$ : chr "foobar"
$ : num [1:10] 4.26 1.95 7.33 2.72 6.29 ...
$ : chr "foobar"
$ : num [1:10] 7.53 6.74 3.6 6.43 3.08 ...
$ : chr "foobar"
$ : num [1:10] -0.4181 -0.0584 5.5812 1.038 8.2482 ...
$ : chr "foobar"
$ : num [1:10] 0.2377 4.8557 5.2177 -0.0706 2.0434 ...
$ : chr "foobar"
$ : num [1:10] 2.95 4.3 5.26 8.58 5.81 ...
$ : chr "foobar"
$ : num [1:10] -0.85 4.83 8.19 5.17 6.58 ...
$ : chr "foobar"
$ : num [1:10] 3.59 11.46 6.29 6.57 2.97 ...
$ : chr "foobar"
$ : num [1:10] 0.117 3.142 10.473 10.196 5.56 ...
$ : chr "foobar"
$ : num [1:10] 13.03 2.64 -1.07 5.29 1.97 ...
$ : chr "foobar"
- attr(*, "dim")= int [1:2] 2 15
- attr(*, "dimnames")=List of 2
..$ : chr [1:2] "random" "foobar"
..$ : NULL
By default mapply will attempt to simplify and if you wanted to keep them as separate objects you could negate that default:
> str(with(pars, mapply(fun, sd=sd, mean=mean, SIMPLIFY=FALSE) ) )
List of 15
$ :'data.frame': 10 obs. of 2 variables:
..$ random: num [1:10] 1.08 0.68 3.16 3.38 5.96 ...
..$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1
$ :'data.frame': 10 obs. of 2 variables:
..$ random: num [1:10] 0.0927 5.1506 -1.0109 2.7136 2.1263 ...
..$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1
$ :'data.frame': 10 obs. of 2 variables:
..$ random: num [1:10] -0.331 2.9 -1.705 5.471 4.712 ...
..$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1
snipped
And if you need them in one stacked dataframe, it's just:
> str(do.call( rbind, with(pars, mapply(fun, sd=sd, mean=mean, SIMPLIFY=FALSE) ) ))
'data.frame': 150 obs. of 2 variables:
$ random: num 1 3.34 2.5 4.72 4.25 ...
$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1 ...
If you want these "labeled" with the sd and mean values, just this modification of the constructor function:
fun <- function(sd, mean, foobar = "foobar") {
data.frame(random = rnorm(10) * sd + mean + 1:10,
sd=sd, mean=mean, foobar = foobar)
}
str(do.call( rbind, with(pars,
mapply(fun, sd=sd, mean=mean, SIMPLIFY=FALSE) ) ))
#---------------
'data.frame': 150 obs. of 4 variables:
$ random: num 1.42 1.13 3.73 4.5 5.63 ...
$ sd : int 1 1 1 1 1 1 1 1 1 1 ...
$ mean : int 0 0 0 0 0 0 0 0 0 0 ...
$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1 ...