I have 4 columns and 34 rows of data. I tried to export it into excel with xlsx format using write.xlsx. But when I convert it, the excel file only shows 1 data.
library(openxlsx)
data = scale(DATA2)
write.xlsx(data, "outpu2t.xlsx");
This is my data
and this is the output
The key consideration here is that the output of the scale() function is an object of type matrix() when write.xlsx() requires an input of type data.frame(). The following code creates a data frame, uses scale() to scale it, and prints the structure to show that the data frame has bene converted to a matrix().
df <- data.frame(matrix(runif(4 * 34),ncol=4))
str(df)
> df <- data.frame(matrix(runif(4 * 34),ncol=4))
> str(df)
'data.frame': 34 obs. of 4 variables:
$ X1: num 0.438 0.134 0.671 0.392 0.613 ...
$ X2: num 0.9 0.793 0.668 0.351 0.275 ...
$ X3: num 0.201 0.892 0.74 0.788 0.14 ...
$ X4: num 0.996 0.619 0.492 0.904 0.615 ...
scaledData <- scale(df)
str(scaledData)
> scaledData <- scale(df)
> str(scaledData)
num [1:34, 1:4] -0.174 -1.386 0.752 -0.36 0.521 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:4] "X1" "X2" "X3" "X4"
- attr(*, "scaled:center")= Named num [1:4] 0.482 0.591 0.508 0.471
..- attr(*, "names")= chr [1:4] "X1" "X2" "X3" "X4"
- attr(*, "scaled:scale")= Named num [1:4] 0.251 0.206 0.306 0.29
..- attr(*, "names")= chr [1:4] "X1" "X2" "X3" "X4"
We can solve the problem by casting the output of scale() with data.frame().
The following code generates a 4 x 34 matrix, scales it, and casts to a data.frame() as part of write.xlsx().
aMatrix <- matrix(runif(4 * 34),ncol=4)
library(openxlsx)
write.xlsx(data.frame(scale(aMatrix)),"./data/aSpreadsheet.xlsx")
The resulting spreadsheet looks like this when viewed in Microsoft Excel.
Note that writexl::write_xlsx() will also fail when passed an input of type matrix(), so this is not a tidyverse vs. openxlsx problem.
b <- scale(aMatrix)
write_xlsx(b,"./data/aSpreadsheetWritexl.xlsx")
...generates the following error:
> write_xlsx(b,"./data/aSpreadsheetWritexl.xlsx")
Error in write_xlsx(b, "./data/aSpreadsheetWritexl.xlsx") :
Argument x must be a data frame or list of data frames
I am cognisant that you asked this question pointing to the package {openxlsx}. I used this a while back as well and ran into multiple problems. Being biased and leaning towards the {tidyverse} family, there is a cool package that comes from that part of the R/RStudio ecosystem: {writexl}.
If not yet installed: install.packages("writexl")
Then run the following without pain ... and it does not require to install other fancy stuff/dependencies/etc:
library(writexl)
# create a reproducible data set of 34 rows
my_data <- iris[1:34,]
# write-out my_data to the data subfolder in the project - configure as appropriate for your environment
write_xlsx(x = my_data, path = "./data/my_data.xlsx")
This gets you without problems:
The solution is to convert matrix data into a data frame.
data <- as.data.frame(data)
Then,
write.xlsx(data, "outpu2t.xlsx")
will work as expected
Related
I'm trying to compute a phenotypic covariance matrix between a fatty acid dataset and a phylogenetic tree using the Rphylopars package.
I'm able to load the data set and phylogeny; however, when I attempt to run the test I get the error message
Error in class(tree) <- "phylo" : attempt to set an attribute on NULL"
This is the code for the test
phy <- read.tree("combined_trees.txt")
plot(phy)
phy$tip.label
FA_data <- read.csv("fatty_acid_example_data.csv", header = TRUE, na.strings = ".")
head(FA_data)
str(FA_data)
PPE <- phylopars(trait_data = FA_data$fatty1_continuous, tree = FA_data$phy)
Not sure what other info will help figure out the issue. The data set and phylogeny loaded without an error.
In the tutorial, the tree and trait data are jointly simulated by the simtraits() function, so both end up as elements of a single list. In your case (which will be typical of real-data cases), the tree and the trait data come from different sources, so most likely you want
PPE <- phylopars(trait_data = FA_data, tree = phy)
provided that FA_data contains a first column species matching the tip names in phy, and otherwise only the numeric data you want to use (potentially only the single fatty_acid1 column).
For comparison, the data structure returned by simtraits() looks like this (using str()):
List of 4
$ trait_data:'data.frame': 45 obs. of 5 variables:
..$ species: chr [1:45] "t7" "t8" "t2" "t3" ...
..$ V1 : num [1:45] 1.338 0.308 1.739 2.009 2.903 ...
..$ V2 : num [1:45] -2.002 -0.115 -0.349 -4.452 NA ...
..$ V3 : num [1:45] -1.74 NA 1.09 -2.54 -1.19 ...
..$ V4 : num [1:45] 2.496 2.712 1.198 1.675 -0.117 ...
$ tree :List of 4
..$ edge : int [1:28, 1:2] 29 29 28 28 27 27 26 26 25 25 ...
..$ edge.length: num [1:28] 0.0941 0.0941 0.6233 0.7174 0.0527 ...
..$ Nnode : int 14
..$ tip.label : chr [1:15] "t7" "t8" "t2" "t3" ...
..- attr(*, "class")= chr "phylo"
..- attr(*, "order")= chr "postorder"
...
you can see that simtraits() returns a list containing (among other things) (1) a data frame with species as the first column and the other columns numeric and (2) a phylogenetic tree.
You
I would like to use rollapply or rollapplyr to apply the modwt function to my time series data.
I'm familiar with how rollapply/r works but I need some help setting up the output so that I can correctly store my results when using rollapply.
The modwt function in the waveslim package takes a time series and decomposes it into J levels, for my particular problem J = 4 which means I will have 4 sets of coefficients from my single time series stored in a list of 5. Of this list I am only concerned with d1,d2,d3 & d4.
The output of the modwt function looks as follows
> str(ar1.modwt)
List of 5
$ d1: num [1:200] -0.223 -0.12 0.438 -0.275 0.21 ...
$ d2: num [1:200] 0.1848 -0.4699 -1.183 -0.9698 -0.0937 ...
$ d3: num [1:200] 0.5912 0.6997 0.5416 0.0742 -0.4989 ...
$ d4: num [1:200] 1.78 1.86 1.85 1.78 1.65 ...
$ s4: num [1:200] 4.64 4.42 4.19 3.94 3.71 ...
- attr(*, "class")= chr "modwt"
- attr(*, "wavelet")= chr "la8"
- attr(*, "boundary")= chr "periodic"
In the example above I have applied the modwt function to the full length time series of length 200 but I wish to apply it to a small rolling window of 30 using rollapply.
I have already tried the following but the output is a large matrix and I cannot easily identify which values belong to d1,d2,d3 or d4
roller <- rollapplyr(ar1, 30,FUN=modwt,wf="la8",n.levels=4,boundary="periodic")
The output of this is a large matrix with the following structure:
> str(roller)
List of 855
$ : num [1:30] 0.117 -0.138 0.199 -1.267 1.872 ...
$ : num [1:30] -0.171 0.453 -0.504 -0.189 0.849 ...
$ : num [1:30] 0.438 -0.3868 0.1618 -0.0973 -0.0247 ...
$ : num [1:30] -0.418 0.407 0.639 -2.013 1.349 ...
...lots of rows omitted...
$ : num [1:30] 0.307 -0.658 -0.105 1.128 -0.978 ...
[list output truncated]
- attr(*, "dim")= int [1:2] 171 5
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] "d1" "d2" "d3" "d4" ...
How can I set up a variable such that it will store the (200-30)+1 lists with lists within this for each of the scales d1,d2,d3 and d4?
For a reproducible example please use the following:
library(waveslim)
data(ar1)
ar1.modwt <- modwt(ar1, "la8", 4)
Define modwt2 which invokes modwt, takes the first 4 components and strings them out into a numeric vector. Then use rollapplyr with that giving rollr where each row of rollr is the result of one call to modwt2. Finally, reshape each row of rollr into a separate matrix and create a list, L, of those matrices:
modwt2 <- function(...) unlist(head(modwt(...), 4))
rollr <- rollapplyr(ar1, 30, FUN = modwt2, wf = "la8", n.levels = 4, boundary = "periodic")
L <- lapply(1:nrow(rollr), function(i) matrix(rollr[i,], , 4))
If a 30 x 4 x 171 array is desired then the following will simplify it into a 3d array:
simplify2array(L)
or as a list of lists:
lapply(L, function(x) as.list(as.data.frame(x)))
2) This is an alternate solution that just uses lapply directly and returns a list each of whose components is the list consisting of d1, d2, d3 and d4.
lapply(1:(200-30+1), function(i, ...) head(modwt(ar1[seq(i, length = 30)], ...), 4),
wf = "la8", n.levels = 4, boundary = "periodic")
Updates: Code improvements, expand (1) and add (2).
I'm splitting a dataframe in multiple dataframes using the command
data <- apply(data, 2, function(x) data.frame(sort(x, decreasing=F)))
I don't know how to access them, I know I can access them using df$1 but I have to do that for every dataframe,
df1<- head(data$`1`,k)
df2<- head(data$`2`,k)
can I get these dataframes in one go (like storing them in some form) however the indexes of these multiple dataframes shouldn't change.
str(data) gives
List of 2
$ 7:'data.frame': 7 obs. of 1 variable:
..$ sort.x..decreasing...F.: num [1:7] 0.265 0.332 0.458 0.51 0.52 ...
$ 8:'data.frame': 7 obs. of 1 variable:
..$ sort.x..decreasing...F.: num [1:7] 0.173 0.224 0.412 0.424 0.5 ...
str(data[1:2])
List of 2
$ 7:'data.frame': 7 obs. of 1 variable:
..$ sort.x..decreasing...F.: num [1:7] 0.265 0.332 0.458 0.51 0.52 ...
$ 8:'data.frame': 7 obs. of 1 variable:
..$ sort.x..decreasing...F.: num [1:7] 0.173 0.224 0.412 0.424 0.5 ...
Thanks to #r2evans I got it done, here is his code from the comments
Yes. Two short demos: lapply(data, head, n=2), or more generically
sapply(data, function(df) mean(df$x)). – r2evans
and after that fetching the indexes
df<-lapply(df, rownames)
I've created this data frame and want to access the individual elements for plotting. But it seems I can't. What kind of data frame did I have created and how can I access its individual elements?
> print(df)
B.mean B.conf1 B.conf2
1 0.75000000 -0.18826132 1.68826132
2 0.66666667 0.01334534 1.31998799
3 0.33333333 -0.31998799 0.98665466
> names(df)
[1] "B"
> struct(df)
'data.frame': 3 obs. of 1 variable:
$ B: num [1:3, 1:3] 0.75 0.6667 0.3333 -0.1883 0.0133 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "mean" "conf1" "conf2"
The 'B' column is a matrix as evident from the str of 'df'. By using do.call with data.frame, it gets converted to 3 columns of a data.frame.
do.call(data.frame, df)
I would like to create a for-loop in order to generate several tables called S1, S2...
For the moment, this is what I have:
S1<-int[int$IdVar == "X1",1:n]
S2<-int[int$IdVar == "X2",1:n]
S3<-int[int$IdVar == "X3",1:n]
S4<-int[int$IdVar == "X4",1:n]
S5<-int[int$IdVar == "X5",1:n]
S6<-int[int$IdVar == "X6",1:n]
S7<-int[int$IdVar == "X7",1:n]
But I can have more or less factors for IdVar variable. I have to add or suppress lines...which is not very efficient!
Could you help me please to find the best way to create my loop ?
I hope I have made this sufficiently clear.
Thank you very much for your help,
We could loop the "X1:X7" values using lapply and subset the 'int' dataset. Change the names of the list elements as S1:S7
lst <- setNames(lapply(paste0('X', 1:7), function(x)
int[int$IdVar==x, 1:n]), paste0("S", 1:7))
It would be better to keep the datasets within the list rather than creating individual objects in the global environment. If you want to do this, then list2env can be used.
list2env(lst, envir=.GlobalEnv)
Another option is using assign with a for loop.
A sillier but nonetheless effective solution:
Get R to automatically write a script to assign the variables and then source that script.
for(s in 1:7){
write(paste0("S", s, "<-int[int$IdVar == \"X", s, "\",1:n])", "tmp.R", append = T)
}
source("tmp.R")
file.remove("tmp.R")
Such an approach I also find useful for referring back to variables that you've created in an automated manner (as you are doing). You can use assign to create variables, but not to refer back to them afterwards. (If anyone knows another way to do that I'd be interested).
Consider using a list instead of creating global variables. 'split' will do what you want, and if you don't like the names "X1", you can change then to "S1" easily. This will also handle all the values in 'IdVar' without you having to specify them:
> n <- 100
> int <- data.frame(IdVar = sample(paste0("X", 1:7), n, TRUE)
+ , val = runif(n)
+ , stringsAsFactors = FALSE
+ )
> # split into a list
> S <- split(int, int$IdVar)
>
> str(S)
List of 7
$ X1:'data.frame': 13 obs. of 2 variables:
..$ IdVar: chr [1:13] "X1" "X1" "X1" "X1" ...
..$ val : num [1:13] 0.104 0.515 0.135 0.501 0.94 ...
$ X2:'data.frame': 18 obs. of 2 variables:
..$ IdVar: chr [1:18] "X2" "X2" "X2" "X2" ...
..$ val : num [1:18] 0.7697 0.6108 0.9354 0.2199 0.0235 ...
$ X3:'data.frame': 16 obs. of 2 variables:
..$ IdVar: chr [1:16] "X3" "X3" "X3" "X3" ...
..$ val : num [1:16] 0.758 0.347 0.48 0.781 0.157 ...
$ X4:'data.frame': 11 obs. of 2 variables:
..$ IdVar: chr [1:11] "X4" "X4" "X4" "X4" ...
..$ val : num [1:11] 0.658 0.247 0.515 0.731 0.114 ...
$ X5:'data.frame': 15 obs. of 2 variables:
..$ IdVar: chr [1:15] "X5" "X5" "X5" "X5" ...
..$ val : num [1:15] 0.502 0.71 0.394 0.738 0.147 ...
$ X6:'data.frame': 14 obs. of 2 variables:
..$ IdVar: chr [1:14] "X6" "X6" "X6" "X6" ...
..$ val : num [1:14] 0.687 0.625 0.705 0.468 0.382 ...
$ X7:'data.frame': 13 obs. of 2 variables:
..$ IdVar: chr [1:13] "X7" "X7" "X7" "X7" ...
..$ val : num [1:13] 0.903 0.466 0.558 0.799 0.527 ...