I am trying to write a table from R into Excel. Here is some sample code:
library(XLConnect)
wb <- loadWorkbook("C:\\Users\\Bob\\Desktop\\Example.xls", create=TRUE)
output <- as.table(output)
createSheet(wb, name="Worksheet 1")
writeWorksheet(wb, output, sheet="Worksheet 1")
saveWorkbook(wb)
But it seems that the writeWorksheet function converts the table into a dataframe. This makes the data look messy and unformatted. I want the table structure to be preserved. How would I modify the above code?
The issue here is that writeWorksheet converts the table object to a data frame. The way that happens is that R will basically "melt" it into long format, whereas a table object is typically printed to the console in "wide" format.
It is a bit of a nuisance, but you generally have to manually convert the table into a data frame that matches the format you're after. An example:
library(reshape2)
tbl <- with(mtcars,table(cyl,gear))
> tbl
gear
cyl 3 4 5
4 1 8 2
6 2 4 1
8 12 0 2
> as.data.frame(tbl)
cyl gear Freq
1 4 3 1
2 6 3 2
3 8 3 12
4 4 4 8
5 6 4 4
6 8 4 0
7 4 5 2
8 6 5 1
9 8 5 2
> tbl_df <- as.data.frame(tbl)
> final <- dcast(tbl_df,cyl~gear,value.var = "Freq")
> final
cyl 3 4 5
1 4 1 8 2
2 6 2 4 1
3 8 12 0 2
> class(final)
[1] "data.frame"
Then you should be able to write that data frame to the Excel worksheet with no problem.
Related
I have a table like this
table(mtcars$gear, mtcars$cyl)
I want to rank the rows by the ones with more observations in the 4 cylinder. E.g.
4 6 8
4 8 4 0
5 2 1 2
3 1 2 12
I have been playing with order/sort/rank without much success. How could I order tables output?
We can convert table to data.frame and then order by the column.
sort_col <- "4"
tab <- as.data.frame.matrix(table(mtcars$gear, mtcars$cyl))
tab[order(-tab[sort_col]), ]
# OR tab[order(tab[sort_col], decreasing = TRUE), ]
# 4 6 8
#4 8 4 0
#5 2 1 2
#3 1 2 12
If we don't want to convert it into data frame and want to maintain the table structure we can do
tab <- table(mtcars$gear, mtcars$cyl)
tab[order(-tab[,dimnames(tab)[[2]] == sort_col]),]
# 4 6 8
# 4 8 4 0
# 5 2 1 2
# 3 1 2 12
Could try this. Use sort for the relevant column, specifying decreasing=TRUE; take the names of the sorted rows and subset using those.
table(mtcars$gear, mtcars$cyl)[names(sort(table(mtcars$gear, mtcars$cyl)[,1], dec=T)), ]
4 6 8
4 8 4 0
5 2 1 2
3 1 2 12
In the same scope as Milan, but using the order() function, instead of looking for names() in a sort()-ed list.
The [,1] is to look at the first column when ordering.
table(mtcars$gear, mtcars$cyl)[order(table(mtcars$gear, mtcars$cyl)[,1], decreasing=T),]
Lets say I have a data frame with the following structure:
> DF <- data.frame(x=1:5, y=6:10)
> DF
x y
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
I need to build a new data frame with overlapping observations from the first data frame to be used as an input for building the A matrix for the Rglpk optimization library. I would use n-length observation windows, so that if n=2 the resulting data frame would join rows 1&2, 2&3, 3&4, and so on. The length of the resulting data frame would be
(numberOfObservations-windowSize+1)*windowSize
The result for this example with windowSize=2 would be a structure like
x y
1 1 6
2 2 7
3 2 7
4 3 8
5 3 8
6 4 9
7 4 9
8 5 10
I could do a loop like
DFResult <- NULL
numBlocks <- nrow(DF)-windowSize+1
for (i in 1:numBlocks) {
DFResult <- rbind(DFResult, DF[i:(i+horizon-1), ])
}
But this seems vey inefficient, especially for very large data frames.
I also tried
rollapply(data=DF, width=windowSize, FUN=function(x) x, by.column=FALSE, by=1)
x y
[1,] 1 6
[2,] 2 7
[3,] 2 7
[4,] 3 8
where I was trying to repeat a block of rows without applying any aggregate function. This does not work since I am missing some rows
I am a bit stumped by this and have looked around for similar problems but could not find any. Does anyone have any better ideas?
We could do a vectorized approach
i1 <- seq_len(nrow(DF))
res <- DF[c(rbind(i1[-length(i1)], i1[-1])),]
row.names(res) <- NULL
res
# x y
#1 1 6
#2 2 7
#3 2 7
#4 3 8
#5 3 8
#6 4 9
#7 4 9
#8 5 10
I am very new to R, still getting my head around so my question can be very basic but please help me out!
I have a large data frame, with more than 400000 rows.
GENE_ID p1 p2 p3 ...
41 1 2 3
41 4 5 6
41 7 8 9
85 1 2 3
1923 1 2 3
1923 4 5 6
First, I wanted to simply name the GENE_ID as the row name, but due to some gene IDs not unique, I failed.
Now I am thinking of making this data frame into a list each object contains expression level of a gene.
So what I want is a list that has outcome something like,
mylist$41
[1] 1 2 3 4 5 6 7 8 9
mylist$85
[1] 1 2 3
mylist$1923
[1] 1 2 3 4 5 6
Any advice to achieve this would be greatly appreciated.
We can do a melt by 'GENE_ID' and then do the split to get a list of vectors
library(reshape2)
mylist <- melt(df1, id.var = 'GENE_ID')
split(mylist$value, mylist$GENE_ID)
#$`41`
#[1] 1 4 7 2 5 8 3 6 9
#$`85`
#[1] 1 2 3
#$`1923`
#[1] 1 4 2 5 3 6
Also, we can do this in base R
v1 <- unlist(df1[-1], use.names = FALSE)
grp <- rep(df1[,1], ncol(df1[-1]))
split(v1, grp)
in R I need to create a vector from a given data frame row wise. for example:
data frame:
A B
1 4
2 5
3 6
vector = 1 4 2 5 3 6
couldn't be so hard i think. Thanks so far.
> df <- data.frame(A=1:3,B=4:6)
> as.vector(t(df))
[1] 1 4 2 5 3 6
I have a dataset stored in a text file in the format of bins of values followed by counts, like this:
var_a 1:5 5:12 7:9 9:14 ...
indicating that var_a took on the value 1 5 times in the dataset, 5 12 times, etc. Each variable is on its own line in that format.
I'd like to be able to perform calculations on this dataset in R, like quantiles, variance, and so on. Is there an easy way to load the data from the file and calculate these statistics? Ultimately I'd like to make a box-and-whisker plot for each variable.
Cheers!
You could use readLines to read in the data file
.x <- readLines(datafile)
I will create some dummy data, as I don't have the file. This should be the equivalent of the output of readLines
## dummy
.x <- c("var_a 1:5 5:12 7:9 9:14", 'var_b 1:5 2:12 3:9 4:14')
I split by spacing to get each
#split by space
space_split <- strsplit(.x, ' ')
# get the variable names (first in each list)
variable_names <- lapply(space_split,'[[',1)
# get the variable contents (everything but the first element in each list)
variable_contents <- lapply(space_split,'[',-1)
# a function to do the appropriate replicates
do_rep <- function(x){rep.int(x[1],x[2])}
# recreate the variables
variables <- lapply(variable_contents, function(x){
.list <- strsplit(x, ':')
unlist(lapply(lapply(.list, as.numeric), do_rep))
})
names(variables) <- variable_names
you could get the variance for each variable using
lapply(variables, var)
## $var_a
## [1] 6.848718
##
## $var_b
## [1] 1.138462
or get boxplots
boxplot(variables, ~.)
Not knowing the actual form that your data is in, I would probably use something like readLines to get each line in as a vector, then do something like the following:
# Some sample data
temp = c("var_a 1:5 5:12 7:9 9:14",
"var_b 1:7 4:9 3:11 2:10",
"var_c 2:5 5:14 6:6 3:14")
# Extract the names
NAMES = gsub("[0-9: ]", "", temp)
# Extract the data
temp_1 = strsplit(temp, " |:")
temp_1 = lapply(temp_1, function(x) as.numeric(x[-1]))
# "Expand" the data
temp_1 = lapply(1:length(temp_1),
function(x) rep(temp_1[[x]][seq(1, length(temp_1[[x]]), by=2)],
temp_1[[x]][seq(2, length(temp_1[[x]]), by=2)]))
names(temp_1) = NAMES
temp_1
# $var_a
# [1] 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 5 5 7 7 7 7 7 7 7 7 7 9 9 9 9 9 9 9 9 9 9 9 9 9 9
#
# $var_b
# [1] 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2
#
# $var_c
# [1] 2 2 2 2 2 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 3 3 3 3 3 3 3 3 3 3 3 3 3 3