I am using the xlsx package to write tables to an excel file. I want to use the xlsx package so I can write multiple tabs. xlsx converts the tables into data.frames and in so doing changes the dimensions.
b <- sample( c("bob", "mark", "joanna"), 100, replace=TRUE)
a <- c( sample( 1:5, 100, replace=TRUE) )
a <- data.frame( a , b)
d <- table( a$a , a$b )
e <- data.frame(d)
print (e)
print(d)
see how the dimensions of d are different than e. Is there an easy way to keep the dimensions of d when converting? I looked around in previous questions and didn't see anyone tackle this.
You're probably looking for as.data.frame.matrix:
> as.data.frame.matrix(d)
bob joanna mark
1 1 7 9
2 10 7 4
3 4 6 14
4 6 8 11
5 5 7 1
There are different "methods" that are used when calling data.frame on different types of inputs. Run methods("as.data.frame") to see a list of those. Looking at that list, you would see that there is a specific method for tables. You can view that code by just typing as.data.frame.table. If you treat your table as a matrix, you get the behavior I think you're expecting.
Related
Is there any way to View dataframes in r, while refering to them with another variable? Say I have 10 data frames named df1 to df10, is there a way I can View them while using i instead of 1:10?
Example:
df1 = as.data.frame(c(1:20))
i = 1
View(paste("df", i, sep =""))
I would like this last piece of code to do the same as View(df1). Is there any command or similar in R that allows you to do that?
The answer to your immediate question is get:
df1 <- data.frame(x = 1:5)
df2 <- data.frame(x = 6:10)
> get(paste0("df",1))
x
1 1
2 2
3 3
4 4
5 5
But having multiple similar objects with names like df1, df2, etc in your workspace is considered fairly bad practice in R, and instead experienced R folks will prefer to put related objects in a named list:
df_list <- setNames(list(df1,df2),paste0("df",1:2))
> df_list[[paste0("df",1)]]
x
1 1
2 2
3 3
4 4
5 5
Currently in one excel sheet I have one block of data that begins from row 1 and the last row always varies, but it is usually around 18 or 19. Once the first set of data ends then there are two blank rows and the second data set begins, which is also around 18 or 19. The two data sets have the same number of columns and share the same headers. I save the excel sheet as a csv. Then in R I will do read.csv(), but after I have done that I do not know how to separate the two sets of data into separate data.frames.
I realize I could just copy and paste the second data set into a separate excel sheet and read it in, but I do not want to do that. I want to leave the excel sheet untouched.
Example of the excel sheet:
A B C D # FIRST DATA SET
1 2 3 4
A B C D # SECOND DATA SET
5 6 7 8
Any help will be appreciated and please let me know if more info is needed.
There are probably many ways to archieve what you want. Maybe just read it in using readLines, then determine the indices of the two empty lines and use read.csv on the two subsets:
txt <- readLines(con=textConnection("1,2,3,4
5,6,7,8
a,b,c,d,e
f,g,h,i,j"))
read.csv(header=F, text=txt[1:which.max(txt=="")])
# V1 V2 V3 V4
# 1 1 2 3 4
# 2 5 6 7 8
read.csv(header=F, text=txt[(which.max(txt=="")+2):length(txt)])
# V1 V2 V3 V4 V5
# 1 a b c d e
# 2 f g h i j
With regards to your added toy example:
txt <- readLines(con=textConnection("A B C D #1st
1 2 3 4
A B C D #2nd
5 6 7 8"))
txt <- sub("\\s+#.*$", "", txt) # delete comments if necessary
read.table(header=T, check.names = F, text=txt[1:which.max(txt=="")])
# A B C D
# 1 1 2 3 4
read.table(header=T, check.names = F, text=txt[(which.max(txt=="")+2):length(txt)])
# A B C D
# 1 5 6 7 8
That depends. If you know the row number where the first block ends and second one has no header, you can do
mydata <- read.csv('yourfile.csv', header=TRUE)
block1 <- mydata[1:18,]
block2 <- mydata[19:nrow(mydata)]
If your blocks have different structures, like different number of columns, and each block has its own column names, then it’s better to use readLines() function, and pass the result to read.csv. How do you tell those blocks appart?
In reply to your comment:
Then it’s relatively easy. As Kota Mori pointed out, read your data with blank likes. Assuming your first column has numeric values, and no NAs except in between your data sets,
mydata <- read.table('yourfile.csv', header=TRUE, blank.lines.skip = FALSE)
blines <- which(is.na(mydata[,1]))
data1 <- mydata[1:(blines[1]-1),]
data2 <- mydata[(blines[length(blines)]+1):nrow(mydata),]
you should alter the search pattern depending on your data.
This depends on what data file you have.
If you have two empty rows between the two data, letting blank.lines.skip = FALSE in read.csv() would allow you to locate where to split the data.
First of all i would like to tell that I am new to R programming. I was doing some experiment on some R code. I am facing some strange behaviour that I do not expect. I think some one can help me to figure it out.
I ran the following code to read data from a CSV file:
normData= read.csv("normData.csv");
and my normData looks like:
But When I ran the following code to form a Data Frame:
datExpr0 = as.data.frame(t(normData));
I get the following data:
Can some one please tell me, from where the an extra raw (v1,v2,v3,v4,v5,v6) coming from?
Try using:
setNames(as.data.frame(t(normData[-1])), normData[[1]])
However, it might be better to see if you can use the row.names argument in read.table to directly read your "X" as the row names. Then you should be able to directly use as.data.table(t(...)).
Here's a small example to show what's happening:
Start with a data.frame with characters as the first column:
df <- data.frame(A = letters[1:3],
B = 1:3, C = 4:6)
df
# A B C
# 1 a 1 4
# 2 b 2 5
# 3 c 3 6
When you transpose the entire thing, you also transpose that first column (thereby also creating a character matrix).
as.data.frame(t(df))
# V1 V2 V3
# A a b c
# B 1 2 3
# C 4 5 6
So, we drop the column first, and use the values from the column to replace the "V1", "V2"... names.
setNames(as.data.frame(t(df[-1])), df[[1]])
# a b c
# B 1 2 3
# C 4 5 6
I'm relatively new in R (~3 months), and so I'm just getting the hang of all the different data types. While lists are a super useful way of holding dissimilar data all in one place, they are also extremely inflexible for function calls, and riddle me with angst.
For the work I'm doing, I often uses lists because I need to hold a bunch of vectors of different lengths. For example, I'm tracking performance statistics of about 10,000 different vehicles, and there are certain vehicles which are so similar they can essentially be treated as the same vehicles for certain analyses.
So let's say we have this list of vehicle ID's:
List <- list(a=1, b=c(2,3,4), c=5)
For simplicity's sake.
I want to do two things:
Tell me which element of a list a particular vehicle is in. So when I tell R I'm working with vehicle 2, it should tell me b or [2]. I feel like it should be something simple like how you can do
match(3,b)
> 2
Convert it into a data frame or something similar so that it can be saved as a CSV. Unused rows could be blank or NA. What I've had to do so far is:
for(i in length(List)) {
length(List[[i]]) <- max(as.numeric(as.matrix(summary(List)[,1])))
}
DF <- as.data.frame(List)
Which seems dumb.
For your first question:
which(sapply(List, `%in%`, x = 3))
# b
# 2
For your second question, you could use a function like this one:
list.to.df <- function(arg.list) {
max.len <- max(sapply(arg.list, length))
arg.list <- lapply(arg.list, `length<-`, max.len)
as.data.frame(arg.list)
}
list.to.df(List)
# a b c
# 1 1 2 5
# 2 NA 3 NA
# 3 NA 4 NA
Both of those tasks (and many others) would become much easier if you were to "flatten" your data into a data.frame. Here's one way to do that:
fun <- function(X)
data.frame(element = X, vehicle = List[[X]], stringsAsFactors = FALSE)
df <- do.call(rbind, lapply(names(List), fun))
# element vehicle
# 1 a 1
# 2 b 2
# 3 b 3
# 4 b 4
# 5 c 5
With a data.frame in hand, here's how you could perform your two tasks:
## Task #1
with(df, element[match(3, vehicle)])
# [1] "b"
## Task #2
write.csv(df, file = "outfile.csv")
In previous message
Convert table into matrix by column names
I want to use the same approach for an csv table or an table in R. Could you mind to teach me how to modify the first command line?
x <- read.table(textConnection(' models cores time 4 1 0.000365 4 2 0.000259 4 3 0.000239 4 4 0.000220 8 1 0.000259 8 2 0.000249 8 3 0.000251 8 4 0.000258' ), header=TRUE)
library(reshape) cast(x, models ~ cores)
Should I use following for a data.csv file
x <- read.csv(textConnection("data.csv"), header=TRUE)
Should I use the following for a R table named as xyz
x <- xyz(textConnection(xyz), header=TRUE)
Is it a must to have the textConnection for using cast command?
Thank you.
Several years later...
read.table and its derivatives like read.csv now have a text argument, so you don't need to mess around with textConnections directly anymore.
read.table(text = "
x y z
1 1.9 'a'
2 0.6 'b'
", header = TRUE)
The main use for textConnection is when people who ask questions on SO just dump their data onscreen, rather than writing code to let answerers generate it themselves. For example,
Blah blah blah I'm stuck here is my data plz help omg
x y z
1 1.9 'a'
2 0.6 'b'
etc.
In this case you can copy the text from the screen and wrap it in a call to textConnection, like so:
the_data <- read.table(tc <- textConnection("x y z
1 1.9 'a'
2 0.6 'b'"), header = TRUE); close(tc)
It is much nicer when questioners provide code, like this:
the_data <- data.frame(x = 1:2, b = c(2.9, 0.6), c = letters[1:2])
When you are using you own data, you shouldn't ever need to use textConnection.
my_data <- read.csv("my data file.csv") should suffice.