R: load R table or csv file with the textconnection command - r

In previous message
Convert table into matrix by column names
I want to use the same approach for an csv table or an table in R. Could you mind to teach me how to modify the first command line?
x <- read.table(textConnection(' models cores time 4 1 0.000365 4 2 0.000259 4 3 0.000239 4 4 0.000220 8 1 0.000259 8 2 0.000249 8 3 0.000251 8 4 0.000258' ), header=TRUE)
library(reshape) cast(x, models ~ cores)
Should I use following for a data.csv file
x <- read.csv(textConnection("data.csv"), header=TRUE)
Should I use the following for a R table named as xyz
x <- xyz(textConnection(xyz), header=TRUE)
Is it a must to have the textConnection for using cast command?
Thank you.

Several years later...
read.table and its derivatives like read.csv now have a text argument, so you don't need to mess around with textConnections directly anymore.
read.table(text = "
x y z
1 1.9 'a'
2 0.6 'b'
", header = TRUE)
The main use for textConnection is when people who ask questions on SO just dump their data onscreen, rather than writing code to let answerers generate it themselves. For example,
Blah blah blah I'm stuck here is my data plz help omg
x y z
1 1.9 'a'
2 0.6 'b'
etc.
In this case you can copy the text from the screen and wrap it in a call to textConnection, like so:
the_data <- read.table(tc <- textConnection("x y z
1 1.9 'a'
2 0.6 'b'"), header = TRUE); close(tc)
It is much nicer when questioners provide code, like this:
the_data <- data.frame(x = 1:2, b = c(2.9, 0.6), c = letters[1:2])
When you are using you own data, you shouldn't ever need to use textConnection.
my_data <- read.csv("my data file.csv") should suffice.

Related

View dataframes by pasting its name in r

Is there any way to View dataframes in r, while refering to them with another variable? Say I have 10 data frames named df1 to df10, is there a way I can View them while using i instead of 1:10?
Example:
df1 = as.data.frame(c(1:20))
i = 1
View(paste("df", i, sep =""))
I would like this last piece of code to do the same as View(df1). Is there any command or similar in R that allows you to do that?
The answer to your immediate question is get:
df1 <- data.frame(x = 1:5)
df2 <- data.frame(x = 6:10)
> get(paste0("df",1))
x
1 1
2 2
3 3
4 4
5 5
But having multiple similar objects with names like df1, df2, etc in your workspace is considered fairly bad practice in R, and instead experienced R folks will prefer to put related objects in a named list:
df_list <- setNames(list(df1,df2),paste0("df",1:2))
> df_list[[paste0("df",1)]]
x
1 1
2 2
3 3
4 4
5 5

How to import two sets of data in the same excel sheet in R?

Currently in one excel sheet I have one block of data that begins from row 1 and the last row always varies, but it is usually around 18 or 19. Once the first set of data ends then there are two blank rows and the second data set begins, which is also around 18 or 19. The two data sets have the same number of columns and share the same headers. I save the excel sheet as a csv. Then in R I will do read.csv(), but after I have done that I do not know how to separate the two sets of data into separate data.frames.
I realize I could just copy and paste the second data set into a separate excel sheet and read it in, but I do not want to do that. I want to leave the excel sheet untouched.
Example of the excel sheet:
A B C D # FIRST DATA SET
1 2 3 4
A B C D # SECOND DATA SET
5 6 7 8
Any help will be appreciated and please let me know if more info is needed.
There are probably many ways to archieve what you want. Maybe just read it in using readLines, then determine the indices of the two empty lines and use read.csv on the two subsets:
txt <- readLines(con=textConnection("1,2,3,4
5,6,7,8
a,b,c,d,e
f,g,h,i,j"))
read.csv(header=F, text=txt[1:which.max(txt=="")])
# V1 V2 V3 V4
# 1 1 2 3 4
# 2 5 6 7 8
read.csv(header=F, text=txt[(which.max(txt=="")+2):length(txt)])
# V1 V2 V3 V4 V5
# 1 a b c d e
# 2 f g h i j
With regards to your added toy example:
txt <- readLines(con=textConnection("A B C D #1st
1 2 3 4
A B C D #2nd
5 6 7 8"))
txt <- sub("\\s+#.*$", "", txt) # delete comments if necessary
read.table(header=T, check.names = F, text=txt[1:which.max(txt=="")])
# A B C D
# 1 1 2 3 4
read.table(header=T, check.names = F, text=txt[(which.max(txt=="")+2):length(txt)])
# A B C D
# 1 5 6 7 8
That depends. If you know the row number where the first block ends and second one has no header, you can do
mydata <- read.csv('yourfile.csv', header=TRUE)
block1 <- mydata[1:18,]
block2 <- mydata[19:nrow(mydata)]
If your blocks have different structures, like different number of columns, and each block has its own column names, then it’s better to use readLines() function, and pass the result to read.csv. How do you tell those blocks appart?
In reply to your comment:
Then it’s relatively easy. As Kota Mori pointed out, read your data with blank likes. Assuming your first column has numeric values, and no NAs except in between your data sets,
mydata <- read.table('yourfile.csv', header=TRUE, blank.lines.skip = FALSE)
blines <- which(is.na(mydata[,1]))
data1 <- mydata[1:(blines[1]-1),]
data2 <- mydata[(blines[length(blines)]+1):nrow(mydata),]
you should alter the search pattern depending on your data.
This depends on what data file you have.
If you have two empty rows between the two data, letting blank.lines.skip = FALSE in read.csv() would allow you to locate where to split the data.

keep dimensions of table when converting to a data.frame

I am using the xlsx package to write tables to an excel file. I want to use the xlsx package so I can write multiple tabs. xlsx converts the tables into data.frames and in so doing changes the dimensions.
b <- sample( c("bob", "mark", "joanna"), 100, replace=TRUE)
a <- c( sample( 1:5, 100, replace=TRUE) )
a <- data.frame( a , b)
d <- table( a$a , a$b )
e <- data.frame(d)
print (e)
print(d)
see how the dimensions of d are different than e. Is there an easy way to keep the dimensions of d when converting? I looked around in previous questions and didn't see anyone tackle this.
You're probably looking for as.data.frame.matrix:
> as.data.frame.matrix(d)
bob joanna mark
1 1 7 9
2 10 7 4
3 4 6 14
4 6 8 11
5 5 7 1
There are different "methods" that are used when calling data.frame on different types of inputs. Run methods("as.data.frame") to see a list of those. Looking at that list, you would see that there is a specific method for tables. You can view that code by just typing as.data.frame.table. If you treat your table as a matrix, you get the behavior I think you're expecting.

Assigning one variable to another based on (and within) macro

I am using RStudio 0.98.1062.
What I am trying to do is within a macro to create a new variable based on another one (that already has a suffix defined by me) in the same dataframe . The name of the data frame, and the index(suffix) are macro variables.
Here is my code:
read_data <- defmacro(fileName, monthIndex, dfName,
expr = {
dfName <- read.table(fileName, head=TRUE,sep = ",")
#add suffix vor the variables for the corresponding month
colnames(dfName) <- paste(colnames(dfName),monthIndex, sep = "_")
#dfName["EasyClientMerge"]<-numeric()
within(dfName, assign("EasyClientMerge", paste("dfName$EasyClientNumber",monthIndex,sep="_"))
})
if the macro parameters are (..., monthIndex=6, dfName= m201309) I expect the following variable to be created
m201309$EasyClientMerge<-m201309$EasyClient_6
first of all a new variable is not created within the data frame and second of all it seems that a string is taken "m201309$EasyClient_6" rather than reference to dataframe & variable name
Thanks a lot in advance cause I am kind of stuck!
If you really insist on producing hard coded data.frames within a function (in my opinion a bad choice), you can do it like so.
> dfName <- "new.df"
> assign(dfName, value = list(clientMerge = 1:10, clientMerge2 = 1:10))
> as.data.frame(new.df)
clientMerge clientMerge2
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10

Why as.data.frame doing this in R programming?

First of all i would like to tell that I am new to R programming. I was doing some experiment on some R code. I am facing some strange behaviour that I do not expect. I think some one can help me to figure it out.
I ran the following code to read data from a CSV file:
normData= read.csv("normData.csv");
and my normData looks like:
But When I ran the following code to form a Data Frame:
datExpr0 = as.data.frame(t(normData));
I get the following data:
Can some one please tell me, from where the an extra raw (v1,v2,v3,v4,v5,v6) coming from?
Try using:
setNames(as.data.frame(t(normData[-1])), normData[[1]])
However, it might be better to see if you can use the row.names argument in read.table to directly read your "X" as the row names. Then you should be able to directly use as.data.table(t(...)).
Here's a small example to show what's happening:
Start with a data.frame with characters as the first column:
df <- data.frame(A = letters[1:3],
B = 1:3, C = 4:6)
df
# A B C
# 1 a 1 4
# 2 b 2 5
# 3 c 3 6
When you transpose the entire thing, you also transpose that first column (thereby also creating a character matrix).
as.data.frame(t(df))
# V1 V2 V3
# A a b c
# B 1 2 3
# C 4 5 6
So, we drop the column first, and use the values from the column to replace the "V1", "V2"... names.
setNames(as.data.frame(t(df[-1])), df[[1]])
# a b c
# B 1 2 3
# C 4 5 6

Resources