reading row names in read_csv2 (readr package) - r

I am trying to load an example dataset from here: http://www.agrocampus-ouest.fr/math/RforStat/decathlon.csv to run an example PCA.
The correctly loaded data frame can be replicated with this line of code:
decathlon = read.csv('http://www.agrocampus-ouest.fr/math/RforStat/decathlon.csv',
header = TRUE, row.names = 1, check.names = FALSE,
dec = '.', sep = ';')
However, I was wondering if this can be simulated with function(s) from readr package. Suitable function for this seems to be read_csv2, however, the row.names command is not available:
dplyrtlon = read_csv2('http://www.agrocampus-ouest.fr/math/RforStat/decathlon.csv',
col_names = TRUE, col_types = NULL, skip = 0)
Any suggestion on how to do this within readr?

readr returns tibbles instead of data frames. Tibbles are much faster and memory efficient than data frames but do not support row names.
Depending on what you want to do with your data after reading it in, you could either, add a column name to the first column (it looks like last names):
dplyrtlon <- read_csv2('http://www.agrocampus-ouest.fr/math/RforStat/decathlon.csv',
col_types = NULL, skip = 0)
names(dplyrtlon)[1] <- "last_name"
or you could convert the variable to a data frame, and use the content of the first column to set up row names:
r <- as.data.frame(dplyrtlon)
rownames(r) <- r[, 1]
r <- r[, -1]

Related

I have a problem with write_excel_csv in splitting data frame and save the output

I have a dataframe with the following columns (Branch, Region, Sales, Stock)
that I need to split depending on the values on the region column so I generate data frames (a dataframe for each region).
I have used this code
lapply(names(s), function(nm)
write_excel_csv(s, "G:/19011/"+paste(nm,",",collapse = null)+".csv" , col_names = FALSE, quote = "none", append = FALSE)
but it doesn't generate the files
Note:(it must be with write_excel_csv) because the other function of write.csv and similar ones can't encode the arabic language
Using gapminder::gapminder as example dataset and purrr::iwalk this can been achieved like so:
s <- gapminder::gapminder
s <- split(s, s$continent)
purrr::iwalk(s, ~ readr::write_excel_csv(.x, paste0("G:/19011/", .y, ".csv") , col_names = FALSE, quote = "none", append = FALSE))

Creating text files from column in data frame in R without for loop

I am trying to create individual text files from columns in a data-frame using dplyr and the and the map function from the purrr package so that I do not have to create a for loop and can use the the existing column names as the file name for the new txt file.
Here is the dataframe:
n = c(2, 3, 5)
s = c("aa", "bb", "cc")
b = c(TRUE, FALSE, TRUE)
df = data.frame(n, s, b)
Then I created this function:
textfilecreate <- function(filename){
filename1 <- noquote(names(filename))
colunmname <- select(filename, filename1)
myfile <- paste0( "_", colunmname, ".txt")
write.table(colunmname, file = myfile, sep = "", row.names = FALSE,
col.names = FALSE, quote = FALSE, append = FALSE)
}
Then I called the map function:
map(data_link, textfilecreate)
I got this error:
Error in noquote(names(filename)) : attempt to set an attribute on NULL
I know that I am missing something but I cannot quite pinpoint what.
Thanks in advance.
One of the difficulties here is that map loops through each column one at a time, so you end up working on a vector of values instead of data.frame. This leads to the problems you were having with noquote.
However, you don't need to do any select-ing here, as map will loop through and return each column. The remaining issue is how to get the names for the file names.
One alternative is to loop through the dataset and the column names simultaneously, creating the file name with the names and using each column as the file to save. I use walk2 instead of map2 to loop through two lists simultaneously as it doesn't create a new list.
Two argument function:
textfilecreate = function(filename, name){
myfile = paste0( "_", name, ".txt")
write.table(filename, file = myfile, sep = "", row.names = FALSE,
col.names = FALSE, quote = FALSE, append = FALSE)
}
Now loop through the dataset and the column names via walk2. The first list is used as the first argument and the second list as the second argument by default.
walk2(df, names(df), textfilecreate)
You can simply use lapply like this:
lapply(names(df), function(colname) write.table(df[,colname],file=paste0(colname,'.txt')))

R: Dynamically create a variable name

I'm looking to create multiple data frames using a for loop and then stitch them together with merge().
I'm able to create my data frames using assign(paste(), blah). But then, in the same for loop, I need to delete the first column of each of these data frames.
Here's the relevant bits of my code:
for (j in 1:3)
{
#This is to create each data frame
#This works
assign(paste(platform, j, "df", sep = "_"), read.csv(file = paste(masterfilename, extension, sep = "."), header = FALSE, skip = 1, nrows = 100))
#This is to delete first column
#This does not work
assign(paste(platform, j, "df$V1", sep = "_"), NULL)
}
In the first situation I'm assigning my variables to a data frame, so they inherit that type. But in the second situation, I'm assigning it to NULL.
Does anyone have any suggestions on how I can work this out? Also, is there a more elegant solution than assign(), which seems to bog down my code? Thanks,
n.i.
assign can be used to build variable names, but "name$V1" isn't a variable name. The $ is an operator in R so you're trying to build a function call and you can't do that with assign. In fact, in this case it's best to avoid assign completely. You con't need to create a bunch of different variables. If you data.frames are related, just keep them in a list.
mydfs <- lapply(1:3, function(j) {
df<- read.csv(file = paste(masterfilename, extension, sep = "."),
header = FALSE, skip = 1, nrows = 100))
df$V1<-NULL
df
})
Now you can access them with mydfs[[1]], mydfs[[2]], etc. And you can run functions overall data.sets with any of the *apply family of functions.
As #joran pointed out in his comment, the proper way of doing this would be using a list. But if you want to stick to assign you can replace your second statement with
assign(paste(platform, j, "df", sep = "_"),
get(paste(platform, j, "df", sep = "_"))[
2:length(get(paste(platform, j, "df", sep = "_")))]
If you wanted to use a list instead, your code to read the data frames would look like
dfs <- replicate(3,
read.csv(file = paste(masterfilename, extension, sep = "."),
header = FALSE, skip = 1, nrows = 100), simplify = FALSE)
Note you can use replicate because your call to read.csv does not depend on j in the loop. Then you can remove the first column of each
dfs <- lapply(dfs, function(d) d[-1])
Or, combining everything in one command
dfs <- replicate(3,
read.csv(file = paste(masterfilename, extension, sep = "."),
header = FALSE, skip = 1, nrows = 100)[-1], simplify = FALSE)

using column names when appending data in write.table

I am looping through some data, and appending it to csv file. What I want is to have column names on the top of the file once, and then as it loops to not repeat column names in the middle of file.
If I do col.names=T, it repeats including column names for each new loop. If I have col.names=F, there are no column names at all.
How do I do this most efficiently? I feel that this is such a common case that there must be a way to do it, without writing code especially to handle it.
write.table(dd, "data.csv", append=TRUE, col.names=T)
See ?file.exists.
write.table(dd, "data.csv", append=TRUE, col.names=!file.exists("data.csv"))
Thus column names are written only when you are not appending to a file that already exists.
You may or may not also see a problem with the row names being identical, as write.table does not allow identical row names when appending. You could give this a try. In the first write to file, try write.table with row.names = FALSE only. Then, starting from the second write to file, use both col.names = FALSE and row.names = FALSE
Here's the first write to file
> d1 <- data.frame(A = 1:5, B = 1:5) ## example data
> write.table(d1, "file.txt", row.names = FALSE)
We can check it with read.table("file.txt", header = TRUE). Then we can append the same data frame to that file with
> write.table(d1, "file.txt", row.names = FALSE,
col.names = FALSE, append = TRUE)
And again we can check it with read.table("file.txt", header = TRUE)
So, if you have a list of data frames, say dlst, your code chunk that appends the data frames together might look something like
> dlst <- rep(list(d1), 3) ## list of example data
> write.table(dlst[1], "file.txt", row.names = FALSE)
> invisible(lapply(dlst[-1], write.table, "file.txt", row.names = FALSE,
col.names = FALSE, append = TRUE))
But as #MrFlick suggests, it would be much better to append the data frames in R, and then send them to file once. This would eliminate many possible errors/problems that could occur while writing to file. If the data is in a list, that could be done with
> dc <- do.call(rbind, dlst)
> write.table(dc, "file.txt")
Try changing the column names of the data frame using names() command in R and replace with the same names as existing and then try the dbWriteTable command keeping row.names = False. The issue will get solved.
e.g.
if your data frame df1 has columns as obs, name, age then
names(df1) <- c('obs','name','age')
and then try
dbWriteTable(conn, 'table_name', df1, append = T, row.names = F)

d_ply and dist() together

I'm having trouble with a R code that I wrote. Particularly it looks like this:
n<- nrow(aa)
for (i in 1:n)
{
A<- aa[i,]
d_ply(A, 1, function(row){
cu<- dist(A)
write.table(cu, file = paste(row$header, "txt", sep = "."), sep = "\t")
}, .progress='text', .print = TRUE)
}
I would like to obtain a single file from each row of aa matrix (the file name should be the header of the row), containing the distance matrix of that row, but seems very hard. If I try the code I get this error:
cannot coerce class '"dist"' into a data.frame
How can I solve this?
First, assuming aa is a data frame, then A is just a single row. You don't need to use the for loop if you're already using d_ply, which is designed to apply something to every row of a data frame.
The second issue is that dist returns a dist object, which has to be turned into a matrix before it can be written. The following code will do that:
Third, you need to convert the row from a one-row data frame to a vector before using dist.
This leads to the following code:
d_ply(aa, 1, function(row){
cu<- dist(as.numeric(row[-1]))
write.table(as.matrix(cu), file = paste(row$header, "txt", sep = "."), sep = "\t")
}, .progress='text', .print = TRUE)

Resources