How to merge a set of lists into a single data frame - r

I am new to R and coding in general, so please bear with me.
I have a spreadsheet that has 7 sheets, 6 of these sheets are formatted in the same way and I am skipping the one that is not formatted the same way.
The code I have is thus:
lst <- lapply(2:7,
function(i) read_excel("CONFIDENTIAL Ratio 062018.xlsx", sheet = i)
)
This code was taken from this post: How to import multiple xlsx sheets in R
So far so good, the formula works and I have a large list with 6 sub lists that appears to represent all of my data.
It is at this point that I get stuck, being so new I do not understand lists yet, and really need the lists to be merged into one single data frame that looks and feels like the source data (so columns and rows).
I cannot work out how to get from a list to a single data frame, I've tried using R Bind and other suggestions from here, but all seem to either fail or only partially work and I end up with a data frame that looks like a list etc.

If each sheets has the same number of columns (ncol) and same names (colnames) then this will work. It needs the dplyr pacakge.
require(dplyr)
my_dataframe <- bind_rows(my_list)

Related

Changing a List to a Dataframe in R

I have used the "htmltab" library to get data on the NFL draft and combine. The data has been selected fine but they are lists at the moment. I intend to merge them and perform analysis the data. at the moment it looks like this:
image List of combine 2016 1
Whenever I try use the unlist method I lose the headers of the columns and they are still remaining as a list.
any suggestions on this?
urlcom16 <- "http://nflcombineresults.com/nflcombinedata.php?
year=2016&pos=&college="
com16 <- htmltab(doc=urlcom16, which=1)
Try as.data.frame(com16). If it doesn't work, you might not have the same vector length in each list entry.

Change a date column in multiple data frames with one function

I know there are several questions regarding the "apply one function to multiple data frames"-issue. However, I coundn't find a solution to my problem but I think I got close to it using a solution from this question:
Same function over multiple data frames in R
I have 12 data frames with 4 columns each. The second one contains the data as an integer (e.g. 20161014, so %Y%m%d).
To get it into 2016-10-14 I used
TX_SOUID100758.txt[,2]<-as.Date(as.character(TX_SOUID100758.txt[,2]), "%Y%m%d")
Since I want to apply this function on all 15 data frames I tried
zch_filelist <- list.files(path=path, pattern="*.txt")
for (file in zch_filelist){
assign(file, read.csv(paste(path, file, sep=''),na.strings = -9999))
}
lapply(zch_filelist, function(x) (as.Date(as.character(x[2]), "%Y%m%d")))
I used the previously created list of file names when I imported the files into R.
However, it is not working. I guess the mistake is the indexing in the as.date function.
Any help is greatly appreciated.
Thanks!

View( ) function in R: How to view part of huge data frame as spreadsheet in R

I have a huge data frame df in R. Now,if I invoke View(df) then Rstudio does not respond since it's too big. So, I am wondering, if there is any way to view, say first 500 lines, of a data frame as spreadsheet.
(I know it's possible to view using head but I want to see it as spreadsheet since it has too many columns and using head to see too many columns is not really user friendly)
If you want to see first 100 lines of the data frame df as spreadsheet, use
View(head(df,100))
You can look at the dataframe as a matrix such as df[rowfrom:rowto, columnfrom:columnto], for example in your case:
df[1:500,]

r create and address variable in for loop

I have multiple csv-files in one folder. I want to load each csv-file in this folder into one separate data frame. Next, I want to extract certain elements from this data frame into a matrix and calculate the mean of all these matrixes.
setwd("D:\\data")
group_1<-list.files()
a<-length(group_1)
mferg_mean<-data.frame
for(i in 1:a)
{
assign(paste0("mferg_",i),read.csv(group_1[i],header=FALSE,sep=";",quote="",dec=",",col.names=1:90))
}
As there are 11 csv-files in the folder I now have the data frames
mferg_1
to
mferg_11
How can I address each data frame in this loop? As mentioned, I want to extract certain elements from each data frame to a matrix. I would imagine it something like this:
assign(paste0("mferg_matrix_",i),mferg_i[1:5,1:10])
But this obviously does not work because R does not recognize mferg_i in the loop. How can I address this data frame?
This is not something you should probably be using assign for in the first place. Working with a bunch of different data.frames in R is a mess, but working with a list of data.frames is much easier. Try reading your data with
group_1<-list.files()
mferg <- lapply(group_1, function(filename) {
read.csv(filename,header=FALSE,sep=";",quote="",dec=",",col.names=1:90))
})
and you get each each value with mferg[[1]], mferg[[1]], etc. And then you can create a list of extractions with
mferg_matrix <- lapply(mferg, function(x) x[1:5, 1:10])
This is the more R-like way to do things.
But technically you can use get to retrieve values like you use assign to create them. For example
assign(paste0("mferg_matrix_",i),get(paste0("mferg_",i))[1:5,1:10])
but again, this is probably not a smart strategy in the long run.

Replacing Rows in a large data frame in R

I have to manually collect some rows so based on the R Cookbook, it recommended me to pre-allocate some memory for a large data frame. Say my code is
dataSize <- 500000;
shoesRead <- read.csv(file="someShoeCsv.csv", head=TRUE, sep=",");
shoes <- data.frame(size=integer(dataSize), price=double(dataSize),
cost=double(dataSize), retail=double(dataSize));
So now, I have some data about shoes which I imported via csv, and then I perform some calculation and want to insert into the data frame shoes. Let's say the someShoeCsv.csv has a column called ukSize and so
usSize <- ukSize * 1.05 #for example
My question is how do I do so? Running the code, noting now I have a usSize variable which was transformed from the ukSize column, read from the csv file:
shoes <- rbind(shoes,
data.frame("size"=usSize, "price"=price,
"cost"=cost, "retail"=retail));
adds to the already large data frame.
I have experimented with doing the list and then rbind but understand that it is tedious and so I am thinking of using this method but still to no avail.
I'm not quite sure what you're trying to do, but if you're trying to replace some of the pre-allocated rows with new data, you could do so like this:
Nreplace = length(usSize)
shoes$size[1:Nreplace] <- usSize
shoes$price[1:Nreplace] <- shoesRead$price
And so on, for the rest of the columns.
Here's some unsolicited advice. Looking at the code you've included, you reference ukSize and price etc without referencing the data frame, which makes it appear like you've done attach(shoesRead). Definitely never use attach(). If you want the price vector, for example, just do shoesRead$price. It's just a little bit more typing for the sake of much more readable code.

Resources