How to read an unstructured excel sheet into R and clean it to be used in Shiny? - r

Our excel sheet is formatted in a strange manner. Some headers are located in the first row, others are located in either the 2nd,3rd, or 4th row. Beneath the 4th row is the first subset of data we want to generate graphs from, there are multiple subsets as you go down the excel sheet. Each of these subsets is separated by an empty row. The first column is dedicated to the name of the source of the data. For example in the first column and 5th row, there is a label called "communications" and to the right is the data. The rows in the first column under "communications" are empty until the next label. We need to be able to read the separate subsets in shiny to generate individual graphs. How do you recommend we go about this? We are fairly new to R and are lost on where to go.

Related

Paste name of column to other columns in R?

I have recently received an output from the online survey (ESRI Survey123), storing the each recored attribte as a new column of teh table. The survey reports characteristics of single trees located on study site: e.g. beech1, beech2, etc. For each beech, several attributes are recorded such as height, shape, etc.
This is how the output table looks like in Excel. ID simply represent the site number:
Now I wonder, how can I read those data into R to make sure that columns 1:3 belong to beech1, columns 4:6 represent beech2, etc.? I am looking for something that would paste the beech1 into names of the following columns: beech1.height, beech1.shape. But I am not sure how to do it?

How to skip empty rows while reading multiple tabs in R?

I am trying to read an excel file with multiple tabs. For that, I use the code provided here.
The problem is that each tab has a different number of empty rows before the actual data begins. For example, the first tab has two empty rows, the second tab has three empty rows, and so on.
Normally, I would use the parameter skip in the read_excel function to indicate the number of empty lines to skip. But how do I do that for multiple tabs with different numbers of rows to skip?
perhaps the easiest solution would be to read it as it is then remove rows, i.e. yourdata <- yourdata[!is.na(yourdata$columname),] ; this would work if you don't expect any NA's in a particular column, like id. If you have data gaps everywhere you can test for all NAs in multiple columns - let me know if that's what you need.

Read AGS type file in R

I am trying to read a special type of file (the format is called AGS) which looks like in the image:
This is basically a TEXT file, which contains many tables with different dimensions inside, separated by 2 (but sometimes more) empty rows. As you might guess, the problem is related to the fact that these tables have different number of columns and obviously different column names.
The first row in each table (here tables are denoted as GROUP) shows the name of the table, e.g. LOCA, HDPH, etc. The second row shows the column names. The third row shows the units of each column. All the other rows show the observations. In each row, columns are separated by commas and values are inside double quotes.
I am struggling to read this type of file. The ideal output would be to have each of these tables into separated data frames. Any help and ideas are much appreciated.
An example file can be downloaded here: example AGS file

Manipulating Data/Data Wrangling of excel sheets in R and changing layout

I have over a hundred files in excel that in general have the same format of :
Screenshot of format
they are all different lengths too!
I want to combine the data of all the different excels sheet into this format:
Desired format
So far my thinking is:
create a new column in every excel sheet that is titled Day
combine day column and Steps column in some kind of join that combines the title of the two columns to say day{1}steps and have the data of where they intersect be the data that goes in that row
Is this the right thought process on how to do that? Any tips on how to get to my end result?

R XLConnect getting index/formula to a chunk of data using content found in first cell

Sorry if this is difficult to understand - I don't have enough karma to add a picture so I will do the best I can to describe this! Using XLConnect package within R to read & write from/to Excel spreadsheets.
I am working on a project in which I am trying to take columns of data out of many workbooks and concatenate them together into rows of a new workbook based on which workbook they came from (each workbook is data from a consecutive business day). The snag is that the data that I seek is only a small part (10 rows X 3 columns) of each workbook/worksheet and is not always located in the same place within the worksheet due to sloppiness on behalf of the person who originally created the spreadsheets. (e.g. I can't just start at cell A2 because the dataset that starts at A2 in one workbook might start at B12 or C3 in another workbook).
I am wondering if it is possible to search for a cell based on its contents (e.g. a cell containing the title "Table of Arb Prices") and return either the index or reference formula to be able to access that cell.
Also wondering if, once I reference that cell based on its contents, if there is a way to adjust that formula to get to where I know another cell is compared to that one. For example if a cell with known contents is always located 2 rows above and 3 columns to the left of the cell where I wish to start collecting data, is it possible for me to take that first reference formula and increment it by 2 rows and 3 columns to get the reference formula for the cell I want?
Thanks for any help and please advise me if you need further information to be able to understand my questions!
You can just read the entire worksheet in as a matrix with something like
library(XLConnect)
demoExcelFile <- system.file("demoFiles/mtcars.xlsx", package = "XLConnect")
mm <- as.matrix(readWorksheetFromFile(demoExcelFile, sheet=1))
class(mm)<-"character" # convert all to character
Then you can search for values and get the row/colum
which(mm=="3.435", arr.ind=T)
# row col
# [1,] 23 6
Then you can offset those and extract values from the matrix how ever you like. In the end, when you know where you want to read from, you can convert to a cleaner data frame with
read.table(text=apply(mm[25:27, 6:8],1,paste, collapse="\t"), sep="\t")
Hopefully that gives you a general idea of something you can try. It's hard to be more specific without knowing exactly what your input data looks like.

Resources