Read AGS type file in R

Read AGS type file in R - r

I am trying to read a special type of file (the format is called AGS) which looks like in the image:
This is basically a TEXT file, which contains many tables with different dimensions inside, separated by 2 (but sometimes more) empty rows. As you might guess, the problem is related to the fact that these tables have different number of columns and obviously different column names.
The first row in each table (here tables are denoted as GROUP) shows the name of the table, e.g. LOCA, HDPH, etc. The second row shows the column names. The third row shows the units of each column. All the other rows show the observations. In each row, columns are separated by commas and values are inside double quotes.
I am struggling to read this type of file. The ideal output would be to have each of these tables into separated data frames. Any help and ideas are much appreciated.
An example file can be downloaded here: example AGS file

Related

How to stop R from reading first row as column name when scraping a pdf

Unfortunately, the pdf I'm scraping is sensitive so I can't share it.
It's about 50 pages long and none of the columns have actual column headers so R is taking the first row and using it as the column names. Not a huge deal, I can always add that row back in and replace the column names. The problem is each page has a different first line so when I run all the pages, it take the first line from each page and takes is as a new column name. So, page one spits out 10 nice columns with the wrong names. Then it moves to page two and recognizes new column names so in addition to adding new rows it adds another 10 columns. So in the end instead of 1000 obs. of 10 variables, I have 1000 obs. of 500 variables.
I hope this explanation makes sense.
Using extract_tables(), I'm able to specify table area and column widths. Is there a command I can use with extract_tables() to tell it not to assume/use column names?

Paste name of column to other columns in R?

I have recently received an output from the online survey (ESRI Survey123), storing the each recored attribte as a new column of teh table. The survey reports characteristics of single trees located on study site: e.g. beech1, beech2, etc. For each beech, several attributes are recorded such as height, shape, etc.
This is how the output table looks like in Excel. ID simply represent the site number:
Now I wonder, how can I read those data into R to make sure that columns 1:3 belong to beech1, columns 4:6 represent beech2, etc.? I am looking for something that would paste the beech1 into names of the following columns: beech1.height, beech1.shape. But I am not sure how to do it?

How to skip empty rows while reading multiple tabs in R?

I am trying to read an excel file with multiple tabs. For that, I use the code provided here.
The problem is that each tab has a different number of empty rows before the actual data begins. For example, the first tab has two empty rows, the second tab has three empty rows, and so on.
Normally, I would use the parameter skip in the read_excel function to indicate the number of empty lines to skip. But how do I do that for multiple tabs with different numbers of rows to skip?

perhaps the easiest solution would be to read it as it is then remove rows, i.e. yourdata <- yourdata[!is.na(yourdata$columname),] ; this would work if you don't expect any NA's in a particular column, like id. If you have data gaps everywhere you can test for all NAs in multiple columns - let me know if that's what you need.

R Code: csv file data incorrectly breaking across lines

I have some csv data that I'm trying to read in, where lines are breaking across rows weirdly.
An example of the file (the files are the same but the date varies) is here: http://nemweb.com.au/Reports/Archive/DispatchIS_Reports/PUBLIC_DISPATCHIS_20211118.zip
The csv is non-rectangular because there's 4 different types of data included, each with their own heading rows. I can't skip a certain number of lines because the length of the data varies by date.
The data that I want is the third dataset (sometimes the second), and has approximately twice the number of headers as the data above it. So I use read.csv() without a header and ideally it should pull all data and fill NAs above.
But for some reason read.csv() seems to decide that there's 28 columns of data (corresponding to the data headers on row 2) which splits the data lower down across three rows - so instead of the data headers being on one row, it splits across three; and so does all the rows of data below it.
I tried reading it in with the column names explicitly defined, it's still splitting the rows weirdly
I can't figure out what's going on - if I open the csv file in Excel it looks perfectly normal.
If I use readr::read_lines() there's no errant carriage returns or new lines as far as I can tell.
Hoping someone might have some guidance, otherwise I'll have to figure out a kind of nasty read_lines approach.

Printing out R Dataframe - Single Character Between Columns While Maintaining Alignment (Variable Spacing)

In a previous question, I received output for an R dataframe that had two aligned columns. The answer gave me the following output:
While the post answered my initial question, it seems as if the program I intend to use requires a text file in which the two columns are both aligned and separated by a single character (e.g. a tab). The previous solution instead results in a large and variable number of spaces between the first and second columns (depending on the length of the string in the first column for that particular row.) Inserting a single character, however, results in a misalignment of the columns.
Is there any way in which I can replace a large number of spaces with a single character that has variable spacing to 'reach' to the second column?
If it helps, this webpage contains a .txt file that you may download to see the intended output (although it does not suffer from the problem with the first column having variable name lengths, it has a single 'space character' that separates the first and second columns. If I 'copy and paste' this specific space character between columns 1 and 2, the program can successfully interpret the .txt file. This copy + paste results in a single character separating the columns and appropriate alignment.)
For further example, the first of the following pictures (note the highlight is a single character) properly parses while the second does not:

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Read AGS type file in R - r

Related

How to stop R from reading first row as column name when scraping a pdf

Paste name of column to other columns in R?

How to skip empty rows while reading multiple tabs in R?

R Code: csv file data incorrectly breaking across lines

Printing out R Dataframe - Single Character Between Columns While Maintaining Alignment (Variable Spacing)

Categories

Resources