I've got a series of excel files to upload to R for analysis. I'm not sure how to shape/upload the data so that the column names come from the first two rows instead of just one row. When uploaded with row 1 as the column names "Health Outcomes" and "Health Factors" don't include column E and G so I cant just combine row 2 with the column names after the upload. I've tried other methods such as those here but they haven't worked since some of the entries in row 1 are blank. In theory I could make some of these changes manually but I'm sure there's a better way. Any help is appreciated!
Related
Unfortunately, the pdf I'm scraping is sensitive so I can't share it.
It's about 50 pages long and none of the columns have actual column headers so R is taking the first row and using it as the column names. Not a huge deal, I can always add that row back in and replace the column names. The problem is each page has a different first line so when I run all the pages, it take the first line from each page and takes is as a new column name. So, page one spits out 10 nice columns with the wrong names. Then it moves to page two and recognizes new column names so in addition to adding new rows it adds another 10 columns. So in the end instead of 1000 obs. of 10 variables, I have 1000 obs. of 500 variables.
I hope this explanation makes sense.
Using extract_tables(), I'm able to specify table area and column widths. Is there a command I can use with extract_tables() to tell it not to assume/use column names?
I have recently received an output from the online survey (ESRI Survey123), storing the each recored attribte as a new column of teh table. The survey reports characteristics of single trees located on study site: e.g. beech1, beech2, etc. For each beech, several attributes are recorded such as height, shape, etc.
This is how the output table looks like in Excel. ID simply represent the site number:
Now I wonder, how can I read those data into R to make sure that columns 1:3 belong to beech1, columns 4:6 represent beech2, etc.? I am looking for something that would paste the beech1 into names of the following columns: beech1.height, beech1.shape. But I am not sure how to do it?
Our excel sheet is formatted in a strange manner. Some headers are located in the first row, others are located in either the 2nd,3rd, or 4th row. Beneath the 4th row is the first subset of data we want to generate graphs from, there are multiple subsets as you go down the excel sheet. Each of these subsets is separated by an empty row. The first column is dedicated to the name of the source of the data. For example in the first column and 5th row, there is a label called "communications" and to the right is the data. The rows in the first column under "communications" are empty until the next label. We need to be able to read the separate subsets in shiny to generate individual graphs. How do you recommend we go about this? We are fairly new to R and are lost on where to go.
I have data on one excel sheet where there are about 10 columns But I need to count data using just two of them. For Example:
Column C has district names and Column D says what their title is.
The file has hundreds of entries and I need to know how many Admins are in district 1. Only that the results would be on a different tab. I have tried using Vlookup but my knowledge only takes me as far as looking up just one criteria.
If your "one excel sheet" is called Sheet1, district 1 is in ColumnC of that sheet and Admins in ColumnD of that sheet then on a different tab:
=COUNTIFS(Sheet1!C:C,"district 1",Sheet1!D:D,"Admins")
Sorry if this is difficult to understand - I don't have enough karma to add a picture so I will do the best I can to describe this! Using XLConnect package within R to read & write from/to Excel spreadsheets.
I am working on a project in which I am trying to take columns of data out of many workbooks and concatenate them together into rows of a new workbook based on which workbook they came from (each workbook is data from a consecutive business day). The snag is that the data that I seek is only a small part (10 rows X 3 columns) of each workbook/worksheet and is not always located in the same place within the worksheet due to sloppiness on behalf of the person who originally created the spreadsheets. (e.g. I can't just start at cell A2 because the dataset that starts at A2 in one workbook might start at B12 or C3 in another workbook).
I am wondering if it is possible to search for a cell based on its contents (e.g. a cell containing the title "Table of Arb Prices") and return either the index or reference formula to be able to access that cell.
Also wondering if, once I reference that cell based on its contents, if there is a way to adjust that formula to get to where I know another cell is compared to that one. For example if a cell with known contents is always located 2 rows above and 3 columns to the left of the cell where I wish to start collecting data, is it possible for me to take that first reference formula and increment it by 2 rows and 3 columns to get the reference formula for the cell I want?
Thanks for any help and please advise me if you need further information to be able to understand my questions!
You can just read the entire worksheet in as a matrix with something like
library(XLConnect)
demoExcelFile <- system.file("demoFiles/mtcars.xlsx", package = "XLConnect")
mm <- as.matrix(readWorksheetFromFile(demoExcelFile, sheet=1))
class(mm)<-"character" # convert all to character
Then you can search for values and get the row/colum
which(mm=="3.435", arr.ind=T)
# row col
# [1,] 23 6
Then you can offset those and extract values from the matrix how ever you like. In the end, when you know where you want to read from, you can convert to a cleaner data frame with
read.table(text=apply(mm[25:27, 6:8],1,paste, collapse="\t"), sep="\t")
Hopefully that gives you a general idea of something you can try. It's hard to be more specific without knowing exactly what your input data looks like.