Yahoo finance expanded financial tables scrape with R - r

I am trying to expand financial tables on yahoo finance with rvest.
url <- "https://finance.yahoo.com/quote/AEFES.IS/balance-sheet?p=AEFES.IS"
tic.nodes url.session %>%
html_elements(".fi-row") %>%
html_elements("[title]") %>%
html_text()
[1] "Total Revenue" "Cost of Revenue"
[3] "Gross Profit" "Operating Expense"
[5] "Operating Income" "Net Non Operating Interest Income Expense"
[7] "Pretax Income" "Tax Provision"
[9] "Net Income Common Stockholders" "Diluted NI Available to Com Stockholders"
[11] "Basic EPS" "Diluted EPS"
[13] "Basic Average Shares" "Diluted Average Shares"
[15] "Total Operating Income as Reported" "Rent Expense Supplemental"
[17] "Total Expenses" "Net Income from Continuing & Discontinued Operation"
[19] "Normalized Income" "Interest Income"
[21] "Interest Expense" "Net Interest Income"
[23] "EBIT" "EBITDA"
[25] "Reconciled Cost of Revenue" "Reconciled Depreciation"
[27] "Net Income from Continuing Operation Net Minority Interest" "Total Unusual Items Excluding Goodwill"
[29] "Total Unusual Items" "Normalized EBITDA"
[31] "Tax Rate for Calcs" "Tax Effect of Unusual Items"
However, the expanded table has 47 rows. On HTML code all lines start with fi-row but on the code, it won't take the under divisions.expand income statement can you guys help me please?

Related

Scraping keywords on PHP page

I would like to scrape the keywords inside the dropdown table of this webpage https://www.aeaweb.org/jel/guide/jel.php
The problem is that the drop-down menu of each item prevents me from scraping the table directly because it only takes the heading and not the inner content of each item.
rvest::read_html("https://www.aeaweb.org/jel/guide/jel.php") %>%
rvest::html_table()
I thought of scraping each line that starts with Keywords: but I do not get how can I do that. Seems like the HTML is not showing the items inside the table.
A RSelenium solution,
#Start the server
library(RSelenium)
driver = rsDriver(
browser = c("firefox"))
remDr <- driver[["client"]]
#Navigate to the url
remDr$navigate("https://www.aeaweb.org/jel/guide/jel.php")
#xpath of the table
remDr$findElement(using = "xpath",'/html/body/main/div/section/div[4]') -> out
#get text from the table
out <- out$getElementText()
out= out[[1]]
Split using stringr package
library(stringr)
str_split(out, "\n", n = Inf, simplify = FALSE)
[[1]]
[1] "A General Economics and Teaching"
[2] "B History of Economic Thought, Methodology, and Heterodox Approaches"
[3] "C Mathematical and Quantitative Methods"
[4] "D Microeconomics"
[5] "E Macroeconomics and Monetary Economics"
[6] "F International Economics"
[7] "G Financial Economics"
[8] "H Public Economics"
[9] "I Health, Education, and Welfare"
[10] "J Labor and Demographic Economics"
[11] "K Law and Economics"
[12] "L Industrial Organization"
[13] "M Business Administration and Business Economics; Marketing; Accounting; Personnel Economics"
[14] "N Economic History"
[15] "O Economic Development, Innovation, Technological Change, and Growth"
[16] "P Economic Systems"
[17] "Q Agricultural and Natural Resource Economics; Environmental and Ecological Economics"
[18] "R Urban, Rural, Regional, Real Estate, and Transportation Economics"
[19] "Y Miscellaneous Categories"
[20] "Z Other Special Topics"
To get the Keywords for History of Economic Thought, Methodology, and Heterodox Approaches
out1 <- remDr$findElement(using = 'xpath', value = '//*[#id="cl_B"]')
out1$clickElement()
out1 <- remDr$findElement(using = 'xpath', value = '/html/body/main/div/section/div[4]/div[2]/div[2]/div/div/div/div[2]')
out1$getElementText()
[[1]]
[1] "Keywords: History of Economic Thought"

Error replacing period with spaces in list

I have a list of indictors with periods in the name and I want to replace those periods with spaces. I know of the gsub() function that replaces punctuations. But every time I try to replace the dots with spaces the list returns null
list_AM = list(list(geo = "EU", sales="West.Europe.Sales",
indicator = list("SA","NSA","composites_industry_value","DUCS","WUCS","T30","Rovings",
"Mats","WE.Construction.Gross.output..sales...Real.USD","WE.Construction.Production.index","WE.Glass.Gross.operating.surplus..profits...Nominal.USD",
"WE.Glass.Gross.output..sales...Nominal.USD","WE.Glass.Investment..Nominal.USD","WE.Glass.Production.index","WE.Glass.Value.added.output..As.a.percent.of.GDP",
"WE.Glass.Value.added.output..As.a.percent.of.manufacturing","WE.Glass.Value.added.output..As.a.percent.of.world.total","WE.Industrial.Production.Gross.operating.surplus..profits...Nominal.USD",
"WE.Industrial.Production.Gross.output..sales...Nominal.USD","WE.Glass.Investment..Nominal.USD","WE.Glass.Production.index","WE.Glass.Value.added.output..As.a.percent.of.GDP","WE.Glass.Value.added.output..As.a.percent.of.manufacturing",
"WE.Glass.Value.added.output..As.a.percent.of.world.total","WE.Industrial.Production.Gross.operating.surplus..profits...Nominal.USD","WE.Industrial.Production.Gross.output..sales...Nominal.USD","WE.Industrial.Production.Production.index",
"WE.Industrial.Production.Value.added.output..As.a.percent.of.GDP","WE.Industrial.Production.Value.added.output..As.a.percent.of.world.total","WE.Manufacturing.Gross.operating.surplus..profits...Nominal.USD","WE.Manufacturing.Gross.output..sales...Nominal.USD",
"WE.Manufacturing.Investment..Nominal.USD","WE.Manufacturing.Production.index","WE.Manufacturing.Value.added.output..As.a.percent.of.GDP","WE.Manufacturing.Value.added.output..As.a.percent.of.world.total","WE.Current.account.of.balance.of.payments.in.US...share.of.GDP",
"WE.Employment..total.1","WE.External.debt..total..US.","WE.Foreign.direct.investment..US.","WE.GDP.per.capita..nominal..US.","WE.GDP..nominal..US.","WE.Government.balance..share.of.GDP","WE.Population..total","WE.Reserves..foreign.exchange..US.",
"WE.Reserves..months.of.import.cover","WE.Stockbuilding..real..share.of.GDP","WE.Visible.trade.balance..share.of.GDP","WE.Consumer.price.index","WE.Gross.government.debt..as.a...of.GDP.","WE.Industrial.production.index","WE.Interest.rate..short.term",
"WE.Interest.rate..Yield.on.10.year.Government.Debt.Securities....per.annum.","WE.Services.balance..as...of.GDP","WE.Share.price.index","WE.Unemployment.rate","WE.Capacity.utilisation","WE.Consumption..government..PPP.exchange.rate..nominal..US.","WE.Consumption..government..nominal..US.",
"WE.Consumption..government..nominal..share.of.GDP.1","WE.Consumption..private..PPP.exchange.rate..nominal..US.","WE.Exports..goods...services..constant.prices.and.exchange.rate..US.....of.World","WE.GDP..industry..real","WE.GVA.Agriculture.share.of.GVA","WE.GVA.Industry.share.of.GVA",
"WE.GVA.Manufacturing.of.GVA","WE.GVA.Services..share.of.GVA","WE.Gross.value.added.in.construction..real","WE.Gross.value.added.in.services..real","WE.Imports..goods...services..constant.prices.and.exchange.rate..US.....of.World","WE.Imports..goods..PPP.exchange.rate..nominal..US.",
"WE.Industrial.production.index.1","WE.Investment..government..nominal","WE.Investment..machinery...equipment..nominal","WE.Investment..private..non.residential.structures..nominal","WE.Investment..total.fixed.investment..nominal..US.",
"WE.Investment..total.fixed..nominal..share.of.GDP","WE.Net.investment..nominal..US.","WE.Output.gap","WE.Productivity..trend","WE.Stockbuilding..nominal..US.",
"WE.Stockbuilding..nominal..share.of.GDP","WE.Stockbuilding..real..annual.contribution.to.growth","WE.Trend.productivity.target","WE.World.trade.index","WE.House.price.index","WE.Housing.starts","WE.Interest.rate.on.building.society.mortgages","WE.Market.value.of.housing.stock..LCU",
"WE.Residential.property.transactions","WE.Stock.of.owner.occupied.houses","WE.Consumers..expenditure..durables..nominal","WE.Financial.liabilities..household.sector..as.a...of.disposable.income","WE.Liabilities..debt.other.than.loans..households","WE.Personal.consumer.credit",
"WE.Retail.sales..value.index","WE.Retail.sales..volume.index","WE.Savings..personal.sector.ratio")))
For example Instead of "WE.Residential.property.transactions" I want the list to return
"WE Residential property transactions"
Based on the structure, it is a recursive list, therefore, functions that loop over the nested list in a recursive way i.e. rapply or rrapply can be used and apply the gsub to match the . and replace with space (' ').
Note that . is a metacharacter that matches any character in regex mode (default case), thus we could match literally by either using fixed = TRUE (should be faster) or escape (\\.) or place it inside square brackets ([.])
library(rrapply)
list_AM2 <- rrapply(list_AM, f = function(x) gsub(".", " ", x, fixed = TRUE))
-ouput
> list_AM2
[[1]]
[[1]]$geo
[1] "EU"
[[1]]$sales
[1] "West Europe Sales"
[[1]]$indicator
[[1]]$indicator[[1]]
[1] "SA"
[[1]]$indicator[[2]]
[1] "NSA"
[[1]]$indicator[[3]]
[1] "composites_industry_value"
[[1]]$indicator[[4]]
[1] "DUCS"
[[1]]$indicator[[5]]
[1] "WUCS"
[[1]]$indicator[[6]]
[1] "T30"
[[1]]$indicator[[7]]
[1] "Rovings"
[[1]]$indicator[[8]]
[1] "Mats"
[[1]]$indicator[[9]]
[1] "WE Construction Gross output sales Real USD"
[[1]]$indicator[[10]]
[1] "WE Construction Production index"
[[1]]$indicator[[11]]
[1] "WE Glass Gross operating surplus profits Nominal USD"
[[1]]$indicator[[12]]
[1] "WE Glass Gross output sales Nominal USD"
[[1]]$indicator[[13]]
[1] "WE Glass Investment Nominal USD"
[[1]]$indicator[[14]]
[1] "WE Glass Production index"
[[1]]$indicator[[15]]
[1] "WE Glass Value added output As a percent of GDP"
[[1]]$indicator[[16]]
[1] "WE Glass Value added output As a percent of manufacturing"
[[1]]$indicator[[17]]
[1] "WE Glass Value added output As a percent of world total"
[[1]]$indicator[[18]]
[1] "WE Industrial Production Gross operating surplus profits Nominal USD"
[[1]]$indicator[[19]]
[1] "WE Industrial Production Gross output sales Nominal USD"
[[1]]$indicator[[20]]
[1] "WE Glass Investment Nominal USD"
[[1]]$indicator[[21]]
[1] "WE Glass Production index"
[[1]]$indicator[[22]]
[1] "WE Glass Value added output As a percent of GDP"
[[1]]$indicator[[23]]
[1] "WE Glass Value added output As a percent of manufacturing"
[[1]]$indicator[[24]]
[1] "WE Glass Value added output As a percent of world total"
[[1]]$indicator[[25]]
[1] "WE Industrial Production Gross operating surplus profits Nominal USD"
[[1]]$indicator[[26]]
[1] "WE Industrial Production Gross output sales Nominal USD"
[[1]]$indicator[[27]]
[1] "WE Industrial Production Production index"
[[1]]$indicator[[28]]
[1] "WE Industrial Production Value added output As a percent of GDP"
[[1]]$indicator[[29]]
[1] "WE Industrial Production Value added output As a percent of world total"
[[1]]$indicator[[30]]
[1] "WE Manufacturing Gross operating surplus profits Nominal USD"
[[1]]$indicator[[31]]
[1] "WE Manufacturing Gross output sales Nominal USD"
[[1]]$indicator[[32]]
[1] "WE Manufacturing Investment Nominal USD"
[[1]]$indicator[[33]]
[1] "WE Manufacturing Production index"
[[1]]$indicator[[34]]
[1] "WE Manufacturing Value added output As a percent of GDP"
[[1]]$indicator[[35]]
[1] "WE Manufacturing Value added output As a percent of world total"
[[1]]$indicator[[36]]
[1] "WE Current account of balance of payments in US share of GDP"
[[1]]$indicator[[37]]
[1] "WE Employment total 1"
[[1]]$indicator[[38]]
[1] "WE External debt total US "
[[1]]$indicator[[39]]
[1] "WE Foreign direct investment US "
[[1]]$indicator[[40]]
[1] "WE GDP per capita nominal US "
[[1]]$indicator[[41]]
[1] "WE GDP nominal US "
[[1]]$indicator[[42]]
[1] "WE Government balance share of GDP"
[[1]]$indicator[[43]]
[1] "WE Population total"
[[1]]$indicator[[44]]
[1] "WE Reserves foreign exchange US "
[[1]]$indicator[[45]]
[1] "WE Reserves months of import cover"
[[1]]$indicator[[46]]
[1] "WE Stockbuilding real share of GDP"
[[1]]$indicator[[47]]
[1] "WE Visible trade balance share of GDP"
[[1]]$indicator[[48]]
[1] "WE Consumer price index"
[[1]]$indicator[[49]]
[1] "WE Gross government debt as a of GDP "
[[1]]$indicator[[50]]
[1] "WE Industrial production index"
[[1]]$indicator[[51]]
[1] "WE Interest rate short term"
[[1]]$indicator[[52]]
[1] "WE Interest rate Yield on 10 year Government Debt Securities per annum "
[[1]]$indicator[[53]]
[1] "WE Services balance as of GDP"
[[1]]$indicator[[54]]
[1] "WE Share price index"
[[1]]$indicator[[55]]
[1] "WE Unemployment rate"
[[1]]$indicator[[56]]
[1] "WE Capacity utilisation"
[[1]]$indicator[[57]]
[1] "WE Consumption government PPP exchange rate nominal US "
[[1]]$indicator[[58]]
[1] "WE Consumption government nominal US "
[[1]]$indicator[[59]]
[1] "WE Consumption government nominal share of GDP 1"
[[1]]$indicator[[60]]
[1] "WE Consumption private PPP exchange rate nominal US "
[[1]]$indicator[[61]]
[1] "WE Exports goods services constant prices and exchange rate US of World"
[[1]]$indicator[[62]]
[1] "WE GDP industry real"
[[1]]$indicator[[63]]
[1] "WE GVA Agriculture share of GVA"
[[1]]$indicator[[64]]
[1] "WE GVA Industry share of GVA"
[[1]]$indicator[[65]]
[1] "WE GVA Manufacturing of GVA"
[[1]]$indicator[[66]]
[1] "WE GVA Services share of GVA"
[[1]]$indicator[[67]]
[1] "WE Gross value added in construction real"
[[1]]$indicator[[68]]
[1] "WE Gross value added in services real"
[[1]]$indicator[[69]]
[1] "WE Imports goods services constant prices and exchange rate US of World"
[[1]]$indicator[[70]]
[1] "WE Imports goods PPP exchange rate nominal US "
[[1]]$indicator[[71]]
[1] "WE Industrial production index 1"
[[1]]$indicator[[72]]
[1] "WE Investment government nominal"
[[1]]$indicator[[73]]
[1] "WE Investment machinery equipment nominal"
[[1]]$indicator[[74]]
[1] "WE Investment private non residential structures nominal"
[[1]]$indicator[[75]]
[1] "WE Investment total fixed investment nominal US "
[[1]]$indicator[[76]]
[1] "WE Investment total fixed nominal share of GDP"
[[1]]$indicator[[77]]
[1] "WE Net investment nominal US "
[[1]]$indicator[[78]]
[1] "WE Output gap"
[[1]]$indicator[[79]]
[1] "WE Productivity trend"
[[1]]$indicator[[80]]
[1] "WE Stockbuilding nominal US "
[[1]]$indicator[[81]]
[1] "WE Stockbuilding nominal share of GDP"
[[1]]$indicator[[82]]
[1] "WE Stockbuilding real annual contribution to growth"
[[1]]$indicator[[83]]
[1] "WE Trend productivity target"
[[1]]$indicator[[84]]
[1] "WE World trade index"
[[1]]$indicator[[85]]
[1] "WE House price index"
[[1]]$indicator[[86]]
[1] "WE Housing starts"
[[1]]$indicator[[87]]
[1] "WE Interest rate on building society mortgages"
[[1]]$indicator[[88]]
[1] "WE Market value of housing stock LCU"
[[1]]$indicator[[89]]
[1] "WE Residential property transactions"
[[1]]$indicator[[90]]
[1] "WE Stock of owner occupied houses"
[[1]]$indicator[[91]]
[1] "WE Consumers expenditure durables nominal"
[[1]]$indicator[[92]]
[1] "WE Financial liabilities household sector as a of disposable income"
[[1]]$indicator[[93]]
[1] "WE Liabilities debt other than loans households"
[[1]]$indicator[[94]]
[1] "WE Personal consumer credit"
[[1]]$indicator[[95]]
[1] "WE Retail sales value index"
[[1]]$indicator[[96]]
[1] "WE Retail sales volume index"
[[1]]$indicator[[97]]
[1] "WE Savings personal sector ratio"
If there are multiple .s, can use \\.+ i.e. one or more and replace with ' '
list_AM2 <- rrapply(list_AM, f = function(x) gsub("\\.+", " ", x))

How do you scrape multiple pages from same website on Rstudio

so I want to download data from multiple pages of the same website using RStudio
https://www.irishjobs.ie/ShowResults.aspx?Keywords=Data&autosuggestEndpoint=%2fautosuggest&Location=0&Category=&Recruiter=Company&btnSubmit=Search&Page=2
The difference between page 2 and page 3, is …at the end of the hyperlink we just have a 3 instead of a 2
I have no problem getting what I need from 25 jobs in 1 page, but I want to get 100 jobs from 4 pages.
I am using the selector gadget chrome extension.
I tried the for loop
for (page_result in seq(from =1, to = 101, by = 25)) {
link = paste0(“ https://www.irishjobs.ie/ShowResults.aspx?Keywords=Data&autosuggestEndpoint=%2fautosuggest&Location=0&Category=&Recruiter=Company&btnSubmit=Search&Page=2)
page = read_html(link)
I can’t figure out how to do it
I think I need to fit in page_result into the link, but I don’t know where.
I welcome any ideas.
i have the rvest package and the dplyr package. But I want the for loop to go through each page. Any idea how best to do this, thanks
4 links can be easily put in for loop.
Copy the CSS link from DOM and iterate over 5 to 30 to get all 25 jobs.
AllJOBS <- vector()
for (i in 1:4) {
print("s")
url <- paste0("https://www.irishjobs.ie/ShowResults.aspx?Keywords=Data&autosuggestEndpoint=%2fautosuggest&Location=0&Category=&Recruiter=Company&btnSubmit=Search&Page=",i,sep="")
for (k in 5:30) {
jobs <- read_html(url) %>% html_node(css = paste0("#page > div.container > div.column-wrap.order-one-two > div.two-thirds > div:nth-child(",k,") > div > div.job-result-logo-title > div.job-result-title > h2 > a")) %>% html_text()
AllJOBS <- append(AllJOBS,jobs)
Sys.sleep(runif(1,1,2))
print(k)
}
print(paste0("Page",i))
}
output
> AllJOBS
[1] "Senior Consultant - Fund Static Data"
[2] "Data Warehouse Engineer"
[3] "Senior Software Engineer - Big Data DevOps"
[4] "HR Data Analyst"
[5] "Data Insights Engineer - Dublin - Permanent/Contract - SQL Server"
[6] NA
[7] "Data Engineer - Master Data Services - SQL Server - Permanent/Contract"
[8] "Senior Data Protection Officer (DPO) - Contract"
[9] "QC Data Analyst (Trending)"
[10] "Senior Data Warehouse Developer"
[11] "Senior Data Analyst FTC"
[12] "Compliance Advisory and Data Protection Relationship Manager"
[13] "Contracts Manager-Data Center"
[14] "Payments Product Data Analyst"
[15] "Data Center Product Hardware Platform Engineer"
[16] "People Data Privacy Program Lead"
[17] "Head of Data Science"
[18] "Data Protection Counsel (Product or Compliance)"
[19] "Data Engineer, GMS"
[20] "Data Protection Associate General Counsel"
[21] "Senior Data Engineer"
[22] "Geospatial Data Scientist"
[23] "Data Solutions Manager"
[24] "Data Protection Solicitor"
[25] "Junior Data Scientist"
[26] "Master Data Specialist"
[27] "Temp QC Electronic Data Management Analyst"
[28] "20725 -Data Scientist - Limerick"
[29] "Technical Support Specialist - Data Centre"
[30] "Lead QC Micro Analyst (data review and compliance)"
[31] "Temp QC Data Analyst"
[32] "#Abbvie Compliance Engineer (Data Integrity)"
[33] "People Data Analyst"
[34] "Senior Electrical Design Engineer - Data Centre Ex"
[35] "Laboratory Data Entry Assistant, UCD NVRL"
[36] "Data Migrations Specialist"
[37] "Data Protection Officer"
[38] "Data Center Operations Engineer (Linux)"
[39] "Senior Electrical Engineer | Data Centre LV Design"
[40] "Data Scientist - (Process Sciences)"
[41] "Mgr Supply Logistics Global Materials Data"
[42] "Data Protection / Privacy Delivery Consultant"
[43] "Global Supply Chain Data Analyst"
[44] "QC Data Analyst"
[45] "0582GradeVIIFOIOLOL1120 - Grade VII Data Protection / Freedom of Information & Compliance Officer"
[46] "DPO001 - Deputy Data Protection Officer (General Manager) Office of the Head of Data Protection, HSE"
[47] "Senior Campaign Data Analyst"
[48] "Data & Reporting Analyst II"
[49] "Azure Data Analytics Solution Architect"
[50] "Head of Risk Assurance for IT, Data, Projects and Outsourcing"
[51] "Trainee Data Technician, Ireland"
[52] NA
You can deal with NAs separately. Does this answer your question or I misinterpreted it?

Scraping live updating information using R

I am new to web scraping and been trying to scrape the right-hand side list of UK local authorities and the number of Covid-19 cases.
Here is the website:
https://www.arcgis.com/apps/opsdashboard/index.html#/f94c3c90da5b4e9f9a0b19484dd4bb14
I have been able to scrape Wikipedia, but I don't have any idea where to start with the above website. Any tip/links would be very helpful and much appreciated!
I have been able to get some numbers in the page with the following code :
library(rvest)
library(RSelenium)
port <- as.integer(4444L + rpois(lambda = 1000, 1))
rd <- rsDriver(chromever = "105.0.5195.52", browser = "chrome", port = port)
remDr <- rd$client
remDr$open()
url <- "https://coronavirus.data.gov.uk/"
remDr$navigate(url)
html_Content <- remDr$getPageSource()[[1]]
text <- read_html(html_Content) %>% html_text2()
text <- strsplit(text, "\n")[[1]]
text[54 : 72]
[1] "Last 7 days – first dose"
[2] "10,536Number of people vaccinated (first dose) in the 7 days to 2 October 2022"
[3] "Total – first dose"
[4] "45,275,970Total number of people vaccinated (first dose) reported on 2 October 2022"
[5] "Last 7 days – second dose"
[6] "18,800Number of people vaccinated (second dose) in the 7 days to 2 October 2022"
[7] "Total – second dose"
[8] "42,718,917Total number of people vaccinated (second dose) reported on 2 October 2022"
[9] "Last 7 days – booster or third dose"
[10] "25,518Number of people vaccinated (booster or third dose) in the 7 days to 2 October 2022"
[11] "Total – booster or third dose"
[12] "33,613,297Total number of people vaccinated (booster or third dose) reported on 2 October 2022"
[13] "Percentage of population aged 12+"
[14] "93.6%Percentage of population aged 12+ vaccinated (first dose) reported on 2 October 2022"
[15] "First dose"
[16] "88.3%Percentage of population aged 12+ vaccinated (second dose) reported on 2 October 2022"
[17] "Second dose"
[18] "69.5%Percentage of population aged 12+ vaccinated (booster or third dose) reported on 2 October 2022"
[19] "Booster or third dose"
I hope this is helpful!

Replace all non-alphanumeric with a period

I am trying to rename all of these atrocious column names in a data frame I received from a government agency.
> colnames(thedata)
[1] "Region" "Resource Assessment Site ID"
[3] "Site Name/Facility" "Design Head (feet)"
[5] "Design Flow (cfs)" "Installed Capacity (kW)"
[7] "Annual Production (MWh)" "Plant Factor"
[9] "Total Construction Cost (1,000 $)" "Annual O&M Cost (1,000 $)"
[11] "Cost per Installed Capacity ($/kW)" "Benefit Cost Ratio with Green Incentives"
[13] "IRR with Green Incentives" "Benefit Cost Ratio without Green Incentives"
[15] "IRR without Green Incentives"
The column headers have special non-alphanumeric characters and spaces, so referring to them is impossible so I have to rename them. I would like to replace all non-alphanumeric characters with a period. But I tried:
old.col.names <- colnames(thedata)
new.col.names <- gsub("^a-z0-9", ".", old.col.names)
The ^ is a "not" delineation, so I thought it would replace everything that is not alphanumeric with a period in the old.col.names.
Can anyone help?
Here are three options to consider:
make.names(x)
gsub("[^A-Za-z0-9]", ".", x)
names(janitor::clean_names(setNames(data.frame(matrix(NA, ncol = length(x))), x)))
Here's what each looks like:
make.names(x)
## [1] "Region" "Resource.Assessment.Site.ID"
## [3] "Site.Name.Facility" "Design.Head..feet."
## [5] "Design.Flow..cfs." "Installed.Capacity..kW."
## [7] "Annual.Production..MWh." "Plant.Factor"
## [9] "Total.Construction.Cost..1.000..." "Annual.O.M.Cost..1.000..."
## [11] "Cost.per.Installed.Capacity....kW." "Benefit.Cost.Ratio.with.Green.Incentives"
## [13] "IRR.with.Green.Incentives" "Benefit.Cost.Ratio.without.Green.Incentives"
## [15] "IRR.without.Green.Incentives"
gsub("[^A-Za-z0-9]", ".", x)
## [1] "Region" "Resource.Assessment.Site.ID"
## [3] "Site.Name.Facility" "Design.Head..feet."
## [5] "Design.Flow..cfs." "Installed.Capacity..kW."
## [7] "Annual.Production..MWh." "Plant.Factor"
## [9] "Total.Construction.Cost..1.000..." "Annual.O.M.Cost..1.000..."
## [11] "Cost.per.Installed.Capacity....kW." "Benefit.Cost.Ratio.with.Green.Incentives"
## [13] "IRR.with.Green.Incentives" "Benefit.Cost.Ratio.without.Green.Incentives"
## [15] "IRR.without.Green.Incentives"
library(janitor)
names(clean_names(setNames(data.frame(matrix(NA, ncol = length(x))), x)))
## [1] "region" "resource_assessment_site_id"
## [3] "site_name_facility" "design_head_feet"
## [5] "design_flow_cfs" "installed_capacity_kw"
## [7] "annual_production_mwh" "plant_factor"
## [9] "total_construction_cost_1_000" "annual_o_m_cost_1_000"
## [11] "cost_per_installed_capacity_kw" "benefit_cost_ratio_with_green_incentives"
## [13] "irr_with_green_incentives" "benefit_cost_ratio_without_green_incentives"
## [15] "irr_without_green_incentives"
Sample data:
x <- c("Region", "Resource Assessment Site ID", "Site Name/Facility",
"Design Head (feet)", "Design Flow (cfs)", "Installed Capacity (kW)",
"Annual Production (MWh)", "Plant Factor", "Total Construction Cost (1,000 $)",
"Annual O&M Cost (1,000 $)", "Cost per Installed Capacity ($/kW)",
"Benefit Cost Ratio with Green Incentives", "IRR with Green Incentives",
"Benefit Cost Ratio without Green Incentives", "IRR without Green Incentives")

Resources