I am performing a subset of a large ffdf objects and I noticed that when I use subset.ff it is generating a large number of NAs. I tried an alternative way by using ffwhich and the result is much faster and no NAs are generated. Here it is my test:
library(ffbase)
# deals is the ffdf I would like to subset
unique(deals$COMMODITY)
ff (open) integer length=7 (7) levels: CASH CO2 COAL ELEC GAS GCERT OIL
[1] [2] [3] [4] [5] [6] [7]
CASH CO2 COAL ELEC GAS GCERT OIL
# Using subset.ff
started.at=proc.time()
deals0 <- subset.ff(deals,deals$COMMODITY %in% c("CASH","COAL","CO2","ELEC","GCERT"))
cat("Finished in",timetaken(started.at),"\n")
Finished in 12.640sec
# NAs are generated
unique(deals0$COMMODITY)
ff (open) integer length=8 (8) levels: CASH CO2 COAL ELEC GAS GCERT OIL <NA>
[1] [2] [3] [4] [5] [6] [7] [8]
CASH CO2 COAL ELEC GAS GCERT OIL NA
# Subset using ffwhich
started.at=proc.time()
idx <- ffwhich(deals,COMMODITY %in% c("CASH","COAL","CO2","ELEC","GCERT"))
deals1 <- deals[idx,]
cat("Finished in",timetaken(started.at),"\n")
Finished in 3.130sec
# No NAs are generated
unique(deals1$COMMODITY)
ff (open) integer length=7 (7) levels: CASH CO2 COAL ELEC GAS GCERT OIL
[1] [2] [3] [4] [5] [6] [7]
CASH CO2 COAL ELEC GAS GCERT OIL
Any idea why this is happening?
subset.ff is probably using [and your criterion but not including a !is.na(.) clause. The default for "[" is to return items that are either TRUE or NA for the criterion vector. The regular subset function adds a !is.na(.) clause, but maybe the authors of ffbase didn't get around to that.
Related
I have a list of indictors with periods in the name and I want to replace those periods with spaces. I know of the gsub() function that replaces punctuations. But every time I try to replace the dots with spaces the list returns null
list_AM = list(list(geo = "EU", sales="West.Europe.Sales",
indicator = list("SA","NSA","composites_industry_value","DUCS","WUCS","T30","Rovings",
"Mats","WE.Construction.Gross.output..sales...Real.USD","WE.Construction.Production.index","WE.Glass.Gross.operating.surplus..profits...Nominal.USD",
"WE.Glass.Gross.output..sales...Nominal.USD","WE.Glass.Investment..Nominal.USD","WE.Glass.Production.index","WE.Glass.Value.added.output..As.a.percent.of.GDP",
"WE.Glass.Value.added.output..As.a.percent.of.manufacturing","WE.Glass.Value.added.output..As.a.percent.of.world.total","WE.Industrial.Production.Gross.operating.surplus..profits...Nominal.USD",
"WE.Industrial.Production.Gross.output..sales...Nominal.USD","WE.Glass.Investment..Nominal.USD","WE.Glass.Production.index","WE.Glass.Value.added.output..As.a.percent.of.GDP","WE.Glass.Value.added.output..As.a.percent.of.manufacturing",
"WE.Glass.Value.added.output..As.a.percent.of.world.total","WE.Industrial.Production.Gross.operating.surplus..profits...Nominal.USD","WE.Industrial.Production.Gross.output..sales...Nominal.USD","WE.Industrial.Production.Production.index",
"WE.Industrial.Production.Value.added.output..As.a.percent.of.GDP","WE.Industrial.Production.Value.added.output..As.a.percent.of.world.total","WE.Manufacturing.Gross.operating.surplus..profits...Nominal.USD","WE.Manufacturing.Gross.output..sales...Nominal.USD",
"WE.Manufacturing.Investment..Nominal.USD","WE.Manufacturing.Production.index","WE.Manufacturing.Value.added.output..As.a.percent.of.GDP","WE.Manufacturing.Value.added.output..As.a.percent.of.world.total","WE.Current.account.of.balance.of.payments.in.US...share.of.GDP",
"WE.Employment..total.1","WE.External.debt..total..US.","WE.Foreign.direct.investment..US.","WE.GDP.per.capita..nominal..US.","WE.GDP..nominal..US.","WE.Government.balance..share.of.GDP","WE.Population..total","WE.Reserves..foreign.exchange..US.",
"WE.Reserves..months.of.import.cover","WE.Stockbuilding..real..share.of.GDP","WE.Visible.trade.balance..share.of.GDP","WE.Consumer.price.index","WE.Gross.government.debt..as.a...of.GDP.","WE.Industrial.production.index","WE.Interest.rate..short.term",
"WE.Interest.rate..Yield.on.10.year.Government.Debt.Securities....per.annum.","WE.Services.balance..as...of.GDP","WE.Share.price.index","WE.Unemployment.rate","WE.Capacity.utilisation","WE.Consumption..government..PPP.exchange.rate..nominal..US.","WE.Consumption..government..nominal..US.",
"WE.Consumption..government..nominal..share.of.GDP.1","WE.Consumption..private..PPP.exchange.rate..nominal..US.","WE.Exports..goods...services..constant.prices.and.exchange.rate..US.....of.World","WE.GDP..industry..real","WE.GVA.Agriculture.share.of.GVA","WE.GVA.Industry.share.of.GVA",
"WE.GVA.Manufacturing.of.GVA","WE.GVA.Services..share.of.GVA","WE.Gross.value.added.in.construction..real","WE.Gross.value.added.in.services..real","WE.Imports..goods...services..constant.prices.and.exchange.rate..US.....of.World","WE.Imports..goods..PPP.exchange.rate..nominal..US.",
"WE.Industrial.production.index.1","WE.Investment..government..nominal","WE.Investment..machinery...equipment..nominal","WE.Investment..private..non.residential.structures..nominal","WE.Investment..total.fixed.investment..nominal..US.",
"WE.Investment..total.fixed..nominal..share.of.GDP","WE.Net.investment..nominal..US.","WE.Output.gap","WE.Productivity..trend","WE.Stockbuilding..nominal..US.",
"WE.Stockbuilding..nominal..share.of.GDP","WE.Stockbuilding..real..annual.contribution.to.growth","WE.Trend.productivity.target","WE.World.trade.index","WE.House.price.index","WE.Housing.starts","WE.Interest.rate.on.building.society.mortgages","WE.Market.value.of.housing.stock..LCU",
"WE.Residential.property.transactions","WE.Stock.of.owner.occupied.houses","WE.Consumers..expenditure..durables..nominal","WE.Financial.liabilities..household.sector..as.a...of.disposable.income","WE.Liabilities..debt.other.than.loans..households","WE.Personal.consumer.credit",
"WE.Retail.sales..value.index","WE.Retail.sales..volume.index","WE.Savings..personal.sector.ratio")))
For example Instead of "WE.Residential.property.transactions" I want the list to return
"WE Residential property transactions"
Based on the structure, it is a recursive list, therefore, functions that loop over the nested list in a recursive way i.e. rapply or rrapply can be used and apply the gsub to match the . and replace with space (' ').
Note that . is a metacharacter that matches any character in regex mode (default case), thus we could match literally by either using fixed = TRUE (should be faster) or escape (\\.) or place it inside square brackets ([.])
library(rrapply)
list_AM2 <- rrapply(list_AM, f = function(x) gsub(".", " ", x, fixed = TRUE))
-ouput
> list_AM2
[[1]]
[[1]]$geo
[1] "EU"
[[1]]$sales
[1] "West Europe Sales"
[[1]]$indicator
[[1]]$indicator[[1]]
[1] "SA"
[[1]]$indicator[[2]]
[1] "NSA"
[[1]]$indicator[[3]]
[1] "composites_industry_value"
[[1]]$indicator[[4]]
[1] "DUCS"
[[1]]$indicator[[5]]
[1] "WUCS"
[[1]]$indicator[[6]]
[1] "T30"
[[1]]$indicator[[7]]
[1] "Rovings"
[[1]]$indicator[[8]]
[1] "Mats"
[[1]]$indicator[[9]]
[1] "WE Construction Gross output sales Real USD"
[[1]]$indicator[[10]]
[1] "WE Construction Production index"
[[1]]$indicator[[11]]
[1] "WE Glass Gross operating surplus profits Nominal USD"
[[1]]$indicator[[12]]
[1] "WE Glass Gross output sales Nominal USD"
[[1]]$indicator[[13]]
[1] "WE Glass Investment Nominal USD"
[[1]]$indicator[[14]]
[1] "WE Glass Production index"
[[1]]$indicator[[15]]
[1] "WE Glass Value added output As a percent of GDP"
[[1]]$indicator[[16]]
[1] "WE Glass Value added output As a percent of manufacturing"
[[1]]$indicator[[17]]
[1] "WE Glass Value added output As a percent of world total"
[[1]]$indicator[[18]]
[1] "WE Industrial Production Gross operating surplus profits Nominal USD"
[[1]]$indicator[[19]]
[1] "WE Industrial Production Gross output sales Nominal USD"
[[1]]$indicator[[20]]
[1] "WE Glass Investment Nominal USD"
[[1]]$indicator[[21]]
[1] "WE Glass Production index"
[[1]]$indicator[[22]]
[1] "WE Glass Value added output As a percent of GDP"
[[1]]$indicator[[23]]
[1] "WE Glass Value added output As a percent of manufacturing"
[[1]]$indicator[[24]]
[1] "WE Glass Value added output As a percent of world total"
[[1]]$indicator[[25]]
[1] "WE Industrial Production Gross operating surplus profits Nominal USD"
[[1]]$indicator[[26]]
[1] "WE Industrial Production Gross output sales Nominal USD"
[[1]]$indicator[[27]]
[1] "WE Industrial Production Production index"
[[1]]$indicator[[28]]
[1] "WE Industrial Production Value added output As a percent of GDP"
[[1]]$indicator[[29]]
[1] "WE Industrial Production Value added output As a percent of world total"
[[1]]$indicator[[30]]
[1] "WE Manufacturing Gross operating surplus profits Nominal USD"
[[1]]$indicator[[31]]
[1] "WE Manufacturing Gross output sales Nominal USD"
[[1]]$indicator[[32]]
[1] "WE Manufacturing Investment Nominal USD"
[[1]]$indicator[[33]]
[1] "WE Manufacturing Production index"
[[1]]$indicator[[34]]
[1] "WE Manufacturing Value added output As a percent of GDP"
[[1]]$indicator[[35]]
[1] "WE Manufacturing Value added output As a percent of world total"
[[1]]$indicator[[36]]
[1] "WE Current account of balance of payments in US share of GDP"
[[1]]$indicator[[37]]
[1] "WE Employment total 1"
[[1]]$indicator[[38]]
[1] "WE External debt total US "
[[1]]$indicator[[39]]
[1] "WE Foreign direct investment US "
[[1]]$indicator[[40]]
[1] "WE GDP per capita nominal US "
[[1]]$indicator[[41]]
[1] "WE GDP nominal US "
[[1]]$indicator[[42]]
[1] "WE Government balance share of GDP"
[[1]]$indicator[[43]]
[1] "WE Population total"
[[1]]$indicator[[44]]
[1] "WE Reserves foreign exchange US "
[[1]]$indicator[[45]]
[1] "WE Reserves months of import cover"
[[1]]$indicator[[46]]
[1] "WE Stockbuilding real share of GDP"
[[1]]$indicator[[47]]
[1] "WE Visible trade balance share of GDP"
[[1]]$indicator[[48]]
[1] "WE Consumer price index"
[[1]]$indicator[[49]]
[1] "WE Gross government debt as a of GDP "
[[1]]$indicator[[50]]
[1] "WE Industrial production index"
[[1]]$indicator[[51]]
[1] "WE Interest rate short term"
[[1]]$indicator[[52]]
[1] "WE Interest rate Yield on 10 year Government Debt Securities per annum "
[[1]]$indicator[[53]]
[1] "WE Services balance as of GDP"
[[1]]$indicator[[54]]
[1] "WE Share price index"
[[1]]$indicator[[55]]
[1] "WE Unemployment rate"
[[1]]$indicator[[56]]
[1] "WE Capacity utilisation"
[[1]]$indicator[[57]]
[1] "WE Consumption government PPP exchange rate nominal US "
[[1]]$indicator[[58]]
[1] "WE Consumption government nominal US "
[[1]]$indicator[[59]]
[1] "WE Consumption government nominal share of GDP 1"
[[1]]$indicator[[60]]
[1] "WE Consumption private PPP exchange rate nominal US "
[[1]]$indicator[[61]]
[1] "WE Exports goods services constant prices and exchange rate US of World"
[[1]]$indicator[[62]]
[1] "WE GDP industry real"
[[1]]$indicator[[63]]
[1] "WE GVA Agriculture share of GVA"
[[1]]$indicator[[64]]
[1] "WE GVA Industry share of GVA"
[[1]]$indicator[[65]]
[1] "WE GVA Manufacturing of GVA"
[[1]]$indicator[[66]]
[1] "WE GVA Services share of GVA"
[[1]]$indicator[[67]]
[1] "WE Gross value added in construction real"
[[1]]$indicator[[68]]
[1] "WE Gross value added in services real"
[[1]]$indicator[[69]]
[1] "WE Imports goods services constant prices and exchange rate US of World"
[[1]]$indicator[[70]]
[1] "WE Imports goods PPP exchange rate nominal US "
[[1]]$indicator[[71]]
[1] "WE Industrial production index 1"
[[1]]$indicator[[72]]
[1] "WE Investment government nominal"
[[1]]$indicator[[73]]
[1] "WE Investment machinery equipment nominal"
[[1]]$indicator[[74]]
[1] "WE Investment private non residential structures nominal"
[[1]]$indicator[[75]]
[1] "WE Investment total fixed investment nominal US "
[[1]]$indicator[[76]]
[1] "WE Investment total fixed nominal share of GDP"
[[1]]$indicator[[77]]
[1] "WE Net investment nominal US "
[[1]]$indicator[[78]]
[1] "WE Output gap"
[[1]]$indicator[[79]]
[1] "WE Productivity trend"
[[1]]$indicator[[80]]
[1] "WE Stockbuilding nominal US "
[[1]]$indicator[[81]]
[1] "WE Stockbuilding nominal share of GDP"
[[1]]$indicator[[82]]
[1] "WE Stockbuilding real annual contribution to growth"
[[1]]$indicator[[83]]
[1] "WE Trend productivity target"
[[1]]$indicator[[84]]
[1] "WE World trade index"
[[1]]$indicator[[85]]
[1] "WE House price index"
[[1]]$indicator[[86]]
[1] "WE Housing starts"
[[1]]$indicator[[87]]
[1] "WE Interest rate on building society mortgages"
[[1]]$indicator[[88]]
[1] "WE Market value of housing stock LCU"
[[1]]$indicator[[89]]
[1] "WE Residential property transactions"
[[1]]$indicator[[90]]
[1] "WE Stock of owner occupied houses"
[[1]]$indicator[[91]]
[1] "WE Consumers expenditure durables nominal"
[[1]]$indicator[[92]]
[1] "WE Financial liabilities household sector as a of disposable income"
[[1]]$indicator[[93]]
[1] "WE Liabilities debt other than loans households"
[[1]]$indicator[[94]]
[1] "WE Personal consumer credit"
[[1]]$indicator[[95]]
[1] "WE Retail sales value index"
[[1]]$indicator[[96]]
[1] "WE Retail sales volume index"
[[1]]$indicator[[97]]
[1] "WE Savings personal sector ratio"
If there are multiple .s, can use \\.+ i.e. one or more and replace with ' '
list_AM2 <- rrapply(list_AM, f = function(x) gsub("\\.+", " ", x))
I have the following data that I obtained from a .txt file using the read_lines function from readr
txtread<-read_lines("expenses_copy1.txt")
txtread
[1] "Amount:Category:Date:Description"
[2] "5.25:supply:20170222:box of staples"
[3] "79.81:meal:20170222:lunch with ABC Corp. clients Al, Bob, and Cy"
[4] "43.00:travel:20170222:cab back to office"
[5] "383.75:travel:20170223:flight to Boston, to visit ABC Corp."
[6] "55.00:travel:20170223:cab to ABC Corp. in Cambridge, MA"
[7] "23.25:meal:20170223:dinner at Logan Airport"
[8] "318.47:supply:20170224:paper, toner, pens, paperclips, tape"
[9] "142.12:meal:20170226:host dinner with ABC clients, Al, Bob, Cy, Dave, Ellie"
[10] "303.94:util:20170227:Peoples Gas"
[11] "121.07:util:20170227:Verizon Wireless"
[12] "7.59:supply:20170227:Python book (used)"
[13] "79.99:supply:20170227:spare 20\" monitor"
[14] "49.86:supply:20170228:Stoch Cal for Finance II"
[15] "6.53:meal:20170302:Dunkin Donuts, drive to Big Inc. near DC"
[16] "127.23:meal:20170302:dinner, Tavern64"
[17] "33.07:meal:20170303:dinner, Uncle Julio's"
[18] "86.00:travel:20170304:mileage, drive to/from Big Inc., Reston, VA"
[19] "22.00:travel:20170304:tolls"
[20] "378.81:travel:20170304:Hyatt Hotel, Reston VA, for Big Inc. meeting"
I want to read each of these in to vectors that are "Amount", "Category", "Date" and "Description" and create a dataframe out of them so that I have a dataset I can work with
I tried the following
for (i in length(txtread) ) {
data<-read.table(textConnection(txtread[[i]]))
print(data)
}
However this does't seem to work.
how can I read this data into a dataframe in R
library(rvest)
jobbank <- read_html("https://www.jobbank.gc.ca/LMI_bulletin.do?cid=3373&AREA=0007&INDUSTRYCD=&EVENTCD=")
Error in open.connection(x, "rb") :
Timeout was reached: Connection timed out after 10015 milliseconds
jobbank %>%
html_node(".lmiBox") %>%
html_text()
Error in eval(lhs, parent, parent) : object 'jobbank' not found
I'm trying to find keywords from the news section of the websites but it seems to be showing me these 2 error messages.
Seems to be working fine on my side.
library(rvest)
#> Loading required package: xml2
library(stringr)
jobbank <- read_html("https://www.jobbank.gc.ca/LMI_bulletin.do?cid=3373&AREA=0007&INDUSTRYCD=&EVENTCD=")
jobbank %>%
html_node(".lmiBox") %>%
html_text() %>%
str_split("(\r\\n+\\s+)|(\\n\\s+)")
#> [[1]]
#> [1] ""
#> [2] "Week of Jan 14 - Jan 18, 2019Lowe's Canada is looking to hire about 2,650 full-time, part-time and seasonal staff at its stores in Ontario. The company will hold a National Hiring Day on February 23."
#> [3] "The Ministry of Innovation, Science, and Economic Development announced $5M in funding to support automotive innovation at APAG Elektronik Corp. and Service Mold + Aerospace Inc. in Windsor, creating 160 jobs"
#> [4] "A $1M investment by the provincial government into Kenora's Downtown Revitalization Project for a plaza and infrastructure upgrades will create 75 new jobs"
#> [5] "Redfin Corp., an American real estate brokerage, is expanding into Canada and hiring in Toronto"
#> [6] "The construction of townhomes at Walkerville Stones in Windsor is expected to begin this spring "
#> [7] "The Ontario Emerging Jobs Institute (OEJI) at the Nav Centre in Cornwall opened. The OEJI provides skills training in areas with worker shortages."
#> [8] "The Chartwell Meadowbrook Retirement Residence in Lively broke ground on their expansion project, which includes 41 new suites and 14 town homes"
#> [9] "Lambton College created an Information Technology and Communication Research Centre using a five-year, $2M grant from the Natural Sciences and Engineering Research Council of Canada. They hope to use part of the funding to employ students."
#> [10] "SnapCab, a workspace pod manufacturer in Kingston, has grown from 20 to 25 employees with more hiring expected to occur in 2019"
#> [11] "Niagara Pallet & Recyclers Ltd., a manufacturer of pallets and shipping materials in Smithville, is hiring general labour workers, AZ and DZ drivers, production staff, forklift drivers and saw operators"
#> [12] "A1 Demolition will begin demolition of the former Maliboo Club in Simcoe. The plan is to rebuild the structure with residential and commercial space."
#> [13] "MidiCi: The Neapolitan Pizza Co., Sweet Jesus, La Carnita and The Pie Commission will be among several restaurants opening in the 34,000-sq.-ft. Food District in Mississauga this spring "
#> [14] "Menkes Developments Ltd., in partnership with TD Greystone Asset Management, will renovate the former Canada Permanent Trust Building in Toronto. Work on the 270,000-sq.-ft. space is expected to take between 12 and 18 months."
#> [15] "Westmount Signs & Printing in Waterloo is hiring experienced installers after doubling the size of its workforce to 24 employees in the last year and a half"
#> [16] "Microbrewery, Heral Haus Brewing Co. opened in Stratford at the end of December"
#> [17] "Demolition is expected to start this month on Windsor's old City Hall and is expected to be complete by August"
#> [18] "Urban Planet, a clothing store, will open as early as February 2019 at the Cornwall Square mall in Cornwall"
#> [19] "The federal government committed $3.5M towards the construction of a new art gallery in Thunder Bay, bringing total government funding for the project to $27.5M"
#> [20] "The Rec Room, a 44,000-sq.-ft. entertainment complex by Cineplex Entertainment LP, is scheduled to open in Mississauga in March "
#> [21] "Yang Teashop opened a second location in Toronto with plans to open two more locations in the Greater Toronto Area"
#> [22] "Spacecraft Brewery opened in Sudbury"
#> [23] "The Town of Lakeshore will be accepting applications for 11 summer student positions until March 1"
#> [24] "Virtual reality arcade Cntrl V opened in Lindsay"
#> [25] "A new restaurant, Presqu'ile Café and Burger, opened in Brighton"
#> [26] "Beauty brand Morphe LLC opened a store in Mississauga"
#> [27] "Footwear retailer Brown Shoe Company of Canada Ltd. Inc. will open an outlet store in Halton Hills in April"
#> [28] "The Westdale Theatre in Hamilton is scheduled to reopen in February "
#> [29] "Early ON/Family Grouping will open a child care centre in Monkton"
#> [30] "The De Novo addiction treatment centre opened in Huntsville "
#> [31] "French Revolution Bakery & Crêperie opened in Dundas"
#> [32] "A Williams Fresh Cafe is slated to open in Stoney Creek, one of three new locations opening this year in southwestern Ontario"
#> [33] "Monigram Coffee Midtown cafe will open in Kitchener this winter "
#> [34] "My Roti Place opened a fourth restaurant in Toronto"
#> [35] "A Gangster Cheese restaurant opened in Whitby"
#> [36] "A Copper Branch restaurant opened in Mississauga "
#> [37] "Hallmark Canada will exit about 20 company-owned stores across Canada in 2019 by either transitioning them to independent ownership or closing them. The loacations of the affected stores have not been identified."
#> [38] "Lush Cosmetics at the Intercity Shopping Centre in Thunder Bay will close at the end of January"
#> [39] ""
Created on 2019-01-28 by the reprex package (v0.2.1)
I am new to the R language, for my assignment, I am trying to generate several levels dummies for different variables(total in 3). however, each approach i got problem:
method1: followed by https://stats.idre.ucla.edu/r/modules/coding-for-categorical-variables-in-regression-models/
The code:
> housing_prices2$Fuel.Type.f <- factor(housing_prices2$Fuel.Type)
> is.factor(housing_prices2$Fuel.Type.f)
[1] TRUE
> housing_prices2$Fuel.Type.f[1:10]
[1] Electric Gas Gas Gas Gas Gas Oil
[8] Oil Electric Gas
Levels: Electric Gas None Oil Solar Unknown/Other Wood
works well. However, when I got problem in next line:
> summary(lm(write ~ Fuel.Type.f, data = housing_prices2))
Error in model.frame.default(formula = write ~ Fuel.Type.f, data = housing_prices2,: object is not a matrix
I just have no idea about this error and it doesn't make sense to me, so I decided to use another method;
method2: followed by Convert categorical variables to numeric in R
for variable Fuel.Type, it works well:
> Fuel.Type <- as.factor(c("Electric", "Gas", "None", "Oil", "Solar", "Unknown/Other",
+ "Wood"))
> Fuel.Type
[1] Electric Gas None Oil Solar
[6] Unknown/Other Wood
Levels: Electric Gas None Oil Solar Unknown/Other Wood
> unclass(Fuel.Type)
[1] 1 2 3 4 5 6 7
attr(,"levels")
[1] "Electric" "Gas" "None" "Oil"
[5] "Solar" "Unknown/Other" "Wood"
but when I try to generate dummies for other variables, then i got this error:
> housing_prices2$Heat.Type.f[1:10]
NULL
Warning message:
Unknown or uninitialised column: 'Heat.Type.f'.
I have clueless about what's going on about these error either...
any suggestions are appreciated!
BTW, here is my sample data table:
>$ Fuel.Type : chr "Electric" "Gas" "Gas" "Gas"
>$ Heat.Type : chr "Electric" "Hot Water" "Hot Water" "Hot Air"
>$ Sewer.Type : chr "Private" "Private" "Public" "Private"
I figured out my problem last night.
The problem is that I messed up the datafile, since i create a new data file named:
hp2 <- read_excel("Desktop/hw/424/hw1/housing_prices2.xlsx")
In addition, I messed up the Y variable as well, see
summary(lm(write ~ Fuel.Type.f, data = housing_prices2))
My Y variable actually is not write.
This question already has answers here:
How do I make a list of data frames?
(10 answers)
Closed 5 years ago.
I have a csv data file with 50000+ records stored in dataframe 'data'. I am creating data subsets based on 2 factors Segment & Market with below values:
customer_segments <- c('Consumer','Corporate','Home Office')
markets <- c('Africa','APAC','Canada','EMEA','EU','LATAM','US')
To get all subsets with 21 combinations for Market & Segement, I am using below nested for loops with assign & paste functions:
for(i in 1:length(markets)){
for(j in 1:length(customer_segments)){
assign(paste(markets[i],customer_segments[j],sep='_'),data[(data$Market == markets[i]) & (data$Segment == customer_segments[j]), ])
}
}
This creates 21 dataframes & assign them a name accordingly like Canada_Home Office etc.
Problem is I want to iterate over all these 21 dataframes to aggregate 3 attributes: Sales, Quantity & Profit on each but not sure how to address these dataframes in a loop? Maybe if I get all 21 dataframes in a vector I can iterate, but not sure if this is the best option.
Create combination of markets and customer_segments using expand.grid().
df <- expand.grid(markets, customer_segments)
head(df)
# Var1 Var2
# 1 Africa Consumer
# 2 APAC Consumer
# 3 Canada Consumer
# 4 EMEA Consumer
# 5 EU Consumer
# 6 LATAM Consumer
Vector of the combination of markets and customer_segments
df1 <- as.vector(paste(df$Var1,df$Var2, sep = " "))
df1
# [1] "Africa Consumer" "APAC Consumer" "Canada Consumer"
# [4] "EMEA Consumer" "EU Consumer" "LATAM Consumer"
# [7] "US Consumer" "Africa Corporate" "APAC Corporate"
# [10] "Canada Corporate" "EMEA Corporate" "EU Corporate"
# [13] "LATAM Corporate" "US Corporate" "Africa Home Office"
# [16] "APAC Home Office" "Canada Home Office" "EMEA Home Office"
# [19] "EU Home Office" "LATAM Home Office" "US Home Office"