Web scraping India Energy Dashboard data - web-scraping

I am trying to web scrape India Energy Dashboard (https://www.niti.gov.in/edm/#elecGeneration) data using Python. Then, when I click on download, the website returns error NET::ERR_CERT_DATE_INVALID. I guess, because of this I am not getting response 200 message. I tried with Tableauscraper library too, but I am getting error NoneType has no attribute text.
I am writing the following code:
#!pip install TableauScraper
from tableauscraper import TableauScraper as TS
url = "https://public.tableau.com/app/profile/niti.energy.vertical/viz/ElectricityGeneration_0/Source"
ts = TS()
ts.loads(url)

You need to inspect the Network tab in your browser's Dev tools, and get the correct url for the data source. Here is one way to obtain that data:
from tableauscraper import TableauScraper as TS
url = 'https://public.tableau.com/views/ElectricityGeneration_0/Source?%3Adisplay_static_image=y&%3AbootstrapWhenNotified=true&%3Aembed=true&%3Alanguage=en-US&:embed=y&:showVizHome=n&:apiID=host0'
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()
for t in workbook.worksheets:
print(f"worksheet name : {t.name}") #show worksheet name
print(t.data) #show dataframe for this worksheet
Result in terminal:
worksheet name : Generation Trend by Source
Year Name-value Year Name-alias Year Name-[federated.0hpknup10wcqib1b9qd9s1xn749g].[none:YearName:nk]-value Year Name-[federated.0hpknup10wcqib1b9qd9s1xn749g].[none:YearName:nk]-alias SUM(Generation TWh)-value SUM(Generation TWh)-alias SUM(Generation TWh)-[federated.0hpknup10wcqib1b9qd9s1xn749g].[sum:Calculation_3893502658938839040:qk]-value SUM(Generation TWh)-[federated.0hpknup10wcqib1b9qd9s1xn749g].[sum:Calculation_3893502658938839040:qk]-alias Energy Source-alias
0 FY06 FY06 FY06 FY06 697.06083 697.06083 0.22062 0.22062 WIND
1 FY07 FY07 FY07 FY07 751.53005 751.53005 0.21588 0.21588 WIND
2 FY08 FY08 FY08 FY08 809.263687 809.263687 11.065371 11.065371 WIND
3 FY09 FY09 FY09 FY09 838.682997 838.682997 13.19954 13.19954 WIND
4 FY10 FY10 FY10 FY10 898.527489 898.527489 15.171851 15.171851 WIND
... ... ... ... ... ... ... ... ... ...
150 0 0 FY16 FY16 0 0 16.680499 16.680499 BIOMASS-BAGASSE
151 0 0 FY17 FY17 0 0 14.15864 14.15864 BIOMASS-BAGASSE
152 0 0 FY18 FY18 0 0 15.2523 15.2523 BIOMASS-BAGASSE
153 0 0 FY19 FY19 0 0 16.326489 16.326489 BIOMASS-BAGASSE
154 0 0 FY20 FY20 0 0 13.742429 13.742429 BIOMASS-BAGASSE
155 rows × 9 columns
worksheet name : Generation by Source in
Energy Source-alias SUM(Generation TWh)-alias SUM(Generation GWh)-alias SUM(Generation GWh)-[federated.0hpknup10wcqib1b9qd9s1xn749g].[pcto:sum:Generation_GWh:qk]-alias
0 NUCLEAR 46.47245 46,472MWh 0.028636
1 HYDRO 156.117158 156,117MWh 0.096197
2 COAL 1199.742768 1,199,743MWh 0.739262
3 BIOMASS-BAGASSE 13.742429 13,742MWh 0.008468
4 DIESEL 2.027548 2,028MWh 0.001249
5 NATURAL GAS 73.885792 73,886MWh 0.045527
6 RENEWABLES 0.365895 366MWh 0.000225
7 SMALL HYDRO 9.451229 9,451MWh 0.005824
8 SOLAR 51.938299 51,938MWh 0.032004
9 WIND 69.149642 69,150MWh 0.042609
For documentation, please see https://github.com/bertrandmartel/tableau-scraping

Related

Problems to separate data

I have the FreqAnual.
Fêmea Macho
Abril 3 0
Agosto 1 0
Dezembro 7 0
Fevereiro 6 4
Janeiro 6 4
Julho 1 0
Junho 5 0
Maio 3 0
Março 20 2
Novembro 4 1
Outubro 3 0
It also comes from a dataset from Excel, in which is the column "Mes", and a row for every register, and another row for sex, that comes to be Fêmea and Macho.
I used the FreqAnual <- table(Dados_procesados$Mes, Dados_procesados$Sexo) .
So i tried FreqJan <- Dados_Procesados [Mes == Janeiro, ], also the one with the $ before Mes, and just get the result
FreqJan <- Dados_Procesados [Mes = Janeiro, ]
Error: object 'Dados_Procesados' not found
What can I do? Also the subtable didn't work
I was expecting something like
Fêmea Macho
Janeiro 6 4
I need it that way, so I can do G test monthly to find the sex ratio, and if there were significant differences

How can I create a vector with values that are obtained by a function that returns different values for every row?

I have a function club_points(club) that returns me the total points of the club. Now I want to make a data frame with club on the rows and the club_points values of the respective club in the columns. Is there a way to iterate my function in order to automatically assign the points in the same row as the club?
After some research I believe I have to use the apply family... but since I am new I dont know how to do it
teams total_points
1 Rio Ave 0
2 Moreirense 0
3 Sp Lisbon 0
4 Tondela 0
5 Boavista 0
6 Guimaraes 0
7 Setubal 0
8 Estoril 0
9 Belenenses 0
10 Chaves 0
11 Maritimo 0
12 Pacos Ferreira 0
13 Porto 0
14 Arouca 0
15 Benfica 0
16 Feirense 0
17 Sp Braga 0
18 Nacional 0
this the current format of my dataframe final_pos, but i would like to iterate the club_points function in the total_points column
Do you mean something like
final_pos$total_points <- Vectorize(club_points, "club")(final_pos$teams)
or
final_pos$total_points <- sapply(final_pos$teams,club_points)

create matrix from raw data

My data looks like this:
> head(data, 20)
# A tibble: 20 x 2
hosp zip
<chr> <chr>
1 010001 14843
2 010001 36303
3 010016 13320
4 010021 10468
5 010023 36040
6 010023 36116
7 010023 36116
8 010023 36116
9 010024 36401
10 010029 10025
11 010029 11412
12 010029 11733
13 010033 14086
14 010033 14701
15 010033 35244
16 010034 12308
17 010038 11413
18 010039 10011
19 010039 11704
20 010039 35749
hospis hospital id and zip is zip code. Patients in each hospital came from multiple zip codes. How can I create a matrix to present for each hospital, how many patients were from each zip code?
Ideal matrix would be like this:
zip 010001 010016 010021 ... hosp
14843 1 0 0
36303 1 0 0
13320 0 1 0
10468 0 0 1
Thanks!!
As was stated in the comments you can use table. The t() function puts zip code on the left:
t(as.matrix(table(data)))

Converting from long to wide in R -- values not being filled in cells

I've perused through countless posts and could not find a solution to what I'm experiencing. I have some stock data (in long format) that I wish to convert into wide format. I'm using the cast function in the reshape package, but for some reason, the cells are being filled with 1 and 0 instead of the the adjusted closing prices. This is a sample of my dataset:
Ticker days Adj.Close
ABC -100 44
ABC -99 43
ABC -98 43.4
ABC -97 44.3
... ... ...
When I use cast(df, Ticker ~ days, value = "Adj.Close"), I get the right structure, but the cells are filled with 1 and 0 like this:
Ticker -100 -99 -98 -97 ...
ABC 1 0 0 1
DEF 0 0 0 1
GHI 1 1 0 0
Does anyone have any idea how to fix this? Thanks so much.

Feature selection with "fscaret" in R issue with empty list()

I have a data frame that has a lot of factor and numeric variables and I'm trying to use fscaret package for feature selection.
Sample of the data:
age clerical construc educ earns74 gdhlth inlf leis1 rlxal
32 0 0 12 0 0 1 3529 3479
40 1 0 14 9500 1 1 3929 3329
20 1 1 10 329 0 0 5300 2309
22 0 0 6 602 1 0 5205 4290
I try to run the following code:
fsMod <- c("gbm", "treebag", "ridge", "lasso", "Boruta", "glm")
myFS<-fscaret(train.sleepDF, test.sleepDF, myTimeLimit = 40, preprocessData=TRUE, Used.funcRegPred = 'fsMod', with.labels=TRUE,
supress.output=FALSE, no.cores=2)
And then finding the myFS$VarImp I get list()
I've read the annotation to the package and they also report on that issue, not giving a clear solution to it. There appears to be a troublesome method in the calculations, however, how do I identificate it?
Is there a way to solve the problem?
Any help is greatly appreciated.

Resources