I am unable to import in R a downloaded xls file

I am unable to import in R a downloaded xls file - r

I am trying to directly import the .xls file that comes from this link (French electricity distributor).
I have built, based on this question, the folloning code :
library(rio)
Chemin = "F:/DGTresor/00.Refontes/06.Electricite_HauteFrequence" #WhateverPath
## RTE mois en cours
temporaire <- tempfile()
download.file("https://eco2mix.rte-france.com/download/eco2mix/eCO2mix_RTE_En-cours-TR.zip",temporaire)
unzip(zipfile=temporaire,
files = "eCO2mix_RTE_En-cours-TR.xls",
exdir=Chemin)
RTE_EnCours <- import(paste0(Chemin,"/eCO2mix_RTE_En-cours-TR.xls"))
The file exists, but I am unable to read it. I get the following error : libxls error: Unable to open file

I am not sure why it is happening but when I try to open the .xls file manually, it gives an error like "The file format and its extension does not match" etc. To solve the issue, I converted the file extension to .csv with the codes below.
file.rename(paste0(Chemin,"/eCO2mix_RTE_En-cours-TR.xls"), paste0(Chemin,"/eCO2mix_RTE_En-cours-TR.csv"))
After that, importing the file works,
# to prevent the shifting, header=FALSE should be applied
RTE_EnCours<- read.csv(paste0(Chemin,"/eCO2mix_RTE_En-cours-TR.csv"),sep="\t",header=FALSE,row.names=NULL)
# canceling out the last column which is full NA
RTE_EnCours <- RTE_EnCours[,-ncol(RTE_EnCours)]
# assigning the first row as the column names
colnames(RTE_EnCours) <-as.character(unlist(RTE_EnCours[1,]))
# removing the first row
RTE_EnCours <- RTE_EnCours[-1,]
head(RTE_EnCours)
gives,
Périmètre Nature Date Heures Consommation Prévision J-1 Prévision J Fioul Charbon Gaz Nucléaire Eolien Solaire Hydraulique
2 France Données temps réel 2020-10-01 00:00 46957 46500 47100 134 286 4524 35004 4327 0 4645
3 France Données temps réel 2020-10-01 00:15 46342 45350 45950 149 318 4727 35278 4336 0 4953
4 France Données temps réel 2020-10-01 00:30 44689 44200 44800 149 304 4380 34732 4428 0 4580
5 France Données temps réel 2020-10-01 00:45 43277 42950 43700 165 308 4244 34644 4528 0 4147
6 France Données temps réel 2020-10-01 01:00 42511 41700 42600 165 302 4012 34780 4488 0 4096
7 France Données temps réel 2020-10-01 01:15 42714 41650 42750 165 297 4114 35145 4630 0 3758
Pompage Bioénergies Ech. physiques Taux de Co2 Ech. comm. Angleterre Ech. comm. Espagne Ech. comm. Italie Ech. comm. Suisse
2 -751 1087 -2299 58 179 -914 -1732 -1283
3 -750 1055 -3724 59
4 -920 1045 -4009 58 179 -914 -1732 -1283
5 -1861 1048 -3946 59
6 -1857 1039 -4514 56 497 -1759 -2279 -2217
7 -2005 1037 -4427 57
Ech. comm. Allemagne-Belgique Fioul - TAC Fioul - Cogén. Fioul - Autres Gaz - TAC Gaz - Cogén. Gaz - CCG Gaz - Autres
2 -79 0 21 113 -2 585 3941 0
3 0 21 128 -1 580 4148 0
4 -159 0 21 128 -1 580 3801 0
5 0 21 144 -1 582 3663 0
6 1252 0 21 144 -1 579 3434 0
7 0 21 144 -1 581 3534 0
Hydraulique - Fil de l?eau + éclusée Hydraulique - Lacs Hydraulique - STEP turbinage Bioénergies - Déchets Bioénergies - Biomasse
2 3355 1288 2 183 447
3 3336 1615 2 174 435
4 3242 1338 0 174 434
5 3155 992 0 174 437
6 3060 1036 0 172 434
7 2992 766 0 177 436
Bioénergies - Biogaz
2 301
3 294
4 294
5 294
6 294
7 294
>

Related

How to add a string to each cell of a row in a R data table?

I'm facing an issue which looks very basic, but I'm not able to find the solution.
I have a simple table :
Statut occupation
1983
1988
1996
2002
2007
2012
2017
Propriétaire du logement
207
267
305
363
468
597
482
Locataire
35
40
33
52
50
61
60
Locataire de l'habitat social (OPH, OTHS)
0
0
0
0
0
2
0
Logé gratuitement (parents, amis, employeurs)
39
47
69
99
57
87
98
Total général
281
354
407
514
575
745
640
I want to get this result :
Statut occupation
1983
1988
1996
2002
2007
2012
2017
Propriétaire du logement
207
267
305
363
468
597
482
Locataire
35
40
33
52
50
61
60
Locataire de l'habitat social (OPH, OTHS)
0
0
0
0
0
2
0
Logé gratuitement (parents, amis, employeurs)
39
47
69
99
57
87
98
Total général
281
354
407
514
575
745
640
The purpose is just add a formatting (italic, underline, add unbreakable spaces...) on all the cells of one row. It looks like it's not that easy in R.
What I've tried
I tried to get the name if each column and modify the cell corresponding in a for loop
n.row=3
cols=colnames(Y)
for (i in 1:ncol(Y)){
Y[n.row,get(cols[i])]<-as.data.table(sprintf("*%s*",as.character(Y[n.row,get(cols[i])])))
}
The problem here, is that Y[n.row,get(cols[i])] always return "Statut occupation" (the column name), whatever the value of n.row. Why ?
I also tried to make it working with the id of the column directly :
n.row=3
for (i in 1:ncol(Y)){
Y[n.row,..i]<-sprintf("*%s*",Y[n.row,..i])
}
Here :
Y[n.row,..i] is giving me the proper information...
sprintf("%s",as.character(Y[n.row,..i])) is giving the proper string whatever the class of the column...
But Y[n.row,..i]<-sprintf("%s",as.character(Y[n.row,..i])) returns
Error in [<-.data.table(*tmp*, n.row, ..i, value = " Locataire de l'habitat social (OPH, OTHS)") :
object '..i' not found
I don't understand the behaviour here. All information is properly findable one by one but I cannot assign one to another because it's suddenly not findable anymre.
Any explanation would be appreciated, or maybe I'm not using to proper strategy to do what I need :) !
Thanks for your help !

Notice that *tmp* is a character, so all columns should keep the same type.
library(data.table)
Y <- fread("Statut occupation 1983 1988 1996 2002 2007 2012 2017
Propriétaire du logement 207 267 305 363 468 597 482
Locataire 35 40 33 52 50 61 60
Locataire de l'habitat social (OPH, OTHS) 0 0 0 0 0 2 0
Logé gratuitement (parents, amis, employeurs) 39 47 69 99 57 87 98
Total général 281 354 407 514 575 745 640",header = T,colClasses = 'character')
n.row <- 3
Y[n.row, names(Y) := as.list(sprintf("*%s*",Y[n.row,]))]
output:
Statut occupation 1983 1988 1996 2002 2007 2012 2017
<char> <char> <char> <char> <char> <char> <char> <char>
1: Propriétaire du logement 207 267 305 363 468 597 482
2: Locataire 35 40 33 52 50 61 60
3: *Locataire de l'habitat social (OPH, OTHS)* *0* *0* *0* *0* *0* *2* *0*
4: Logé gratuitement (parents, amis, employeurs) 39 47 69 99 57 87 98
5: Total général 281 354 407 514 575 745 640

How can I call for something in a data.frame when the destinction has to be done in two columns?

Sorry for the very specific question, but I have a file as such:
Adj Year man mt wm wmt by bytl gr grtl
3 careless 1802 0 126 0 54 0 13 0 51
4 careless 1803 0 166 0 72 0 1 0 18
5 careless 1804 0 167 0 58 0 2 0 25
6 careless 1805 0 117 0 5 0 5 0 7
7 careless 1806 0 408 0 88 0 15 0 27
8 careless 1807 0 214 0 71 0 9 0 32
...
560 mean 1939 21 5988 8 1961 0 1152 0 1512
561 mean 1940 20 5810 6 1965 1 914 0 1444
562 mean 1941 10 6062 4 2097 5 964 0 1550
563 mean 1942 8 5352 2 1660 2 947 2 1506
564 mean 1943 14 5145 5 1614 1 878 4 1196
565 mean 1944 42 5630 6 1939 1 902 0 1583
566 mean 1945 17 6140 7 2192 4 1004 0 1906
Now I have to call for specific values (e.g. [careless,1804,man] or [mean, 1944, wmt].
Now I have no clue how to do that, one possibility would be to split the data.frame and create an array if I'm correct. But I'd love to have a simpler solution.
Thank you in advance!

Subsetting for specific values in Adj and Year column and selecting the man column will give you the required output.
df[df$Adj == "careless" & df$Year == 1804, "man"]

Selecting a subset of a sqlite database with dplyr

I'm trying to pull down a subset of rows in a sqlite database using dplyr. Since slice doesn't work with tbl_sql objects, I'm using the window function row_number. But I get the following error:
Source: sqlite 3.8.6
[/Library/Frameworks/R.framework/Versions/3.2/Resources/library/dplyr/db/nycflights13.sqlite]
Error in sqliteSendQuery(con, statement, bind.data) :
error in statement: no such function: ROW_NUMBER
dplyr version 0.4.3.9000, RSQLite version 1.0.0. Reproducible example:
library(dplyr)
library(nycflights13)
flights_sqlite <- tbl(nycflights13_sqlite(), "flights")
filter(flights_sqlite, row_number(month) == 1L) %>% collect()

Probably there's a more efficient and faster way, but head seems to do the job.
To extract first n rows, for instance first 10 records:
head(flights_sqlite, 10) %>% collect()
Output:
year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time distance hour minute
1 2013 1 1 517 2 830 11 UA N14228 1545 EWR IAH 227 1400 5 17
2 2013 1 1 533 4 850 20 UA N24211 1714 LGA IAH 227 1416 5 33
3 2013 1 1 542 2 923 33 AA N619AA 1141 JFK MIA 160 1089 5 42
4 2013 1 1 544 -1 1004 -18 B6 N804JB 725 JFK BQN 183 1576 5 44
5 2013 1 1 554 -6 812 -25 DL N668DN 461 LGA ATL 116 762 5 54
6 2013 1 1 554 -4 740 12 UA N39463 1696 EWR ORD 150 719 5 54
7 2013 1 1 555 -5 913 19 B6 N516JB 507 EWR FLL 158 1065 5 55
8 2013 1 1 557 -3 709 -14 EV N829AS 5708 LGA IAD 53 229 5 57
9 2013 1 1 557 -3 838 -8 B6 N593JB 79 JFK MCO 140 944 5 57
10 2013 1 1 558 -2 753 8 AA N3ALAA 301 LGA ORD 138 733 5 58
A percentage of the first rows
head(flights_sqlite, nrow(flights_sqlite)*0.1) %>% collect()
To subset any specific number of rows. For instance rows 578 and 579:
head(flights_sqlite, nrow(flights_sqlite))[578:579, ] %>% collect()
Output:
year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time distance hour minute
578 2013 1 1 1701 -9 2026 11 AA N3FUAA 695 JFK AUS 247 1521 17 1
579 2013 1 1 1701 1 1856 16 UA N418UA 689 LGA ORD 144 733 17 1

R: trouble reading dates and time

I have some problems in reading in date and time in a proper way, and I wonder why I get these problems. The problem is only on my windows installation of R. Running the exact same script on my UNIX installation works fine.
Basically, I want to read in a file with data and time as the second column, like this:
TrainData[[i]] = read.csv(TrainFiles[i],header=F, colClasses=c(NA,"POSIXct",rep(NA,8)))
colnames(TrainData[[i]])=c("comp","time","s1","s2","s3","s4","r1","r2","r3","r4")
However, only the dates are read, not the times, and my data looks like this:
comp time s1 s2 s3 s4 r1 r2 r3 r4
1 1 2009-08-18 711 630 69 600 689 20 40 1
2 5 2009-08-18 725 460 101 705 689 20 40 1
3 6 2009-08-18 711 505 69 678 689 20 40 1
4 1 2009-08-18 705 630 69 600 689 20 40 1
5 2 2009-08-18 734 516 101 671 689 20 40 1
6 3 2009-08-18 743 637 69 595 689 20 40 1
7 4 2009-08-18 730 577 101 633 689 20 40 1
8 2 2009-08-18 721 511 101 674 689 20 40 1
9 3 2009-08-18 747 563 101 642 689 20 40 1
10 4 2009-08-18 716 572 101 636 689 20 40 1
Running the exact same cond on UNIX returned both time and dates.
When I read in another file in the same script, with dates and times in the two first columns, I get the correct format of the date/time:
TrainData[[i]]=read.csv(TrainFiles[i],header=F, colClasses=c("POSIXct","POSIXct",NA))
colnames(TrainData[[i]])=c("start","end","fault")
returns
start end fault
1 2010-10-24 04:25:53 2010-10-24 11:22:33 6
2 2010-10-30 12:57:16 2010-11-02 12:29:54 6
3 2010-11-05 10:40:17 2010-11-05 11:59:51 6
4 2010-11-05 17:07:37 2010-11-06 14:30:01 6
5 2010-11-06 23:59:59 2010-11-07 00:14:49 6
6 2010-11-06 23:59:59 2010-11-07 00:14:49 6
7 2010-11-06 23:59:59 2010-11-07 00:14:49 6
8 2010-11-06 23:59:59 2010-11-07 00:14:49 6
9 2010-11-06 23:59:59 2010-11-07 00:14:50 6
10 2010-11-06 23:59:47 2010-11-07 00:14:51 6
Actually, I found a solution that works, eventually, but I wonder why I get these problems.
It appears that my Sys.timezone is set to "Europe/Berlin". If I set this to NA, the times will be read in as well, i.e. using Sys.setenv(tz=NA). If I then run the same code, my data looks like this:
comp time s1 s2 s3 s4 r1 r2 r3 r4
1 1 2009-08-18 18:12:00 711 630 69 600 689 20 40 1
2 5 2009-08-18 18:14:27 725 460 101 705 689 20 40 1
3 6 2009-08-18 18:14:31 711 505 69 678 689 20 40 1
4 1 2009-08-18 18:14:43 705 630 69 600 689 20 40 1
5 2 2009-08-18 18:14:47 734 516 101 671 689 20 40 1
6 3 2009-08-18 18:14:51 743 637 69 595 689 20 40 1
7 4 2009-08-18 18:15:00 730 577 101 633 689 20 40 1
8 2 2009-08-18 18:29:33 721 511 101 674 689 20 40 1
9 3 2009-08-18 18:29:37 747 563 101 642 689 20 40 1
10 4 2009-08-18 18:29:45 716 572 101 636 689 20 40 1
The other file still get times, but now consistently two hours different.
This is how the csv-files look like (basically, text separated by commas):
this is my file (basically text separated by commas):
1,2009-08-18 18:12:00,711,630,69,600,689,20,40,1
5,2009-08-18 8:14:27,725,460,101,705,689,20,40,1
6,2009-08-18 18:14:31,711,505,69,678,689,20,40,1
1,2009-08-18 18:14:43,705,630,69,600,689,20,40,1
2,2009-08-18 8:14:47,734,516,101,671,689,20,40,1
3,2009-08-18 18:14:51,743,637,69,595,689,20,40,1
4,2009-08-18 8:15:00,730,577,101,633,689,20,40,1
2,2009-08-18 8:29:33,721,511,101,674,689,20,40,1
3,2009-08-18 8:29:37,747,563,101,642,689,20,40,1
4,2009-08-18 8:29:45,716,572,101,636,689,20,40,1
Why am I having these problems with reading in the times? I would expect that it is not correct to use tz=NA, but this is the only way I found to work. Can anyone help me figure out why the times are ignored when tz = "Europe/Berlin"?
Is it generally adviced to put tz=NA when reading files like this? Even if this seems to work in reading in the times, the tz="NA" results in warning messages when I later want to work with the data:
Warning message:
In as.POSIXlt.POSIXct(x, tz) : unknown timezone 'NA'
Can anyone help me explain the differences I get?

Barplot using three columns

The data in the table is given below:
Year NSW Vic. Qld SA WA Tas. NT ACT Aust.
1 1917 1904 1409 683 440 306 193 5 3 4941
2 1927 2402 1727 873 565 392 211 4 8 6182
3 1937 2693 1853 993 589 457 233 6 11 6836
4 1947 2985 2055 1106 646 502 257 11 17 7579
5 1957 3625 2656 1413 873 688 326 21 38 9640
6 1967 4295 3274 1700 1110 879 375 62 103 11799
7 1977 5002 3837 2130 1286 1204 415 104 214 14192
8 1987 5617 4210 2675 1393 1496 449 158 265 16264
9 1997 6274 4605 3401 1480 1798 474 187 310 18532
I want to plot a graph with (Year) on my x-axis and (total value) on my Y-axis. The barplot should depicting the ACT and NT value for the respective (Years).
I tried the following command:
barplot(as.matrix(r_data$ACT, r_data$NT), main="r_data", ylab="Total", beside=TRUE)
The above command showed the barplot of ACT column per year but didn't show the Bar plot of NT column.

You have to create the matrix in a different way:
barplot(as.matrix(r_data[c("ACT", "NT")]),
main="r_data", ylab="Total", beside=TRUE)
You can also use cbind instead of as.matrix and keep the rest of your original approach:
barplot(cbind(r_data$ACT, r_data$NT),
main="r_data", ylab="Total", beside=TRUE)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

I am unable to import in R a downloaded xls file - r

Related

How to add a string to each cell of a row in a R data table?

How can I call for something in a data.frame when the destinction has to be done in two columns?

Selecting a subset of a sqlite database with dplyr

R: trouble reading dates and time

Barplot using three columns

Categories

Resources