R: trouble reading dates and time

R: trouble reading dates and time - r

I have some problems in reading in date and time in a proper way, and I wonder why I get these problems. The problem is only on my windows installation of R. Running the exact same script on my UNIX installation works fine.
Basically, I want to read in a file with data and time as the second column, like this:
TrainData[[i]] = read.csv(TrainFiles[i],header=F, colClasses=c(NA,"POSIXct",rep(NA,8)))
colnames(TrainData[[i]])=c("comp","time","s1","s2","s3","s4","r1","r2","r3","r4")
However, only the dates are read, not the times, and my data looks like this:
comp time s1 s2 s3 s4 r1 r2 r3 r4
1 1 2009-08-18 711 630 69 600 689 20 40 1
2 5 2009-08-18 725 460 101 705 689 20 40 1
3 6 2009-08-18 711 505 69 678 689 20 40 1
4 1 2009-08-18 705 630 69 600 689 20 40 1
5 2 2009-08-18 734 516 101 671 689 20 40 1
6 3 2009-08-18 743 637 69 595 689 20 40 1
7 4 2009-08-18 730 577 101 633 689 20 40 1
8 2 2009-08-18 721 511 101 674 689 20 40 1
9 3 2009-08-18 747 563 101 642 689 20 40 1
10 4 2009-08-18 716 572 101 636 689 20 40 1
Running the exact same cond on UNIX returned both time and dates.
When I read in another file in the same script, with dates and times in the two first columns, I get the correct format of the date/time:
TrainData[[i]]=read.csv(TrainFiles[i],header=F, colClasses=c("POSIXct","POSIXct",NA))
colnames(TrainData[[i]])=c("start","end","fault")
returns
start end fault
1 2010-10-24 04:25:53 2010-10-24 11:22:33 6
2 2010-10-30 12:57:16 2010-11-02 12:29:54 6
3 2010-11-05 10:40:17 2010-11-05 11:59:51 6
4 2010-11-05 17:07:37 2010-11-06 14:30:01 6
5 2010-11-06 23:59:59 2010-11-07 00:14:49 6
6 2010-11-06 23:59:59 2010-11-07 00:14:49 6
7 2010-11-06 23:59:59 2010-11-07 00:14:49 6
8 2010-11-06 23:59:59 2010-11-07 00:14:49 6
9 2010-11-06 23:59:59 2010-11-07 00:14:50 6
10 2010-11-06 23:59:47 2010-11-07 00:14:51 6
Actually, I found a solution that works, eventually, but I wonder why I get these problems.
It appears that my Sys.timezone is set to "Europe/Berlin". If I set this to NA, the times will be read in as well, i.e. using Sys.setenv(tz=NA). If I then run the same code, my data looks like this:
comp time s1 s2 s3 s4 r1 r2 r3 r4
1 1 2009-08-18 18:12:00 711 630 69 600 689 20 40 1
2 5 2009-08-18 18:14:27 725 460 101 705 689 20 40 1
3 6 2009-08-18 18:14:31 711 505 69 678 689 20 40 1
4 1 2009-08-18 18:14:43 705 630 69 600 689 20 40 1
5 2 2009-08-18 18:14:47 734 516 101 671 689 20 40 1
6 3 2009-08-18 18:14:51 743 637 69 595 689 20 40 1
7 4 2009-08-18 18:15:00 730 577 101 633 689 20 40 1
8 2 2009-08-18 18:29:33 721 511 101 674 689 20 40 1
9 3 2009-08-18 18:29:37 747 563 101 642 689 20 40 1
10 4 2009-08-18 18:29:45 716 572 101 636 689 20 40 1
The other file still get times, but now consistently two hours different.
This is how the csv-files look like (basically, text separated by commas):
this is my file (basically text separated by commas):
1,2009-08-18 18:12:00,711,630,69,600,689,20,40,1
5,2009-08-18 8:14:27,725,460,101,705,689,20,40,1
6,2009-08-18 18:14:31,711,505,69,678,689,20,40,1
1,2009-08-18 18:14:43,705,630,69,600,689,20,40,1
2,2009-08-18 8:14:47,734,516,101,671,689,20,40,1
3,2009-08-18 18:14:51,743,637,69,595,689,20,40,1
4,2009-08-18 8:15:00,730,577,101,633,689,20,40,1
2,2009-08-18 8:29:33,721,511,101,674,689,20,40,1
3,2009-08-18 8:29:37,747,563,101,642,689,20,40,1
4,2009-08-18 8:29:45,716,572,101,636,689,20,40,1
Why am I having these problems with reading in the times? I would expect that it is not correct to use tz=NA, but this is the only way I found to work. Can anyone help me figure out why the times are ignored when tz = "Europe/Berlin"?
Is it generally adviced to put tz=NA when reading files like this? Even if this seems to work in reading in the times, the tz="NA" results in warning messages when I later want to work with the data:
Warning message:
In as.POSIXlt.POSIXct(x, tz) : unknown timezone 'NA'
Can anyone help me explain the differences I get?

Related

Removing and adding observations specific to an id variable within a dataframe of multiple ids in R

I have a dataframe containing location data of different animals. Each animal has a unique id and each observation has a time stamp and some further metrics of the location observation. See a subset of the data below. The subset contains the first two observations of each id.
> sub
id lc lon lat a b c date
1 111 3 -79.2975 25.6996 414 51 77 2019-04-01 22:08:50
2 111 3 -79.2975 25.6996 414 51 77 2019-04-01 22:08:50
3 222 3 -79.2970 25.7001 229 78 72 2019-01-07 20:36:27
4 222 3 -79.2970 25.7001 229 78 72 2019-01-07 20:36:27
5 333 B -80.8211 24.8441 11625 6980 37 2018-12-17 20:45:05
6 333 3 -80.8137 24.8263 155 100 69 2018-12-17 21:00:43
7 444 3 -80.4535 25.0848 501 33 104 2019-10-20 19:44:16
8 444 1 -80.8086 24.8364 6356 126 87 2020-01-18 20:32:28
9 555 3 -77.7211 24.4887 665 45 68 2020-07-12 21:09:17
10 555 3 -77.7163 24.4897 285 129 130 2020-07-12 21:10:35
11 666 2 -77.7221 24.4902 1129 75 66 2020-07-12 21:09:02
12 666 2 -77.7097 24.4905 314 248 164 2020-07-12 21:11:37
13 777 3 -77.7133 24.4820 406 58 110 2020-06-20 11:18:18
14 777 3 -77.7218 24.4844 170 93 107 2020-06-20 11:51:06
15 888 3 -79.2975 25.6996 550 34 79 2017-11-25 19:10:45
16 888 3 -79.2975 25.6996 550 34 79 2017-11-25 19:10:45
However, I need to do some data housekeeping, i.e. I need to include the day/time and location each animal was released. And after that I need to filter out observations for each animal that occurred pre-release of the corresponding animal.
I have a an additional dataframe that contains the necessary release metadata:
> stack
id release lat lon
1 888 2017-11-27 14:53 25.69201 -79.31534
2 333 2019-01-31 16:09 25.68896 -79.31326
3 222 2019-02-02 15:55 25.70051 -79.31393
4 111 2019-04-02 10:43 25.68534 -79.31341
5 444 2020-03-13 15:04 24.42892 -77.69518
6 666 2020-10-27 09:40 24.58290 -77.69561
7 555 2020-01-21 14:38 24.43333 -77.69637
8 777 2020-06-25 08:54 24.42712 -77.76427
So my question is: how can I add the release information (time and lat/lon) to the dataframe fore each id (while the columns a, b, and c can be NA). And how can I then filter out the observations that occured before each animal's release time? I have been looking into possibilites using dplyr but was not yet able to resolve my issue.

You've not provided an easy way of obtaining your data (dput()) is by far the best and you have issues with your date time values (release uses Y-M-D H:M whereas date uses Y:M:D H:M:S) so for clarity I've included code to obtain the data frames I use at the end of this post.
First, the solution:
library(tidyverse)
library(lubridate)
sub %>%
left_join(stack, by="id") %>%
mutate(
release=ymd_hms(paste0(release, ":00")),
date=ymd_hms(date)
) %>%
filter(date >= release)
id lc lon.x lat.x a b c date release lat.y lon.y
1 555 3 -77.7211 24.4887 665 45 68 2020-07-12 21:09:17 2020-01-21 14:38:00 24.43333 -77.69637
2 555 3 -77.7163 24.4897 285 129 130 2020-07-12 21:10:35 2020-01-21 14:38:00 24.43333 -77.69637
As I indicated in comments.
To obtain the data
sub <- read.table(textConnection("id lc lon lat a b c date
1 111 3 -79.2975 25.6996 414 51 77 '2019-04-01 22:08:50'
2 111 3 -79.2975 25.6996 414 51 77 '2019-04-01 22:08:50'
3 222 3 -79.2970 25.7001 229 78 72 '2019-01-07 20:36:27'
4 222 3 -79.2970 25.7001 229 78 72 '2019-01-07 20:36:27'
5 333 B -80.8211 24.8441 11625 6980 37 '2018-12-17 20:45:05'
6 333 3 -80.8137 24.8263 155 100 69 '2018-12-17 21:00:43'
7 444 3 -80.4535 25.0848 501 33 104 '2019-10-20 19:44:16'
8 444 1 -80.8086 24.8364 6356 126 87 '2020-01-18 20:32:28'
9 555 3 -77.7211 24.4887 665 45 68 '2020-07-12 21:09:17'
10 555 3 -77.7163 24.4897 285 129 130 '2020-07-12 21:10:35'
11 666 2 -77.7221 24.4902 1129 75 66 '2020-07-12 21:09:02'
12 666 2 -77.7097 24.4905 314 248 164 '2020-07-12 21:11:37'
13 777 3 -77.7133 24.4820 406 58 110 '2020-06-20 11:18:18'
14 777 3 -77.7218 24.4844 170 93 107 '2020-06-20 11:51:06'
15 888 3 -79.2975 25.6996 550 34 79 '2017-11-25 19:10:45'
16 888 3 -79.2975 25.6996 550 34 79 '2017-11-25 19:10:45'"), header=TRUE)
stack <- read.table(textConnection("id release lat lon
1 888 '2017-11-27 14:53' 25.69201 -79.31534
2 333 '2019-01-31 16:09' 25.68896 -79.31326
3 222 '2019-02-02 15:55' 25.70051 -79.31393
4 111 '2019-04-02 10:43' 25.68534 -79.31341
5 444 '2020-03-13 15:04' 24.42892 -77.69518
6 666 '2020-10-27 09:40' 24.58290 -77.69561
7 555 '2020-01-21 14:38' 24.43333 -77.69637
8 777 '2020-06-25 08:54' 24.42712 -77.76427"), header=TRUE)

I am unable to import in R a downloaded xls file

I am trying to directly import the .xls file that comes from this link (French electricity distributor).
I have built, based on this question, the folloning code :
library(rio)
Chemin = "F:/DGTresor/00.Refontes/06.Electricite_HauteFrequence" #WhateverPath
## RTE mois en cours
temporaire <- tempfile()
download.file("https://eco2mix.rte-france.com/download/eco2mix/eCO2mix_RTE_En-cours-TR.zip",temporaire)
unzip(zipfile=temporaire,
files = "eCO2mix_RTE_En-cours-TR.xls",
exdir=Chemin)
RTE_EnCours <- import(paste0(Chemin,"/eCO2mix_RTE_En-cours-TR.xls"))
The file exists, but I am unable to read it. I get the following error : libxls error: Unable to open file

I am not sure why it is happening but when I try to open the .xls file manually, it gives an error like "The file format and its extension does not match" etc. To solve the issue, I converted the file extension to .csv with the codes below.
file.rename(paste0(Chemin,"/eCO2mix_RTE_En-cours-TR.xls"), paste0(Chemin,"/eCO2mix_RTE_En-cours-TR.csv"))
After that, importing the file works,
# to prevent the shifting, header=FALSE should be applied
RTE_EnCours<- read.csv(paste0(Chemin,"/eCO2mix_RTE_En-cours-TR.csv"),sep="\t",header=FALSE,row.names=NULL)
# canceling out the last column which is full NA
RTE_EnCours <- RTE_EnCours[,-ncol(RTE_EnCours)]
# assigning the first row as the column names
colnames(RTE_EnCours) <-as.character(unlist(RTE_EnCours[1,]))
# removing the first row
RTE_EnCours <- RTE_EnCours[-1,]
head(RTE_EnCours)
gives,
Périmètre Nature Date Heures Consommation Prévision J-1 Prévision J Fioul Charbon Gaz Nucléaire Eolien Solaire Hydraulique
2 France Données temps réel 2020-10-01 00:00 46957 46500 47100 134 286 4524 35004 4327 0 4645
3 France Données temps réel 2020-10-01 00:15 46342 45350 45950 149 318 4727 35278 4336 0 4953
4 France Données temps réel 2020-10-01 00:30 44689 44200 44800 149 304 4380 34732 4428 0 4580
5 France Données temps réel 2020-10-01 00:45 43277 42950 43700 165 308 4244 34644 4528 0 4147
6 France Données temps réel 2020-10-01 01:00 42511 41700 42600 165 302 4012 34780 4488 0 4096
7 France Données temps réel 2020-10-01 01:15 42714 41650 42750 165 297 4114 35145 4630 0 3758
Pompage Bioénergies Ech. physiques Taux de Co2 Ech. comm. Angleterre Ech. comm. Espagne Ech. comm. Italie Ech. comm. Suisse
2 -751 1087 -2299 58 179 -914 -1732 -1283
3 -750 1055 -3724 59
4 -920 1045 -4009 58 179 -914 -1732 -1283
5 -1861 1048 -3946 59
6 -1857 1039 -4514 56 497 -1759 -2279 -2217
7 -2005 1037 -4427 57
Ech. comm. Allemagne-Belgique Fioul - TAC Fioul - Cogén. Fioul - Autres Gaz - TAC Gaz - Cogén. Gaz - CCG Gaz - Autres
2 -79 0 21 113 -2 585 3941 0
3 0 21 128 -1 580 4148 0
4 -159 0 21 128 -1 580 3801 0
5 0 21 144 -1 582 3663 0
6 1252 0 21 144 -1 579 3434 0
7 0 21 144 -1 581 3534 0
Hydraulique - Fil de l?eau + éclusée Hydraulique - Lacs Hydraulique - STEP turbinage Bioénergies - Déchets Bioénergies - Biomasse
2 3355 1288 2 183 447
3 3336 1615 2 174 435
4 3242 1338 0 174 434
5 3155 992 0 174 437
6 3060 1036 0 172 434
7 2992 766 0 177 436
Bioénergies - Biogaz
2 301
3 294
4 294
5 294
6 294
7 294
>

Adding a column to a data frame with two different variables

I am sure this is a super easy answer but I am struggling with how to add a column with two different variables to my dataframe. Currently, this is what it looks like
vcv.index model.index par.index grid index estimate se lcl ucl fixed
1 6 6 16 A 16 0.8856724 0.07033280 0.6650468 0.9679751
2 7 7 17 A 17 0.6298118 0.06925471 0.4873052 0.7528014
3 8 8 18 A 18 0.6299359 0.06658557 0.4930263 0.7487169
4 9 9 19 A 19 0.6297988 0.05511771 0.5169948 0.7300157
5 10 10 20 A 20 0.7575811 0.05033490 0.6461758 0.8424612
6 21 21 61 B 61 0.8713467 0.07638687 0.6404598 0.9626184
7 22 22 62 B 62 0.6074379 0.06881230 0.4677827 0.7314827
8 23 23 63 B 63 0.6041054 0.06107520 0.4805279 0.7156792
9 24 24 64 B 64 0.5806565 0.06927308 0.4422237 0.7074601
10 25 25 65 B 65 0.7370944 0.05892108 0.6070620 0.8357394
11 41 41 121 C 121 0.8048479 0.09684385 0.5519097 0.9324759
12 42 42 122 C 122 0.5259547 0.07165218 0.3871380 0.6608721
13 43 43 123 C 123 0.5427100 0.07127273 0.4033255 0.6757137
14 44 44 124 C 124 0.5168820 0.06156392 0.3975561 0.6343132
15 45 45 125 C 125 0.6550049 0.07378403 0.5002851 0.7826343
16 196 196 586 A 586 0.8536314 0.08709394 0.5979992 0.9580976
17 197 197 587 A 587 0.5672194 0.07079508 0.4268452 0.6975725
18 198 198 588 A 588 0.5675415 0.06380445 0.4408540 0.6859714
19 199 199 589 A 589 0.5666874 0.06499899 0.4377071 0.6872233
20 200 200 590 A 590 0.7058542 0.05985868 0.5769484 0.8085177
21 211 211 631 B 631 0.8360614 0.09413427 0.5703031 0.9514472
22 212 212 632 B 632 0.5432872 0.07906200 0.3891364 0.6895701
23 213 213 633 B 633 0.5400994 0.06497607 0.4129055 0.6622759
24 214 214 634 B 634 0.5161692 0.06292706 0.3943257 0.6361202
25 215 215 635 B 635 0.6821667 0.07280044 0.5263841 0.8056298
26 226 226 676 C 676 0.7621875 0.10484478 0.5077465 0.9087471
27 227 227 677 C 677 0.4607440 0.07326970 0.3240229 0.6036386
28 228 228 678 C 678 0.4775168 0.08336433 0.3219349 0.6375872
29 229 229 679 C 679 0.4517655 0.06393339 0.3319262 0.5774725
30 230 230 680 C 680 0.5944330 0.07210672 0.4491995 0.7248303
then I am adding a column with periods 1-5 repeated until reaches the end
with this code
SurJagPred$estimates %<>% mutate(Primary = rep(1:5, 6))
and I also need to add sex( F, M) as well. the numbers 1-15 are female and the 16-30 are male. So overall it should look like this.
> vcv.index model.index par.index grid index estimate se lcl ucl fixed Primary Sex
F
1 6 6 16 A 16 0.8856724 0.07033280 0.6650468 0.9679751 1 F
2 7 7 17 A 17 0.6298118 0.06925471 0.4873052 0.7528014 2 F
3 8 8 18 A 18 0.6299359 0.06658557 0.4930263 0.7487169 3 F
4 9 9 19 A 19 0.6297988 0.05511771 0.5169948 0.7300157 4 F

We can use rep with each on a vector of values to replicate each element of the vector to that many times
SurJagPred$estimates %<>%
mutate(Sex = rep(c("F", "M"), each = 15))

merging multiple p-values from Fisher test to the original data

I have done a Fisher test on all my rows which outputs a lot of p-values. How could I correctly combine p-values to the original columns? I tried the following codes but the rows in original data (d) do not match with p-values (e) in the merged dataframe (f).
d <- read.table('test.txt', header = FALSE)
e <-apply(d,1, function(x) fisher.test(matrix(x,nr=2), alternative='greater')$p.value)
f <-merge(d,as.data.frame(e),by.x=0,by.y=0)
> d
V1 V2 V3 V4
1 1 839 63 222247
2 1 839 47 222263
3 1 839 299 222011
4 6 834 1821 220489
5 1 839 198 222112
6 1 839 324 221986
7 2 838 808 221502
8 3 837 935 221375
9 4 836 1723 220587
10 1 839 117 22219
> e
[1] 2.144749e-01 1.656028e-01 6.776690e-01 6.848409e-01 5.280300e-01 7.067099e-01 8.091576e-01 6.859446e-01
[9] 8.895988e-01 3.592658e-01
> f
Row.names V1 V2 V3 V4 e
1 1 1 839 63 222247 2.144749e-01
2 10 1 839 117 222193 3.592658e-01
3 11 6 834 850 221460 1.071752e-01
4 12 29 811 11625 210685 9.941101e-01
5 13 2 838 1231 221079 9.463472e-01
6 14 1 839 1236 221074 9.907043e-01
7 15 3 837 905 221405 6.647785e-01
8 16 3 837 793 221517 5.768163e-01
9 17 6 834 687 221623 4.906665e-02
10 18 1 839 226 222084 5.753710e-01

f <-cbind(d,e)
# V1 V2 V3 V4 e
#1 1 839 63 222247 0.2144749
#2 1 839 47 222263 0.1656028
#3 1 839 299 222011 0.6776690
#4 6 834 1821 220489 0.6848409
#5 1 839 198 222112 0.5280300
#6 1 839 324 221986 0.7067099
#7 2 838 808 221502 0.8091576
#8 3 837 935 221375 0.6859446
#9 4 836 1723 220587 0.8895988
#10 1 839 117 22219 0.9873172

debugger freezes on X (arm)

I am developing an application for Olimex A20 with Qt 5.7. This app needs to run on X. If I just run the application it works perfectly fine. The issue is with the debuging - debugger freezes. This is the stack trace I see when I interrupt the debugger.
This is the line of code where the debugger is waiting for something to happen (qwaitcondition_unix.cpp - line 143).
code = pthread_cond_wait(&cond, &mutex);
This is the main stack trace from thread #1
1 __libc_do_syscall 0xb5dd6514
2 pthread_cond_wait * *GLIBC_2.4 0xb5dd1da6
3 QWaitConditionPrivate::wait qwaitcondition_unix.cpp 143 0xb63a7a44
4 QWaitCondition::wait qwaitcondition_unix.cpp 215 0xb63a7a44
5 QSemaphore::acquire qsemaphore.cpp 143 0xb63a2dba
6 QMetaObject::activate qobject.cpp 3708 0xb6506c6e
7 QMetaObject::activate qobject.cpp 3602 0xb6506fee
8 QDBusConnectionManager::connectionRequested moc_qdbusconnectionmanager_p.cpp 141 0xb3baf2a0
9 QDBusConnectionManager::connectToBus qdbusconnection.cpp 225 0xb3b6b37e
10 QDBusConnectionManager::busConnection qdbusconnection.cpp 134 0xb3b6b488
11 QDBusConnection::sessionBus qdbusconnection.cpp 1195 0xb3b6c058
12 DBusConnection::DBusConnection dbusconnection.cpp 73 0xb3e08a4c
13 QSpiAccessibleBridge::QSpiAccessibleBridge bridge.cpp 66 0xb3dff070
14 QXcbIntegration::accessibility qxcbintegration.cpp 337 0xb3dc563c
15 platformAccessibility qaccessible.cpp 485 0xb6a4ae34
16 QAccessible::isActive qaccessible.cpp 791 0xb6a4ae34
17 QQuickTextInputPrivate::emitCursorPositionChanged qquicktextinput.cpp 4206 0xb6eb5fde
18 QQuickTextInputPrivate::moveCursor qquicktextinput.cpp 3264 0xb6eb9d76
19 QQuickTextInputPrivate::setCursorPosition qquicktextinput_p_p.h 407 0xb6eb9e2a
20 QQuickTextInput::setReadOnly qquicktextinput.cpp 682 0xb6eb9e2a
21 QQuickTextInput::qt_static_metacall moc_qquicktextinput_p.cpp 1180 0xb6f593e8
22 QQuickTextInput::qt_metacall moc_qquicktextinput_p.cpp 1257 0xb6f59cf8
23 QQmlPropertyPrivate::write qqmlproperty.cpp 1254 0xb68a648a
24 QQmlPropertyPrivate::writeValueProperty qqmlproperty.cpp 1183 0xb68a7594
25 QQmlBinding::write qqmlbinding.cpp 333 0xb68f38da
26 QQmlBinding::update qqmlbinding.cpp 197 0xb68f46fc
27 QQmlObjectCreator::finalize qqmlobjectcreator.cpp 1202 0xb68faf92
28 QQmlComponentPrivate::complete qqmlcomponent.cpp 926 0xb68a861e
29 QQmlComponentPrivate::completeCreate qqmlcomponent.cpp 962 0xb68a8698
30 QQmlComponent::create qqmlcomponent.cpp 788 0xb68a85ac
31 QQmlApplicationEnginePrivate::_q_finishLoad qqmlapplicationengine.cpp 136 0xb68f55fc
32 QQmlApplicationEnginePrivate::startLoad qqmlapplicationengine.cpp 115 0xb68f57a4
33 QQmlApplicationEngine::load qqmlapplicationengine.cpp 260 0xb68f57de
34 main main.cpp 50 0x1fd60
This is thread #6
1 __libc_do_syscall 0xb5dd6514
2 pthread_cond_wait * *GLIBC_2.4 0xb5dd1da6
3 _mali_osu_lock_wait 0xb61ec7fe
4 __egl_worker_thread 0xb61e7096
5 start_thread 0xb5dcd5dc
6 ??
0xb5ffd71c
Anyone came across this issue? Any pointers would be appreciated.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R: trouble reading dates and time - r

Related

Removing and adding observations specific to an id variable within a dataframe of multiple ids in R

I am unable to import in R a downloaded xls file

Adding a column to a data frame with two different variables

merging multiple p-values from Fisher test to the original data

debugger freezes on X (arm)

Categories

Resources