I have a script which regularly checks (once a week) some website to perform scraping. The website is updated once a week (or once every two weeks). Up till now, I've had no issue retrieving the correct information.
Yet today, for some unknown reason, I am getting the wrong information from read_html
library(rvest)
urlATP <- "http://www.tennisleader.fr/classement/1"
checkDate <- as.Date(gsub("Au ", "",
html_text(html_nodes(read_html(urlATP), ".date-atp"))),
format = "%d/%m/%Y")
print(checkDate)
Which returns this
[1] "2018-07-02"
But when I get to the website however, the date is different.
<p class="date-atp">Au 16/07/2018</p>
What could explain this mismatch and more importantly how do I get rid of it ?
Additional information :
The date retrieved was the one present on the website last week
I have tried to clear memory / refresh R session but with no success
Related
I work with gmailR and powerBi and I observe that my email data have different message "time" in Rstudio (correct time UTC+1, same that in gmail account) and PowerBi (wrong time, UTC+0) when I using same query code in both tools.
Screen:
[sorry I don't know how to insert image properly ]
link:
https://ibb.co/TM12Bk1
My query code is based on gmailR, i get specific data in list's from gmail and transform to table (tibble object in R). Then I run code in PowerBi as data source.
I get time from message list, not fuction gm_data (because of cases when this function get null values). Below sample code:
gmail_DK<- msgs_meta_DK %>% {
tibble(
date = map_chr(., "internalDate")
)
and then I change timestamp to normal data time. Sample code:
mutate(date = as.POSIXct(as.numeric(date) /1000, origin="1970-01-01"))
I checked regional settings in PowerBi Deskop and it is correct - UTC+1.
I'm in UTC+1 time zone, gmail account is for UTC+1 (this is customer service email, but as I know all customer agent's are too in UTC+1 time zone) and most of message are from and to UTC+1 users.
I can change this difference using PowerQuery or just add hour in my query code in R, but I'm curious why is this difference happen?
I already checked Stackoverflow for an answer, but I only found question related to showing a timestamp in PowerBI Desktop, which is pretty different from the behaviour in the PowerBI Service, e.g. see
How to display current date and time in power bi visuals?
Visualizing last refresh date in power bi
Why?
I don't want to see in my report the timestamp of the current date
and time, since I already have this in the status bar of my
operating system.
I don't want to see in my report the timestamp of
the last "report" refresh, when only the measures get updated (like
in the Service).
I don't want to see the timestamp of last re-import
of (most-likely unchanged) data in the Desktop/Service.
What I want to see in my report is the timestamp of the last "dataset" refresh in the Service, which cannot be achived by a measure, but a M-function only!
The problem now is that the Service runs in UTC time, while I'm of cause interested in local time, and all the M-functions to convert a datetimezone number only only accept a fixed time-shift in hours, but do not consider daylight savings.
How would a solution look like to properly overcome this deficit and to show the proper local time of the dataset refresh in a PBI Service report?!
For whatever reason, Microsoft hasn't built in native daylight saving handling yet but please vote for them to fix this here.
However, various people have suggested workarounds involving defining the dates/times when DST changes things or referencing an external oracle.
https://intellitect.com/convert-utc-local-time-daylight-savings-support-power-bi/
https://blog.crossjoin.co.uk/2017/03/28/daylight-saving-time-and-time-zones-in-m/
https://powerpivotpro.com/2019/01/dst-refresh-date-function-power-bi-service/
https://radacad.com/solving-dax-time-zone-issue-in-power-bi
As a workaround I've been pulling the proper local time from worldtimeapi.org so far, see e.g. this PowerQuery M-script:
let
Source = Json.Document(
Web.Contents("http://worldtimeapi.org/api/timezone/Europe/Berlin")),
#"Converted to Table" = Record.ToTable(Source),
#"Filtered Rows" = Table.SelectRows(
#"Converted to Table", each ([Name] = "datetime")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Name"}),
#"Changed Type" = Table.TransformColumnTypes(
#"Removed Columns",{{"Value", type datetimezone}}),
#"Renamed Columns" = Table.RenameColumns(
#"Changed Type",{{"Value", "Europe/Berlin"}})
in
#"Renamed Columns"
However, I just realized that this has become somewhat obsolete meanwhile:
In the PowerBI Service switch the New Look to ON and then in the title bar next to the report name you get e.g. "Data updated 26/04/20" and in the drop-down menu you can even see the exact update time.
I'm getting the same kind of issue with "Events".
I'm trying to track the creation of Invoice Disputes (when the clint detect some error in the invoice) for March '19.
The thing is that when I get the data for the event without filter nothing in the query (in order to use the filters on the Report tab) I get the value of 316. BUT i know that the correct Value is 302.
Wrong Value
So I tried to filter the month directly on the query before load the data and guess what? the correct value shows up.
Correct Value
Any ideas?
In my Devstack setup there was a issue in displaying details in the Rating section.
Pricing was configured correctly, During Instance creation Rate is displayed in the instance creation window.
But after creation of instance I am checking the Rating section for rates or cost.
It was not displaying the value as needed.
I checked the DB table (rated_data_frames) in Cloudkitty.
It doesn't have the necessary values immediately.
I was continuously checking for some hours consecutively.
But I can be able to see that Cloudkitty DB is getting updated with the values after some hours from instance creation.
That is after some hours, it is getting added in table regarding the Instance created.
So that in Front-end also it got displayed.
I want to know why it is happening.
Is there any solution for the same to get the results immediately.
Simply I need to get the results immediately in rating section.
I can be able to see that in cloudkitty.conf file section is there as follows:
# Rating period in seconds. (integer value)
#period = 3600
#wait_periods = 2
If changing this will help us.?
I am new to this forum so I hope I asking my question in the right place.
I have a problem inserting a datetime into a Google Spreadsheet from a form created in Appinventor2;
In app inventor2 I created a form that fills in a google spreadsheet. Basically I merged the Pizza Party example (http://appinventor.mit.edu/explore/ai2/pizzaparty.html) with this example http://puravidaapps.com/spreadsheet.php to use google spreadsheet instead of fusion table.
the user selects in how many minutes he wants his order and then sees all the orders in a table sorted by delivery time.
Problem A)
Firstly, i want to save the current datetime + the desired delay into the google spreadsheet and sort the table by this new datetime.
1) when i use the block "call clock format time" + "call clock addminutes" the spreadsheet is populated with a text, but then i can't sort the table by delivery datetime. in fact i believe the sorting is done on the number regardless of the am/pm or day of the month. so for example instead of having 4am, 6am, 2pm, 3pm i get : 2pm, 3pm, 4am, 6am.
2) I then tried to remove the block "call clock format time" and in the google form i kept the field format = text
but the google spreadsheet is populated with the following:
java.util.GregorianCalendar[time=1395531335908,areFieldsSet=true,lenient=true,zone=Europe/Dublin,firstDayOfWeek=2,minimalDaysInFirstWeek=4,ERA=1,YEAR=2014,MONTH=2,WEEK_OF_YEAR=12,WEEK_OF_MONTH=4,DAY_OF_MONTH=22,DAY_OF_YEAR=81,DAY_OF_WEEK=7,DAY_OF_WEEK_IN_MONTH=4,AM_PM=1,HOUR=11,HOUR_OF_DAY=23,MINUTE=35,SECOND=35,MILLISECOND=908,ZONE_OFFSET=0,DST_OFFSET=0]
3) I then tried to remove the block "call clock format time" and in the google form I changed the field format = time
but then the google spreadsheet isn't populated with anything.
4)I tried using the segment block, but after a while I realised the block "format time" actually returns this format: "hh:mm:ss AM/PM"
so selecting the 5 characters is not good enough because it does not take into account of the am/pm element as well as the day of the month.
5) I found a temp solution by defining the desired delivery time as a new global variable, and extracting a string in the format hh:mm by joining the blocks ".hour instant" and ".minute instant".
However this is not a final solution because what i extracted is of course a string of text and when sorting, 01:10 will be always considered smaller than 23:50 for example, regardless of the date.
So is there a way of actually saving in the google spreadsheet not a string of text, but actually the date and time?
Problem B)
Secondly, I would like to filter/show only the rows of the google spreadsheet have a delivery time expired by no more than 1 hour (as well as orders with delivery time in the future e.g. in 2 hours from now()).
I tried using some Google Visualization API Query Language commands, altering the url of the google spreadsheet (like WHERE "now() - Delivery Time < 60 mins)" (cannot remember the exact code I wrote) but unsucessfully.
Would anyone know how to filter my results?
thanks in advance
alterettore
So there's a few things to note.
If you're using Taifun's example as you mention, you'll notice that when you submit data to Google Spreadsheets using a form, the first column is always a timestamp, even if you're not submitting a date or time. Trying to send the current date/time is redundant - go ahead and make use of what Google provided.
Google Spreadsheets (and Excel) store Date/Time as a number. If you want to store a date in GS, the best way to do so is not formatted text, but by sending a number. Use AppInventor to calculate the number you need. For example, today (April 27) in GS is 41756. Noon today would be 41756.5
To generate this number, start with AI's Millisecond function. NOTE: Both GS and AI use milliseconds, but they have different 0 points, so you have to manipulate the result a bit. The formula I've used in AI in the past is this:
GS Date/Time = (Clock1.GetMillis(Clock1.Now) / 86400000) + 25569
Hope this helps!