Compatibility issue mac/PC on date format - r

Everything works at work on my PC but at home with my mac I meet a problem
I wrote my data on Excel,
it formats date dd/jj/yy even if I write dd/jj/yyyy but it keeps in memory the way I wrote it (dd/jj/yyyy)
I save the file as a CSV and read it in a data.frame
here is the problem :
data$ddn
[1] 29/11/58 25/07/64 25/09/67 03/01/82 15/05/58 29/07/78 22/03/69 23/01/60 15/12/60 16/06/64
[11] 10/12/60 23/08/78 13/04/67 29/11/59 25/09/56 10/10/87 22/06/60 21/06/76 01/11/63 08/07/69
[21] 22/05/52 06/05/69 04/03/64 08/04/75 09/03/54 22/04/69 29/04/71 18/03/79 14/06/71 03/06/60
71 Levels: 01/11/63 01/12/40 02/07/48 03/01/82 03/05/68 03/06/60 04/03/64 05/01/62 ... 31/07/70
> class(data$ddn)
[1] "factor"
data$ddn<-as.Date(data$ddn,format="%d/%m/%Y") (this syntax works perfectly on my PC)
data$ddn
[1] "0058-11-29" "0064-07-25" "0067-09-25" "0082-01-03" "0058-05-15" "0078-07-29" "0069-03-22"
[8] "0060-01-23" "0060-12-15" "0064-06-16" "0060-12-10" "0078-08-23" "0067-04-13" "0059-11-29"
[15] "0056-09-25" "0087-10-10" "0060-06-22" "0076-06-21" "0063-11-01" "0069-07-08" "0052-05-22"
[22] "0069-05-06" "0064-03-04" "0075-04-08" "0054-03-09" "0069-04-22" "0071-04-29" "0079-03-18"
[29] "0071-06-14" "0060-06-03"
data$ddn<-as.Date(data$ddn,format="%d/%m/%y")
data$ddn
[1] "2058-11-29" "2064-07-25" "2067-09-25" "1982-01-03" "2058-05-15" "1978-07-29" "1969-03-22"
[8] "2060-01-23" "2060-12-15" "2064-06-16" "2060-12-10" "1978-08-23" "2067-04-13" "2059-11-29"
[15] "2056-09-25" "1987-10-10" "2060-06-22" "1976-06-21" "2063-11-01" "1969-07-08" "2052-05-22"
[22] "1969-05-06" "2064-03-04" "1975-04-08" "2054-03-09" "1969-04-22" "1971-04-29" "1979-03-18"
[29] "1971-06-14" "2060-06-03"
R choose to put 19 or 20 before the date and I do not know why
And if I modify the original data (cell format : text or standard instead of date) it 29/11/58 becomes 20056 (again, I am perplexed).
I thought it was a EXCEL problem but the CSV which works with R on my PC doesn't work on mac
How to correct this R compatibility problem between PC and mac?
Thanks.

Related

Parse text from html tags in xml table

Asked to review questions for grammar and spelling:
library(XML)
tbls_all <- readHTMLTable(url_v9)
length(tbls_all)
[1] 34
names(tbls_all)
[1] "NULL" "DisplayedQuestions" "AllQuestions1"
[4] "HiddenAnswerTable1" "HiddenAnswerTable1" "AllQuestions2"
[7] "HiddenAnswerTable2" "HiddenAnswerTable2" "AllQuestions3"
[10] "HiddenAnswerTable3" "HiddenAnswerTable3" "AllQuestions4"
[13] "HiddenAnswerTable4" "HiddenAnswerTable4" "AllQuestions5"
[16] "HiddenAnswerTable5" "HiddenAnswerTable5" "AllQuestions6"
[19] "HiddenAnswerTable6" "HiddenAnswerTable6" "AllQuestions7"
[22] "HiddenAnswerTable7" "HiddenAnswerTable7" "AllQuestions8"
[25] "HiddenAnswerTable8" "HiddenAnswerTable8" "AllQuestions9"
[28] "HiddenAnswerTable9" "HiddenAnswerTable9" "AllQuestions10"
[31] "HiddenAnswerTable10" "HiddenAnswerTable10" "TotalsTable"
[34] "HiddenTable"
just interested in AllQuestions, so
tbls_q <- tbls_all[grep('AllQuestions\\d', names(tbls_all))]
length(tbls_q)
[1] 10
names(tbls_q[[1]])
[1] "V1" "V2" "V3" "V4"
The questions are in V1
tbls_q[[1]]$V1[2]
[1] "<strong>Now I am going to evaluate how well you can remember the names of some common items. First, I will show you pictures of 16 items that I want you to remember. Each item belongs to a different category. For example, 'type of reading materials' is a category. I will show you the items four at a time and ask you to tell me which item belongs with each category and then to immediately recall the items when I tell you their categories. Later, I will ask you to recall all of the items I have shown you. For any items you miss, I will tell you the categories to help you recall more items. You will have 3 tries to recall the items.</strong>(726368 - WAS_Card1_Intro)"
> tbls_q[[1]]$V1[4]
[1] "<br><br><font color=\"blue\"><i>Bear - Correctly Named?</i></font>(726370 - WAS_Card1_Word1_Name)"
> tbls_q[[1]]$V1[3]
[1] "<font color=\"blue\"><i>Place Worksheet 1 in front of the subject.</i></font><br><br><strong>There are 4 pictures on this worksheet. When I tell you a category, point to the item that is in that category and tell me its name. <br><br><br>Point to the 4–Legged Animal and tell me its name.</strong><br><br><font color=\"blue\"><i>Bear - Correctly Identified?</i></font>(726369 - WAS_Card1_Word1_Identify)"
At which point, I'm stuck for how to further extract the text without embedded html <tags, report what it says, report what it should say and which variable, (726369 for example, the question is. I can imagine some regex approaches, but, fragile...

List.files based on numbers

I am trying to create a list of files on which I want to run a function. I created a pattern which matches 35 files which I want to use.
mypattern <- paste0("NBS_NLoans_since2009_", seq(1, 35),".xls")
[1] "NBS_NLoans_since2009_1.xls" "NBS_NLoans_since2009_2.xls" "NBS_NLoans_since2009_3.xls" "NBS_NLoans_since2009_4.xls"
[5] "NBS_NLoans_since2009_5.xls" "NBS_NLoans_since2009_6.xls" "NBS_NLoans_since2009_7.xls" "NBS_NLoans_since2009_8.xls"
[9] "NBS_NLoans_since2009_9.xls" "NBS_NLoans_since2009_10.xls" "NBS_NLoans_since2009_11.xls" "NBS_NLoans_since2009_12.xls"
[13] "NBS_NLoans_since2009_13.xls" "NBS_NLoans_since2009_14.xls" "NBS_NLoans_since2009_15.xls" "NBS_NLoans_since2009_16.xls"
[17] "NBS_NLoans_since2009_17.xls" "NBS_NLoans_since2009_18.xls" "NBS_NLoans_since2009_19.xls" "NBS_NLoans_since2009_20.xls"
[21] "NBS_NLoans_since2009_21.xls" "NBS_NLoans_since2009_22.xls" "NBS_NLoans_since2009_23.xls" "NBS_NLoans_since2009_24.xls"
[25] "NBS_NLoans_since2009_25.xls" "NBS_NLoans_since2009_26.xls" "NBS_NLoans_since2009_27.xls" "NBS_NLoans_since2009_28.xls"
[29] "NBS_NLoans_since2009_29.xls" "NBS_NLoans_since2009_30.xls" "NBS_NLoans_since2009_31.xls" "NBS_NLoans_since2009_32.xls"
[33] "NBS_NLoans_since2009_33.xls" "NBS_NLoans_since2009_34.xls" "NBS_NLoans_since2009_35.xls"
Then I used the pattern to get those files from my directory. I got only one file. I have tried different patterns but either I got one file or more than 35 files. Thanks for any suggestion.
list.files(pattern = mypattern)
[1] "NBS_NLoans_since2009_1.xls"

Search_tweets() (rtweet package) does not return all expected columns

I'm using the rtweet package and it's not returning the database with all columns with the search_tweets() function. The database has only 35 columns and no columns "screen_name" and "mentions_screen_name". Please how to get the rest of the columns? Below an example the columns returned.
tweets.df <- search_tweets("science")
names(tweets.df)
[1] "created_at" "id"
[3] "id_str" "full_text"
[5] "truncated" "display_text_range"
[7] "entities" "metadata"
[9] "source" "in_reply_to_status_id"
[11] "in_reply_to_status_id_str" "in_reply_to_user_id"
[13] "in_reply_to_user_id_str" "in_reply_to_screen_name"
[15] "geo" "coordinates"
[17] "place" "contributors"
[19] "is_quote_status" "retweet_count"
[21] "favorite_count" "favorited"
[23] "retweeted" "possibly_sensitive"
[25] "lang" "retweeted_status"
[27] "quoted_status_id" "quoted_status_id_str"
[29] "quoted_status" "text"
[31] "favorited_by" "display_text_width"
[33] "quoted_status_permalink" "query"
[35] "possibly_sensitive_appealable"
You seem to have installed the development version of rtweet 0.7.0 < rtweet > 1.0.0 which is not released yet on CRAN. Could you post the packageVersion("rtweet") output?
The devel version of rtweet returns only the columns returned by the API but the user information is retrieval via users_data(tweets.df). There you will find the id and screen name of the user who posted the tweets.
The previous mentions_screen_name is the in_reply_to_screen_name column.
Please make sure that you read the documentation of the version you are using
Get users data of the tweets using users_data method
tweets <- search_tweets("science", n = 100)
users <- users_data(tweets)
# get screen names of users
users["screen_name"]

How to set calendar (365 days) when converting numeric object to Date object with origin specified

I am working on NetCDF files and trying to convert numeric to Date object.
I am using as_data function from lubricate packages. The function works well. However the converted date starts from 2039-12-24, which should be 2040-01-01.
I guess the problem might be the calendar. I checked the NetCDF attributes, and it used a calendar year of 365 days. It could be other issues causing the problem. Any ideas?
time <- as_date(time, origin = as.Date('2006-1-1')) # manipualte time
The numeric vector looks like below
> time[1:200]
[1] 12410.5 12411.5 12412.5 12413.5 12414.5 12415.5 12416.5 12417.5 12418.5 12419.5 12420.5 12421.5 12422.5 12423.5
[15] 12424.5 12425.5 12426.5 12427.5 12428.5 12429.5 12430.5 12431.5 12432.5 12433.5 12434.5 12435.5 12436.5 12437.5
[29] 12438.5 12439.5 12440.5 12441.5 12442.5 12443.5 12444.5 12445.5 12446.5 12447.5 12448.5 12449.5 12450.5 12451.5
[43] 12452.5 12453.5 12454.5 12455.5 12456.5 12457.5 12458.5 12459.5 12460.5 12461.5 12462.5 12463.5 12464.5 12465.5
[57] 12466.5 12467.5 12468.5 12469.5 12470.5 12471.5 12472.5 12473.5 12474.5 12475.5 12476.5 12477.5 12478.5 12479.5
[71] 12480.5 12481.5 12482.5 12483.5 12484.5 12485.5 12486.5 12487.5 12488.5 12489.5 12490.5 12491.5 12492.5 12493.5
[85] 12494.5 12495.5 12496.5 12497.5 12498.5 12499.5 12500.5 12501.5 12502.5 12503.5 12504.5 12505.5 12506.5 12507.5
[99] 12508.5 12509.5 12510.5 12511.5 12512.5 12513.5 12514.5 12515.5 12516.5 12517.5 12518.5 12519.5 12520.5 12521.5
[113] 12522.5 12523.5 12524.5 12525.5 12526.5 12527.5 12528.5 12529.5 12530.5 12531.5 12532.5 12533.5 12534.5 12535.5
[127] 12536.5 12537.5 12538.5 12539.5 12540.5 12541.5 12542.5 12543.5 12544.5 12545.5 12546.5 12547.5 12548.5 12549.5
[141] 12550.5 12551.5 12552.5 12553.5 12554.5 12555.5 12556.5 12557.5 12558.5 12559.5 12560.5 12561.5 12562.5 12563.5
[155] 12564.5 12565.5 12566.5 12567.5 12568.5 12569.5 12570.5 12571.5 12572.5 12573.5 12574.5 12575.5 12576.5 12577.5
[169] 12578.5 12579.5 12580.5 12581.5 12582.5 12583.5 12584.5 12585.5 12586.5 12587.5 12588.5 12589.5 12590.5 12591.5
[183] 12592.5 12593.5 12594.5 12595.5 12596.5 12597.5 12598.5 12599.5 12600.5 12601.5 12602.5 12603.5 12604.5 12605.5
[197] 12606.5 12607.5 12608.5 12609.5
It is better late than never :). I had similar issues with the different types of calendars but after I discovered the package ncdf4.helpers, everything is much easier and faster now. An example:
library(ncdf4)
#open the netcdf file
ncin<- nc_open("nc_file.nc")
library(ncdf4.helpers)
#obtain time dimension in date format
Time_in_Date_format <- nc.get.time.series(f = ncin,
time.dim.name = "time")
nc_close(ncin)

Issue with encoding of cyrlic character strings

I have some cyrlic strings in my dataframe that I can't manage to read acuratelly.
This is how the dataframe looks after I load the csv:
unique(transactions$orders)
[1] "ÌÈÏÑ-ÏÏ30Å" "ÈÍÒ-ÏÏ30Å" "ÊÈÁÑ-ÏÏ30Å" "ÊÈÁÑ-ÏÏ50Å" "ÌÈÏÑ-ÏÏ50Å" "ÊÈÁÑ-ÏÏ53Å" "ÈÍÒ-ÏÏ53Å"
[8] "ÌÈÏÑ-ÏÏ53Å" "ÈÍÒ-ÏÏ30" "ÊÈÁÑ-ÏÏ30" "ÌÈÏÑ-ÏÏ30" "ÌÈÏÑ-ÏÏ50" "ÈÍÒ-ÏÏ10" "ÊÈÁÑ-ÏÏ50"
[15] "ÈÍÒ-ÏÏ40" "ÊÈÁÑ-ÏÏ53" "ÈÍÒ-ÏÏ53" "ÌÈÏÑ-ÏÏ53" "ÊÈÁÑ-ÏÏ10" "ÈÍÒ-ÏÏ30Ï" "ÊÈÁÑ-ÏÏ50Ï"
[22] "ÌÈÏÑ-ÏÏ30Ï" "ÊÈÁÑ-ÏÏ30Ï" "ÌÈÏÑ-ÏÏ50Ï" "ÈÍÒ-ÏÏ50" "ÌÈÏÑ-ÏÏ10" "ÊÈÁÑ-ÏÏ53Ï" "ÈÍÒ-ÏÏ53Ï"
Any ideas how I can fix this?

Resources