as.date not working in R script for Jupyter Books - r

I have been writing code in R studio and tried to move it over to Jupyer Books to share it with people.
The code all works in R studio but when I run it in Jupyer Books, as.date() does not convert the date column which begins as a factor into a date which then means I have no data when I subset by date later on.
Has anyone had this happen and know a solution? Or will I just need to use lubridate or similar to convert the date?
Thanks,
Dave

My guess is that you are running different R versions in both places. Run R.version.string at both the places to check which version of R you are running at each of them. Since R 4.0.0 the default behaviour of R changed when importing string data into R. Previously they were imported as factors and now (since 4.0.0) they are imported as characters.
The solution is to import your dataset with stringsAsFactors = FALSE in both the places to see the same output at both the places.
data <- read.csv('filename.csv', stringsAsFactors = FALSE)

Related

R Program Version issue

Results are different in Version 3.6 & 4.1.
My R(3.6) code in the ubuntu server(18) is running well but the same code in ubuntu 20 R(4.1) is working very badly.
look at this capture
Issue with R Version
The purpose of this code is to normalize the column by dividing the sum.
Thank you all in advance.
Please don't post code as an image. It is also advised to post a reproducible example.
In any case, in your example on R 3.6, all_bins is a factor. However, in your R 4.1 example, all_bins is a character vector.
This is because of the change in R 4.0.0.:
R now uses a ‘⁠stringsAsFactors = FALSE⁠’ default, and hence by default no longer converts strings to factors in calls to data.frame() and read.table().
In order to reproduce the server behaviour on your local machine, when you read in bins in your local version of R, you need to add the argument stringsAsFactors = TRUE, e.g.:
bins <- read.csv("path/to/file", stringsAsFactors = TRUE)
This should solve this particular issue. However, you may run into other differences between R 3.6 and R 4.1 on different machines. I would recommend running the same version of R and packages on both machines, perhaps using renv, if you want to ensure the output is the same.

Inconsistencies of R scripts between RStudio and TibcoSpotfire

When making data functions for Tibco SpotFire - build version 7.8.1.0.9 - I use RStudio - R version 3.5.2 (2018-12-20) - for writing and debugging the functions, and then I copy my code into SpotFire when I am done.
On several occasions, I have noticed inconsistencies between how R code runs between RStudio and SpotFire. Whenever these arise, the results produced by RStudio are consistent with the online R documentation, and those produced by SpotFire are not.
I have not been tracking examples as I go, but I do have my most recent example of this available. Below is a simplified version of that data function. It and the paragraph below it are more in-the-weeds than is ideal for this post, but hopefully it demonstrates the type of issue I keep coming across.
# converts date strings "yyyy-MM-dd" to week number strings "yyyyww",
# where ww is the week number in the year (ISO 8601 convention.)
# dates is a vector (R) or column in a data table (SpotFire)
# containing strings, formatted as "yyyy-MM-dd". In SpotFire,
# the data type for the column is String, not Date.
Week <- strftime(dates, format="%Y%V")
A link to the documentation for R's strftime function is here. RStudio
returns values like "201901", which is what the documentation indicates it should for the format argument used. SpotFire returns values like "2019" - no week number info is there at all, against the documentation. If I replace format="%Y%V" with format="%Y%W", RStudio returns values like "201900", which again is what is indicated by the documentation. As far as I can tell, SpotFire returns the values it is supposed to with format="%Y%V" - so I guess internally it changes the inputs in some manner.
My basic question is: How do I get around this sort of thing, and how can I know when/how SpotFire is going to mess with my functions and their variables in some weird manner? E.g., is there some special version of R that Tibco uses that is not the documented R, or is there documentation that Tibco provides for how it's going to internally handle R code?
Thanks for any help.
The short answer is yes. Spotfire natively runs TERR, a special version of R that TIBCO uses. This link gives the main differences but it is not exhaustive: R/4.4.0/doc/html/Differences_Between_TERR_and_R/differences.html
They are two separate language engines. If you google 'TIBCO TERR' you will find a lot of information. You will find the exact version of TERR you are running in your Spotfire by going to Tools > TERR Tools.
You can use RStudio and point it to where TERR is installed on your machine, the same way you point it to your R installation. This way you can verify your code does what you expect. It looks in this case that %V is not supported but %W is. You can also use open source R within Spotfire, but then you need a statistics server.
Gaia

gsub error message when addressing column in dataframe in RStudio

Since a couple of days I get the following error message in RStudio from time to time and can't figure out what is causing it.
When I write in the console window to address a data.frame followed by $ to address a specific column in the data.frame (for example df$SomeVariable), the following message is shown in the console window and is printed over an over with every letter I type
Error in gsub(reStrip, "", completions, perl = TRUE) :
input string 38 is invalid UTF-8
The error message doesn't have any real effect. Everything works just fine except the automatic completion of the variable name.
I'm using R version 3.4.4 and RStudio Version 1.0.143 on a Windows computer. In the R script I am currently working on I don't use gsub or any other "string" or regular expression function for that matter. The issue appeared with various data.frames and various types of variables in the data.frames (numeric, integer, date, factor, etc.). It also happens with various packages. Currently, I am using combinations of the packages readr, dplyr, plm, lfe, readstata13, infuser, and RPostgres. The issue disappears for a while after closing RStudio and opening it again but re-appears after working for a while.
Does anyone have an idea what may cause this and how to fix it?
I used to have the same problem a few days ago. I made some research and i found that when you import the dataset, you can change the encoding. Change the encoding to "latin1" and maybe that could fix your problem. Sorry for my poor english, im from Southamerica. Hope it works.

Convert Stata 13 .dta file to CSV without using stata [duplicate]

Is there a way to read a Stata version 13 dataset file in R?
I have tried to do the following:
> library(foreign)
> data = read.dta("TEAdataSTATA.dta")
However, I got an error:
Error in read.dta("TEAdataSTATA.dta") :
not a Stata version 5-12 .dta file
Could someone point out if there is a way to fix this?
There is a new package to import Stata 13 files into a data.frame in R.
Install the package and read a Stata 13 dataset with read.dta13():
install.packages("readstata13")
library(readstata13)
dat <- read.dta13("TEAdataSTATA.dta")
Update: readstata13 imports in version 0.8 also files from Stata 6 to 14
More about the package: https://github.com/sjewo/readstata13
There's a new package called Haven, by Hadley Wickham, which can load Stata 13 dta files (as well as SAS and SPSS files)
library(haven) # haven package now available on cran
df <- read_dta('c:/somefile.dta')
See: https://github.com/hadley/haven
If you have Stata 13, then you can load it there and save it as a Stata 12 format using the command saveold (see help saveold). Afterwards, take it to R.
If you have, Stata 10 - 12, you can use the user-written command use13, (by Sergiy Radyakin) to load it and save it there; then to R. You can install use13 running ssc install use13.
Details can be found at http://radyakin.org/transfer/use13/use13.htm
Other alternatives, still with Stata, involve exporting the Stata format to something else that R will read, e.g. text-based files. See help export within Stata.
Update
Starting Stata 14, saveold has a version() option, allowing one to save in Stata .dta formats as old as Stata 11.
In the meanwhile savespss command became a member of the SSC archive and can be installed to Stata with: findit savespss
The homepage http://www.radyakin.org/transfer/savespss/savespss.htm continues to work, but the program should be installed from the SSC now, not from the beta location.
I am not familiar with the current state of R programs regarding their ability
to read other file formats, but if someone doesn't have Stata installed on their computer and R cannot read a specific version of Stata's dta files, Pandas in Python can now do the vast majority of such conversions.
Basically, the data from the dta file are first loaded using the pandas.read_stata function. As of version 0.23.0, the supported encoding and formats can be found in a related answer of mine.
Then one can either save the data as a csv file and import them
using standard R functions, or instead use the pandas.DataFrame.to_feather function, which exports the data using a serialization format built on Apache Arrow. The latter has extensive support in R as it was conceived to promote interoperability with Pandas.
I had the same problem. Tried read.dta13, read.dta but nothing worked. Then tried the easiest and least expected: MS Excel! It opened marvelously. I saved it as a .csv and used in R!!! Hope this helps!!!!

Read Stata 13 file in R

Is there a way to read a Stata version 13 dataset file in R?
I have tried to do the following:
> library(foreign)
> data = read.dta("TEAdataSTATA.dta")
However, I got an error:
Error in read.dta("TEAdataSTATA.dta") :
not a Stata version 5-12 .dta file
Could someone point out if there is a way to fix this?
There is a new package to import Stata 13 files into a data.frame in R.
Install the package and read a Stata 13 dataset with read.dta13():
install.packages("readstata13")
library(readstata13)
dat <- read.dta13("TEAdataSTATA.dta")
Update: readstata13 imports in version 0.8 also files from Stata 6 to 14
More about the package: https://github.com/sjewo/readstata13
There's a new package called Haven, by Hadley Wickham, which can load Stata 13 dta files (as well as SAS and SPSS files)
library(haven) # haven package now available on cran
df <- read_dta('c:/somefile.dta')
See: https://github.com/hadley/haven
If you have Stata 13, then you can load it there and save it as a Stata 12 format using the command saveold (see help saveold). Afterwards, take it to R.
If you have, Stata 10 - 12, you can use the user-written command use13, (by Sergiy Radyakin) to load it and save it there; then to R. You can install use13 running ssc install use13.
Details can be found at http://radyakin.org/transfer/use13/use13.htm
Other alternatives, still with Stata, involve exporting the Stata format to something else that R will read, e.g. text-based files. See help export within Stata.
Update
Starting Stata 14, saveold has a version() option, allowing one to save in Stata .dta formats as old as Stata 11.
In the meanwhile savespss command became a member of the SSC archive and can be installed to Stata with: findit savespss
The homepage http://www.radyakin.org/transfer/savespss/savespss.htm continues to work, but the program should be installed from the SSC now, not from the beta location.
I am not familiar with the current state of R programs regarding their ability
to read other file formats, but if someone doesn't have Stata installed on their computer and R cannot read a specific version of Stata's dta files, Pandas in Python can now do the vast majority of such conversions.
Basically, the data from the dta file are first loaded using the pandas.read_stata function. As of version 0.23.0, the supported encoding and formats can be found in a related answer of mine.
Then one can either save the data as a csv file and import them
using standard R functions, or instead use the pandas.DataFrame.to_feather function, which exports the data using a serialization format built on Apache Arrow. The latter has extensive support in R as it was conceived to promote interoperability with Pandas.
I had the same problem. Tried read.dta13, read.dta but nothing worked. Then tried the easiest and least expected: MS Excel! It opened marvelously. I saved it as a .csv and used in R!!! Hope this helps!!!!

Resources