Reading date as a character in googlevis - r

so this is in reference to an earlier question: Using a ordered factor as timevar in Motion Chart but it wouldn't let me leave this as a comment :/
So, I am having the same error the person earlier was having, thing is, according to the answer: "the documentation says that timevar argument can't handle factor. It can handle character if and only if they are in a particular format, that is (for example): 2010Q1." Thing is, I already have my data formatted like that, in a csv file: http://www.filedropper.com/texasgdp
Time GDP
2006Q1 500
2006Q2 1000
2006Q3 2000
2006Q4 2600....etc
So, if this is the character format, why am i still getting the same error? Is there a way I could just have rstudio reread that entire column as a "character" rather than a factor?

Yes, you can use this as you read the data in:
read.csv('file path',stringsAsFactors=FALSE)
Or you can convert the column to a character vector after reading the data:
df$Time <- as.character(df$Time)

Related

R problem Date column stored as Factor R can't convert it

I have downloaded the SP500 data from Yahoo Finance ticker GSPC and am trying to filter it by year, however the Date column is stored as Factor so R can't filter it. Can anyone help me convert it? I tried multiple solutions, but nothing worked.
So far I've used the loaded the lubridate package and used the following code, but all the values just got replaced with NA's.
as.Date(SP500$Date, format = "%m-%d-%Y")
Then I used the: SP500$Date <- ymd(SP500$Date, format = "%Y-%m-%d") code and again nothing happened. (SP500 is the name of the data frame that I stored the data in)
Also, tried using just SP500$Date <- as.Date(SP500$Date) but R says do not know how to convert it to Date.
Any help would be much appreciated! Thank you!
Classes only exist in the environment of a programming language. What likely happened was that your data (perhaps a .csv file?) got interpreted as factor by R during reading.
Everything you're trying to do here can be accomplished using the base library in R (meaning you don't need to import anything).
If you're dealing with dates:
df$date <- as.Date(df$date, format = "%Y-%m-%d")
If you're dealing with datetimes:
df$date <- as.POSIXct(df$date, format = "%Y-%m-%d %H:%M:%S")
(obviously the specific format may vary; see list)
Occasionally, coercion in R may act finicky. The format parameter is somewhat unforgiving of errors. I personally frequently mistake - for /, or conflate "%Y-%m-%d" with "%d-%m-%Y" causing the operation to throw an error. Obviously, if the format isn't consistent in your data, instances that can't be described by the specific format you supplied will result in NAs.
Sometimes your dates are actually integers (e.g. 20181111); in this case, you may need to supply '1970-01-01' to the origin parameter of as.Date(). For example, if you are iterating through a vector of Dates using a for loop, R won't honour the class of passed Dates and will convert them to integers.
It may sound like a bandaid solution, but class coercions from common types like character are usually written well; I often pre-emptively coerce the object to character when I'm clueless about why my attempt to coerce a class failed.

Why are dates in a list first converted to numeric when coercing to character

I'm trying to convert a column of dates into strings, because I want to use them as factor levels at some later point in my code.
The date column is part of a tibble, and is of class Date. I figured that a simple as.character() conversion would do the trick, but unfortunately I was wrong. Instead of neatly formatted strings it returns a number in string form. For example today (22 november 2017) would come out as "17492". So somewhere in the process the date gets converted into its numeric format and only then turned into a character string.
Now I did find a workaround, by unlisting the data, converting it again to dates and then to character strings, but it is fairly inefficient.
Can anyone explain i) why this occurs and ii) if there is an easier fix?
Below a reproducible example:
#Get current system date
foo <-Sys.Date()
#Convert to list
foo <- as.list(foo)
#The following then produces the number string:
as.character(foo)
[1] "17492"
#The following code works but is a rather annoying work-around
as.character(as.Date(unlist(foo), origin=as.Date("1970-01-01")))
[1] "2017-11-22"
Given the amount of useful comments and the final solutions provided I'll post an answer summary here.
The first thing to do if you run into this problem is check whether you actually want to convert the full list, or a column within the list, with the column actually being a vector. This was my underlying problem as MrFlick and neilfws pointed out. The reason I missed that was because in my case the list was a one column tibble, the column being named "date". Using as.character(foo) returned my "numeric string" "17492", but using as.character(foo$date), did exactly what it was supposed to do and returned "2017-11-22".
In case your list is really just a list, or a list of lists, the solution of d.b. works like a breeze: use lapply(foo, as.character) or sapply(foo, as.character) depending a bit on your desired output.
Now as to the why this happens: the direct reason, as pointed out by d.b. is that if as.character() encounters a list it first unlist() it, and then does the conversion.
The deeper why was pointed out by joran and the duplicate question on that here. In short: usually it does not make sense to convert a full list to a single data type class, as it can can contain many. For example as.numeric(foo) would just return an error. The only exception to that is as.character(), that actually makes a full write-out of the list (perhaps to keep records).

Need to convert factor variable to numeric, but is little more complicated [duplicate]

This question already has answers here:
How to read data when some numbers contain commas as thousand separator?
(11 answers)
Closed 7 years ago.
Today I download dataset in csv format from the Eurostat website. I load this dataset to the rstudio by read.csv command and by subseting get data I need. Now I am in situation that I have 12 observation with around 9 variables. One of the variables is value I am interested in, but the problem is value is coded as factor variable (with 754 levels).
It would be easily overcome by as.numeric command, but problem is that the numbers are in the format like this "48,478", so Rstudio don't see one number (just my guess) and if I use as.numeric command I don't get 48478 but some different number, maybe mean or else but definitely not 48478 as a number. After few minutes I realize that problem is probably with the "," and start looking for solution how to remove it.
One solution I found is that use edit command and erase it manually, but I am planning to use more subsets from the original dataset and I hope it's not necessary to every time I will make new dataset to use edit command and manually erase symbol that make me mad there.
You can read the data in and then replace the "," before converting string to numeric:
Read the dataset with stringsAsFactors=FALSE:
raw <- read.csv("a.csv",stringsAsFactors=FALSE)
Converte the string to numeric (same logic as you replace the "," in editor):
raw$number <- as.numeric(gsub(",","",raw$numberAsString)) # converte the numberAsString to numeric after substituting ","

how to read the variables in the format of long digits to R

my question is I have a column which has such format as 20000000002185979. Everytime I read the csv file into R, it became "2e+16". So I can't distinguish from different values. Do you have any good ideas about how to keep the original format when read the file into R? Thx!
Since it turned out to be the answer you wanted. I'll post it here to close out the question.
Since R is unable to maintain that many digits of precision with it's numeric values, you'll have to read it in as a character value. You can do that by setting the colClasses parameter of read.table.

R claims that data is non-numeric, but after writing to file is numeric

I have read in a table in R, and am trying to take log of the data. This gives me an error that the last column contains non-numeric values:
> log(TD_complete)
Error in Math.data.frame(list(X2011.01 = c(187072L, 140815L, 785077L, :
non-numeric variable in data frame: X2013.05
The data "looks" numeric, i.e. when I read it my brain interprets it as numbers. I can't be totally wrong since the following will work:
> write.table(TD_complete,"C:\\tmp\\rubbish.csv", sep = ",")
> newdata = read.csv("C:\\tmp\\rubbish.csv")
> log(newdata)
The last line will happily output numbers.
This doesn't make any sense to me - either the data is numeric when I read it in the first time round, or it is not. Any ideas what might be going on?
EDIT: Unfortunately I can't share the data, it's confidential.
Review the colClasses argument of read.csv(), where you can specify what type each column should be read and stored as. That might not be so helpful if you have a large number of columns, but using it makes sure R doesn't have to guess what type of data you're using.
Just because "the last line will happily output numbers" doesn't mean R is treating the values as numeric.
Also, it would help to see some of your data.
If you provide the actual data or a sample of it, help will be much easier.
In this case I assume R has the column in question saved as a string and writes it without any parantheses into the CSV file. Once there, it reads it again and does not bother to interpret a value without any characters as anything else than a number. In other words, by writing and reading a CSV file you converted a string containing only numbers into a proper integer (or float).
But without the actual data or the rest of the code this is mere conjecture.

Resources