Importing custom formatted excel data into R - r

I'm trying to analyse some race data in R. The data is mainly finishing times, currently in the custom format hh:mm:ss, however when imported to R I cannot do any analysis as I'll always receive the following error message:
Warning message:
In mean.default(Swim) : argument is not numeric or logical: returning NA
Can anyone advice how best to get around this, albeit probably simple, stumbling block for me? Thanks for your help.

When you are importing a column with the format hh:mm:ss from excel to r it will be imported as factor (or character, depending on the import function/settings). That is the reason for your error message (character/factor is neither numerical or logical).
To be able to do any analysis on the data, you need to do some conversion. You have your data as character you can do
as.integer(as.POSIXct(Swim, format="%H:%M:%S")) %% 86400
to get the hh:mm:ss as number of seconds. If Swim is a factor you can do:
Swim <- as.character(Swim)
to get it as a character.

Related

Expecting numeric in B2 / R2C2: got a date in R

I am reading in a data set from excel that has dates in it. When I read my code it gives me this warning: "Expecting numeric in B2 / R2C2: got a date"
All of my dates are messed up. how do I solve this?
It helps us to help you if you show the exact code that you used, including any packages used.
That warning looks like it comes from the readxl package (but could be a different package).
Basically, when functions like read_xl or even read.table are not told specifically what type of data is in each column then R will read several rows at the top of the file and make an educated guess as to what type of data is in each column, then it will start over and read the data based on those guesses.
Your warning means that there was a cell that your R function was expecting to be a number (based either on the educated guess, or because you told it to expect a number) and instead it saw a date, so it gives a warning to let you know that there was a potential problem. Note that a warning means the code continued to run, there may just be some values that don't match what you were expecting. An error would have stopped the code running and not returned anything.
To fix the problem you can either explicitly tell your R function what type of data is in each column (exactly how depends on the function). Or you can fix your Excel file so that it is clear what each type of data is (remember, just because something looks like a date in Excel does not mean that Excel realizes it is a date or tells other programs that it is a date).

How to convert character into time duration?

I want to calculate time spent on different types of activities collected by Excel spreadsheet.
After reading the file all values of time come as character type and I'm unable to transform into HH:MM:SS.
Dataframe example:
df <- data.frame(id=c(1,2,3,4,5,6),
name=c('Sean','Bob','Dylan',"Barbara","Louis","Marine"),
Swimming=c("00:00:00","00:30:22","00:42:22",
"00:50:53","00:20:11","00:30:12"),
Skating=c("00:10:23","00:10:22","00:02:22",
"00:20:53","00:30:11","00:10:12"))
I need to transform this CHR values of Swimming and Skating column into a time duration to manipulate them. I want to know for example, how many hours all of them spend doing swimming activities.
I tried:
Lubridate package (parse_date_time) function:
parse_date_time(df[3:4],"HMS")
Gives me this warning:
Warning message:
All formats failed to parse. No formats found.
How can I transform this data in a way I can manipulate?
I've just successfully tested #thelatemail suggestion. It worked perfectly. Then I just converted to hours.
Just will duplicate your #thelatemail response here for those who feel lost and neglect comments:
as.duration(hms(df$Swimming)) I think is preferable. sum(hms(df$Swimming)) gives a really odd result while sum(as.duration(hms(df$Swimming))) gives a more expected result.

R problem Date column stored as Factor R can't convert it

I have downloaded the SP500 data from Yahoo Finance ticker GSPC and am trying to filter it by year, however the Date column is stored as Factor so R can't filter it. Can anyone help me convert it? I tried multiple solutions, but nothing worked.
So far I've used the loaded the lubridate package and used the following code, but all the values just got replaced with NA's.
as.Date(SP500$Date, format = "%m-%d-%Y")
Then I used the: SP500$Date <- ymd(SP500$Date, format = "%Y-%m-%d") code and again nothing happened. (SP500 is the name of the data frame that I stored the data in)
Also, tried using just SP500$Date <- as.Date(SP500$Date) but R says do not know how to convert it to Date.
Any help would be much appreciated! Thank you!
Classes only exist in the environment of a programming language. What likely happened was that your data (perhaps a .csv file?) got interpreted as factor by R during reading.
Everything you're trying to do here can be accomplished using the base library in R (meaning you don't need to import anything).
If you're dealing with dates:
df$date <- as.Date(df$date, format = "%Y-%m-%d")
If you're dealing with datetimes:
df$date <- as.POSIXct(df$date, format = "%Y-%m-%d %H:%M:%S")
(obviously the specific format may vary; see list)
Occasionally, coercion in R may act finicky. The format parameter is somewhat unforgiving of errors. I personally frequently mistake - for /, or conflate "%Y-%m-%d" with "%d-%m-%Y" causing the operation to throw an error. Obviously, if the format isn't consistent in your data, instances that can't be described by the specific format you supplied will result in NAs.
Sometimes your dates are actually integers (e.g. 20181111); in this case, you may need to supply '1970-01-01' to the origin parameter of as.Date(). For example, if you are iterating through a vector of Dates using a for loop, R won't honour the class of passed Dates and will convert them to integers.
It may sound like a bandaid solution, but class coercions from common types like character are usually written well; I often pre-emptively coerce the object to character when I'm clueless about why my attempt to coerce a class failed.

Reading date as a character in googlevis

so this is in reference to an earlier question: Using a ordered factor as timevar in Motion Chart but it wouldn't let me leave this as a comment :/
So, I am having the same error the person earlier was having, thing is, according to the answer: "the documentation says that timevar argument can't handle factor. It can handle character if and only if they are in a particular format, that is (for example): 2010Q1." Thing is, I already have my data formatted like that, in a csv file: http://www.filedropper.com/texasgdp
Time GDP
2006Q1 500
2006Q2 1000
2006Q3 2000
2006Q4 2600....etc
So, if this is the character format, why am i still getting the same error? Is there a way I could just have rstudio reread that entire column as a "character" rather than a factor?
Yes, you can use this as you read the data in:
read.csv('file path',stringsAsFactors=FALSE)
Or you can convert the column to a character vector after reading the data:
df$Time <- as.character(df$Time)

R claims that data is non-numeric, but after writing to file is numeric

I have read in a table in R, and am trying to take log of the data. This gives me an error that the last column contains non-numeric values:
> log(TD_complete)
Error in Math.data.frame(list(X2011.01 = c(187072L, 140815L, 785077L, :
non-numeric variable in data frame: X2013.05
The data "looks" numeric, i.e. when I read it my brain interprets it as numbers. I can't be totally wrong since the following will work:
> write.table(TD_complete,"C:\\tmp\\rubbish.csv", sep = ",")
> newdata = read.csv("C:\\tmp\\rubbish.csv")
> log(newdata)
The last line will happily output numbers.
This doesn't make any sense to me - either the data is numeric when I read it in the first time round, or it is not. Any ideas what might be going on?
EDIT: Unfortunately I can't share the data, it's confidential.
Review the colClasses argument of read.csv(), where you can specify what type each column should be read and stored as. That might not be so helpful if you have a large number of columns, but using it makes sure R doesn't have to guess what type of data you're using.
Just because "the last line will happily output numbers" doesn't mean R is treating the values as numeric.
Also, it would help to see some of your data.
If you provide the actual data or a sample of it, help will be much easier.
In this case I assume R has the column in question saved as a string and writes it without any parantheses into the CSV file. Once there, it reads it again and does not bother to interpret a value without any characters as anything else than a number. In other words, by writing and reading a CSV file you converted a string containing only numbers into a proper integer (or float).
But without the actual data or the rest of the code this is mere conjecture.

Resources