Expecting numeric in B2 / R2C2: got a date in R - r

I am reading in a data set from excel that has dates in it. When I read my code it gives me this warning: "Expecting numeric in B2 / R2C2: got a date"
All of my dates are messed up. how do I solve this?

It helps us to help you if you show the exact code that you used, including any packages used.
That warning looks like it comes from the readxl package (but could be a different package).
Basically, when functions like read_xl or even read.table are not told specifically what type of data is in each column then R will read several rows at the top of the file and make an educated guess as to what type of data is in each column, then it will start over and read the data based on those guesses.
Your warning means that there was a cell that your R function was expecting to be a number (based either on the educated guess, or because you told it to expect a number) and instead it saw a date, so it gives a warning to let you know that there was a potential problem. Note that a warning means the code continued to run, there may just be some values that don't match what you were expecting. An error would have stopped the code running and not returned anything.
To fix the problem you can either explicitly tell your R function what type of data is in each column (exactly how depends on the function). Or you can fix your Excel file so that it is clear what each type of data is (remember, just because something looks like a date in Excel does not mean that Excel realizes it is a date or tells other programs that it is a date).

Related

How to read from REDCap forms with data validation into R (REDCapR::readcap_read)

I've been using the REDCapR package to read in data from my survey form. I was reading in the data with no issue using redcap_read until I realized I needed to add a field restriction to one question on my survey. Initially it was a short answer field asking users how many of something they had, and people were doing expectedly annoying things like spelling out numbers or entering "a few" instead of a number. But all of that data read in fine. I changed the field to be a short answer field (same type as before) that requires the response to be an integer and now the data won't read into R using redcap_read.
When I run:
redcap_read(redcap_uri=uri, token=api_token)$data
I get the error message that:
Column [name of my column] can't be converted from numeric to character
I also noticed when I looked at the data that that it read in the 1st and 6th records of that column (both zeros) just fine (out of 800+ records), but everything else is NA. Is there an inherent problem with trying to read in data from a text field restricted to an integer or is there another way to do this?
Edit: it also reads the dates fine, which are text fields with a date field restriction. This seems to be very specific to reading in the validated numbers from the text field.
I also tried redcapAPI::exportRecords and it will continue to read in the rest of the dataset, but reads in NA for all values in the column with the test restriction.
Upgrade REDCapR to the version on GitHub, which stacks the batches on top of each other before determining the data type (see #257).
# install.packages("remotes") # Run this line if the 'remotes' package isn't installed already.
remotes::install_github(repo="OuhscBbmc/REDCapR")
In your case, I believe that the batches (of 200 records, by default) contain different different data types (character & numeric, according to the error message), which won't stack on top of each other silently.
The REDCapR::redcap_read() function should work then. (If not, please create a new issue).
Two alternatives are
calling redcap_read_oneshot with a large value of guess_max, or
calling redcap_read_oneshot with guess_type = TRUE.

R Error in `row.names<-.data.frame`(`*tmp*`, value = value) while using tell of the sensitivity package

I am conducting a sensitivity study using the Sensitivity package. When trying to calculate the sensitivity indices with the output data of the external model I get the error specified in the titel.
The output is a three column table stored in a csv file which I read in as follows:
day1 <- read.csv("day_1_outputs.csv",header=FALSE)
Now when I try to calculate sensitivity indices with the ouput of the first column:
tell(sob.pars,day1[,1])
I get:
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
invalid 'row.names' length
At first I thought I should use a matrix like object because in another study I conducted I generated the ouput from a raster image read in as a matrix which worked fine, but that didn't help.
The help page for tell states using a vector for the model results but even if I store the column of the dataframe before using tell the problem persists.
I guess my main problem is that I don't understand the error message in conjunction with the tell function, sob.pars is a list returned by sensitivity analyses objects constructors from the same package so I don't know to which rownames of that object the message is refering.
Any hint is appreciated.
Finally found out what the problem was. The error is kind of missleading.
The problem was not the row names since these were identical, that's what irritated me in the first place. There was obviously nothing wrong with them.
The actual problem was the column names in sob.pars. These were missing. Once I added these everything worked fine. Thanks rawr anyways (I just only now noticed someone had commented on the question, I thought I would be notified when this happens, but I guess not).

convert period in stata to NA in r

I have a dataset in stata and I want to take it to R, but there are some missing values in state and they are represented using a period. I want to get the data into R which I do by loading the foreign package and then I use read.table() function. How do I convert the periods in state which are genuinely missing to NA in R?
If i understand you correctly, you first load the Foreign-Package for loading a .dta-File, correct?
library("foreign")
Then you would read in your Data by using:
myRFile <- read.dta(file="someStataFile.dta")
You are asking for a way that the missing operator from Stata, often denoted by a dot ., is converted to the missing operator in R, NA, also correct?
One thing to know here is, that Stata handles missing values "behind the scenes" in multiple ways. There are actually about 27 different missing operators in Stata, which are usually not distinguishable for the user. You do not need to know them for you problem though, because read.dta() handles them itself.
To learn how you can tackle a simple problem like this yourself in the future, you always need to check the help file for your function first:
help(read.dta)
Here you see, that the function handles the extensive missing-data types from Stata automatically and correctly.
If you want to have information about which type of missing operator was recognized, you can set the argument missing.type=TRUE, by using:
myRFile <- read.dta(file="someStataFile.dta", missing.type=TRUE)
Then, according to the help file, the following will happen:
If missing.type is TRUE a separate list is created with the same
variable names as the loaded data. For string variables the list value
is NULL. For other variables the value is NA where the observation is
not missing and 0–26 when the observation is missing. This is attached
as the "missing" attribute of the returned value.

MS Project - Column with formula does not calculate correctly

I am trying to add an indicator light to my MS Project sheet similar to this one: Late Indicator Tool. I'm using a simplified formula: IIf([% Complete]<>100,DateDiff("d",[Deadline],[Finish]))
For any row that I enter all the information by hand, the formula works perfectly. However, the formula returns 0 for any rows where I paste data in from other project files (even if all I paste in is the task name).
Even if I attempt to use an even simpler formula ([Deadline]-[Finish]), it still returns 0 (and breaks even further by returning 4294925695.29 or 4294925708.67 instead of #Error in the rows where the Deadline is NA).
Has anyone else had any issues with calculated columns in MS Project and can help me fix it?
EDIT: I gave up on this approach when I discovered a work-around: There is a column called "Finish Variance" that will automatically calculate the difference between the Finish date and the date in the "Baseline Finish" column (which I am now using instead of "Deadline").
Your first problem sounds like your project may be corrupted (or the file that you are pasting from). I suggest building a small sample project to see if you can replicate this error. (I could not replicate it.)
As for the second problem, when the Deadline is NA, Project is substituting a default value of the largest unsigned 32-bit integer (2^32-1). To avoid this unintended value, use an If statement in your formula to return your own value in case Deadline is NA.

R claims that data is non-numeric, but after writing to file is numeric

I have read in a table in R, and am trying to take log of the data. This gives me an error that the last column contains non-numeric values:
> log(TD_complete)
Error in Math.data.frame(list(X2011.01 = c(187072L, 140815L, 785077L, :
non-numeric variable in data frame: X2013.05
The data "looks" numeric, i.e. when I read it my brain interprets it as numbers. I can't be totally wrong since the following will work:
> write.table(TD_complete,"C:\\tmp\\rubbish.csv", sep = ",")
> newdata = read.csv("C:\\tmp\\rubbish.csv")
> log(newdata)
The last line will happily output numbers.
This doesn't make any sense to me - either the data is numeric when I read it in the first time round, or it is not. Any ideas what might be going on?
EDIT: Unfortunately I can't share the data, it's confidential.
Review the colClasses argument of read.csv(), where you can specify what type each column should be read and stored as. That might not be so helpful if you have a large number of columns, but using it makes sure R doesn't have to guess what type of data you're using.
Just because "the last line will happily output numbers" doesn't mean R is treating the values as numeric.
Also, it would help to see some of your data.
If you provide the actual data or a sample of it, help will be much easier.
In this case I assume R has the column in question saved as a string and writes it without any parantheses into the CSV file. Once there, it reads it again and does not bother to interpret a value without any characters as anything else than a number. In other words, by writing and reading a CSV file you converted a string containing only numbers into a proper integer (or float).
But without the actual data or the rest of the code this is mere conjecture.

Resources