My data frame is as follows:
> t
Day TestID VarID
1 2013-04-27 Total Total
> str(t)
'data.frame': 1 obs. of 3 variables:
$ Day : Date, format: "2013-04-27"
$ TestID: factor [1, 1] Total
..- attr(*, "levels")= chr "Total"
$ VarID : Factor w/ 3 levels "0|0","731|18503",..: 3
When I try doing a rbind I get the following error
> rbind(t,t)
Error in NextMethod() : invalid value
but when I try to recreate the data frame directly I don't get that error:
> t <- data.frame(Day = as.Date("2013-04-27"),TestID = "Total", VarID = "Total")
> t
Day TestID VarID
1 2013-04-27 Total Total
> str(t)
'data.frame': 1 obs. of 3 variables:
$ Day : Date, format: "2013-04-27"
$ TestID: Factor w/ 1 level "Total": 1
$ VarID : Factor w/ 1 level "Total": 1
> rbind(t,t)
Day TestID VarID
1 2013-04-27 Total Total
2 2013-04-27 Total Total
Can anyone help me figure out what is going on and how can I avoid this error.
Thanks.
The major difference I see is that the TestID variable in the first version is factor [1, 1] (a matrix) rather than Factor (a vector)
First version:
t1 <- data.frame(Day = as.Date("2013-04-27"),
TestID = "Total", VarID = "Total")
rbind(t1,t1)
Convert to second version:
t2 <- t1
dim(t2$TestID) <- c(1,1)
str(t2$TestID)
## factor [1, 1] Total
## - attr(*, "levels")= chr "Total"
rbind(t2,t2)
## Error in NextMethod() : invalid value
Fix the mangled version:
t3 <- t2
t3$TestID <- drop(t3$TestID)
rbind(t3,t3) ## works
Related
folks...
I am having trouble with date/time showing up properly in lubridate.
Here's my code:
Temp.dat <- read_excel("Temperature Data.xlsx", sheet = "Sheet1", na="NA") %>%
mutate(Treatment = as.factor(Treatment),
TempC=as.factor(TempC),
TempF=as.factor(TempF),
Month=as.factor(Month),
Day=as.factor(Day),
Year=as.factor(Year),
Time=as.factor(Time))%>%
select(TempC, Treatment, Month, Day, Year, Time)%>%
mutate(Measurement=make_datetime(Month, Day, Year, Time))
Here's what it spits out:
tibble [44 x 7] (S3: tbl_df/tbl/data.frame)
$ TempC : Factor w/ 38 levels "15.5555555555556",..: 31 32 29 20 17 28 27 26 23 24 ...
$ Treatment : Factor w/ 2 levels "Grass","Soil": 1 1 1 1 2 2 2 2 2 2 ...
$ Month : Factor w/ 1 level "6": 1 1 1 1 1 1 1 1 1 1 ...
$ Day : Factor w/ 2 levels "15","16": 1 1 1 1 1 1 1 1 1 1 ...
$ Year : Factor w/ 1 level "2022": 1 1 1 1 1 1 1 1 1 1 ...
$ Time : Factor w/ 3 levels "700","1200","1600": 3 3 3 3 3 3 3 3 3 3 ...
**$ Measurement: POSIXct[1:44], format: "0001-01-01 03:00:00" "0001-01-01 03:00:00" "0001-01-01 03:00:00" "0001-01-01 03:00:00" ...**
I've put asterisks by the problem result. It should spit out June 16th at 0700 or something like that, but instead it's defaulting to January 01, 1AD for some reason. I've tried adding colons to the date in excel, but that defaults to a 12-hour timecycle and I'd like to keep this at 24 hours.
What's going on here?
This will work as long as the format in the excel file for date is set to time, and it imports as a date-time object that lubridate can interpret.
library(dplyr)
library(lubridate)
Temp.dat <- read_excel("t.xlsx", sheet = "Sheet1", na="NA") %>%
mutate(Treatment = as.factor(Treatment),
TempC = as.numeric(TempC),
TempF = as.numeric(TempF),
Month = as.numeric(Month),
Day = as.numeric(Day),
Year = as.numeric(Year),
Hour = hour(Time),
Minute = minute(Time)) %>%
select(TempC, Treatment, Month, Day, Year, Hour, Minute) %>%
mutate(Measurement = make_datetime(year = Year,
month = Month,
day = Day,
hour = Hour,
min = Minute))
Notice the value for the arguments for make_datetime() are set to numeric, which is what the function expects. If you pass factors, the function gives you the weird dates you were seeing.
No need to convert Time to string and extract hours and minutes, as I suggested in the comments, since you can use lubridate's minute() and hour() functions.
EDIT
In order to be able to use lubridate's functions Time needs to be a date-time object. You can check that it is by looking at what read_excel() produces
> str(read_excel("t.xlsx", sheet = "Sheet1", na="NA"))
tibble [2 × 7] (S3: tbl_df/tbl/data.frame)
$ Treatment: chr [1:2] "s" "c"
$ TempC : num [1:2] 34 23
$ TempF : num [1:2] 99 60
$ Month : num [1:2] 5 4
$ Day : num [1:2] 1 15
$ Year : num [1:2] 2020 2021
$ Time : POSIXct[1:2], format: "1899-12-31 04:33:23" "1899-12-31 03:20:23"
See that Time is type POSIXct, a date-time object. If it is not, then you need to convert it into one if you want to use lubridate's minute() and hour() functions. If it cannot be converted, there are other solutions, but they depend on what you have.
I have read a lot of blogs, but I cannot find the answer to my question:
I have a date 2020-25-02 17:45:03 and I would like to convert it to two columns day and time.
hello <- strptime(as.character("2020-25-02 17:42:03"),"%Y-%m-%d %H:%M:%S")
df$day <- as.Date(hello, format = "%Y-%d-%m")
But I also would like df$time. Is it possible ?
dtimes = c("2002-06-09 12:45:40","2003-01-29 09:30:40",
+ "2002-09-04 16:45:40","2002-11-13 20:00:40",
+ "2002-07-07 17:30:40")
> dtparts = t(as.data.frame(strsplit(dtimes,' ')))
> row.names(dtparts) = NULL
> thetimes = chron(dates=dtparts[,1],times=dtparts[,2],
+ format=c('y-m-d','h:m:s'))
> thetimes
[1] (02-06-09 12:45:40) (03-01-29 09:30:40) (02-09-04 16:45:40)
[4] (02-11-13 20:00:40) (02-07-07 17:30:40)
Please see this link
Use function hms in package lubridate.
df <- data.frame(day = as.Date(hello, format = "%Y-%d-%m"))
df$time <- lubridate::hms(sub("^[^ ]*\\b(.*)$", "\\1", hello))
df
# day time
#1 2020-02-25 17H 42M 3S
str(df)
#'data.frame': 1 obs. of 2 variables:
# $ day : Date, format: "2020-02-25"
# $ time:Formal class 'Period' [package "lubridate"] with 6 slots
# .. ..# .Data : num 3
# .. ..# year : num 0
# .. ..# month : num 0
# .. ..# day : num 0
# .. ..# hour : num 17
# .. ..# minute: num 42
I want to concate 2 columns with numbers and get as result a number.
Example:
First column: 123456
Second column: 78910
Desired Result: 12345678910
test<-matrix(
c(328897771052600448,4124523780886268),
nrow=1,
ncol=2
)
test<-data.frame(test)
str(test)
Both columns are numeric
colnames(test)<-c("post_visid_high","post_visid_low")
test_2<-transform(test,visit_id=as.numeric(paste0(post_visid_high,post_visid_low)))
Problem:
My concated result gives: 3.288977710526004289528e+33
I dont understand why I get this (incorrect??) number.
When I exlcude "as.numeric" I get the right result:
test_2<-transform(test,visit_id=paste0(post_visid_high,post_visid_low))
test_2
But its converted into "factor":
str(test_2)
These numbers are to large to be stored exactly as numeric. You can either store them as string by specifying stringsAsFactors = FALSE:
test_2<-transform(test,visit_id=paste0(post_visid_high,post_visid_low), stringsAsFactors = FALSE)
test_2
#> post_visid_high post_visid_low visit_id
#> 1 3.288978e+17 4.124524e+15 3288977710526004484124523780886268
str(test_2)
#> 'data.frame': 1 obs. of 3 variables:
#> $ post_visid_high: num 3.29e+17
#> $ post_visid_low : num 4.12e+15
#> $ visit_id : chr "3288977710526004484124523780886268"
Or you use something like gmp to process arbitrary sized integers:
library(gmp)
test_3 <- test
test_3$visit_id <- as.bigz(paste0(test_3$post_visid_high, test_3$post_visid_low))
test_3
#> post_visid_high post_visid_low visit_id
#> 1 3.288978e+17 4.124524e+15 3288977710526004484124523780886268
str(test_3)
#> 'data.frame': 1 obs. of 3 variables:
#> $ post_visid_high: num 3.29e+17
#> $ post_visid_low : num 4.12e+15
#> $ visit_id : 'bigz' raw 3288977710526004484124523780886268
I have a function that I apply to a column and puts results in another column and it sometimes gives me integer(0) as output. So my output column will be something like:
45
64
integer(0)
78
How can I detect these integer(0)'s and replace them by NA? Is there something like is.na() that will detect them ?
Edit: Ok I think I have a reproducible example:
df1 <-data.frame(c("267119002","257051033",NA,"267098003","267099020","267047006"))
names(df1)[1]<-"ID"
df2 <-data.frame(c("257051033","267098003","267119002","267047006","267099020"))
names(df2)[1]<-"ID"
df2$vals <-c(11,22,33,44,55)
fetcher <-function(x){
y <- df2$vals[which(match(df2$ID,x)==TRUE)]
return(y)
}
sapply(df1$ID,function(x) fetcher(x))
The output from this sapply is the source of the problem.
> str(sapply(df1$ID,function(x) fetcher(x)))
List of 6
$ : num 33
$ : num 11
$ : num(0)
$ : num 22
$ : num 55
$ : num 44
I don't want this to be a list - I want a vector, and instead of num(0) I want NA (note in this toy data it gives num(0) - in my real data it gives (integer(0)).
Here's a way to (a) replace integer(0) with NA and (b) transform the list into a vector.
# a regular data frame
> dat <- data.frame(x = 1:4)
# add a list including integer(0) as a column
> dat$col <- list(45,
+ 64,
+ integer(0),
+ 78)
> str(dat)
'data.frame': 4 obs. of 2 variables:
$ x : int 1 2 3 4
$ col:List of 4
..$ : num 45
..$ : num 64
..$ : int
..$ : num 78
# find zero-length values
> idx <- !(sapply(dat$col, length))
# replace these values with NA
> dat$col[idx] <- NA
# transform list to vector
> dat$col <- unlist(dat$col)
# now the data frame contains vector columns only
> str(dat)
'data.frame': 4 obs. of 2 variables:
$ x : int 1 2 3 4
$ col: num 45 64 NA 78
Best to do that in your function, I'll call it myFunctionForApply but that's your current function. Before you return, check the length and if it is 0 return NA:
myFunctionForApply <- function(x, ...) {
# Do your processing
# Let's say it ends up in variable 'ret':
if (length(ret) == 0)
return(NA)
return(ret)
}
I have a data.frame mydf, that contains data from 27 subjects. There are two predictors, congruent (2 levels) and offset (5 levels), so overall there are 10 conditions. Each of the 27 subjects was tested 20 times under each condition, resulting in a total of 10*27*20 = 5400 observations. RT is the response variable. The structure looks like this:
> str(mydf)
'data.frame': 5400 obs. of 4 variables:
$ subject : Factor w/ 27 levels "1","2","3","5",..: 1 1 1 1 1 1 1 1 1 1 ...
$ congruent: logi TRUE FALSE FALSE TRUE FALSE TRUE ...
$ offset : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 5 5 1 2 5 5 2 2 3 5 ...
$ RT : int 330 343 457 436 302 311 595 330 338 374 ...
I've used daply() to calculate the mean RT of each subject in each of the 10 conditions:
myarray <- daply(mydf, .(subject, congruent, offset), summarize, mean = mean(RT))
The result looks just the way I wanted, i.e. a 3d-array; so to speak 5 tables (one for each offset condition) that show the mean of each subject in the congruent=FALSE vs. the congruent=TRUE condition.
However if I check the structure of myarray, I get a confusing output:
List of 270
$ : num 417
$ : num 393
$ : num 364
$ : num 399
$ : num 374
...
# and so on
...
[list output truncated]
- attr(*, "dim")= int [1:3] 27 2 5
- attr(*, "dimnames")=List of 3
..$ subject : chr [1:27] "1" "2" "3" "5" ...
..$ congruent: chr [1:2] "FALSE" "TRUE"
..$ offset : chr [1:5] "1" "2" "3" "4" ...
This looks totally different from the structure of the prototypical ozone array from the plyr package, even though it's a very similar format (3 dimensions, only numerical values).
I want to compute some further summarizing information on this array, by means of aaply. Precisely, I want to calculate the difference between the congruent and the incongruent means for each subject and offset.
However, already the most basic application of aaply() like aaply(myarray,2,mean) returns non-sense output:
FALSE TRUE
NA NA
Warning messages:
1: In mean.default(piece, ...) :
argument is not numeric or logical: returning NA
2: In mean.default(piece, ...) :
argument is not numeric or logical: returning NA
I have no idea, why the daply() function returns such weirdly structured output and thereby prevents any further use of aaply. Any kind of help is kindly appreciated, I frankly admit that I have hardly any experience with the plyr package.
Since you haven't included your data it's hard to know for sure, but I tried to make a dummy set off your str(). You can do what you want (I'm guessing) with two uses of ddply. First the means, then the difference of the means.
#Make dummy data
mydf <- data.frame(subject = rep(1:5, each = 150),
congruent = rep(c(TRUE, FALSE), each = 75),
offset = rep(1:5, each = 15), RT = sample(300:500, 750, replace = T))
#Make means
mydf.mean <- ddply(mydf, .(subject, congruent, offset), summarise, mean.RT = mean(RT))
#Calculate difference between congruent and incongruent
mydf.diff <- ddply(mydf.mean, .(subject, offset), summarise, diff.mean = diff(mean.RT))
head(mydf.diff)
# subject offset diff.mean
# 1 1 1 39.133333
# 2 1 2 9.200000
# 3 1 3 20.933333
# 4 1 4 -1.533333
# 5 1 5 -34.266667
# 6 2 1 -2.800000