I need both the original and a rounded-to-the-day version of a datetime in a data.table. When I use the base round function to do this (as recommended here), I start getting errors regarding the number of items when I try to add it back into my data.table - even though the length looks right.
Example:
temp <- data.table(ID=1:3,dates_unrounded=rep(as.POSIXct(NA),3),dates_rounded=rep(as.POSIXct(NA),3))
dates_form1 <- c("2021-04-01","2021-06-30","2021-05-22")
dates_form2 <- as.POSIXct(dates_form1,format="%Y-%m-%d")
temp$dates_unrounded <- dates_form2
dates_form3 <- round(dates_form2,"days")
temp$dates_rounded <- dates_form3
length(dates_form3)
length(temp$dates_unrounded)
When run, produces:
> temp <- data.table(ID=1:3,dates_unrounded=rep(as.POSIXct(NA),3),dates_rounded=rep(as.POSIXct(NA),3))
> dates_form1 <- c("2021-04-01","2021-06-30","2021-05-22")
> dates_form2 <- as.POSIXct(dates_form1,format="%Y-%m-%d")
> temp$dates_unrounded <- dates_form2
> dates_form3 <- round(dates_form2,"days")
> temp$dates_rounded <- dates_form3
Error in set(x, j = name, value = value) :
Supplied 11 items to be assigned to 3 items of column 'dates_rounded'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
> length(dates_form3)
[1] 3
> length(temp$dates_unrounded)
[1] 3
What's going wrong and how do I fix it?
?round.POSIXt reveals that in this case, round() returns a POSIXlt object. But data.table doesn't work with those. So just do
dates_form3 <- round(dates_form2,"days")
dates_form3 <- as.POSIXct(dates_form3)
temp$dates_rounded <- dates_form3
length(dates_form3)
length(temp$dates_unrounded)
and you're fine.
Related
I have the following setup:
mydata:
today_date
r1 11.11.21
r2 11.11.21
r3 11.11.21
I want to convert column like 'today_date' to a date using
as.Date(today_date,tryFormats = c("%d.%m.%Y")).
So I'm using the following function, which is supposed to change the corresponding column to proper dates:
myfun <- function(x){
x<- as.Date(x, tryFormats = c("%d.%m.%Y"))
}
In this function x is representing a variable corresponding to: mydata$today_date
Sadly, x is properly representing the object that's to be replaced, so instead of:
myfun(mydata$today_date)
I still have to use:
mydata$today_date<- myfun(mydata$today_date)
How can I manipulate the function so the as.Date()-functionality is directly applied? I'm pretty certain that the variable in myfun(x) is not properly able to represent the subsection of my dataframe that I want to change. Any help is very welcome!
Try doing this.
df <- data.frame(today_date = c("11.11.21","11.11.21","11.11.21"))
myfun <- function(df, var = 'today_date'){
df[[var]] <- as.Date(df[[var]], tryFormats = c("%d.%m.%Y"))
return(df)
}
The output is
> myfun(df, "today_date")
today_date
1 0021-11-11
2 0021-11-11
3 0021-11-11
I like the magrittr assignment pipe syntax for this.
library(magrittr)
mydata$today_date %<>% myfun()
Instead of mydata$today_date<- myfun(mydata$today_date)
I have 10 variable X1,X2,..X10
> X1
[1] 11.388245 3.847984 3.271024 3.637894
> X2
[1] 3.603660 3.176091 20.868740 4.229564 3.150181 3.379059 11.379710 3.577636 5.094401
> X10
[1] 11.613462 7.360181 3.210812 5.066974 5.391218 3.049254 10.639178 4.154140
[9] 3.502896 7.919751 3.416924 6.577095 5.047722 3.953996 3.153649 3.005215
ms<-list()
for (i in c(X1,X2,X3,X4,X5,X6,X7,X8,X9,X10)){
n<-length(i)
m<-n/sum(log(i/3))
ls<-c(ms,m)
}
the above R code does not work.
what I want is to get final result with a numeric variable ms that contain 10 vaules from calculaing n/sum(log(i/3).
For example one of the value:
> n<-length(X1)
> m<-n/sum(log(X1/3))
>
> m
[1] 2.148009
after apply X1, X2,..X10 in the loop to get:
Ms <-(m1 m2 m3 ...m10)
The c function is concatenating your vectors into 1 long vector, e.g. c(1:3, 5:7) will be 1 vector with 6 elements.
I think what you want is to use list instead of c which will keep the vectors as individual vectors.
Your for look should work if you do something like:
ms<-list()
for (i in list(X1,X2,X3,X4,X5,X6,X7,X8,X9,X10)){
n<-length(i)
m<-n/sum(log(i/3))
ms<-c(ms,m)
}
Note the fix to the last line.
But since the goal is to create a new list with the results, using the lapply function may be simpler:
ms <- lapply( list(X1,X2,X3,X4,X5,X6,X7,X8,X9,X10),
function(x) length(x)/sum(log(x/3))
)
Here a MWE
ms <- c()
#to define ms in more efficient way use
#ms <- vector("double", 10)
for(i in seq_len(10)){
#use get to retrive an object from the global Environment
n<-length(get(paste0('X',i)))
m<-n/sum(log(get(paste0('X',i))/3))
ms[i]=m
}
I have a dataframe with ~9000 rows of human coded data in it, two coders per item so about 4500 unique pairs. I want to break the dataset into each of these pairs, so ~4500 dataframes, run a kripp.alpha on the scores that were assigned, and then save those into a coder sheet I have made. I cannot get the loop to work to do this.
I can get it to work individually, using this:
example.m <- as.matrix(example.m)
s <- kripp.alpha(example.m)
example$alpha <- s$value
However, when trying a loop I am getting either "Error in get(v) : object 'NA' not found" when running this:
for (i in items) {
v <- i
v <- v[c("V1","V2")]
v <- assign(v, as.matrix(get(v)))
s <- kripp.alpha(v)
i$alpha <- s$value
}
Or am getting "In i$alpha <- s$value : Coercing LHS to a list" when running:
for (i in items) {
i.m <- i[c("V1","V2")]
i.m <- as.matrix(i.m)
s <- kripp.alpha(i.m)
i$alpha <- s$value
}
Here is an example set of data. Items is a list of individual dataframes.
l <- as.data.frame(matrix(c(4,3,3,3,1,1,3,3,3,3,1,1),nrow=2))
t <- as.data.frame(matrix(c(4,3,4,3,1,1,3,3,1,3,1,1),nrow=2))
items <- c("l","t")
I am sure this is a basic question, but what I want is for each file, i, to add a column with the alpha score at the end. Thanks!
Your problem is with scoping and extracting names from objects when referenced through strings. You'd need to eval() some of your object to make your current approach work.
Here's another solution
library("irr") # For kripp.alpha
# Produce the data
l <- as.data.frame(matrix(c(4,3,3,3,1,1,3,3,3,3,1,1),nrow=2))
t <- as.data.frame(matrix(c(4,3,4,3,1,1,3,3,1,3,1,1),nrow=2))
# Collect the data as a list right away
items <- list(l, t)
Now you can sapply() directly over the elements in the list.
sapply(items, function(v) {
kripp.alpha(as.matrix(v[c("V1","V2")]))$value
})
which produces
[1] 0.0 -0.5
RStudio Version 1.0.143
Windows Ver: Windows10 Pro
I have 300+ files which has the same struction, and I want to create a loop, so it can calculate the correlation index of the required files. I can get the right files and calculate the correlation index, but I can't get them all showed as a result. I tried to save them to a vector, but it tells me "the object not found". and if it can work, I also worried about whether the content of the vector will stay if I run the function for several times. Here's the loop:
for(i in ind_larg){
+ specdata_i <- read.csv(i)
+ com_case_ind <- complete.cases(specdata_i)
+ sulfate_i <- specdata_i[,2][com_case_ind]
+ nitrate_i <- specdata_i[,3][com_case_ind]
+ ou[i] <- cor(sulfate_i, nitrate_i)
+ }
and the result
Error: object 'ou' not found
I'm not sure if you need the rest of the code before this, so I attach them at the end here.
> setwd("C:/Users/sunxi/Coursera/specdata")
> ind <- dir(path = "C:/Users/sunxi/Coursera/specdata", pattern = ".csv") #Save the index of the files to a vector.
> specdata_ful <- lapply(ind, read.csv) #combine all the files to a data frame.
> specdat_recon_ful <- do.call(rbind, specdata_ful) #Reconstruct the data frame to put the same variable in one column.
> com_case_ful <- complete.cases(specdat_recon_ful) #Filter the complete cases.
> id_ful <- specdat_recon_ful[,4][com_case_ful] #The ID of the complete cases.
> sulfate_ful <- specdat_recon_ful[,2][com_case_ful] #The sulfate value of the complete cases.
> nitrate_ful <- specdat_recon_ful[,3][com_case_ful] #The nitrate value of the complete cases.
> id_freq_ful <- table(id_ful) #Summary the frequency in each id
> id_freq_mat_ful <- as.data.frame(id_freq_ful) #transfer the table into the data.frame.
> good <- id_freq_mat_ful[["Freq"]] > 1000 #Filter the freqency larger than threshold.
> id_good <- id_freq_mat_ful[["id_ful"]][good] #Filter the id has the frequency of complete cases larger than the threshold.
> ind_larg <- ind[id_good] #Create an index for the id has required requency.
You have to create the variable ou before you access it with ou[i]:
ou <- c()
for(i in ind_larg){
# your loop here...
ou[i] <- cor(sulfate_i, nitrate_i)
}
Thanks in advance, and sorry if this question has been answered previously - I have looked pretty extensively. I have a dataset containing a row of with concatenated information, specifically: name,color code,some function expression. For example, one value may be:
cost#FF0033#log(x)+6.
I have all of the code to extract the information, and I end up with a vector of expressions that I would like to convert to a list of actual functions.
For example:
func.list <- list()
test.func <- c("x","x+1","x+2","x+3","x+4")
where test.func is the vector of expressions. What I would like is:
func.list[[3]]
To be equivalent to
function(x){x+3}
I know that I can create a function using:
somefunc <- function(x){eval(parse(text="x+1"))}
to convert a character value into a function. The problem comes when I try and loop through to make multiple functions. For an example of something I tried that didn't work:
for(i in 1:length(test.func)){
temp <- test.func[i]
f <- assign(function(x){eval(expr=parse(text=temp))})
func.list[[i]] <- f
}
Based on another post (http://stats.stackexchange.com/questions/3836/how-to-create-a-vector-of-functions) I also tried this:
makefunc <- function(y){y;function(x){y}}
for(i in 1:length(test.func)){
func.list[[i]] <- assign(x=paste("f",i,sep=""),value=makefunc(eval(parse(text=test.func[i]))))
}
Which gives the following error: Error in eval(expr, envir, enclos) : object 'x' not found
The eventual goal is to take the list of functions and apply the jth function to the jth column of the data.frame, so that the user of the script can specify how to normalize each column within the concatenated information given by the column header.
Maybe initialize your list with a single generic function, and then update them using:
foo <- function(x){x+3}
> body(foo) <- quote(x+4)
> foo
function (x)
x + 4
More specifically, starting from a character, you'd probably do something like:
body(foo) <- parse(text = "x+5")
Just to add onto joran's answer, this is what finally worked:
test.data <- matrix(data=rep(1,25),5,5)
test.data <- data.frame(test.data)
test.func <- c("x","x+1","x+2","x+3","x+4")
func.list <- list()
for(i in 1:length(test.func)){
func.list[[i]] <- function(x){}
body(func.list[[i]]) <- parse(text=test.func[i])
}
processed <- mapply(do.call,func.list,lapply(test.data,list))
Thanks again, joran.
This is what I do:
f <- list(identity="x",plus1 = "x+1", square= "x^2")
funCreator <- function(snippet){
txt <- snippet
function(x){
exprs <- parse(text = txt)
eval(exprs)
}
}
listOfFunctions <- lapply(setNames(f,names(f)),function(x){funCreator(x)}) # I like to have some control of the names of the functions
listOfFunctions[[1]] # try to see what the actual function looks like?
library(pryr)
unenclose(listOfFunctions[[3]]) # good way to see the actual function http://adv-r.had.co.nz/Functional-programming.html
# Call your funcions
listOfFunctions[[2]](3) # 3+1 = 4
do.call(listOfFunctions[[3]],list(3)) # 3^2 = 9
attach(listOfFunctions) # you can also attach your list of functions and call them by name
square(3) # 3^2 = 9
identity(7) # 7 ## masked object identity, better detach it now!
detach(listOfFunctions)