Create a data frame with date as column names - r

I would like to construct a data table in R that has columns as dates and rows as times (without date info). Basically I have a table in the form:
Time 21.04.15 22.04.15 24.04.15 03.05.15
00:00 0.4 0.4 0.4 0.4
01:00 0.4 0.4 0.4 0.4
02:00 0.4 0.4 0.4 0.4
03:00 0.6 0.6 0.6 0.6
04:00 0.6 0.6 0.6 0.6
05:00 0.7 0.8 0.8 0.8
06:00 0.7 0.8 0.8 0.8
07:00 0.7 0.8 0.8 0.8
...
I would like to address (plot, extract) the columns by date and elements by date and time.
Is this possible?

The best you can do is rename them character strings that represent dates, but I don't think the names themselves can be a Date object. (I'll admit, I've never tried, and I'm not going to experiment with it because doing so seems like a really bad idea).
Assuming your current column names are in dd.mm.yy format, run
names(df_object) <- format(as.Date(names(df_object), format = "%d.%m.%y"),
format = "%Y-%m-%d")
But like those in the comments, while this will work, I have a hard time imagining circumstances where it is beneficial.

Related

Selecting rows with time in R

I have a data frame that looks like this:
Subject Time Freq1 Freq2 ...
A 6:20 0.6 0.1
A 6:30 0.1 0.5
A 6:40 0.6 0.1
A 6:50 0.6 0.1
A 7:00 0.3 0.4
A 7:10 0.1 0.5
A 7:20 0.1 0.5
B 6:00 ... ...
I need to delete the rows in the time range it is not from 7:00 to 7:30.So in this case, all the 6:00, 6:10, 6:20...
I have tried creating a data frame with just the times I want to keep but I does not seem to recognize the times as a number nor as a name. And I get the same error when trying to directly remove the ones I don't need. It is probably quite simple but I haven't found any solution.
Any suggestions?
We can convert the time column to a Period class under the package lubridate and then filter the data frame based on that column.
library(dplyr)
library(lubridate)
dat2 <- dat %>%
mutate(HM = hm(Time)) %>%
filter(HM < hm("7:00") | HM > hm("7:30")) %>%
select(-HM)
dat2
# Subject Time Freq1 Freq2
# 1 A 6:20 0.6 0.1
# 2 A 6:30 0.1 0.5
# 3 A 6:40 0.6 0.1
# 4 A 6:50 0.6 0.1
# 5 B 6:00 NA NA
DATA
dat <- read.table(text = "Subject Time Freq1 Freq2
A '6:20' 0.6 0.1
A '6:30' 0.1 0.5
A '6:40' 0.6 0.1
A '6:50' 0.6 0.1
A '7:00' 0.3 0.4
A '7:10' 0.1 0.5
A '7:20' 0.1 0.5
B '6:00' NA NA",
header = TRUE)

How to meta analyze p values of different observations

I am trying to meta analyze p values from different studies. I have data frame
DF1
p-value1 p-value2 pvalue3 m
0.1 0.2 0.3 a
0.2 0.3 0.4 b
0.3 0.4 0.5 c
0.4 0.4 0.5 a
0.6 0.7 0.9 b
0.6 0.7 0.3 c
I am trying to get fourth column of meta analyzed p-values1 to p-value3.
I tried to use metap package
p<–rbind(DF1$p-value1,DF1$p-value2,DF1$p-value3)
pv–split (p,p$m)
library(metap)
for (i in 1:length(pv))
{pvalue <- sumlog(pv[[i]]$pvalue)}
But it results in one p value. Thank you for any help.
You can try
apply(DF1[,1:3], 1, sumlog)

dynamic column names in data.table correlation

I've combined the outputs for each user and item (for a recommendation system) into this all x all R data.table. For each row in this table, I need to calculate the correlation between user scores 1,2,3 & item scores 1,2,3 (e.g. for the first row what is the correlation between 0.5,0.6,-0.2 and 0.2,0.8,-0.3) to see how well the user and the item match.
user item user_score_1 user_score_2 user_score_3 item_score_1 item_score_2 item_score_3
A 1 0.5 0.6 -0.2 0.2 0.8 -0.3
A 2 0.5 0.6 -0.2 0.4 0.1 -0.8
A 3 0.5 0.6 -0.2 -0.2 -0.4 -0.1
B 1 -0.6 -0.1 0.9 0.2 0.8 -0.3
B 2 -0.6 -0.1 0.9 0.4 0.1 -0.8
B 3 -0.6 -0.1 0.9 -0.2 -0.4 -0.1
I have a solution that works - which is:
scoresDT[, cor(c(user_score_1,user_score_2,user_score_3), c(item_score_1,item_score_2,item_score_3)), by= .(user, item)]
...where scoresDT is my data.table.
This is all well and good, and it works...but I can't get it to work with dynamic variables instead of hard coding in the variable names.
Normally in a data.frame I could create a list and just input that, but as it's character format, the data.table doesn't like it. I've tried using a list with "with=FALSE" and have had some success when trying basic subsetting of the data.table but not with the correlation syntax that I need...
Any help is much, much appreciated!
Thanks,
Andrew
Here's what I would do:
mDT = melt(scoresDT,
id.vars = c("user","item"),
measure.vars = patterns("item_score_", "user_score_"),
value.name = c("item_score", "user_score")
)
mDT[, cor(item_score, user_score), by=.(user,item)]
user item V1
1: A 1 0.8955742
2: A 2 0.9367659
3: A 3 -0.8260332
4: B 1 -0.6141324
5: B 2 -0.9958706
6: B 3 0.5000000
I'd keep the data in its molten/long form, which fits more naturally with R and data.table functionality.

horizontal correlation across variables in R data frame

I want to calculation a correlation score between two sets of numbers, but these numbers are within each row
The background is that I'm compiling a recommender system, using PCA to give me scores for each user and each item to each derived feature (1,2,3 in this case)
user item user_score_1 user_score_2 user_score_3 item_score_1 item_score_2 item_score_3
A 1 0.5 0.6 -0.2 0.2 0.8 -0.3
A 2 0.5 0.6 -0.2 0.4 0.1 -0.8
A 3 0.5 0.6 -0.2 -0.2 -0.4 -0.1
B 1 -0.6 -0.1 0.9 0.2 0.8 -0.3
B 2 -0.6 -0.1 0.9 0.4 0.1 -0.8
B 3 -0.6 -0.1 0.9 -0.2 -0.4 -0.1
I've combined the outputs for each user and item into this all x all table. For each row in this table, I need to calculate the correlation between user scores 1,2,3 & item scores 1,2,3 (e.g. for the first row what is the correlation between 0.5,0.6,-0.2 and 0.2,0.8,-0.3) to see how well the user and the item match.
The other alternative would be to the correlation before I join the users & items into an all x all dataset, but I'm not sure how to do that best either.
I don't think I can transpose the table as in reality the users and items total is very large.
Any thoughts on a good approach?
Thanks,
Andrew

rollapply function on specific column of dataframes within list

I must admit to complete lunacy when trying to understand how functions within functions are defined and passed in R. The examples always presume you understand every nuance and don't provide descriptions of the process. I have yet to come across a plain English, idiots guide break down of the process. So the first question is do you know of one?
Now my physical problem.
I have a list of data.frames: fileData.
I want to use the rollapply() function on specific columns in each data.frame. I then want all the results(lists) combined. So starting with one of the data.frames using the built in mtcars dataframes as an example:
Of course I need to tell rollapply() to use the function PPI() along with the associated parameters which are the columns.
PPI <- function(a, b){
value = (a + b)
PPI = sum(value)
return(PPI)
}
I tried this:
f <- function(x) PPI(x$mpg, x$disp)
fileData<- list(mtcars, mtcars, mtcars)
df <- fileData[[1]]
and got stopped at
rollapply(df, 20, f)
Error in x$mpg : $ operator is invalid for atomic vectors
I think this is related to Zoo using matrices but other numerous attempts couldn't resolve the rollapply issue. So moving onto what I believe is next:
lapply(fileData, function(x) rollapply ......
Seems a mile away. Some guidance and solutions would be very welcome.
Thanks.
I will Try to help you and show how you can debug the problem. One trick that is very helpful in R is to learn how to debug. Gnerelly I am using browser function.
problem :
Here I am changing you function f by adding one line :
f <- function(x) {
browser()
PPI(x$changeFactor_A, x$changeFactor_B)
}
Now when you run :
rollapply(df, 1, f)
The debugger stops and you can inspect the value of the argument x:
Browse[1]> x
[1,]
1e+05
as you see is a scalar value , so you can't apply the $ operator on it, hence you get the error:
Error in x$changeFactor_A : $ operator is invalid for atomic vectors
general guides
Now I will explain how you should do this.
Either you change your PPI function, to have a single parameter excees: so you do the subtraction outside of it (easier)
Or you use mapply to get a generalized solution. (Harder but more general and very useful)
Avoid using $ within functions. Personally, I use it only on the R console.
complete solution:
I assume that you data.frames(zoo objects) have changeFactor_A and changeFactor_B columns.
sapply(fileData,function(dat){
dat <- transform(dat,excess= changeFactor_A-changeFactor_B)
rollapply(dat[,'excess'],2,sum)
}
Or More generally :
sapply(fileData,function(dat){
excess <- get_excess(dat,'changeFactor_A','changeFactor_B')
rollapply(excess,2,sum)
}
Where
get_excess <-
function(data,colA,colB){
### do whatever you want here
### return a vector
excess
}
Look at the "Usage" section of the help page to ?rollapply. I'll admit that R help pages are not easy to parse, and I see how you got confused.
The problem is that rollapply can deal with ts, zoo or general numeric vectors, but only a single series. You are feeding it a function that takes two arguments, asset and benchmark. Granted, your f and PPI can trivially be vectorized, but rollapply simply isn't made for that.
Solution: calculate your excess outside rollapply (excess is easily vectorially calculated, and it does not involve any rolling calculations), and only then rollapply your function to it:
> mtcars$excess <- mtcars$mpg-mtcars$disp
> rollapply(mtcars$excess, 3, sum)
[1] -363.2 -460.8 -663.1 -784.8 -893.9 ...
You may possibly be interested in mapply, which vectorizes a function for multiple arguments, similarly to apply and friends, which work on single arguments. However, I know of no analogue of mapply with rolling windows.
I sweated away and took some time to slowly understand how to break down the process and protocol of calling a function with arguments from another function. A great site that helped was Advanced R from the one and only Hadley Wickham, again! The pictures showing the process breakdown are near ideal. Although I still needed my thinking cap on for a few details.
Here is a complete example with notes. Hopefully someone else finds it useful.
library(zoo)
#Create a list of dataframes for the example.
listOfDataFrames<- list(mtcars, mtcars, mtcars)
#Give each element a name.
names(listOfDataFrames) <- c("A", "B", "C")
#This is a simple function just for the example!
#I want to perform this function on column 'col' of matrix 'm'.
#Of course to make the whole task worthwhile, this function is usually something more complex.
fApplyFunction <- function(m,col){
mean(m[,col])
}
#This function is called from lapply() and does 'something' to the dataframe that is passed.
#I created this function to keep lapply() very simply.
#The something is to apply the function fApplyFunction(), wich requires an argument 'thisCol'.
fOnEachElement <- function(thisDF, thisCol){
#Convert to matrix for zoo library.
thisMatrix <- as.matrix(thisDF)
rollapply(thisMatrix, 5, fApplyFunction, thisCol, partial = FALSE, by.column = FALSE)
}
#This is where the program really starts!
#
#Apply a function to each element of list.
#The list is 'fileData', with each element being a dataframe.
#The function to apply to each element is 'fOnEachElement'
#The additional argument for 'fOnEachElement' is "vs", which is the name of the column I want the function performed on.
#lapply() returns each result as an element of a list.
listResults <- lapply(listOfDataFrames, fOnEachElement, "vs")
#Combine all elements of the list into one dataframe.
combinedResults <- do.call(cbind, listResults)
#Now that I understand the argument passing, I could call rollapply() directly from lapply()...
#Note that ONLY the additional arguments of rollapply() are passed. The primary argurment is passed automatically by lapply().
listResults2 <- lapply(listOfDataFrames, rollapply, 5, fApplyFunction, "vs", partial = FALSE, by.column = FALSE)
Results:
> combinedResults
A B C
[1,] 0.4 0.4 0.4
[2,] 0.6 0.6 0.6
[3,] 0.6 0.6 0.6
[4,] 0.6 0.6 0.6
[5,] 0.6 0.6 0.6
[6,] 0.8 0.8 0.8
[7,] 0.8 0.8 0.8
[8,] 0.8 0.8 0.8
[9,] 0.6 0.6 0.6
[10,] 0.4 0.4 0.4
[11,] 0.2 0.2 0.2
[12,] 0.0 0.0 0.0
[13,] 0.0 0.0 0.0
[14,] 0.2 0.2 0.2
[15,] 0.4 0.4 0.4
[16,] 0.6 0.6 0.6
[17,] 0.8 0.8 0.8
[18,] 0.8 0.8 0.8
[19,] 0.6 0.6 0.6
[20,] 0.4 0.4 0.4
[21,] 0.2 0.2 0.2
[22,] 0.2 0.2 0.2
[23,] 0.2 0.2 0.2
[24,] 0.4 0.4 0.4
[25,] 0.4 0.4 0.4
[26,] 0.4 0.4 0.4
[27,] 0.2 0.2 0.2
[28,] 0.4 0.4 0.4
> listResults
$A
[1] 0.4 0.6 0.6 0.6 0.6 0.8 0.8 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 0.8 0.6
[20] 0.4 0.2 0.2 0.2 0.4 0.4 0.4 0.2 0.4
$B
[1] 0.4 0.6 0.6 0.6 0.6 0.8 0.8 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 0.8 0.6
[20] 0.4 0.2 0.2 0.2 0.4 0.4 0.4 0.2 0.4
$C
[1] 0.4 0.6 0.6 0.6 0.6 0.8 0.8 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 0.8 0.6
[20] 0.4 0.2 0.2 0.2 0.4 0.4 0.4 0.2 0.4
> listResults2
$A
[1] 0.4 0.6 0.6 0.6 0.6 0.8 0.8 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 0.8 0.6
[20] 0.4 0.2 0.2 0.2 0.4 0.4 0.4 0.2 0.4
$B
[1] 0.4 0.6 0.6 0.6 0.6 0.8 0.8 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 0.8 0.6
[20] 0.4 0.2 0.2 0.2 0.4 0.4 0.4 0.2 0.4
$C
[1] 0.4 0.6 0.6 0.6 0.6 0.8 0.8 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 0.8 0.6
[20] 0.4 0.2 0.2 0.2 0.4 0.4 0.4 0.2 0.4

Resources