I am an R newbie, trying to do simple things.
I wanted to examine the running correlation between two time series (two CSV files).
Below is my code, after loading the gtools package:
v1<-read.csv("var1.csv", header = FALSE)
v2<-read.csv("var2.csv", header = FALSE)
running(v1,v2,fun=cor, width=5)
I receive the following error message:
named list()
Then I try again by assigning first a variable:
p1<-running(v1,v2,fun=cor, width=5)
plot(p1)
I receive the following error message:
Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' is a list, but
does not have components 'x' and 'y'
What am I missing?
How can I create a plot that shows the running correlation and the line that represents the 95% confidence interval?
Thanks!
v1 and v2 are as follows:
v1 = structure(list(var1 = c(-0.888829723, -0.638363898, -0.820331055, -0.711637919, -3.631666745, 0.528082315, -0.888551728, 3.670203445, -0.406498322, 1.185030346, 1.427746793, -0.393369446, 2.905055593, -0.401353407, -0.563123881, 1.140042632, 7.078661195, 2.556181809, 0.888551728, -3.670203445, 0.406498322, -1.185030346, -1.427746793, 0.393369446, -2.016225871, 1.039717305, 1.383454936, -0.428404714, -3.44699445, -3.084264124)), .Names = "var1", class = "data.frame", row.names = c(NA, -30L))
v2 = structure(list(var2 = c(0.008871463, -0.218818955, 1.055065334, 1.353131909, -1.021284981, -2.153524661, 1.825212612, 0.460388983, 1.48721711, -1.78249802, 0.46047233, -0.894777526, -0.852226438, 0.136373161, -0.248409748, -0.411183561, 0.912205699, -1.856740048, -1.825212612, -0.460388983, -1.48721711, 1.78249802, -0.46047233, 0.894777526, 0.843354976, 0.082445794, -0.806655586, -0.941948347, 0.109079282, 4.010264709)), .Names = "var2", class = "data.frame", row.names = c(NA, -30L))
Let's say you want to run 5 year rolling correlation and your given period is 1988-2017 (30 years):
v1<-read.csv("var1.csv", header = FALSE)
v2<-read.csv("var2.csv", header = FALSE)
v1 = structure(list(var1 = c(-0.888829723, -0.638363898, -0.820331055, -0.711637919, -3.631666745, 0.528082315, -0.888551728, 3.670203445, -0.406498322, 1.185030346, 1.427746793, -0.393369446, 2.905055593, -0.401353407, -0.563123881, 1.140042632, 7.078661195, 2.556181809, 0.888551728, -3.670203445, 0.406498322, -1.185030346, -1.427746793, 0.393369446, -2.016225871, 1.039717305, 1.383454936, -0.428404714, -3.44699445, -3.084264124)), .Names = "var1", class = "data.frame", row.names = c(NA, -30L))
v1 = as.vector(v1$var1)
v2 = structure(list(var2 = c(0.008871463, -0.218818955, 1.055065334, 1.353131909, -1.021284981, -2.153524661, 1.825212612, 0.460388983, 1.48721711, -1.78249802, 0.46047233, -0.894777526, -0.852226438, 0.136373161, -0.248409748, -0.411183561, 0.912205699, -1.856740048, -1.825212612, -0.460388983, -1.48721711, 1.78249802, -0.46047233, 0.894777526, 0.843354976, 0.082445794, -0.806655586, -0.941948347, 0.109079282, 4.010264709)), .Names = "var2", class = "data.frame", row.names = c(NA, -30L))
v2 = as.vector(v2$var2)
rc <- running(v1, v2, fun = cor, width = 5)
length(rc)
plot((2017-length(rc) + 1):2017, rc, type="l")
This should give you the rolling correlation plot.
Related
I am trying to recreate this plot but I am having an issue with ggplot not liking the negative numbers in the data frame by the looks of the error message? Error: colours encodes as numbers must be positive. Does anyone know what its issue is? These are very large data frames but I wouldn't have thought that would have been an issue?
## Load packages
library(tidyverse)
require(data.table)
## Read in data frames
m1<-fread("m1.csv", header = F)
m2<-fread("m2.csv", header = F)
L<-fread("l.csv", header = F)
LP<-fread("LP.csv", header = F)
## Get rate by taking m1 from m2
rate<-m1[1,]-m2[1,] ### subtract p1 rate from p2
## Transpose the data frame
t_rate <- transpose(rate)
## Create row ID's to merge data frames
L$row_num <- seq.int(nrow(L))
t_rate$row_num <- seq.int(nrow(t_rate))
all<-merge(L, t_rate, by = "row_num") ## merge the dataframes based on their ID
## Get rid of ID now we don't need it
all$row_num=NULL
## Plot the graph
ggplot(all,x=all$V1.x,y=all$V2,col=all$V1.y)+
geom_point(data=all,x=all$V1.x,y=all$V2,col=all$V1.y,size=0.1)+
geom_point(data=LP,x=LP$V1,y=LP$V2,size=1)
### Data (all)
structure(list(V1.x = c(163.75, 164.25, 164.75, 165.25, 165.75,
166.25), V2 = c(-75.25, -75.25, -75.25, -75.25, -75.25, -75.25
), V1.y = c(1.55995, 1.56093, 1.56237, 1.56545, 1.56764, 1.56827
)), class = c("data.table", "data.frame"), row.names = c(NA,
-6L), .internal.selfref = <pointer: 0x7f9bd4811ae0>)
## Data (LP)
structure(list(V1 = c(169.7, 147.93, 150.01, 146.71, 147.31,
-63.26), V2 = c(-46.47, -42.344, -36.59, -38.64, -43.3, 44.739
)), row.names = c(NA, -6L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x7f9bd4811ae0>)
The issue is that you did not map on aesthetics but instead pass vectors to arguments. When doing so you have to pass color names or codes or a positive number to the color argument.
But to fix your issue you could simply map on aesthetics like so:
library(ggplot2)
ggplot(all, aes(x = V1.x, y = V2)) +
geom_point(aes(color = V1.y), size = 0.1) +
geom_point(data = LP, aes(x = V1, y = V2), size = 1)
Let's say I have 2 different functions to apply. For example, these functions are max and min . After applying bunch of functions I am getting outputs below. I want to assign a function to each output.
Here is my data and its structure.
data<-structure(list(Apr = structure(list(`a1` = structure(list(
date = c("04-01-2036", "04-02-2036", "04-03-2036"), value = c(0,
3.13, 20.64)), .Names = c("date", "value"), row.names = 92:94, class = "data.frame"),
`a2` = structure(list(date = c("04-01-2037", "04-02-2037",
"04-03-2037"), value = c(5.32, 82.47, 15.56)), .Names = c("date",
"value"), row.names = 457:459, class = "data.frame")), .Names = c("a1",
"a2")), Dec = structure(list(`d1` = structure(list(
date = c("12-01-2039", "12-02-2039", "12-03-2039"), value = c(3,
0, 11)), .Names = c("date", "value"), row.names = 1431:1433, class = "data.frame"),
`d2` = structure(list(date = c("12-01-2064", "12-02-2064",
"12-03-2064"), value = c(0, 5, 0)), .Names = c("date", "value"
), row.names = 10563:10565, class = "data.frame")), .Names = c("d1",
"d2"))), .Names = c("Apr", "Dec"))
I applied these functions:
drop<-function(y){
lapply(y, function(x)(x[!(names(x) %in% c("date"))]))
}
q1<-lapply(data, drop)
q2<-lapply(q1, function(x) unlist(x,recursive = FALSE))
daily_max<-lapply(q2, function(x) lapply(x, max))
dailymax <- data.frame(matrix(unlist(daily_max), nrow=length(daily_max), byrow=TRUE))
row.names(dailymax)<-names(daily_max)
max_value <- apply(dailymax, 1, which.max)
And I'm getting
Apr Dec
2 1
And I am applying any random function to both Apr[2] and Dec[1] like:
Map(function(x, y) sum(x[[y]]), q2, max_value)
So, the function will be executed considering the outputs (to Apr's second element which is a1, Dec's first element which is a2.) As you can see, there are outputs as numbers 1 and 2.
What I want
What I want is assigning specific functions to 1 and 2. If output is 1 then max function; if it is 2, min function will be executed. In conclusion, max function will be applied to Apr[2] and min function will be applied to Dec[1].
I will get this:
min(q2$Apr$a2.value)
[1] 5.32
max(q2$Dec$d2.value)
[1] 5
How can I achieve this automatically for all my functions?
You can take help of switch here to apply a function based on number in max_value.
apply_function <- function(x, num) switch(num, `1` = max, `2` = min)(x)
Map(function(x, y) apply_function(x[[y]], y), q2, max_value)
#$Apr
#[1] 5.32
#$Dec
#[1] 11
Map returns a list if you want a vector output use mapply.
I have a functions which yields 2 dataframes. As functions can only return one object, I combined these dataframes as a list. However, I need to work with both dataframes separately. Is there a way to automatically split the list into the component dataframes, or to write the function in a way that both objects are returned separately?
The function:
install.packages("plyr")
require(plyr)
fun.docmerge <- function(x, y, z, crit, typ, doc = checkmerge) {
mergedat <- paste(deparse(substitute(x)), "+",
deparse(substitute(y)), "=", z)
countdat <- nrow(x)
check_t1 <- data.frame(mergedat, countdat)
z1 <- join(x, y, by = crit, type = typ)
countdat <- nrow(z1)
check_t2 <- data.frame(mergedat, countdat)
doc <- rbind(doc, check_t1, check_t2)
t1<-list()
t1[["checkmerge"]]<-doc
t1[[z]]<-z1
return(t1)
}
This is the call to the function, saving the result list to the new object results.
results <- fun.docmerge(x = df1, y = df2, z = "df3", crit = c("id"), typ = "left")
In the following sample data to replicate the problem:
df1 <- structure(list(id = c("XXX1", "XXX2", "XXX3",
"XXX4"), tr.isincode = c("ISIN1", "ISIN2",
"ISIN3", "ISIN4")), .Names = c("id", "isin"
), row.names = c(NA, 4L), class = "data.frame")
df2 <- structure(list(id= c("XXX1", "XXX5"), wrong= c(1L,
1L)), .Names = c("id", "wrong"), row.names = 1:2, class = "data.frame")
checkmerge <- structure(list(mergedat = structure(integer(0), .Label = character(0), class = "factor"),
countdat = numeric(0)), .Names = c("mergedat", "countdat"
), row.names = integer(0), class = "data.frame")
In the example, a list with the dataframes df3 and checkmerge are returned. I would need both dataframes separately. I know that I could do it via manual assignment (e.g., checkmerge <- results$checkmerge) but I want to eliminate manual changes as much as possible and am therefore looking for an automated way.
I have a large dataset I use dplyr() summarize to generate some means.
Occasionally, I would like to perform arithmetic on that output.
For example, I would like to get the mean of means from the output below, say "m.biomass".
I've tried this mean(data.sum[,7]) and this mean(as.list(data.sum[,7])). Is there a quick and easy way to achieve this?
data.sum <-structure(list(scenario = c("future", "future", "future", "future"
), state = c("fl", "ga", "ok", "va"), m.soc = c(4090.31654013689,
3654.45350562628, 2564.33199749487, 4193.83388887064), m.npp = c(1032.244475,
821.319385, 753.401315, 636.885535), sd.soc = c(56.0344229400332,
97.8553643582118, 68.2248389927858, 79.0739969429246), sd.npp = c(34.9421782033153,
27.6443555578531, 26.0728757486901, 24.0375040705595), m.biomass = c(5322.76631158111,
3936.79457763176, 3591.0902359206, 2888.25308402464), sd.m.biomass = c(3026.59250918009,
2799.40317348016, 2515.10516340438, 2273.45510178843), max.biomass = c(9592.9303,
8105.109, 7272.4896, 6439.2259), time = c("1980-1999", "1980-1999",
"1980-1999", "1980-1999")), .Names = c("scenario", "state", "m.soc",
"m.npp", "sd.soc", "sd.npp", "m.biomass", "sd.m.biomass", "max.biomass",
"time"), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4), vars = list(quote(scenario)), labels = structure(list(
scenario = "future"), class = "data.frame", row.names = c(NA,
-1), vars = list(quote(scenario)), drop = TRUE, .Names = "scenario"), indices = list(0:3))
We can use [[ to extract the column as a vector; as mean only works on a vector or a matrix -- not on a data.frame. If the OP wanted to do this on a single column, use this:
mean(data.sum[[7]])
#[1] 3934.726
If there was only the data.frame class, the data.sum[,7] would be extracting it as a vector, but the tbl_df prevents it to collapse it to vector
For multiple columns, the dplyr also has specialised functions
data.sum %>%
summarise_each(funs(mean), 3:7)
I have a dataframe (test) in R. Inside one of the columns contains coordinates in this list structure:
> dput(test$coordinates)
list(structure(list(x = c(-1.294832, -1.294883, -1.294262,
-1.249478), y = c(54.61024, 54.61008, 54.610016, 54.610006
)), .Names = c("x", "y"), row.names = c(NA, -284L), class = c("tbl_df",
"tbl", "data.frame")))
I've reduced the number of coordinates for clarity.
Ultimately I wish to convert the dataframe into a spaitial lines dataframe but to do that I need the test$coordinates in a lines form. However, I get the following error
> lines(test$coordinates)
Error in xy.coords(x, y) :
'x' is a list, but does not have components 'x' and 'y'
I have tried to convert the test$coordinates to other forms but it usually results in some error. How do I transform this list into a line?
Extra info this is a follow up question to
Convert data frame to spatial lines data frame in R with x,y x,y coordintates
UPDATE as requested dput(head(test)):
> dput(head(test))
structure(list(rid = 1, start_id = 1L, start_code = "E02002536",
end_id = 106L, end_code = "E02006909", strategy = "fastest",
distance = 12655L, time_seconds = 2921L, calories = 211L,
document.id = 1L, array.index = 1L, start = "Geranium Close",
finish = "Hylton Road", startBearing = 0, startSpeed = 0,
start_longitude = -1.294832, start_latitude = 54.610241,
finish_longitude = -1.249478, finish_latitude = 54.680691,
crow_fly_distance = 8362, event = "depart", whence = 1473171787,
speed = 20, itinerary = 419956, clientRouteId = 0, plan = "fastest",
note = "", length = 12655, time = 2921, busynance = 42172,
quietness = 30, signalledJunctions = 3, signalledCrossings = 2,
west = -1.300074, south = 54.610006, east = -1.232447, north = 54.683814,
name = "Geranium Close to Hylton Road", walk = 0, leaving = "2016-09-06 15:23:07",
arriving = "2016-09-06 16:11:48", grammesCO2saved = 2359,
calories2 = 211, type = "route", coordinates = list(structure(list(
x = c(-1.294832, -1.294883, -1.294262, -1.294141, -1.29371,
-1.293726, -1.293742, -1.29351, -1.293368, -1.292816,
-1.248019, -1.249478), y = c(54.61024, 54.61008, 54.610016,
54.610006, 54.610038, 54.610142, 54.610247, 54.610262,
54.681238, 54.680975, 54.680601, 54.680404
)), .Names = c("x", "y"), row.names = c(NA, -284L), class = c("tbl_df",
"tbl", "data.frame")))), .Names = c("rid", "start_id", "start_code",
"end_id", "end_code", "strategy", "distance", "time_seconds",
"calories", "document.id", "array.index", "start", "finish",
"startBearing", "startSpeed", "start_longitude", "start_latitude",
"finish_longitude", "finish_latitude", "crow_fly_distance", "event",
"whence", "speed", "itinerary", "clientRouteId", "plan", "note",
"length", "time", "busynance", "quietness", "signalledJunctions",
"signalledCrossings", "west", "south", "east", "north", "name",
"walk", "leaving", "arriving", "grammesCO2saved", "calories2",
"type", "coordinates"), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"))
lines is a plotting function. I'm assuming you want sp::SpatialLines. See ?"SpatialLines-class" for how to construct such an object.
Here's for your case, provided you don't have a "corrupt" data.frame (see at the bottom of this post).
library(sp)
coords <- as.data.frame(xy$coordinates[[1]])[1:12, ]
out <- SpatialLines(list(Lines(list(Line(coords)), ID = 1)))
An object of class "SpatialLines"
Slot "lines":
[[1]]
An object of class "Lines"
Slot "Lines":
[[1]]
An object of class "Line"
Slot "coords":
x y
1 -1.294832 54.61024
2 -1.294883 54.61008
3 -1.294262 54.61002
4 -1.294141 54.61001
5 -1.293710 54.61004
6 -1.293726 54.61014
7 -1.293742 54.61025
8 -1.293510 54.61026
9 -1.293368 54.68124
10 -1.292816 54.68097
11 -1.248019 54.68060
12 -1.249478 54.68040
Slot "ID":
[1] "1"
Slot "bbox":
min max
x -1.294883 -1.248019
y 54.610006 54.681238
Slot "proj4string":
CRS arguments: NA
To add data to this object, you should use
SpatialLinesDataFrame(out, data = yourdata)
but see this example for more info.
There's a warning when I tried to coerce your coordinates to a data.frame. Hopefully this isnt' the case for your dataset.
> as.data.frame(xy$coordinates[[1]])
x y
1 -1.294832 54.61024
2 -1.294883 54.61008
3 -1.294262 54.61002
...
281 <NA> <NA>
282 <NA> <NA>
283 <NA> <NA>
284 <NA> <NA>
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
corrupt data frame: columns will be truncated or padded with NAs