Updating old column entries from new data frame - r

I am working on a problem. Here is an idea of what the original 60k row data frame looks like.
dataOne <- data.frame(
marketVal = c(NA, 543534, NA, 115435, NA),
bathrooms = c(3,3,2,3,5),
garageSqFt = c(400, 385, 454, 534, 210),
totalSqFT = c(NA, NA, 1231, 2232, 4564),
units = c(1, 1, 1, 1, 1),
subDivId = c("112", "111", "111", "111", "112"),
ID = c(4,56,67,94,130) )
Some of the NA's for market value have been retrieved and stored in a new
data frame that looks like so:
dataTwo <- data.frame(
marketVal = c(123123,234234),
IDTwo = c(4,67) )
str(dataTwo)
dataOne$marketVal <- dataTwo$marketVal[match(dataTwo$ID, dataOne$ID)]
comparing ID's from both data frames I am attempting to replace the NA's in the first data frame with the market values in the second data frame. I've tried the match function as follows:
dataOne$marketValue <- dataTwo$marketValue[match(dataOne$ID, dataTwo$ID)]
but recieve an error "replacement has 2 rows, data has 5 calls". I fugured the fact these two data frames not being the same size wouldn't matter as we are only comparing the ID's found in either. How can I accomplish this efficiently considering around 4500 NA's need to be updated?

Your method isn't working because it is producing a vector with 5 values: 1 NA 2 NA NA which is longer than your dataTwo dataframe. Drop the NA values and your method would work.
This is how I would do it:
rowMatch <- which(dataOne$ID %in% dataTwo$ID)
dataOne[rowMatch, ]$marketVal <- dataTwo$marketVal
(Please note your ID variables were actually IDOne and IDTwo respectively in the example you'd provided.)

You can use merge
require(tidyverse)
new <- merge(dataOne, dataTwo, by.x = 'ID', by.y = 'IDTwo', all.x = T)
new$marketVal <- new %$% coalesce(marketVal.x, marketVal.y)

We could use safe_left_join from my package safejoin, and "patch"
the matches from the rhs into the lhs when columns conflict.
# devtools::install_github("moodymudskipper/safejoin")
library(safejoin)
library(dplyr)
dataOne <- data.frame(
marketVal = c(NA, 543534, NA, 115435, NA),
bathrooms = c(3,3,2,3,5),
garageSqFt = c(400, 385, 454, 534, 210),
totalSqFT = c(NA, NA, 1231, 2232, 4564),
units = c(1, 1, 1, 1, 1),
subDivId = c("112", "111", "111", "111", "112"),
ID = c(4,56,67,94,130) )
dataTwo <- data.frame(
marketVal = c(123123,234234),
IDTwo = c(4,67) )
safe_left_join(dataOne, dataTwo, by=c(ID= "IDTwo"), conflict = "patch")
# marketVal bathrooms garageSqFt totalSqFT units subDivId ID
# 1 123123 3 400 NA 1 112 4
# 2 543534 3 385 NA 1 111 56
# 3 234234 2 454 1231 1 111 67
# 4 115435 3 534 2232 1 111 94
# 5 NA 5 210 4564 1 112 130
or for the same effect in this case we can use dplyr::coalesce
library(dplyr)
safe_left_join(dataOne, dataTwo, by=c(ID= "IDTwo"), conflict = coalesce)

Related

sdTrim (trimr package) does not recognize defined conditions

I'm having an issue with the sdTrim function, which had previously ran perfectly.
I have a dataframe (= new_data) containing the following variable names:
There are 8 different conditions: FA_1, HIT_1, ..., FA_4, HIT_4
I wanted to trim the reaction times and calculate a mean per participant and per condition. I used the following code:
trimmedData <- sdTrim(new_data, minRT = 150, sd = 2, pptVar = "participant", condVar = "condition", rtVar = "rt", accVar = "accuracy", perParticipant = TRUE, returnType = "mean")
This used to work fine, but suddenly my condition variable is not recognized as such anymore: instead of 8 variables, all are put into one:
What seems to be the issue here?
I tried different ways of including perCondition = TRUE, FALSE etc. which did not change anything.
the participant and condition variables are characters, the rt is numeric
As far as I can tell, the problem is with your data, not with your code. The example data you posted only has one row per participant/condition at most; there isn't a FA_3 or FA_4 for participant 988. If your real data doesn't have enough data for each combination of participant and conditions, then it looks like sdTrim just averages by participant.
I'm unfamiliar with reaction time data, but you might be able to accomplish what you're looking for using group_by and summarize from dplyr.
Below is an example with a larger dataset based on your example data.
library(trimr)
set.seed(123)
participant <- c(rep("1", 100), rep("2", 100), rep("3", 100))
accuracy <- sample(x = c("1", "0"), size = 300, replace = TRUE, prob = c(.9, .1))
condition <- sample(x = c("hit_1", "FA_1", "hit_2", "FA_2", "hit_3", "FA_3", "FA_4", "hit_4", "hit_1", "FA_1", "hit_2", "FA_2", "hit_3", "hit_4"), size = 300, replace = TRUE)
rt <- sample(x = 250:625, size = 300)
new_data <- data.frame(participant, accuracy, condition, rt)
trimmedData <- sdTrim(data = new_data,
minRT = 150,
sd = 2,
pptVar = "participant",
condVar = "condition",
rtVar = "rt",
accVar = "accuracy",
perParticipant = TRUE,
returnType = "mean")
print(trimmedData)
participant FA_1 hit_1 hit_3 hit_2 FA_4 FA_2 FA_3 hit_4
1 1 439.800 477.250 433.85 440.375 426.286 439.500 508.8 457.429
2 2 477.067 489.933 466.50 360.000 405.000 387.533 427.2 428.364
3 3 398.333 446.500 438.00 362.077 445.000 432.333 419.2 497.125
Update (1/23/23)
In both your original and your updated datasets, you simply don't have enough values per condition to properly use sdTrim() with both participant = TRUE and condition = TRUE (condition is automatically set to TRUE if you don't specify it).
Here is a link to the sdTrim() function on Github. Start looking at line 545, which describes what happens when you have both participant and condition set to TRUE.
Part of this function involves taking the standard deviation of the data for each combination of participant and condition. If you only have one value for each combination of participant and condition, your standard deviation value will be NA. See the below example of just using participant 988 and condition hit_4. Once your standard deviation is NA, NA's just follow after that.
You either need a larger dataset with more values for each combination of participant and condition or you need to set perParticipant and perCondition to both be FALSE. If you do the second option, you will have two NaN values because those values fall under the minRT threshold that you set. However, you can avoid that by also doing returnType = "raw".
new_data <- structure(list(participant = c("986", "986", "986", "986", "986", "986", "986", "986", "988", "988", "988", "988", "988", "988", "988", "988"), accuracy = c("1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"), condition = c("hit_1", "FA_1", "hit_2", "FA_2", "hit_3", "FA_3", "FA_4", "hit_4", "hit_1", "FA_1", "hit_2", "FA_2", "hit_3", "hit_4", "FA_3", "FA_4"), rt = c(638, 286, 348, 310, 404, 301, 216, 534, 348, 276, 256, 293, 495, 438, 73, 73)), row.names = c(NA, -16L), class = "data.frame")
stDev <- 2
minRT <- 150
# get the list of participant numbers
participant <- unique(new_data$participant)
# get the list of experimental conditions
conditionList <- unique(new_data$condition)
# trim the data
trimmedData <- new_data[new_data$rt > minRT, ]
# ready the final data set
finalData <- as.data.frame(matrix(0, nrow = length(participant), ncol = length(conditionList)))
# give the columns the condition names
colnames(finalData) <- conditionList
# add the participant column
finalData <- cbind(participant, finalData)
# convert to data frame
finalData <- data.frame(finalData)
# intialise looping variable for subjects
i <- 1
j <- 2
# take apart the loop
# focus on participant 988, condition hit_4
currSub <- "988"
currCond <- "hit_4"
# get relevant data
tempData <- trimmedData[trimmedData$participant == currSub & trimmedData$condition == currCond, ]
# find the cutoff
curMean <- mean(tempData$rt)
print(curMean)
[1] 438
curSD <- sd(tempData$rt)
print(curSD) # <- here is where the NA values start
[1] NA
curCutoff <- curMean + (stDev * curSD)
# trim the data
curData <- tempData[tempData$rt < curCutoff, ]
# find the average, and add to the data frame
finalData[i, j] <- round(mean(curData$rt))
head(finalData)
> participant hit_1 FA_1 hit_2 FA_2 hit_3 FA_3 FA_4 hit_4
1 986 NA 0 0 0 0 0 0 0
2 988 0 0 0 0 0 0 0 0

How would I go about joining or merging data frames of different sizes while also overwriting missing values in R?

R Question
I am looking to join multiple data frames of unequal size. While joining the data frames, I would like to overwrite any NAs. I attempted to use the coalesce function, but equal sized data frames were required.
Example
x <- data.frame(
ID = c(1,2,3,4,5),
Location = c("Georgia", NA, NA, "Idaho", "Texas"),
Cost = c(NA, 200, NA, 400, 500)
)
y <- data.frame(
ID = c(1, 2, 3),
Location = c("Wyoming", "Florida", "Toronto"),
Cost = c(150, 100, 450)
)
Desired Result
ID Location Cost
1 Georgia 150
2 Florida 200
3 Toronto 450
4 Idaho 400
5 Texas 500
You can do a full_join and then use coalesce for Location and Cost columns.
library(dplyr)
full_join(x, y, by = 'ID') %>%
mutate(Location = coalesce(Location.x, Location.y),
Cost = coalesce(Cost.x, Cost.y)) %>%
select(names(x))
# ID Location Cost
#1 1 Georgia 150
#2 2 Florida 200
#3 3 Toronto 450
#4 4 Idaho 400
#5 5 Texas 500
In base R, we can use ifelse to select values from Location and Cost columns.
transform(merge(x, y, by = 'ID', all = TRUE),
Location = ifelse(is.na(Location.x), Location.y, Location.x),
Cost = ifelse(is.na(Cost.x), Cost.y, Cost.x))[names(x)]

Replacing only NA values in xts object column wise using specific formula

I want to replace NA values in my xts object with formula Beta * Exposure * Index return.
My xts object is suppose Position_SimPnl created below:
library(xts)
df1 <- data.frame(Google = c(NA, NA, NA, NA, 500, 600, 700, 800),
Apple = c(10, 20,30,40,50,60,70,80),
Audi = c(1,2,3,4,5,6,7,8),
BMW = c(NA, NA, NA, NA, NA, 6000,7000,8000),
AENA = c(50,51,52,53,54,55,56,57))
Position_SimPnl <- xts(df1, order.by = Sys.Date() - 1:8)
For Beta there is a specific dataframe:
Beta_table <- data.frame (AENA = c(0.3,0.5,0.6), Apple = c(0.2,0.5,0.8), Google = c(0.1,0.3,0.5), Audi = c(0.4,0.6,0.7), AXP = c(0.5,0.7, 0.9), BMW = c(0.3,0.4, 0.5))
rownames(Beta_table) <- c(".SPX", ".FTSE", ".STOXX")
For exposure there is another dataframe:
Base <- data.frame (RIC = c("AENA","BMW","Apple","Audi","Google"), Exposure = c(100,200,300,400,500))
For Index return there is a xts object (Index_FX_Returns):
df2 <- data.frame(.SPX = c(0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08),
.FTSE = c(0.5, 0.4,0.3,0.2,0.3,0.4,0.3,0.4),
.STOXX = c(0.15,0.25,0.35,0.3,0.45,0.55,0.65,0.5))
Index_FX_Returns <- xts(df2,order.by = Sys.Date() - 1:8)
Also there is a dataframe which links RIC with Index:
RIC_Curr_Ind <- data.frame(RIC = c("AENA", "Apple", "Google", "Audi", "BMW"), Currency = c("EUR.","USD.","USD.","EUR.","EUR."), Index = c(".STOXX",".SPX",".SPX",".FTSE",".FTSE"))
What I want is for a particular position of NA value in Position_SimPnl it should look into the column name and get the corresponding index name from RIC_Curr_Ind dataframe and then look for the beta value from Beta_table by matching column name (column name of NA) and row name (index name derived from column name of NA).
Then again by matching the column name from Position_SimPnl with the RIC column from 'Base' dataframe it would extract the corresponding exposure value.
Then by matching column name from Position_SimPnl with RIC column from RIC_Curr_Ind dataframe, it would get the corresponding index name and from that index name it would look into the column name for xts object Index_FX_Returns and get the corresponding return value for the same date as of the NA value.
After getting the Beta, Exposure and Index return values I want the NA value to be replaced by formula: Beta * Exposure * Index return. Also I want only the NA values in Position_SimPnl to be replaced. the other values should remain as it was previously.I used the following formula for replacing the NA values:
do.call(merge, lapply(Position_SimPnl, function(y) {if(is.na(y)){y = (Beta_table[match(RIC_Curr_Ind$Index[match(colnames(y),RIC_Curr_Ind$RIC)],rownames(Beta_table)), match(colnames(y),colnames(Beta_table))]) * (Base$Exposure[match(colnames(y), Base$RIC)]) * (Index_FX_Returns[,RIC_Curr_Ind$Index[match(colnames(y),RIC_Curr_Ind$RIC)]])} else{y}}))
However in the output, if a particular column contains NA it is replacing all the values in the column (including which were not NA previously). Also I am getting multiple warning messages like
"In if (is.na(y)) { ... :
the condition has length > 1 and only the first element will be used".
I think because of this all values of column are getting transformed including non-NA ones. Can anyone suggest how to effectively replace these NA values by the formula mentioned above, keeping the other values same. Any help would be appreciated
Because you need to combine all data sets to achieve your formula Beta * Exposure * Index, consider building a master data frame comprised of all needed components. However, you face two challenges:
different data types (xts objects and data frame)
different data formats (wide and long formats)
For proper merging and calculating, consider converting all data components into data frames and reshaping to long format (i.e., all but Base and RIC_Curr_Ind). Then, merge and calculate with ifelse to fill NA values. Of course, at the end, you will have to reshape back to wide and convert back to XTS.
Reshape
# USER-DEFINED METHOD GIVEN THE MULTIPLE CALLS
proc_transpose <- function(df, col_pick, val_col, time_col) {
reshape(df,
varying = names(df)[col_pick],
times = names(df)[col_pick], ids = NULL,
v.names = val_col, timevar = time_col,
new.row.names = 1:1E4, direction = "long")
}
# POSITIONS
Position_SimPnl_wide_df <- data.frame(date = index(Position_SimPnl),
coredata(Position_SimPnl))
Position_SimPnl_long_df <- proc_transpose(Position_SimPnl_wide_df, col_pick = -1,
val_col = "Position", time_col = "RIC")
# BETA
Beta_table_long_df <- proc_transpose(transform(Beta_table, Index = row.names(Beta_table)),
col_pick = 1:ncol(Beta_table),
val_col = "Beta", time_col = "RIC")
# INDEX
Index_FX_Returns_wide_df <- data.frame(date = index(Index_FX_Returns),
coredata(Index_FX_Returns))
Index_FX_Returns_long_df <- proc_transpose(Index_FX_Returns_wide_df, col = -1,
val_col = "Index_value", time_col = "Index")
Merge
# CHAIN MERGE
master_df <- Reduce(function(...) merge(..., by="RIC"),
list(Position_SimPnl_long_df,
Beta_table_long_df,
Base)
)
# ADDITIONAL MERGES (NOT INCLUDED IN ABOVE CHAIN DUE TO DIFFERENT by)
master_df <- merge(master_df,
Index_FX_Returns_long_df, by=c("Index", "date"))
master_df <- merge(master_df,
RIC_Curr_Ind, by=c("Index", "RIC"))
Calculation
# FORMULA: Beta * Exposure * Index
master_df$Position <- with(master_df, ifelse(is.na(Position),
Beta * Exposure * Index_value,
Position))
Final Preparation
# RE-ORDER ROWS AND SUBSET COLS
master_df <- data.frame(with(master_df, master_df[order(RIC, date),
c("date", "RIC", "Position")]),
row.names = NULL)
# RESHAPE WIDE (REVERSE OF ABOVE)
Position_SimPnl_new <- setNames(reshape(master_df, idvar = "date",
v.names = "Position", timevar = "RIC",
direction = "wide"),
c("date", unique(master_df$RIC)))
# CONVERT TO XTS
Position_SimPnl_new <- xts(transform(Position_SimPnl_new, date = NULL),
order.by = Position_SimPnl_new$date)
Position_SimPnl_new
# AENA Apple Audi BMW Google
# 2019-11-27 58 80 8 8000 800.0
# 2019-11-28 57 70 7 7000 700.0
# 2019-11-29 56 60 6 6000 600.0
# 2019-11-30 55 50 5 24 500.0
# 2019-12-01 54 40 4 16 2.0
# 2019-12-02 53 30 3 24 1.5
# 2019-12-03 52 20 2 32 1.0
# 2019-12-04 51 10 1 40 0.5

How to fill out missing values of df1(at time t) using the ration of df2(t)/df2(t-1) easily?

There are two dataframes, the first one has some missing values, the second one has no missing values. The rules are:
1: for year(t) of df1, if the value of year(t) is missing, using the value of year(t-1)*ratio. The ratio = value of year(t) / value of year(t-1) of df2
2: In df1, there is no data in both 2012 and 2013, but we only need to impute missing for 2012, which is one year after the most recent data. We don't have to impute for all the years.
My way is a little stupid. Anyone can have any better ways to fix this?
data2 = data.frame('population by age' = seq(5, 8, by = 1),
'2008' = c(145391,
140621,
136150,
131944
),
'2009' = c(148566,
143943,
139367,
135083
),
'2010' = c(152330,
147261,
142555,
138172
),
'2011' = c(156630,
151387,
146491,
141905
),
'2012' = c(133545,
129737,
126124,
122678
),
'2013' = c(119397,
116093,
112666,
109174))
data1 <- data.frame('grade' = seq(1, 4, by = 1),
'2008'= c(218701,
NA,
142190,
NA),
'2009' = c(NA,
196398,
155033,
NA),
'2010' = c(212512,
NA,
176268,
143699),
'2011' = c(218529,
198933,
NA,
159103),
'2012' = c(NA,
NA,
NA,
NA),
'2013' = c(NA,
NA,
NA,
NA)
)
# Find the column number of the last column with non-na value
ind <- !is.na(data1)
t1 <- tapply(data1[ind], col(data1)[ind],tail, 1)
last_non_na_col <- as.numeric(tail(unlist(dimnames(t1)), n = 1))
for (i in 1:nrow(data1)) {
for (j in 3:(last_non_na_col+1)) {
if (is.na(data1[i,j])) {
data1[i,j] = data1[i,j-1]*data2[i,j]/data2[i,j-1]
}
}
}
The output will be like this. And this is exactly what I want.
> data1
grade X2008 X2009 X2010 X2011 X2012 X2013
1 1 218701 223476.9 212512.0 218529.0 186321.0 NA
2 2 NA 196398.0 200925.1 198933.0 170483.4 NA
3 3 142190 155033.0 176268.0 181134.8 155951.2 NA
4 4 NA NA 143699.0 159103.0 137545.8 NA
First, create a new data frame with the values to substitute. I'm using package data.table to do it; you can try a solution using dplyr/tidyr if you prefer. Then replace the NAs in data1. Because there's no data for every grade and year, there will still be NAs. So put everything inside a while loop:
library(data.table)
while( anyNA(data1[ncol(data1)]) ) {
data1.sub <- copy(data1)
for( t in 3:ncol(data1.sub) ) set( data1.sub, j = t, value = data1[[t-1]]*(data2[[t]]/data2[[t-1]]) )
data1[ is.na(data1) ] <- data1.sub[ is.na(data1) ]
}
I'm usin 3:ncol() because there's no information prior to X2008. Here's the result:
> data1
grade X2008 X2009 X2010 X2011 X2012 X2013
1 1 218701 223476.9 212512.0 218529.0 186321.0 166581.8
2 2 NA 196398.0 200925.1 198933.0 170483.4 152554.2
3 3 142190 155033.0 176268.0 181134.8 155951.2 139310.5
4 4 NA NA 143699.0 159103.0 137545.8 122405.2
The same result will be acomplished using the code in your answer if you use for loop to all columns after X2008:
for (i in 1:nrow(data1)) {
for (j in 3:ncol(data1)) {
if (is.na(data1[i,j])) data1[i,j] = data1[i,j-1]*data2[i,j]/data2[i,j-1]
} }

Convert data frame to spatial lines data frame in R with x,y x,y coordintates

I have a data frame in R, one of the columns contains the coordinates for points along a line in the form:
x,y x,y x,y x,y
So the whole data frame looks like
id dist speed coord
1 45 6 1.294832,54.610240 -1.294883,54.610080 -1.294262,54.6482757
2 23 34 2.788732,34.787940 6.294883,24.567080 -5.564262,-45.7676757
I would like to convert this to a spatial lines data frame, and I assume that the fist step would be to separate the coordinates into two columns in the from:
x, x, x, x
y, y, y, y
But I am unsure how to proceed.
EDIT
For those requesting it a dput of the actual file
> finalsub <- final[final$rid <3,]
> dput(finalsub)
structure(list(rid = c(1, 2), start_id = c(1L, 1L), start_code = c("E02002536",
"E02002536"), end_id = c(106L, 106L), end_code = c("E02006909",
"E02006909"), strategy = c("fastest", "quietest"), distance = c(12655L,
12909L), time_seconds = c(2921L, 3422L), calories = c(211L, 201L
), document.id = c(1L, 1L), array.index = 1:2, start = c("Geranium Close",
"Geranium Close"), finish = c("Hylton Road", "Hylton Road"),
startBearing = c(0, 0), startSpeed = c(0, 0), start_longitude = c(-1.294832,
-1.294832), start_latitude = c(54.610241, 54.610241), finish_longitude = c(-1.249478,
-1.249478), finish_latitude = c(54.680691, 54.680691), crow_fly_distance = c(8362,
8362), event = c("depart", "depart"), whence = c(1473171787,
1473171787), speed = c(20, 20), itinerary = c(419956, 419957
), clientRouteId = c(0, 0), plan = c("fastest", "quietest"
), note = c("", ""), length = c(12655, 12909), time = c(2921,
3422), busynance = c(42172, 17242), quietness = c(30, 75),
signalledJunctions = c(3, 4), signalledCrossings = c(2, 0
), west = c(-1.300074, -1.294883), south = c(54.610006, 54.609851
), east = c(-1.232447, -1.232447), north = c(54.683814, 54.683814
), name = c("Geranium Close to Hylton Road", "Geranium Close to Hylton Road"
), walk = c(0, 0), leaving = c("2016-09-06 15:23:07", "2016-09-06 15:23:07"
), arriving = c("2016-09-06 16:11:48", "2016-09-06 16:20:09"
), coordinates = c("-1.294832,54.610240 -1.294883,54.610080 -1.294262,54.610016 -1.294141,54.610006 -1.293710,54.610038 -1.293726,54.610142 -1.293742,54.610247 -1.293510,54.610262 -1.293368,54.610258 -1.292816,54.610195 -1.292489,54.610152 -1.292298,54.610667 -1.292205,54.610951 -1.292182,54.611063 -1.292183,54.611153 -1.292239,54.611341 -1.292305,54.611447 -1.292375,54.611534 -1.292494,54.611639 -1.292739,54.611830 -1.292909,54.611980 -1.293010,54.612107 -1.293111,54.612262 -1.293192,54.612423 -1.293235,54.612546 -1.293267,54.612684 -1.293279,54.612818 -1.293510,54.612813 -1.293732,54.612790 -1.294324,54.612691 -1.295086,54.612568 -1.295313,54.612539 -1.295379,54.612543 -1.295889,54.612645 -1.295945,54.612648 -1.296006,54.612642 -1.297154,54.612414 -1.297502,54.612895 -1.297733,54.612847 -1.297990,54.612796 -1.298292,54.612747 -1.298515,54.612727 -1.299088,54.612681 -1.299564,54.612669 -1.299798,54.612663 -1.300006,54.612660 -1.300057,54.612809 -1.300056,54.613335 -1.300071,54.613693 -1.300074,54.614044 -1.300042,54.614482 -1.300015,54.614786 -1.299947,54.615220 -1.299907,54.615394 -1.299854,54.615644 -1.299730,54.616048 -1.299495,54.616700 -1.299196,54.617347 -1.298236,54.619313 -1.298010,54.619762 -1.297703,54.620418 -1.297520,54.620831 -1.297169,54.621690 -1.297061,54.621981 -1.296416,54.623873 -1.296310,54.624308 -1.296225,54.624888 -1.296215,54.625286 -1.296220,54.625546 -1.296241,54.625803 -1.296268,54.625913 -1.296323,54.626011 -1.296397,54.626096 -1.296540,54.626190 -1.296719,54.626323 -1.296893,54.626433 -1.297042,54.626589 -1.297111,54.626710 -1.297122,54.626825 -1.297110,54.626948 -1.297058,54.627052 -1.296961,54.627172 -1.296861,54.627258 -1.296760,54.627325 -1.296603,54.627397 -1.296491,54.627438 -1.296338,54.627472 -1.296154,54.627496 -1.295966,54.627513 -1.295746,54.627526 -1.295618,54.627522 -1.295421,54.627510 -1.295197,54.627466 -1.295102,54.627436 -1.294832,54.627376 -1.294665,54.627355 -1.294502,54.627350 -1.294331,54.627366 -1.294014,54.627415 -1.293557,54.627499 -1.293001,54.627611 -1.292613,54.627701 -1.291836,54.627902 -1.291248,54.628076 -1.290635,54.628269 -1.290142,54.628446 -1.289621,54.628649 -1.289031,54.628903 -1.288538,54.629137 -1.288132,54.629350 -1.287681,54.629604 -1.287262,54.629862 -1.286866,54.630124 -1.286103,54.630643 -1.285748,54.630868 -1.285396,54.631082 -1.284974,54.631322 -1.284541,54.631540 -1.284101,54.631752 -1.283390,54.632048 -1.282592,54.632385 -1.282161,54.632566 -1.281762,54.632734 -1.281111,54.633011 -1.280519,54.633274 -1.280009,54.633501 -1.279834,54.633579 -1.279392,54.633781 -1.278434,54.634210 -1.277896,54.634457 -1.276936,54.634898 -1.276210,54.635223 -1.275594,54.635485 -1.274596,54.635891 -1.273929,54.636180 -1.273392,54.636428 -1.272814,54.636726 -1.271627,54.637353 -1.271084,54.637605 -1.270550,54.637825 -1.269916,54.638054 -1.269389,54.638222 -1.268821,54.638383 -1.268165,54.638555 -1.265953,54.639131 -1.263766,54.639693 -1.262790,54.639953 -1.262005,54.640154 -1.261340,54.640329 -1.260658,54.640532 -1.260123,54.640712 -1.259489,54.640957 -1.258639,54.641312 -1.258010,54.641554 -1.257141,54.641861 -1.255996,54.642247 -1.254429,54.642787 -1.253524,54.643099 -1.252790,54.643343 -1.251636,54.643727 -1.250900,54.643982 -1.250258,54.644219 -1.249668,54.644419 -1.249123,54.644623 -1.248778,54.644762 -1.246709,54.645642 -1.244773,54.646492 -1.244140,54.646746 -1.243551,54.646973 -1.242738,54.647233 -1.242353,54.647360 -1.241810,54.647503 -1.241182,54.647642 -1.240373,54.647780 -1.239950,54.647854 -1.239961,54.647890 -1.239435,54.647967 -1.239583,54.649724 -1.239343,54.649878 -1.239011,54.650011 -1.237692,54.650177 -1.236610,54.650296 -1.236417,54.650323 -1.236257,54.650351 -1.236015,54.650414 -1.235833,54.650469 -1.235081,54.650723 -1.234846,54.650805 -1.234312,54.650977 -1.234094,54.651025 -1.233980,54.651044 -1.233308,54.651137 -1.233173,54.651160 -1.233063,54.651200 -1.232967,54.651252 -1.232849,54.651347 -1.232814,54.651423 -1.232810,54.651495 -1.232823,54.651569 -1.232840,54.651893 -1.232827,54.652010 -1.232504,54.653177 -1.232447,54.653500 -1.232451,54.653723 -1.232474,54.653943 -1.232535,54.654301 -1.232590,54.654635 -1.232903,54.654627 -1.232948,54.655599 -1.232982,54.656334 -1.232795,54.656365 -1.233131,54.658020 -1.233336,54.658428 -1.233507,54.658699 -1.233592,54.658803 -1.234197,54.659389 -1.234690,54.659825 -1.234979,54.660119 -1.235153,54.660314 -1.235343,54.660572 -1.235566,54.661037 -1.235656,54.661355 -1.235690,54.661638 -1.235677,54.661902 -1.235677,54.661984 -1.235683,54.663215 -1.235656,54.663632 -1.235639,54.664273 -1.235613,54.664639 -1.235593,54.664822 -1.235566,54.664957 -1.235508,54.665351 -1.235197,54.667327 -1.235120,54.668542 -1.235100,54.668897 -1.235199,54.669535 -1.235358,54.670231 -1.235437,54.670482 -1.235756,54.671241 -1.236144,54.672181 -1.236375,54.672971 -1.236309,54.673562 -1.236286,54.673704 -1.236127,54.674365 -1.235918,54.675272 -1.235827,54.675620 -1.235749,54.675960 -1.235735,54.676152 -1.235740,54.676328 -1.235754,54.676598 -1.235770,54.676987 -1.235771,54.677013 -1.235793,54.677480 -1.235758,54.677760 -1.235607,54.678134 -1.235470,54.678420 -1.235167,54.678875 -1.234263,54.679929 -1.234207,54.680065 -1.234175,54.680201 -1.234204,54.680465 -1.234300,54.681119 -1.234362,54.681549 -1.234427,54.681771 -1.234560,54.682172 -1.234782,54.682824 -1.236530,54.682837 -1.236725,54.682829 -1.237133,54.682813 -1.238813,54.683143 -1.241021,54.683814 -1.241819,54.683771 -1.242854,54.683717 -1.242946,54.683718 -1.243082,54.683716 -1.244694,54.683772 -1.244658,54.683077 -1.245038,54.682805 -1.245047,54.681990 -1.245011,54.681238 -1.245220,54.680975 -1.247056,54.680601 -1.248019,54.680404 -1.249478,54.680691",
"-1.294832,54.610240 -1.294883,54.610080 -1.294262,54.610016 -1.294141,54.610006 -1.293710,54.610038 -1.293726,54.610142 -1.293742,54.610247 -1.293510,54.610262 -1.293368,54.610258 -1.292816,54.610195 -1.292489,54.610152 -1.292298,54.610667 -1.292167,54.610651 -1.291371,54.610562 -1.291240,54.610556 -1.291107,54.610564 -1.290983,54.610581 -1.290467,54.610665 -1.290253,54.610690 -1.290017,54.610689 -1.289770,54.610665 -1.289500,54.610620 -1.289281,54.610570 -1.289124,54.610514 -1.288957,54.610440 -1.288611,54.610277 -1.288420,54.610222 -1.287445,54.610110 -1.287259,54.610664 -1.286758,54.610611 -1.285446,54.610462 -1.285308,54.610459 -1.283356,54.610475 -1.283159,54.610475 -1.283156,54.610324 -1.283153,54.610119 -1.282818,54.610118 -1.282560,54.610114 -1.282110,54.610131 -1.281962,54.610153 -1.281788,54.610200 -1.281639,54.610257 -1.281298,54.609964 -1.281196,54.609851 -1.280586,54.610008 -1.280272,54.610054 -1.279816,54.610091 -1.279480,54.610104 -1.279112,54.610121 -1.278953,54.610146 -1.278815,54.610183 -1.278669,54.610225 -1.278524,54.610279 -1.278428,54.610326 -1.278327,54.610377 -1.278042,54.610237 -1.277946,54.610211 -1.277849,54.610204 -1.277454,54.610206 -1.277268,54.610211 -1.276621,54.610222 -1.276217,54.610233 -1.276085,54.610240 -1.275571,54.610315 -1.275426,54.610347 -1.275334,54.610373 -1.275248,54.610417 -1.275204,54.610477 -1.275066,54.610765 -1.274836,54.611248 -1.274811,54.611349 -1.274833,54.611414 -1.274925,54.611607 -1.274953,54.611664 -1.274669,54.611698 -1.272541,54.610433 -1.271290,54.610716 -1.270069,54.611677 -1.269365,54.611847 -1.268450,54.612165 -1.267142,54.612923 -1.266539,54.613386 -1.265920,54.614177 -1.265663,54.614259 -1.264195,54.616131 -1.263730,54.616670 -1.263665,54.616739 -1.263407,54.617051 -1.262407,54.618192 -1.262185,54.618424 -1.262077,54.618537 -1.261506,54.619136 -1.261394,54.619342 -1.261507,54.619520 -1.261799,54.620013 -1.261791,54.620138 -1.261695,54.620233 -1.261342,54.620279 -1.261237,54.620334 -1.261175,54.620442 -1.261128,54.620493 -1.260857,54.620616 -1.260783,54.620697 -1.260729,54.620807 -1.260729,54.620942 -1.260677,54.621042 -1.260600,54.621109 -1.260457,54.621250 -1.260409,54.621298 -1.260364,54.621336 -1.260140,54.621409 -1.260052,54.621475 -1.259959,54.621607 -1.259881,54.621722 -1.259603,54.622326 -1.259445,54.622670 -1.259349,54.623096 -1.259359,54.623266 -1.259490,54.623825 -1.259497,54.623856 -1.259882,54.624563 -1.259894,54.624684 -1.259646,54.624993 -1.259529,54.625176 -1.259093,54.625999 -1.258939,54.626244 -1.258780,54.626429 -1.258157,54.626772 -1.257604,54.627106 -1.256140,54.627787 -1.255933,54.627903 -1.255874,54.627953 -1.255754,54.628092 -1.255576,54.628504 -1.255534,54.628645 -1.255601,54.629176 -1.255572,54.629415 -1.255265,54.630017 -1.255104,54.630209 -1.254200,54.630725 -1.254084,54.630831 -1.254037,54.630915 -1.254018,54.631129 -1.254128,54.631712 -1.254107,54.631961 -1.253979,54.632187 -1.253594,54.632613 -1.253530,54.632749 -1.253501,54.632882 -1.253404,54.633827 -1.253482,54.634447 -1.253483,54.634654 -1.253445,54.634902 -1.253389,54.635040 -1.253275,54.635186 -1.252555,54.635902 -1.252473,54.636063 -1.252330,54.636859 -1.252273,54.637015 -1.252176,54.637175 -1.251977,54.637365 -1.251763,54.637507 -1.251314,54.637755 -1.250800,54.638114 -1.250487,54.638358 -1.250153,54.638709 -1.249968,54.638840 -1.249562,54.639001 -1.248455,54.639365 -1.248124,54.639494 -1.247386,54.639821 -1.246797,54.640013 -1.246610,54.640098 -1.246513,54.640170 -1.246574,54.640279 -1.247014,54.640580 -1.247341,54.641088 -1.247564,54.641365 -1.248256,54.642661 -1.248319,54.642919 -1.247650,54.643048 -1.246829,54.643234 -1.245867,54.643306 -1.245202,54.643286 -1.243595,54.643087 -1.243282,54.643073 -1.242962,54.643076 -1.242041,54.643065 -1.241790,54.643031 -1.241369,54.642888 -1.240421,54.642544 -1.240237,54.642499 -1.240075,54.642497 -1.239926,54.642545 -1.239858,54.642627 -1.239837,54.642696 -1.239909,54.642989 -1.239960,54.643498 -1.239964,54.643532 -1.239983,54.644175 -1.239936,54.644470 -1.239883,54.644807 -1.239842,54.645116 -1.239825,54.645489 -1.239748,54.646014 -1.239733,54.646299 -1.239798,54.646734 -1.239789,54.646896 -1.239705,54.647229 -1.239683,54.647331 -1.239608,54.647496 -1.239566,54.647621 -1.239562,54.647760 -1.239618,54.647903 -1.239421,54.647932 -1.239435,54.647967 -1.239583,54.649724 -1.239343,54.649878 -1.239011,54.650011 -1.237692,54.650177 -1.236610,54.650296 -1.236417,54.650323 -1.236257,54.650351 -1.236015,54.650414 -1.235833,54.650469 -1.235081,54.650723 -1.234846,54.650805 -1.234312,54.650977 -1.234094,54.651025 -1.233980,54.651044 -1.233308,54.651137 -1.233173,54.651160 -1.233063,54.651200 -1.232967,54.651252 -1.232849,54.651347 -1.232814,54.651423 -1.232810,54.651495 -1.232823,54.651569 -1.232840,54.651893 -1.232827,54.652010 -1.232504,54.653177 -1.232447,54.653500 -1.232451,54.653723 -1.232474,54.653943 -1.232535,54.654301 -1.232590,54.654635 -1.232619,54.654960 -1.232723,54.655606 -1.232795,54.656365 -1.233131,54.658020 -1.233336,54.658428 -1.233507,54.658699 -1.233592,54.658803 -1.234197,54.659389 -1.234690,54.659825 -1.234979,54.660119 -1.235153,54.660314 -1.235343,54.660572 -1.235566,54.661037 -1.235656,54.661355 -1.235690,54.661638 -1.235677,54.661902 -1.235677,54.661984 -1.235683,54.663215 -1.235656,54.663632 -1.235639,54.664273 -1.235613,54.664639 -1.235593,54.664822 -1.235566,54.664957 -1.235508,54.665351 -1.235197,54.667327 -1.235120,54.668542 -1.235100,54.668897 -1.235199,54.669535 -1.235358,54.670231 -1.235437,54.670482 -1.235756,54.671241 -1.236144,54.672181 -1.236375,54.672971 -1.236309,54.673562 -1.236286,54.673704 -1.236127,54.674365 -1.235918,54.675272 -1.235827,54.675620 -1.235749,54.675960 -1.235735,54.676152 -1.235740,54.676328 -1.235754,54.676598 -1.235770,54.676987 -1.235771,54.677013 -1.235793,54.677480 -1.235758,54.677760 -1.235607,54.678134 -1.235470,54.678420 -1.235167,54.678875 -1.234263,54.679929 -1.234207,54.680065 -1.234175,54.680201 -1.234204,54.680465 -1.234300,54.681119 -1.234362,54.681549 -1.234427,54.681771 -1.234560,54.682172 -1.234782,54.682824 -1.236530,54.682837 -1.236725,54.682829 -1.237133,54.682813 -1.238813,54.683143 -1.241021,54.683814 -1.241819,54.683771 -1.242854,54.683717 -1.242946,54.683718 -1.243082,54.683716 -1.244694,54.683772 -1.244658,54.683077 -1.245038,54.682805 -1.245047,54.681990 -1.245011,54.681238 -1.245220,54.680975 -1.247056,54.680601 -1.248019,54.680404 -1.249478,54.680691"
), grammesCO2saved = c(2359, 2406), calories = c(211, 201
), type = c("route", "route")), .Names = c("rid", "start_id",
"start_code", "end_id", "end_code", "strategy", "distance", "time_seconds",
"calories", "document.id", "array.index", "start", "finish",
"startBearing", "startSpeed", "start_longitude", "start_latitude",
"finish_longitude", "finish_latitude", "crow_fly_distance", "event",
"whence", "speed", "itinerary", "clientRouteId", "plan", "note",
"length", "time", "busynance", "quietness", "signalledJunctions",
"signalledCrossings", "west", "south", "east", "north", "name",
"walk", "leaving", "arriving", "coordinates", "grammesCO2saved",
"calories", "type"), row.names = 1:2, class = "data.frame")
>
I believe what you want to end up with is a column in your data frame that for each row is a list (or data frame) with x.coord and y.coord columns. To achieve that, we can use unnest and nest from tidyr with dplyr:
library(dplyr)
library(tidyr)
result <- finalsub %>% mutate(coordinates = strsplit(coordinates,split=" ",fixed=TRUE)) %>%
unnest(coordinates) %>%
mutate(coordinates = strsplit(coordinates,split=",",fixed=TRUE),
x.coord = as.numeric(unlist(coordinates)[c(TRUE,FALSE)]),
y.coord = as.numeric(unlist(coordinates)[c(FALSE,TRUE)])) %>%
select(-coordinates) %>%
nest(x.coord,y.coord,.key=coordinates)
Notes:
The first mutate splits the character vector in your coordinates column by " " to separate each coordinate x,y resulting in a list of these.
unnest separates this list into rows.
In the second mutate, we first split each coordinate x,y, this time by "," to separate each coordinate into x and y. Then we create separate x.coord and y.coord columns to hold these. Note the conversion to numeric here.
Finally, we use nest to collect the x.coord and y.coord columns as a list under the column named coordinates. Note that we first have to remove the original coordinates column.
The result using your dput data, printing only the coordinates column:
print(result$coordinates)
##[[1]]
### A tibble: 284 x 2
## x.coord y.coord
## <dbl> <dbl>
##1 -1.294832 54.61024
##2 -1.294883 54.61008
##3 -1.294262 54.61002
##4 -1.294141 54.61001
##5 -1.293710 54.61004
##6 -1.293726 54.61014
##7 -1.293742 54.61025
##8 -1.293510 54.61026
##9 -1.293368 54.61026
##10 -1.292816 54.61019
### ... with 274 more rows
##
##[[2]]
### A tibble: 322 x 2
## x.coord y.coord
## <dbl> <dbl>
##1 -1.294832 54.61024
##2 -1.294883 54.61008
##3 -1.294262 54.61002
##4 -1.294141 54.61001
##5 -1.293710 54.61004
##6 -1.293726 54.61014
##7 -1.293742 54.61025
##8 -1.293510 54.61026
##9 -1.293368 54.61026
##10 -1.292816 54.61019
### ... with 312 more rows
df1 <- data.frame(id= c(1,2), dist =c(45,23), speed = c(6,24) ,do.call(rbind,strsplit(df$cord,split = " ")))
library(reshape2)
df1 <- melt(df1,id=c("id","dist","speed"))
df2<- data.frame(do.call(rbind,strsplit(df1$value, split=",")))
df1$value <- NULL
df1 <- cbind(df1,df2)
names(df1)[5:6] <- c("x","y")
id dist speed variable x y
1 1 45 6 X1 1.294832 54.610240
2 2 23 24 X1 2.788732 34.787940
3 1 45 6 X2 -1.294883 54.610080
4 2 23 24 X2 6.294883 24.567080
5 1 45 6 X3 -1.294262 54.6482757
6 2 23 24 X3 -5.564262 -45.7676757

Resources