R GGplot2 Stacked Columns Chart - r
I am trying to do a Stacked Columns Chart in R. Sorry but I am learning thats why i need help
This is how i have the data
structure(list(Category = structure(c(2L, 3L, 4L, 1L), .Label = c("MLC1000",
"MLC1051", "MLC1648", "MLC5726"), class = "factor"), Minutes = c(2751698L,
2478850L, 556802L, 2892097L), Items = c(684L, 607L, 135L, 711L
), Visits = c(130293L, 65282L, 25484L, 81216L), Sold = c(2625L,
1093L, 681L, 1802L)), .Names = c("Category", "Minutes", "Items",
"Visits", "Sold"), class = "data.frame", row.names = c(NA, -4L)
)
And i want to create this graphic
I think there are two pretty basic principles that you should apply to make this problem easier to handle. First, you should make your data tidy. Second, you shouldn't leave ggplot to do your calculations for you.
library(tidyverse)
a <- data_frame(
category = letters[1:4],
minutes = c(2751698, 2478850, 556802, 2892097),
visits = c(130293, 65282, 25484, 81216),
sold = c(2625, 1093, 681, 1802)
) %>%
gather(variable, value, -category) %>% # make tidy
group_by(variable) %>%
mutate(weight = value / sum(value)) # calculate weight variable
## Source: local data frame [12 x 4]
## Groups: variable [3]
## category variable value weight
## <chr> <chr> <dbl> <dbl>
## 1 a minutes 2751698 0.31703610
## 2 b minutes 2478850 0.28559999
## 3 c minutes 556802 0.06415178
## 4 d minutes 2892097 0.33321213
## 5 a visits 130293 0.43104127
## 6 b visits 65282 0.21596890
## 7 c visits 25484 0.08430734
## 8 d visits 81216 0.26868249
## 9 a sold 2625 0.42331882
## 10 b sold 1093 0.17626189
## 11 c sold 681 0.10982100
## 12 d sold 1802 0.29059829
I don't know what was up with your structure(), but I couldn't build a data frame from it without crashing my R session.
Once we get the data into this format, the ggplot2 call is actually really easy:
ggplot(a, aes(x = variable, weight = weight * 100, fill = category)) +
geom_bar()
Related
Low to high frequency conversion in panel data in R using tempdisagg
I have daily panel data with four variables: date, cusip(id identifier), PD (probability of default), and price. PD is only available on a quarterly basis for the first day of January, April, July, and October. I want to generate daily data for PD using Chow-Lin frequency conversion from tempdisagg package. I know how to apply td() function on time series, but I didn't find examples with panel data frames. Here are my code and sample data using reproduce() from devtools package, so only few sample days are included instead of full quarter. Running td() reports an error: Error in td(PD ~ price, conversion = "first", method = "chow-lin-fixed", fixed.rho = 0.5) : In numeric mode, 'to' must be an integer number. I know that both price and PD are high-frequency daily indicators in mydata, so I guess I need to use to.quarterly() function on PDor something similar. library(dplyr) library(zoo) library(tempdisagg) library(tsbox) mydata <- structure(list(date = structure(c(13516, 13516, 13517, 13517,13518, 13518, 13521, 13605, 13605, 13606), class = "Date"), cusip = c("31677310","66585910", "31677310", "66585910", "31677310", "66585910", "31677310","66585910", "31677310", "66585910"), PD = c(0.076891, 0.096,NA, NA, NA, NA, NA, 0.094341, 0.08867, NA), price = c(40.98, 61.31,40.99, 60.77, 40.18, 59.97, 39.92, 59.96, 38.6, 60.69)), row.names = c(6L,13L, 36L, 43L, 66L, 73L, 96L, 1843L, 1866L, 1873L), class = "data.frame") mydata <- mydata%>% group_by(cusip) %>% arrange(cusip,date) %>% mutate(PDdaily = td(PD ~ price, conversion = "first",method = "chow-lin-fixed", fixed.rho = 0.5))
Your example is not sufficient. For each disaggregation, we need at least 3 low frequency values to be able to perform a regression. Here is an alternative example, with 3 pairs of low and high frequency series: library(tidyverse) library(tempdisagg) library(tsbox) mydata <- ts_c( low_freq = ts_frequency(fdeaths, "year"), high_freq = mdeaths ) %>% ts_tbl() %>% ts_wide() %>% crossing(id = 1:3) %>% arrange(id) Applying td multiple times on data in a data frame will be cumbersome. It is easier to extract the data into two lists, one with the low and one with high frequency series: list_lf <- group_split(ts_na_omit(select(mydata, time, value = low_freq, id)), id, keep = FALSE) list_hf <- group_split(select(mydata, time, value = high_freq, id), id, keep = FALSE) Now you can use Map() or map2() to apply the function to each pair of elements: ans <- map2(list_lf, list_hf, ~ predict(td(.x ~ .y))) Transforming the disaggregated data back to a data frame: bind_rows(ans, .id = "id") #> # A tibble: 216 x 3 #> id time value #> <chr> <date> <dbl> #> 1 1 1974-01-01 59.2 #> 2 1 1974-02-01 54.2 #> 3 1 1974-03-01 54.4 #> 4 1 1974-04-01 54.4 #> 5 1 1974-05-01 47.3 #> 6 1 1974-06-01 42.8 #> 7 1 1974-07-01 43.3 #> 8 1 1974-08-01 40.6 #> 9 1 1974-09-01 42.0 #> 10 1 1974-10-01 47.3 #> # … with 206 more rows Created on 2020-06-03 by the reprex package (v0.3.0)
How to calculate elapsed times for the total duration of events?
I have collected a dataframe that models the duration of time for events in a group problem solving session in which the members Communicate (Discourse Code) and construct models (Modeling Code). Each minute that that occurs is captured in the Time_Processed column. Technically these events occur simultaneously. I would like to know how long the students are constructing each type of model which is the total duration of that model or the time elapsed before that model changes. I have the following dataset: Looks like this: `Modeling Code` `Discourse Code` Time_Processed <fct> <fct> <dbl> 1 OFF OFF 10.0 2 MA Q 11.0 3 MA AG 16.0 4 V S 18.0 5 V Q 20.0 6 MA C 21.0 7 MA C 23.0 8 MA C 25.0 9 V J 26.0 10 P S 28.0 # My explicit dataframe. df <- structure(list(`Modeling Code` = structure(c(3L, 2L, 2L, 6L, 6L, 2L, 2L, 2L, 6L, 4L), .Label = c("A", "MA", "OFF", "P", "SM", "V"), class = "factor"), `Discourse Code` = structure(c(7L, 8L, 1L, 9L, 8L, 2L, 2L, 2L, 6L, 9L), .Label = c("AG", "C", "D", "DA", "G", "J", "OFF", "Q", "S"), class = "factor"), Time_Processed = c(10, 11, 16, 18, 20, 21, 23, 25, 26, 28)), row.names = c(NA, -10L), .Names = c("Modeling Code", "Discourse Code", "Time_Processed"), class = c("tbl_df", "tbl", "data.frame")) For this dataframe I can find how often the students were constructing each type of model logically like this. With Respect to the Modeling Code and Time_Processed columns, At 10 minutes they are using the OFF model method, then at 11 minutes, they change the model so the duration of the OFF model is (11 - 10) minutes = 1 minute. There are no other occurrences of the "OFF" method so the duration of OFF = 1 min. Likewise, for Modeling Code method "MA", the model is used from 11 minutes to 16 minutes (duration = 5 minutes) and then from 16 minutes to 18 minutes before the model changes to V with (duration = 2 minutes), then the model is used again at 21 minutes and ends at 26 minutes (duration = 5 minutes). So the total duration of "MA" is (5 + 2 + 5) minutes = 12 minutes. Likewise the duration of Modeling Code method "V" starts at 18 minutes, ends at 21 minutes (duration = 3 minutes), resumes at 26 minutes, ends at 28 minutes (duration = 2) minutes. So total duration of "V" is 3 + 2 = 5 minutes. Then the duration of Modeling Code P, starts at 28 minutes and there is no continuity so total duration of P is 0 minutes. So the total duration (minutes) table of the Modeling Codes is this: Modeling Code Total_Duration OFF 1 MA 12 V 5 P 0 This models a barchart that looks like this: How can the total duration of these modeling methods be constructed? It would also be nice to know the duration of the combinations such that the only visible combination in this small subset happens to be Modeling Code "MA" paired with Discourse Code "C" and this occurs for 26 - 21 = 5 minutes. Thank you.
UPDATED SOLUTION df %>% mutate(dur = lead(Time_Processed) - Time_Processed) %>% replace_na(list(dur = 0)) %>% group_by(`Modeling Code`) %>% summarise(tot_time = sum(dur)) (^ Thanks to Nick DiQuattro) PREVIOUS SOLUTION Here's one solution that creates a new variable, mcode_grp, which keeps track of discrete groupings of the same Modeling Code. It's not particularly pretty - it requires looping over each row in df - but it works. First, rename columns for ease of reference: df <- df %>% rename(m_code = `Modeling Code`, d_code = `Discourse Code`) We'll update df with a few extra variables. - lead_time_proc gives us the Time_Processed value for the next row in df, which we'll need when computing the total amount of time for each m_code batch - row_n for keeping track of row number in our iteration - mcode_grp is the unique label for each m_code batch df <- df %>% mutate(lead_time_proc = lead(Time_Processed), row_n = row_number(), mcode_grp = "") Next, we need a way to keep track of when we've hit a new batch of a given m_code value. One way is to keep a counter for each m_code, and increment it whenever a new batch is reached. Then we can label all the rows for that m_code batch as belonging to the same time window. mcode_ct <- df %>% group_by(m_code) %>% summarise(ct = 0) %>% mutate(m_code = as.character(m_code)) This is the ugliest part. We loop over every row in df, and check to see if we've reached a new m_code. If so, we update accordingly, and register a value for mcode_grp for each row. mc <- "" for (i in 1:nrow(df)) { current_mc <- df$m_code[i] if (current_mc != mc) { mc <- current_mc mcode_ct <- mcode_ct %>% mutate(ct = ifelse(m_code == mc, ct + 1, ct)) current_grp <- mcode_ct %>% filter(m_code == mc) %>% select(ct) %>% pull() } df <- df %>% mutate(mcode_grp = ifelse(row_n == i, current_grp, mcode_grp)) } Finally, group_by m_code and mcode_grp, compute the duration for each batch, and then sum over m_code values. df %>% group_by(m_code, mcode_grp) %>% summarise(start_time = min(Time_Processed), end_time = max(lead_time_proc)) %>% mutate(total_time = end_time - start_time) %>% group_by(m_code) %>% summarise(total_time = sum(total_time)) %>% replace_na(list(total_time=0)) Output: # A tibble: 4 x 2 m_code total_time <fct> <dbl> 1 MA 12. 2 OFF 1. 3 P 0. 4 V 5. For any dplyr/tidyverse experts out there, I'd love tips on how to accomplish more of this without resorting to loops and counters!
R: Multiple bar plots of mean value vs. month vs. genre [closed]
Closed. This question needs debugging details. It is not currently accepting answers. Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question. Closed 6 years ago. Improve this question I have the following data-frame, where variable is 10 different genre categories of movies, eg. drama, comedy etc. > head(grossGenreMonthLong) Gross ReleasedMonth variable value 5 33508485 2 drama 1 6 67192859 2 drama 1 8 37865 4 drama 1 9 76665507 1 drama 1 10 221594911 2 drama 1 12 446438 2 drama 1 Reproducible dataframe: dput(head(grossGenreMonthLong)) structure(list(Gross = c(33508485, 67192859, 37865, 76665507, 221594911, 446438), ReleasedMonth = c(2, 2, 4, 1, 2, 2), variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("drama", "comedy", "short", "romance", "action", "crime", "thriller", "documentary", "adventure", "animation" ), class = "factor"), value = c(1, 1, 1, 1, 1, 1)), .Names = c("Gross", "ReleasedMonth", "variable", "value"), row.names = c(5L, 6L, 8L, 9L, 10L, 12L), class = "data.frame") I would like to calculate the mean gross vs. month for each of the 10 genres and plot them in separate bar charts using facets (varying by genre). In other words, what's a quick way to plot 10 bar charts of mean gross vs. month for each of the 10 genres?
You should provide a reproducible example to make it easier for us to help you. dput(my.dataframe) is one way to do it, or you can generate an example dataframe like below. Since you haven't given us a reproducible example, I'm going to put on my telepathy hat and assume the "variable" column in your screenshot is the genre. n = 100 movies <- data.frame( genre=sample(letters[1:10], n, replace=T), gross=runif(n, min=1, max=1e7), month=sample(12, n, replace=T) ) head(movies) # genre gross month # 1 e 5545765.4 1 # 2 f 3240897.3 3 # 3 f 1438741.9 5 # 4 h 9101261.0 6 # 5 h 926170.8 7 # 6 f 2750921.9 1 (My genres are 'a', 'b', etc). To do a plot of average gross per month, you will need to calculate average gross per month. One such way to do so is using the plyr package (there is also data.table, dplyr, ...) library(plyr) monthly.avg.gross <- ddply(movies, # the input dataframe .(genre, month), # group by these summarize, avgGross=mean(gross)) # do this. The dataframe monthly.avg.gross now has one row per (month, genre) with a column avgGross that has the average gross in that (month, genre). Now it's a matter of plotting. You have hinted at "facet" so I assume you're using ggplot. library(ggplot2) ggplot(monthly.avg.gross, aes(x=month, y=avgGross)) + geom_point() + facet_wrap(~ genre) You can do stuff like add month labels and treat month as a factor instead of a number like here, but that's peripheral to your question.
Thank you very much #mathematical.coffee. I was able to adapt your answer to produce the appropriate bar charts. meanGrossGenreMonth = ddply(grossGenreMonthLong, .(ReleasedMonth, variable), summarise, mean.Gross = mean(Gross, na.rm = TRUE)) # plot bar plots with facets ggplot(meanGrossGenreMonth, aes(x = factor(ReleasedMonth), y=mean.Gross)) + geom_bar(stat = "identity") + facet_wrap(~ variable) +ylab("mean Gross ($)") + xlab("Month") +ggtitle("Mean gross revenue vs. month released by Genre")
Convert data frame to spatial lines data frame in R with x,y x,y coordintates
I have a data frame in R, one of the columns contains the coordinates for points along a line in the form: x,y x,y x,y x,y So the whole data frame looks like id dist speed coord 1 45 6 1.294832,54.610240 -1.294883,54.610080 -1.294262,54.6482757 2 23 34 2.788732,34.787940 6.294883,24.567080 -5.564262,-45.7676757 I would like to convert this to a spatial lines data frame, and I assume that the fist step would be to separate the coordinates into two columns in the from: x, x, x, x y, y, y, y But I am unsure how to proceed. EDIT For those requesting it a dput of the actual file > finalsub <- final[final$rid <3,] > dput(finalsub) structure(list(rid = c(1, 2), start_id = c(1L, 1L), start_code = c("E02002536", "E02002536"), end_id = c(106L, 106L), end_code = c("E02006909", "E02006909"), strategy = c("fastest", "quietest"), distance = c(12655L, 12909L), time_seconds = c(2921L, 3422L), calories = c(211L, 201L ), document.id = c(1L, 1L), array.index = 1:2, start = c("Geranium Close", "Geranium Close"), finish = c("Hylton Road", "Hylton Road"), startBearing = c(0, 0), startSpeed = c(0, 0), start_longitude = c(-1.294832, -1.294832), start_latitude = c(54.610241, 54.610241), finish_longitude = c(-1.249478, -1.249478), finish_latitude = c(54.680691, 54.680691), crow_fly_distance = c(8362, 8362), event = c("depart", "depart"), whence = c(1473171787, 1473171787), speed = c(20, 20), itinerary = c(419956, 419957 ), clientRouteId = c(0, 0), plan = c("fastest", "quietest" ), note = c("", ""), length = c(12655, 12909), time = c(2921, 3422), busynance = c(42172, 17242), quietness = c(30, 75), signalledJunctions = c(3, 4), signalledCrossings = c(2, 0 ), west = c(-1.300074, -1.294883), south = c(54.610006, 54.609851 ), east = c(-1.232447, -1.232447), north = c(54.683814, 54.683814 ), name = c("Geranium Close to Hylton Road", "Geranium Close to Hylton Road" ), walk = c(0, 0), leaving = c("2016-09-06 15:23:07", "2016-09-06 15:23:07" ), arriving = c("2016-09-06 16:11:48", "2016-09-06 16:20:09" ), coordinates = c("-1.294832,54.610240 -1.294883,54.610080 -1.294262,54.610016 -1.294141,54.610006 -1.293710,54.610038 -1.293726,54.610142 -1.293742,54.610247 -1.293510,54.610262 -1.293368,54.610258 -1.292816,54.610195 -1.292489,54.610152 -1.292298,54.610667 -1.292205,54.610951 -1.292182,54.611063 -1.292183,54.611153 -1.292239,54.611341 -1.292305,54.611447 -1.292375,54.611534 -1.292494,54.611639 -1.292739,54.611830 -1.292909,54.611980 -1.293010,54.612107 -1.293111,54.612262 -1.293192,54.612423 -1.293235,54.612546 -1.293267,54.612684 -1.293279,54.612818 -1.293510,54.612813 -1.293732,54.612790 -1.294324,54.612691 -1.295086,54.612568 -1.295313,54.612539 -1.295379,54.612543 -1.295889,54.612645 -1.295945,54.612648 -1.296006,54.612642 -1.297154,54.612414 -1.297502,54.612895 -1.297733,54.612847 -1.297990,54.612796 -1.298292,54.612747 -1.298515,54.612727 -1.299088,54.612681 -1.299564,54.612669 -1.299798,54.612663 -1.300006,54.612660 -1.300057,54.612809 -1.300056,54.613335 -1.300071,54.613693 -1.300074,54.614044 -1.300042,54.614482 -1.300015,54.614786 -1.299947,54.615220 -1.299907,54.615394 -1.299854,54.615644 -1.299730,54.616048 -1.299495,54.616700 -1.299196,54.617347 -1.298236,54.619313 -1.298010,54.619762 -1.297703,54.620418 -1.297520,54.620831 -1.297169,54.621690 -1.297061,54.621981 -1.296416,54.623873 -1.296310,54.624308 -1.296225,54.624888 -1.296215,54.625286 -1.296220,54.625546 -1.296241,54.625803 -1.296268,54.625913 -1.296323,54.626011 -1.296397,54.626096 -1.296540,54.626190 -1.296719,54.626323 -1.296893,54.626433 -1.297042,54.626589 -1.297111,54.626710 -1.297122,54.626825 -1.297110,54.626948 -1.297058,54.627052 -1.296961,54.627172 -1.296861,54.627258 -1.296760,54.627325 -1.296603,54.627397 -1.296491,54.627438 -1.296338,54.627472 -1.296154,54.627496 -1.295966,54.627513 -1.295746,54.627526 -1.295618,54.627522 -1.295421,54.627510 -1.295197,54.627466 -1.295102,54.627436 -1.294832,54.627376 -1.294665,54.627355 -1.294502,54.627350 -1.294331,54.627366 -1.294014,54.627415 -1.293557,54.627499 -1.293001,54.627611 -1.292613,54.627701 -1.291836,54.627902 -1.291248,54.628076 -1.290635,54.628269 -1.290142,54.628446 -1.289621,54.628649 -1.289031,54.628903 -1.288538,54.629137 -1.288132,54.629350 -1.287681,54.629604 -1.287262,54.629862 -1.286866,54.630124 -1.286103,54.630643 -1.285748,54.630868 -1.285396,54.631082 -1.284974,54.631322 -1.284541,54.631540 -1.284101,54.631752 -1.283390,54.632048 -1.282592,54.632385 -1.282161,54.632566 -1.281762,54.632734 -1.281111,54.633011 -1.280519,54.633274 -1.280009,54.633501 -1.279834,54.633579 -1.279392,54.633781 -1.278434,54.634210 -1.277896,54.634457 -1.276936,54.634898 -1.276210,54.635223 -1.275594,54.635485 -1.274596,54.635891 -1.273929,54.636180 -1.273392,54.636428 -1.272814,54.636726 -1.271627,54.637353 -1.271084,54.637605 -1.270550,54.637825 -1.269916,54.638054 -1.269389,54.638222 -1.268821,54.638383 -1.268165,54.638555 -1.265953,54.639131 -1.263766,54.639693 -1.262790,54.639953 -1.262005,54.640154 -1.261340,54.640329 -1.260658,54.640532 -1.260123,54.640712 -1.259489,54.640957 -1.258639,54.641312 -1.258010,54.641554 -1.257141,54.641861 -1.255996,54.642247 -1.254429,54.642787 -1.253524,54.643099 -1.252790,54.643343 -1.251636,54.643727 -1.250900,54.643982 -1.250258,54.644219 -1.249668,54.644419 -1.249123,54.644623 -1.248778,54.644762 -1.246709,54.645642 -1.244773,54.646492 -1.244140,54.646746 -1.243551,54.646973 -1.242738,54.647233 -1.242353,54.647360 -1.241810,54.647503 -1.241182,54.647642 -1.240373,54.647780 -1.239950,54.647854 -1.239961,54.647890 -1.239435,54.647967 -1.239583,54.649724 -1.239343,54.649878 -1.239011,54.650011 -1.237692,54.650177 -1.236610,54.650296 -1.236417,54.650323 -1.236257,54.650351 -1.236015,54.650414 -1.235833,54.650469 -1.235081,54.650723 -1.234846,54.650805 -1.234312,54.650977 -1.234094,54.651025 -1.233980,54.651044 -1.233308,54.651137 -1.233173,54.651160 -1.233063,54.651200 -1.232967,54.651252 -1.232849,54.651347 -1.232814,54.651423 -1.232810,54.651495 -1.232823,54.651569 -1.232840,54.651893 -1.232827,54.652010 -1.232504,54.653177 -1.232447,54.653500 -1.232451,54.653723 -1.232474,54.653943 -1.232535,54.654301 -1.232590,54.654635 -1.232903,54.654627 -1.232948,54.655599 -1.232982,54.656334 -1.232795,54.656365 -1.233131,54.658020 -1.233336,54.658428 -1.233507,54.658699 -1.233592,54.658803 -1.234197,54.659389 -1.234690,54.659825 -1.234979,54.660119 -1.235153,54.660314 -1.235343,54.660572 -1.235566,54.661037 -1.235656,54.661355 -1.235690,54.661638 -1.235677,54.661902 -1.235677,54.661984 -1.235683,54.663215 -1.235656,54.663632 -1.235639,54.664273 -1.235613,54.664639 -1.235593,54.664822 -1.235566,54.664957 -1.235508,54.665351 -1.235197,54.667327 -1.235120,54.668542 -1.235100,54.668897 -1.235199,54.669535 -1.235358,54.670231 -1.235437,54.670482 -1.235756,54.671241 -1.236144,54.672181 -1.236375,54.672971 -1.236309,54.673562 -1.236286,54.673704 -1.236127,54.674365 -1.235918,54.675272 -1.235827,54.675620 -1.235749,54.675960 -1.235735,54.676152 -1.235740,54.676328 -1.235754,54.676598 -1.235770,54.676987 -1.235771,54.677013 -1.235793,54.677480 -1.235758,54.677760 -1.235607,54.678134 -1.235470,54.678420 -1.235167,54.678875 -1.234263,54.679929 -1.234207,54.680065 -1.234175,54.680201 -1.234204,54.680465 -1.234300,54.681119 -1.234362,54.681549 -1.234427,54.681771 -1.234560,54.682172 -1.234782,54.682824 -1.236530,54.682837 -1.236725,54.682829 -1.237133,54.682813 -1.238813,54.683143 -1.241021,54.683814 -1.241819,54.683771 -1.242854,54.683717 -1.242946,54.683718 -1.243082,54.683716 -1.244694,54.683772 -1.244658,54.683077 -1.245038,54.682805 -1.245047,54.681990 -1.245011,54.681238 -1.245220,54.680975 -1.247056,54.680601 -1.248019,54.680404 -1.249478,54.680691", "-1.294832,54.610240 -1.294883,54.610080 -1.294262,54.610016 -1.294141,54.610006 -1.293710,54.610038 -1.293726,54.610142 -1.293742,54.610247 -1.293510,54.610262 -1.293368,54.610258 -1.292816,54.610195 -1.292489,54.610152 -1.292298,54.610667 -1.292167,54.610651 -1.291371,54.610562 -1.291240,54.610556 -1.291107,54.610564 -1.290983,54.610581 -1.290467,54.610665 -1.290253,54.610690 -1.290017,54.610689 -1.289770,54.610665 -1.289500,54.610620 -1.289281,54.610570 -1.289124,54.610514 -1.288957,54.610440 -1.288611,54.610277 -1.288420,54.610222 -1.287445,54.610110 -1.287259,54.610664 -1.286758,54.610611 -1.285446,54.610462 -1.285308,54.610459 -1.283356,54.610475 -1.283159,54.610475 -1.283156,54.610324 -1.283153,54.610119 -1.282818,54.610118 -1.282560,54.610114 -1.282110,54.610131 -1.281962,54.610153 -1.281788,54.610200 -1.281639,54.610257 -1.281298,54.609964 -1.281196,54.609851 -1.280586,54.610008 -1.280272,54.610054 -1.279816,54.610091 -1.279480,54.610104 -1.279112,54.610121 -1.278953,54.610146 -1.278815,54.610183 -1.278669,54.610225 -1.278524,54.610279 -1.278428,54.610326 -1.278327,54.610377 -1.278042,54.610237 -1.277946,54.610211 -1.277849,54.610204 -1.277454,54.610206 -1.277268,54.610211 -1.276621,54.610222 -1.276217,54.610233 -1.276085,54.610240 -1.275571,54.610315 -1.275426,54.610347 -1.275334,54.610373 -1.275248,54.610417 -1.275204,54.610477 -1.275066,54.610765 -1.274836,54.611248 -1.274811,54.611349 -1.274833,54.611414 -1.274925,54.611607 -1.274953,54.611664 -1.274669,54.611698 -1.272541,54.610433 -1.271290,54.610716 -1.270069,54.611677 -1.269365,54.611847 -1.268450,54.612165 -1.267142,54.612923 -1.266539,54.613386 -1.265920,54.614177 -1.265663,54.614259 -1.264195,54.616131 -1.263730,54.616670 -1.263665,54.616739 -1.263407,54.617051 -1.262407,54.618192 -1.262185,54.618424 -1.262077,54.618537 -1.261506,54.619136 -1.261394,54.619342 -1.261507,54.619520 -1.261799,54.620013 -1.261791,54.620138 -1.261695,54.620233 -1.261342,54.620279 -1.261237,54.620334 -1.261175,54.620442 -1.261128,54.620493 -1.260857,54.620616 -1.260783,54.620697 -1.260729,54.620807 -1.260729,54.620942 -1.260677,54.621042 -1.260600,54.621109 -1.260457,54.621250 -1.260409,54.621298 -1.260364,54.621336 -1.260140,54.621409 -1.260052,54.621475 -1.259959,54.621607 -1.259881,54.621722 -1.259603,54.622326 -1.259445,54.622670 -1.259349,54.623096 -1.259359,54.623266 -1.259490,54.623825 -1.259497,54.623856 -1.259882,54.624563 -1.259894,54.624684 -1.259646,54.624993 -1.259529,54.625176 -1.259093,54.625999 -1.258939,54.626244 -1.258780,54.626429 -1.258157,54.626772 -1.257604,54.627106 -1.256140,54.627787 -1.255933,54.627903 -1.255874,54.627953 -1.255754,54.628092 -1.255576,54.628504 -1.255534,54.628645 -1.255601,54.629176 -1.255572,54.629415 -1.255265,54.630017 -1.255104,54.630209 -1.254200,54.630725 -1.254084,54.630831 -1.254037,54.630915 -1.254018,54.631129 -1.254128,54.631712 -1.254107,54.631961 -1.253979,54.632187 -1.253594,54.632613 -1.253530,54.632749 -1.253501,54.632882 -1.253404,54.633827 -1.253482,54.634447 -1.253483,54.634654 -1.253445,54.634902 -1.253389,54.635040 -1.253275,54.635186 -1.252555,54.635902 -1.252473,54.636063 -1.252330,54.636859 -1.252273,54.637015 -1.252176,54.637175 -1.251977,54.637365 -1.251763,54.637507 -1.251314,54.637755 -1.250800,54.638114 -1.250487,54.638358 -1.250153,54.638709 -1.249968,54.638840 -1.249562,54.639001 -1.248455,54.639365 -1.248124,54.639494 -1.247386,54.639821 -1.246797,54.640013 -1.246610,54.640098 -1.246513,54.640170 -1.246574,54.640279 -1.247014,54.640580 -1.247341,54.641088 -1.247564,54.641365 -1.248256,54.642661 -1.248319,54.642919 -1.247650,54.643048 -1.246829,54.643234 -1.245867,54.643306 -1.245202,54.643286 -1.243595,54.643087 -1.243282,54.643073 -1.242962,54.643076 -1.242041,54.643065 -1.241790,54.643031 -1.241369,54.642888 -1.240421,54.642544 -1.240237,54.642499 -1.240075,54.642497 -1.239926,54.642545 -1.239858,54.642627 -1.239837,54.642696 -1.239909,54.642989 -1.239960,54.643498 -1.239964,54.643532 -1.239983,54.644175 -1.239936,54.644470 -1.239883,54.644807 -1.239842,54.645116 -1.239825,54.645489 -1.239748,54.646014 -1.239733,54.646299 -1.239798,54.646734 -1.239789,54.646896 -1.239705,54.647229 -1.239683,54.647331 -1.239608,54.647496 -1.239566,54.647621 -1.239562,54.647760 -1.239618,54.647903 -1.239421,54.647932 -1.239435,54.647967 -1.239583,54.649724 -1.239343,54.649878 -1.239011,54.650011 -1.237692,54.650177 -1.236610,54.650296 -1.236417,54.650323 -1.236257,54.650351 -1.236015,54.650414 -1.235833,54.650469 -1.235081,54.650723 -1.234846,54.650805 -1.234312,54.650977 -1.234094,54.651025 -1.233980,54.651044 -1.233308,54.651137 -1.233173,54.651160 -1.233063,54.651200 -1.232967,54.651252 -1.232849,54.651347 -1.232814,54.651423 -1.232810,54.651495 -1.232823,54.651569 -1.232840,54.651893 -1.232827,54.652010 -1.232504,54.653177 -1.232447,54.653500 -1.232451,54.653723 -1.232474,54.653943 -1.232535,54.654301 -1.232590,54.654635 -1.232619,54.654960 -1.232723,54.655606 -1.232795,54.656365 -1.233131,54.658020 -1.233336,54.658428 -1.233507,54.658699 -1.233592,54.658803 -1.234197,54.659389 -1.234690,54.659825 -1.234979,54.660119 -1.235153,54.660314 -1.235343,54.660572 -1.235566,54.661037 -1.235656,54.661355 -1.235690,54.661638 -1.235677,54.661902 -1.235677,54.661984 -1.235683,54.663215 -1.235656,54.663632 -1.235639,54.664273 -1.235613,54.664639 -1.235593,54.664822 -1.235566,54.664957 -1.235508,54.665351 -1.235197,54.667327 -1.235120,54.668542 -1.235100,54.668897 -1.235199,54.669535 -1.235358,54.670231 -1.235437,54.670482 -1.235756,54.671241 -1.236144,54.672181 -1.236375,54.672971 -1.236309,54.673562 -1.236286,54.673704 -1.236127,54.674365 -1.235918,54.675272 -1.235827,54.675620 -1.235749,54.675960 -1.235735,54.676152 -1.235740,54.676328 -1.235754,54.676598 -1.235770,54.676987 -1.235771,54.677013 -1.235793,54.677480 -1.235758,54.677760 -1.235607,54.678134 -1.235470,54.678420 -1.235167,54.678875 -1.234263,54.679929 -1.234207,54.680065 -1.234175,54.680201 -1.234204,54.680465 -1.234300,54.681119 -1.234362,54.681549 -1.234427,54.681771 -1.234560,54.682172 -1.234782,54.682824 -1.236530,54.682837 -1.236725,54.682829 -1.237133,54.682813 -1.238813,54.683143 -1.241021,54.683814 -1.241819,54.683771 -1.242854,54.683717 -1.242946,54.683718 -1.243082,54.683716 -1.244694,54.683772 -1.244658,54.683077 -1.245038,54.682805 -1.245047,54.681990 -1.245011,54.681238 -1.245220,54.680975 -1.247056,54.680601 -1.248019,54.680404 -1.249478,54.680691" ), grammesCO2saved = c(2359, 2406), calories = c(211, 201 ), type = c("route", "route")), .Names = c("rid", "start_id", "start_code", "end_id", "end_code", "strategy", "distance", "time_seconds", "calories", "document.id", "array.index", "start", "finish", "startBearing", "startSpeed", "start_longitude", "start_latitude", "finish_longitude", "finish_latitude", "crow_fly_distance", "event", "whence", "speed", "itinerary", "clientRouteId", "plan", "note", "length", "time", "busynance", "quietness", "signalledJunctions", "signalledCrossings", "west", "south", "east", "north", "name", "walk", "leaving", "arriving", "coordinates", "grammesCO2saved", "calories", "type"), row.names = 1:2, class = "data.frame") >
I believe what you want to end up with is a column in your data frame that for each row is a list (or data frame) with x.coord and y.coord columns. To achieve that, we can use unnest and nest from tidyr with dplyr: library(dplyr) library(tidyr) result <- finalsub %>% mutate(coordinates = strsplit(coordinates,split=" ",fixed=TRUE)) %>% unnest(coordinates) %>% mutate(coordinates = strsplit(coordinates,split=",",fixed=TRUE), x.coord = as.numeric(unlist(coordinates)[c(TRUE,FALSE)]), y.coord = as.numeric(unlist(coordinates)[c(FALSE,TRUE)])) %>% select(-coordinates) %>% nest(x.coord,y.coord,.key=coordinates) Notes: The first mutate splits the character vector in your coordinates column by " " to separate each coordinate x,y resulting in a list of these. unnest separates this list into rows. In the second mutate, we first split each coordinate x,y, this time by "," to separate each coordinate into x and y. Then we create separate x.coord and y.coord columns to hold these. Note the conversion to numeric here. Finally, we use nest to collect the x.coord and y.coord columns as a list under the column named coordinates. Note that we first have to remove the original coordinates column. The result using your dput data, printing only the coordinates column: print(result$coordinates) ##[[1]] ### A tibble: 284 x 2 ## x.coord y.coord ## <dbl> <dbl> ##1 -1.294832 54.61024 ##2 -1.294883 54.61008 ##3 -1.294262 54.61002 ##4 -1.294141 54.61001 ##5 -1.293710 54.61004 ##6 -1.293726 54.61014 ##7 -1.293742 54.61025 ##8 -1.293510 54.61026 ##9 -1.293368 54.61026 ##10 -1.292816 54.61019 ### ... with 274 more rows ## ##[[2]] ### A tibble: 322 x 2 ## x.coord y.coord ## <dbl> <dbl> ##1 -1.294832 54.61024 ##2 -1.294883 54.61008 ##3 -1.294262 54.61002 ##4 -1.294141 54.61001 ##5 -1.293710 54.61004 ##6 -1.293726 54.61014 ##7 -1.293742 54.61025 ##8 -1.293510 54.61026 ##9 -1.293368 54.61026 ##10 -1.292816 54.61019 ### ... with 312 more rows
df1 <- data.frame(id= c(1,2), dist =c(45,23), speed = c(6,24) ,do.call(rbind,strsplit(df$cord,split = " "))) library(reshape2) df1 <- melt(df1,id=c("id","dist","speed")) df2<- data.frame(do.call(rbind,strsplit(df1$value, split=","))) df1$value <- NULL df1 <- cbind(df1,df2) names(df1)[5:6] <- c("x","y") id dist speed variable x y 1 1 45 6 X1 1.294832 54.610240 2 2 23 24 X1 2.788732 34.787940 3 1 45 6 X2 -1.294883 54.610080 4 2 23 24 X2 6.294883 24.567080 5 1 45 6 X3 -1.294262 54.6482757 6 2 23 24 X3 -5.564262 -45.7676757
Comparing pairs of rows in a list of data frames
I have a list that's 1314 element long. Each element is a data frame consisting of two rows and four columns. Game.ID Team Points Victory 1 201210300CLE CLE 94 0 2 201210300CLE WAS 84 0 I would like to use the lapply function to compare points for each team in each game, and change Victory to 1 for the winning team. I'm trying to use this function: test_vic <- lapply(all_games, function(x) {if (x[1,3] > x[2,3]) {x[1,4] = 1}}) But the result it produces is a list 1314 elements long with just the Game ID and either a 1 or a null, a la: $`201306200MIA` [1] 1 $`201306160SAS` NULL How can I fix my code so that each data frame maintains its shape. (I'm guessing solving the null part involves if-else, but I need to figure out the right syntax.) Thanks.
Try lapply(all_games, function(x) {x$Victory[which.max(x$Points)] <- 1; x}) Or another option would be to convert the list to data.table by using rbindlist and then do the conversion library(data.table) rbindlist(all_games)[,Victory:= +(Points==max(Points)) ,Game.ID][] data all_games <- list(structure(list(Game.ID = c("201210300CLE", "201210300CLE" ), Team = c("CLE", "WAS"), Points = c(94L, 84L), Victory = c(0L, 0L)), .Names = c("Game.ID", "Team", "Points", "Victory"), class = "data.frame", row.names = c("1", "2")), structure(list(Game.ID = c("201210300CME", "201210300CME" ), Team = c("CLE", "WAS"), Points = c(90, 92), Victory = c(0L, 0L)), .Names = c("Game.ID", "Team", "Points", "Victory"), row.names = c("1", "2"), class = "data.frame"))
You could try dplyr: library(dplyr) all_games %>% bind_rows() %>% group_by(Game.ID) %>% mutate(Victory = row_number(Points)-1) Which gives: #Source: local data frame [4 x 4] #Groups: Game.ID # # Game.ID Team Points Victory #1 201210300CLE CLE 94 1 #2 201210300CLE WAS 84 0 #3 201210300CME CLE 90 0 #4 201210300CME WAS 92 1