I am at a loss! I am trying to sort my data by business_id. Each id has several dates associated with it. I am trying to create a new variable that shows the time in days between the first and last date associated with a business_id. Such that
row.names business_id Days
1 x8453 DxUn-ukNL27GOuwjnFGFKA 876
The data currently is structured as:
row.names date business_id
1 X27038 2012-04-21 FV0BkoGOd3Yu_eJnXY15ZA
2 X60951 2012-05-14 Trar_9cFAj6wXiXfKfEqZA
3 X60462 2011-10-05 DxUn-ukNL27GOuwjnFGFKA
4 X2078 2010-12-19 PlcCjELzSI3SqX7mPF5cCw
5 X166883 2011-09-29 pF7uRzygyZsltbmVpjIyvw
6 X177828 2010-09-19 XkNQVTkCEzBrq7OlRHI11Q
7 X128628 2012-05-05 6TWRuHn24DL6vnW8Uyu4Vw
8 X202882 2011-12-10 Xo9Im4LmIhQrzJcO4R3ZbA
9 X64569 2012-02-07 Z67obTep38V9HMtA10yu5A
10 X14667 2009-07-18 xsSnuGCCJD4OgWnOZ0zB4A
11 X17432 2012-08-11 XkNQVTkCEzBrq7OlRHI11Q
Thanks in advance!
Update:
str(data)
'data.frame': 2299 obs. of 2 variables:
$ date :List of 2299
..$ X2736 : chr "2012-05-29"
..$ X160403: chr "2011-08-29"
..$ X19897 : chr "2010-09-27"
..$ X44519 : chr "2012-05-22"
..$ X75910 : chr "2012-10-22"
..$ X13052 : chr "2010-07-14"
$ business_id:List of 2299
..$ X2736 : chr "EFJAVVBQQqftuqY5Wb3WtQ"
..$ X160403: chr "YDlk9buwF8JQE3JgQgraOw"
..$ X19897 : chr "sc1UacpE3cVNJueMdXiCyA"
..$ X44519 : chr "VY_tvNUCCXGXQeSvJl757Q"
..$ X75910 : chr "fowXs9zAM0TQhSfSkPeVuw"
..$ X13052 : chr "xM5F0cLAlKWoB8rOgt5ZOw"
..$ X87807 : chr "nLL0sjLdZ13YdvhXKyss7A"
Edit now that the OP has provided the structure:
Your data is structured quite oddly. A usual structure in R is a data.frame, which is technically a list of vectors where the vectors are all the same length. In your case, you have a list of two (named) lists.
Store the somewhere else for the time being:
old.names <- names(x[[1]])
Then turn the data into an ordinary data.frame, using the handy unlist() function:
x$date <- unlist(x$date)
x$business_id <- unlist(x$business_id)
Use str(x) to see the difference. The names can go back in now, and it's also a good time to turn your "date" column from a character into a proper date, and sort by date order.
x$old.names <- old.names
x$date <- as.POSIXct(x$date)
x <- x[order(x$date), ]
My original answer should now work.
Original answer:
Like agstudy I'd use the plyr package, but if you have the "date" column in a date format and want to keep it that way, you could try:
require(plyr)
ddply(x, "business_id", summarise
, duration = difftime(max(date), min(date), units = "days")
, old.names = old.names[1])
This also gives you flexibility on the units.
With your example data, sorted by date ascending with dat <- dat[order(dat$date), ] means that old.names[1] gives you the name of the earliest row, and old.names[length(old.names)] would give you the name of the most recent row, but I don't know whether that is reliable given the magic inside ddply.
Further edit:
I only showed how to handle the names because they're in your example. They look as though they were originally column headers from imported data, and R has prepended "X" to them because names aren't allowed to begin with numerals.
Using plyr package:
ddply(dat,.(business_id),function(x)
if(length(x$date)>1)
diff(range(as.POSIXct(x$date)))
else 0)
business_id V1
1 6TWRuHn24DL6vnW8Uyu4Vw 0
2 DxUn-ukNL27GOuwjnFGFKA 0
3 FV0BkoGOd3Yu_eJnXY15ZA 0
4 pF7uRzygyZsltbmVpjIyvw 0
5 PlcCjELzSI3SqX7mPF5cCw 0
6 Trar_9cFAj6wXiXfKfEqZA 0
7 XkNQVTkCEzBrq7OlRHI11Q 692
8 Xo9Im4LmIhQrzJcO4R3ZbA 0
9 xsSnuGCCJD4OgWnOZ0zB4A 0
10 Z67obTep38V9HMtA10yu5A 0
Related
I have a data.frame called StockWeights. The structure of the data.frame is as follows:
'data.frame': 3 obs. of 6 variables:
$ Id : chr "159347" "161863" "22646"
$ ISIN : chr "DK0061156759" "DK0061533726" "DK0060681468"
$ $id : chr "21" "22" "23"
$ Name : chr "159347" "161863" "22646"
$ SumPeriod:'data.frame': 3 obs. of 27 variables:
..$ AccPeriodBasTwrAtMarketPrice : num 0.0969 0.538 -0.1071
..$ AccPeriodLocTwrAtMarketPrice : num 0.0969 0.538 -0.1071
..$ BopDate : chr "2022-02-28T00:00:00" "2022-02-28T00:00:00" "2022-02-28T00:00:00"
..$ BopBasHoldingValueAtMarketPrice: num 7592267 5135961 7166816
My question is then: How can I "unlist" this SumPeriod data.frame column and display the BopBasHoldingValueAtMarketPrice column together with the Id and ISIN columns? What I have done so far is to use the pluck function in the purrr package as such:
StockWeights %>%
pluck('SumPeriod') %>%
select("EopBasHoldingValueAtMarketPrice")
Which only gives me the "EopBasHoldingValueAtMarketPrice":
'data.frame': 3 obs. of 1 variable:
$ EopBasHoldingValueAtMarketPrice: num 7599626 5163591 7159142
But I can't find a way to get theese three values together with the corresponding "Id" and "ISIN" in the original data.frame. Anyone got an idea how to achieve this? Sorry for not producing a reproducible code. The data I am looking at is made from an API call and I am having some trouble in recreating it manually. But the end goal is to get a data.frame that looks like:
df = data.frame(
Id = c("159347", "161863", "22646"),
ISIN = c("DK0061156759", "DK0061533726", "DK0060681468"),
BopBasHoldingValueAtMarketPrice = c(7592267,5135961,7166816)
)
In R I have a series of lists with incrementing numeric suffixes eg mylist1 , mylist2 , mylist3.
I want to concatenate these , like c(mylist1, mylist2, mylist3)
Is there a shorthand way to manage this?
I think you are trying to create a list of lists.
You can do it simply by calling:
list(list1, list2, list3)
If you have many lists with a similar name pattern, you can select use mget to GET all objects whose names have a specific pattern, (ls(pattern=x)).
data
list8<-list(1,2)
list9<-list(3,4)
list10<-list(5,6)
#Included the lists with indexes 8:10 so that the importance of ordering by `parse_number(ls)` is highlighted. Without the `parse_number` step, the list would be sorted by names, with a different order
Answer
list_of_lists<-mget(ls(pattern = 'list\\d+')[order(parse_number(ls(pattern = 'list\\d+')))])
> str(list_of_lists)
List of 3
$ mylist8:List of 2
..$ : num 1
..$ : num 2
$ mylist9:List of 2
..$ : num 3
..$ : num 4
$ mylist10:List of 2
..$ : num 5
..$ : num 6
I am learning sf in R. Since I like data.table very much, I though I could use both. However, it seems that sf object deriving from data.table cannot use methods in data.table any more. Following is an example:
First I generate a very simple data.table and make it to a sf object. So far so good.
> dfr <- data.table(id = c("hwy1", "hwy2"),
+ cars_per_hour = c(78, 22),
+ lat = c(1, 2),
+ lon = c(3, 4))
> my_sf <- st_as_sf(dfr , coords = c("lon", "lat"))
Then I check the structure of the my_sf. It is an sf object, a data.table and a data.frame.
> str(my_sf)
Classes ‘sf’, ‘data.table’ and 'data.frame': 2 obs. of 3 variables:
$ id : chr "hwy1" "hwy2"
$ cars_per_hour: num 78 22
$ geometry :sfc_POINT of length 2; first list element: 'XY' num 3 1
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
..- attr(*, "names")= chr "id" "cars_per_hour"
Then I tried some arbitary function unique, and it does not work. Actually this my_sf does not work as data.table at all.
> my_sf[, unique(id)]
Error in unique(id) : object 'id' not found
Does anyone know the reason for it? Is it not possible to use data.table for sf?
My guess is the function st_as_sf has destroyed .internal.selfref attribute turning back the data.table into data.frame although the class name has been preserved.
> str(dfr)
#Classes ‘data.table’ and 'data.frame': 2 obs. of 4 variables:
#$ id : chr "hwy1" "hwy2"
#$ cars_per_hour: num 78 22
#$ lat : num 1 2
#$ lon : num 3 4
#- attr(*, ".internal.selfref")=<externalptr>
setDT(my_sf) might be enough to turn back the data.frame into a data.table
I want to set an attribute ("full.name") of certain variables in a data frame by subsetting the dataframe and iterating over a character vector. I tried two solutions but neither works (varsToPrint is a character vector containing the variables, questionLabels is a character vector containing the labels of questions):
Sample data:
jtiPrint <- data.frame(question1 = seq(5), question2 = seq(5), question3=seq(5))
questionLabels <- c("question1Label", "question2Label")
varsToPrint <- c("question1", "question2")
Solution 1:
attrApply <- function(var, label) {
`<-`(attr(var, "full.name"), label)
}
mapply(attrApply, jtiPrint[varsToPrint], questionLabels)
Solution 2:
i <- 1
for (var in jtiPrint[varsToPrint]) {
attr(var, "full.name") <- questionLabels[i]
i <- i + 1
}
Desired output (for e.g. variable 1):
attr(jtiPrint$question1, "full.name")
[1] "question1Label"
The problems seems to be in solution 2 that R sets the attritbute to a new dataframe only containing one variable (the indexed variable). However, I don't understand why solution 1 does not work. Any ideas how to fix either of these two ways?
Solution 1 :
The function is 'attr<-' not '<-'(attr...), also you need to set SIMPLIFY=FALSE (otherwise a matrix is returned instead of a list) and then call as.data.frame :
attrApply <- function(var, label) {
`attr<-`(var, "full.name", label)
}
df <- as.data.frame(mapply(attrApply,jtiPrint[varsToPrint],questionLabels,SIMPLIFY = FALSE))
> str(df)
'data.frame': 5 obs. of 2 variables:
$ question1: atomic 1 2 3 4 5
..- attr(*, "full.name")= chr "question1Label"
$ question2: atomic 1 2 3 4 5
..- attr(*, "full.name")= chr "question2Label"
Solution 2 :
You need to set the attribute on the column of the data.frame, you're setting the attribute on copies of the columns :
for(i in 1:length(varsToPrint)){
attr(jtiPrint[[i]],"full.name") <- questionLabels[i]
}
> str(jtiPrint)
'data.frame': 5 obs. of 3 variables:
$ question1: atomic 1 2 3 4 5
..- attr(*, "full.name")= chr "question1Label"
$ question2: atomic 1 2 3 4 5
..- attr(*, "full.name")= chr "question2Label"
$ question3: int 1 2 3 4 5
Anyway, note that the two approaches lead to a different result. In fact the mapply solution returns a subset of the previous data.frame (so no column 3) while the second approach modifies the existing jtiPrint data.frame.
I would like to save my data train.user (213451 obs. of 20 variables. 2 of the variables are lists) as a csv file.
I use:
write.csv(train.user, "train_user.csv", row.names = FALSE)
but an error occurs
Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol, :
unimplemented type 'list' in 'EncodeElement'
This is how my data train.user looks like. (by using str) (showing only part of it)
'data.frame': 213451 obs. of 20 variables:
$ id : Factor w/ 213451 levels "00023iyk9l","0005ytdols",..: 100523 48039 26485 68504 48956 147281 129610 2144 59779 40826 ...
$ gender : Factor w/ 4 levels "-unknown-","FEMALE",..: 1 3 2 2 1 1 2 2 2 1 ...
$ age :List of 213451
..$ : num NA
..$ : num 38
..$ : num 56
..$ : num 42
..$ : num 41
.. [list output truncated]
It seems like the column age is stored as a list, and write.csv doesn't accept this format. From my naive intuition, I tried to re-store the column as a data frame with following code, but it failed.
train.user$age <- as.data.frame(train.user$age)
Error message:
Error in `$<-.data.frame`(`*tmp*`, "age", value = list(NA_real_. = NA_real_, :
replacement has 1 row, data has 213451
I also tried train.user$age <- data.frame(lapply(train.user$age, unlist)) as suggested in another post, but the same error occurs.
I appreciate any help!
train.user$age <- unlist(train.user$age)
Technically, a data.frame is a list of equal-length vectors, but most functions will assume that all of the columns are atomic vectors and will fail when you try to use a list.
NB: Don't edit an answer into your question.
pacman::p_load(tidyverse)
train.user %>% as_tibble() %>%
mutate(age = map(age,~unlist(.)))