This question already has answers here:
Convert integer to class Date
(3 answers)
Closed 2 years ago.
I am trying to read a date column from a dataframe, which is stored as a string |(see column 'DateString'. This is how my data looks like:
X. Date_String ASIN Stars positive_rating
1 0 20150430 B00GKKI4IE 5 0
2 1 20150430 B00GKKI4IE 5 0
3 2 20150430 B00GKKI4IE 5 0
4 3 20150429 B00GKKI4IE 5 0
5 4 20150428 B00GKKI4IE 5 0
6 5 20150428 B00GKKI4IE 5 0
and this is what I am using to format this column as Date
data$date.of.review <- as.Date(data$Date_String, "%Y%m%d")
and getting an error message
Error in charToDate(x) :
character string is not in a standard unambiguous format
any ideas how to solve it? thanks
Here is the structure of the dataframe:
structure(list(X. = 0:5, Date_String = c(20150430L, 20150430L,
20150430L, 20150429L, 20150428L, 20150428L), ASIN = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("B00GKKI4IE", "B00I4OBXWI", "B00IB17BFM",
"B00IN2WD5C", "B00J58F0IA", "B00K7NCS9G", "B00KJEZIBS", "B00KZER5GS",
"B00MK39H68", "B00O1GTTWY"), class = "factor"), Stars = c(5L,
5L, 5L, 5L, 5L, 5L), positive_rating = c(0, 0, 0, 0, 0, 0)), .Names = c("X.",
"Date_String", "ASIN", "Stars", "positive_rating"), row.names = c(NA,
6L), class = "data.frame")
The class is integer if you wrap the variable in as.character as.date can read it.
sapply(data, class)
data$date.of.review <- as.Date(as.character(data$Date_String), "%Y%m%d")
data
Related
This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 1 year ago.
The dataframe above is the an example of the original one. I am trying to create following new dataframe based on this original one:
Thank you!
We can use xtabs from base R
xtabs(abundance ~ StationCode + SpeciesCode, df1)
-output
SpeciesCode
StationCode AME BCF BKB CAP
O-01 2 1 5 0
O-02 1 0 1 1
O-03 0 4 2 0
O-04 0 0 8 1
data
df1 <- structure(list(SpeciesCode = c("AME", "AME", "BCF", "BCF", "CAP",
"CAP", "BKB", "BKB", "BKB", "BKB"), StationCode = c("O-01", "O-02",
"O-03", "O-01", "O-04", "O-02", "O-04", "O-01", "O-02", "O-03"
), abundance = c(2L, 1L, 4L, 1L, 1L, 1L, 8L, 5L, 1L, 2L)),
class = "data.frame", row.names = c(NA,
-10L))
This question already has answers here:
Convert continuous numeric values to discrete categories defined by intervals
(2 answers)
Cut by Defined Interval
(2 answers)
Closed 2 years ago.
I have the data that has numeric variable A. I want to make groups for A to have something like B.
data <- structure(list(A = c(0, 0, 0, 0, 1, 2, 9, 15, 30, 100, 0.2, 0.003,
95, 18), B = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 4L, 10L, 1L, 1L,
10L, 2L)), class = "data.frame", row.names = c(NA, -14L))
Are you trying to create B from A? it looks like you want something like
data$A %/% 10
[1] 0 0 0 0 0 0 0 1 3 10 0 0 9 1
or
(data$A %/% 10)+1
[1] 1 1 1 1 1 1 1 2 4 11 1 1 10 2
This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 3 years ago.
I am trying to remove some rows of my data by adding them to a different row, in the form of another column. Is there a way I can group rows together by a certain variable?
I have tried using group_by statement in the dplyr package, but it does not seem to solve my issue.
library(dplyr)
late <- read.csv(file.choose())
late <- group_by(late, state, add = FALSE)
The data set I have (named "late") now is in this form:
ontime state count
0 AL 1
1 AL 44
null AL 3
0 AR 5
1 AR 50
...
But I would like it to be:
state count0 count1 countnull
AL 1 44 3
AR 5 50 null
...
Ultimately, I want to calculate count0/count1 for each state. So if there is a better way of going about this, I would be open to any suggestions.
You could do this with dcast() from the reshape2 package
library(reshape2)
df = data.frame(
ontime = c(0,1,NA,0,1),
state = c("AL","AL","AL","AR","AR"),
count = c(1,44,3,5,50)
)
dcast(df,state~ontime,value=count)
With spread:
library(dplyr)
library(tidyr)
df %>%
mutate(ontime = paste0('count', ontime)) %>%
spread(ontime, count)
Output:
state count0 count1 countnull
1 AL 1 44 3
2 AR 5 50 NA
Data:
df <- structure(list(ontime = structure(c(1L, 2L, 3L, 1L, 2L), .Label = c("0",
"1", "null"), class = "factor"), state = structure(c(1L, 1L,
1L, 2L, 2L), .Label = c("AL", "AR"), class = "factor"), count = c(1L,
44L, 3L, 5L, 50L)), class = "data.frame", row.names = c(NA, -5L
))
I am working with time series data and want to calculate the difference between the first and final measurement times, and put these numbers into a new and simpler dataframe. For example, for this dataframe
structure(list(time = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), indv = c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L), value = c(1L, 3L, 5L, 8L, 3L, 4L,
7L, 8L)), .Names = c("time", "indv", "value"), class = "data.frame", row.names = c(NA,
-8L))
or
time indv value
1 1 1
2 1 3
3 1 5
4 1 8
1 2 3
2 2 4
3 2 7
4 2 8
I can use this code
ddply(test, .(indv), transform, value_change = (value[length(value)] - value[1]), time_change = (time[length(time)] - time[1]))
to give
time indv value value_change time_change
1 1 1 7 3
2 1 3 7 3
3 1 5 7 3
4 1 8 7 3
1 2 3 5 3
2 2 4 5 3
3 2 7 5 3
4 2 8 5 3
However, I would like to eliminate the redundant rows and make a new and simpler dataframe like this
indv time_change value_change
1 3 7
2 3 5
Does anyone have any clever way to do this?
Thanks!
Just replace transform with summarize. You can also make your code a little prettier by using head and tail:
ddply(test, .(indv), summarize,
value_change = tail(value, 1) - head(value, 1),
time_change = tail(time, 1) - head(time, 1))
For maximum readability, write a function:
change <- function(x) tail(x, 1) - head(x, 1)
ddply(test, .(indv), summarize, value_change = change(value),
time_change = change(time))
This question already has an answer here:
lagging panel data with data.table
(1 answer)
Closed 9 years ago.
I have the following data.table:
AssetNumber StartDate ActionNumber PerviousActionNumber
1 20090602 1
1 20090626 3
1 20090721 5
1 20091008 1
2 20090604 3
2 20090628 2
2 20090723 1
2 20091010 2
2 20091018 3
Load the dataset with:
set <- structure(list(AssetNumber = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),
StartDate = c(20090602L, 20090626L, 20090721L, 20091008L,
20090604L, 20090628L, 20090723L, 20091010L, 20091018L), ActionNumber = c(1L,
3L, 5L, 1L, 3L, 2L, 1L, 2L, 3L), PerviousAction. = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("AssetNumber", "StartDate",
"ActionNumber", "PerviousActionNumber"), class = "data.frame", row.names = c(NA,
-9L))
I am seeking to add the "ActionNumber" from the previous record in the column "PreviousActionNumber". Does anybody can advise me on what function to use to accomplish this? I expect to get the following result:
AssetNumber StartDate ActionNumber PerviousActionNumber
1 20090602 1
1 20090626 3 1
1 20090721 5 3
1 20091008 1 5
2 20090604 3 1
2 20090628 2 3
2 20090723 1 2
2 20091010 2 1
2 20091018 3 2
More in general, is there a specific package in R that includes inter-record functions like this?
You're looking for an assignment by reference using :=. This is pretty basic. I suggest you read the data.table manual/documentation.
# thanks to #agstudy
dt[, previousActionNumber := c(NA, head(ActionNumber, -1))]