filter timedate column in R [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I am a beginner in R and am stuck with a timedate column in R.
I have a data which have multiple columns and one the column has date and time like
Originaltime magnitude depth
2017-01-10T16:42:23.247Z 4.6 18.8
1963-09-02T23:16:55.510Z 3.767 12
1963-08-29T16:46:25.520Z 3.727 12
I want to filter my data using originaltime column, but use only year for my filter.
I have tried:
b <- year(historiceventstbl$origintime)
head(b)
#[1] 2017 2017 2017 2017 2017 2017
I am not sure, how to go from here.

We can use filter from dplyr
library(dplyr)
library(lubridate)
df1 %>%
filter(year(Originaltime)==1963)
# Originaltime magnitude depth
#1 1963-09-02T23:16:55.510Z 3.767 12
#2 1963-08-29T16:46:25.520Z 3.727 12
Or using subset from base R
subset(df1, year(Originaltime)==1963)
Or with using only base R
subset(df1, sub("^(.{4}).*", "\\1", Originaltime)==1963)
# Originaltime magnitude depth
#2 1963-09-02T23:16:55.510Z 3.767 12
#3 1963-08-29T16:46:25.520Z 3.727 12
It may be better to use Date time functions when manipulating time, so
subset(df1, format(as.POSIXct(Originaltime, format = "%Y-%m-%dT%H:%M:%OS"), "%Y") == 1963)
# Originaltime magnitude depth
#2 1963-09-02T23:16:55.510Z 3.767 12
#3 1963-08-29T16:46:25.520Z 3.727 12

You can also try this with your data with base R only:
df[substring(as.character(df$Originaltime), 1, 4) == 2017,]
# Originaltime magnitude depth
#1 2017-01-10T16:42:23.247Z 4.6 18.8

Related

randomly sampling of dataset to decrease the values in the dataset [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am currently trying to decrease the values in a column randomly according to a given sum.
For example, if the main data is like this;
ID Value
1 4
2 10
3 16
after running the code the sum of Value should be 10 and this need to be done randomly(the decrease for each member should be chosen randomly)
ID Value
1 1
2 8
3 1
Tried several command and library but could not manage it. Still a novice and
Any help would be appreciated!
Thanks
Edit: Sorry I was not clear enough. I would like to assign a new value for each observation smaller than original (randomly). And at the end new sum of value will be equal to 10
Using the sample data
dd <- read.table(text="ID Value
1 4
2 10
3 16", header=TRUE)
and the dplyr + tidyr library, you can do
library(dplyr)
library(tidyr)
dd %>%
mutate(ID=factor(ID)) %>%
uncount(Value) %>%
sample_n(10) %>%
count(ID, name = "Value", .drop=FALSE)
Here we repeat the row once for each Value, then we randomly sample 10 rows, then we count them back up. We turn ID to a factor to make sure IDs with 0 observations are preserved.

Dividing all integers of a column in R [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I am trying to divide all integers in a column with another integer. I have a database with a column that has integers that go above 1*10^20. Because of this my plots are way to big. I need to normalize the data to have a better understanding what is going on. For example, the data that I have:
[x][Day] [Amount]
[1] 1 1 23440100
[2] 2 2 41231020
[3] 3 3 32012010
I am using a data.frame for my own data, so here you have the data frame for the data above
x <- c(1,2,3)
day <- c(1,2,3)
Amount <- c(23440100, 41231020, 32012010)
my.data <- data.frame(x, day, Amount)
I tried using another answer, provided here, but that doesn't seem to work.
The code that I tried:
test <- my.data[, 3]/1000
Hope someone can help me out! Cheers, Chester
I guess you are looking for this?
my.data$Amount <- my.data$Amount/1000
such that
> my.data
x day Amount
1 1 1 23440.10
2 2 2 41231.02
3 3 3 32012.01
Use mutate from dplyr
Since you're using a data.frame, you can use this simple code:
library(dplyr)
mutated.data <- my.data %>%
mutate(Amount = as.integer(Amount / 1000))
> mutated.data
x day Amount
1 1 1 23440.10
2 2 2 41231.02
3 3 3 32012.01
Hope this helps.

How to conneted same dataframe column? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
i want connect column in same dataframe.
for example,
# I have data type is below
region=c("A","B","C")
Q1=c("ads","qwer","zxcv")
Q2=c("poi","lkj","mnb")
temp=data.frame(region, Q1, Q2)
### i want chaged below
region1=c("A","B","C")
Q=c("ads,poi","qwer,lkj","zxcv,mnb")
temp2=data.frame(region1, Q)
How to do it... ?
temp$Q <- apply(temp[-1], 1, toString)
temp[c("Q1", "Q2")] <- NULL
temp
region Q
1 A ads, poi
2 B qwer, lkj
3 C zxcv, mnb
Using base R you can do:
temp$Q <- paste(temp$Q1, temp$Q2, sep=",")
temp <- temp[,c("region", "Q")]
temp
region Q
1 A ads,poi
2 B qwer,lkj
3 C zxcv,mnb
This would be a solution using the mutate function from the dplyr package to create the new column Q by using paste0 to concatenate the columns Q1 and Q2. In the end I just removed the columns Q1 and Q2 by using select with -:
library(dplyr)
temp %>% mutate(Q = paste0(Q1,", ",Q2)) %>% select(-Q1,-Q2)

Aggregate data based on dates [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a dataset that is a similar structure to this:
account_no <- c(1:5, 2, 2 , 3)
interaction_date <- c("1/1/2016","2/5/2016", "3/2/2016", "27/4/2016","11/10/2015", "11/10/2015","11/10/2015","2/5/2016")
interaction_date<- as.Date(b, format = "%d/%m/%Y")
action <- c("a","c","b","c","c","a","a","b")
df <- data.frame(account_no ,interaction_date, action)
df
There are a couple of other attributes associated with each row, but this is the typical structure.
Essentially it is log data, describing interactions of a user (account_no), the time they interacted and the action they took.
I've been told to find underlying trends in the data.
Is there a way I can aggregate the data based on account_no that would give me an insight into the average length in days between interaction dates?
Or some sort of count to see what is the most common action taken on a specific day?
There are about 80,000 rows in the dataset, and there may be a number of actions on the same account on the same day. Is there a way in which I can break this down into something meaningful?
Here's how you can get a sense of the gap between interaction dates:
df$interaction_date <- as.Date(df$interaction_date,'%d/%m/%Y'); ## coerce to Date
df <- df[order(df$interaction_date),]; ## ensure ordered by interaction_date
aggregate(cbind(gap=interaction_date)~account_no,df,function(x) mean(diff(unique(x))));
## account_no gap
## 1 1 NaN
## 2 2 204
## 3 3 89
## 4 4 NaN
## 5 5 NaN
Only accounts 2 and 3 had 2 or more interactions, so the remainder get an invalid result. The gap unit is days between interaction dates.
I added the unique() call to exclude multiple interactions on the same date, since I assumed you wouldn't want those to lower the averages.
Or using data.table
library(data.table)
setDT(df)[, interaction_date := as.IDate(interaction_date, "%d/%m/%Y")]
df[order(account_no,interaction_date), .(Gap = mean(diff(interaction_date))) ,account_no]
# account_no Gap
#1: 1 NaN days
#2: 2 102 days
#3: 3 89 days
#4: 4 NaN days
#5: 5 NaN days

Merge 2 columns into one in dataframe [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
This should be simple, but I am struggling with it.
I want to combine two columns in a single dataframe into one. I have separate columns for custemer ID (20227) and year (2009). I want to create a new column that has both (2009_20227).
You could use paste
transform(dat, newcol=paste(year, customerID, sep="_"))
Or use interaction
dat$newcol <- as.character(interaction(dat,sep="_"))
data
dat <- data.frame(year=2009:2013, customerID=20227:20231)
Some alternative way with function unite in tidyr:
library(tidyr)
df = data.frame(year=2009:2013, customerID=20227:20231) # using akrun's data
unite(df, newcol, c(year, customerID), remove=FALSE)
# newcol year customerID
#1 2009_20227 2009 20227
#2 2010_20228 2010 20228
#3 2011_20229 2011 20229
#4 2012_20230 2012 20230
#5 2013_20231 2013 20231
Another alternative (using the example of #akrun):
dat <- data.frame(year=2009:2013, customerID=20227:20231)
dat$newcol <- paste(dat$year, dat$customerID, sep="_")

Resources