Selecting rows with time in R - r

I have a data frame that looks like this:
Subject Time Freq1 Freq2 ...
A 6:20 0.6 0.1
A 6:30 0.1 0.5
A 6:40 0.6 0.1
A 6:50 0.6 0.1
A 7:00 0.3 0.4
A 7:10 0.1 0.5
A 7:20 0.1 0.5
B 6:00 ... ...
I need to delete the rows in the time range it is not from 7:00 to 7:30.So in this case, all the 6:00, 6:10, 6:20...
I have tried creating a data frame with just the times I want to keep but I does not seem to recognize the times as a number nor as a name. And I get the same error when trying to directly remove the ones I don't need. It is probably quite simple but I haven't found any solution.
Any suggestions?

We can convert the time column to a Period class under the package lubridate and then filter the data frame based on that column.
library(dplyr)
library(lubridate)
dat2 <- dat %>%
mutate(HM = hm(Time)) %>%
filter(HM < hm("7:00") | HM > hm("7:30")) %>%
select(-HM)
dat2
# Subject Time Freq1 Freq2
# 1 A 6:20 0.6 0.1
# 2 A 6:30 0.1 0.5
# 3 A 6:40 0.6 0.1
# 4 A 6:50 0.6 0.1
# 5 B 6:00 NA NA
DATA
dat <- read.table(text = "Subject Time Freq1 Freq2
A '6:20' 0.6 0.1
A '6:30' 0.1 0.5
A '6:40' 0.6 0.1
A '6:50' 0.6 0.1
A '7:00' 0.3 0.4
A '7:10' 0.1 0.5
A '7:20' 0.1 0.5
B '6:00' NA NA",
header = TRUE)

Related

Calculate average within a specified range

I am using the 'diamonds' dataset from ggplot2 and am wanting to find the average of the 'carat' column. However, I want to find the average every 0.1:
Between
0.2 and 0.29
0.3 and 0.39
0.4 and 0.49
etc.
You can use function aggregate to mean by group which is calculated with carat %/% 0.1
library(ggplot2)
averageBy <- 0.1
aggregate(diamonds$carat, list(diamonds$carat %/% averageBy * averageBy), mean)
Which gives mean by 0.1
Group.1 x
1 0.2 0.2830764
2 0.3 0.3355529
3 0.4 0.4181711
4 0.5 0.5341423
5 0.6 0.6821408
6 0.7 0.7327491
...

Create xts object from CSV

I'm trying to generate an xts from a CSV file. The output looks okay as a simple vector i.e. Date and Value columns are character and numeric, respectively.
However, if I want to make it into an xts, the output seems dubious
I'm wondering what is the output on the furthest left column on the xts?
> test <- read.csv("Test.csv", header = TRUE, as.is = TRUE)
> test
Date Value
1 1/12/2014 1.5
2 2/12/2014 0.9
3 1/12/2015 -0.1
4 2/12/2015 -0.3
5 1/12/2016 -0.7
6 2/12/2016 0.2
7 7/12/2016 -1.0
8 8/12/2016 -0.2
9 9/12/2016 -1.1
> xts(test, order.by = as.POSIXct(test$Date), format = "%d/%m/%Y")
Date Value
0001-12-20 "1/12/2014" " 1.5"
0001-12-20 "1/12/2015" "-0.1"
0001-12-20 "1/12/2016" "-0.7"
0002-12-20 "2/12/2014" " 0.9"
0002-12-20 "2/12/2015" "-0.3"
0002-12-20 "2/12/2016" " 0.2"
0007-12-20 "7/12/2016" "-1.0"
0008-12-20 "8/12/2016" "-0.2"
0009-12-20 "9/12/2016" "-1.1"
I'd simply like to set an xts ordered by Date, rather than the mystery column on the left. I've tried as.Date for the xts as well but have the same results.
I recommend you use read.zoo to read the data from CSV, then convert the result to xts using as.xts.
Text <- "Date,Value
1/12/2014,1.5
2/12/2014,0.9
1/12/2015,-0.1
2/12/2015,-0.3
1/12/2016,-0.7
2/12/2016,0.2
7/12/2016,-1.0
8/12/2016,-0.2
9/12/2016,-1.1"
z <- read.zoo(text=Text, sep=",", header=TRUE, format="%m/%d/%Y", drop=FALSE)
x <- as.xts(z)
# Value
# 2014-01-12 1.5
# 2014-02-12 0.9
# 2015-01-12 -0.1
# 2015-02-12 -0.3
# 2016-01-12 -0.7
# 2016-02-12 0.2
# 2016-07-12 -1.0
# 2016-08-12 -0.2
# 2016-09-12 -1.1
Note that you will need to omit text = Text from your actual call, and replace it with file = "your_file_name.csv".
The issue appears to be twofold. One, there is a misplaced parenthesis in one of your calls; two, the left most column is the index, making the Date column superfluous.
df <- read.table(text="
Date Value
1/12/2014 1.5
2/12/2014 0.9
1/12/2015 -0.1
2/12/2015 -0.3
1/12/2016 -0.7
2/12/2016 0.2
7/12/2016 -1.0
8/12/2016 -0.2
9/12/2016 -1.1",
header=TRUE)
df$Date <- as.Date(df$Date, format="%d/%m/%Y")
library(xts)
xts(df[-1], order.by=df[,1])
# Value
# 2014-12-01 1.5
# 2014-12-02 0.9
# 2015-12-01 -0.1
# 2015-12-02 -0.3
# 2016-12-01 -0.7
# 2016-12-02 0.2
# 2016-12-07 -1.0
# 2016-12-08 -0.2
# 2016-12-09 -1.1

extract irregular numeric data from strings

I have data like below. I wish to extract the first and last year from each string here called my.string. Some strings only contain one year and some strings contain no years. No strings contain more than two years. I have provided the desired result in the object named desired.result below the example data set. I am using R.
When a string contains two years those years are contained within a portion of the string that looks like this ga49.51 or ea22.24
When a string contains only one year that year is contained in a portion of the string that looks like this: time11
I know a bit about regex, but this problem seems too irregular and complex for me to figure out. I am not even sure where to begin. Thank you for any advice.
EDIT
Perhaps delete the numbers before the first colon (:) and the remaining numbers are what I want.
my.data <- read.table(text = '
my.string cov1 cov2
42:Alpha:ga6.8 -0.1 2.2
43:Alpha:ga9.11 -2.5 0.6
44:Alpha:ga30.32 -1.3 0.5
45:Alpha:ga49.51 -2.5 0.6
50:Alpha:time1:ga.time -1.7 0.9
51:Alpha:time2:ga.time -1.5 0.8
52:Alpha:time3:ga.time -1.0 1.0
2:Beta:ea2.9 -1.7 0.6
3:Beta:ea17.19 -5.0 0.8
4:Beta:ea22.24 -6.4 1.0
8:Beta:as 0.2 0.6
9:Beta:sd 1.7 0.4
12:Beta:time1:ea.tim -2.6 1.8
13:Beta:time10:ea.ti -3.6 1.1
14:Beta:time11:ea.ti -3.1 0.7
', header = TRUE, stringsAsFactors = FALSE, na.strings = "NA")
desired.result <- read.table(text = '
my.string cov1 cov2 time1 time2
42:Alpha:ga6.8 -0.1 2.2 6 8
43:Alpha:ga9.11 -2.5 0.6 9 11
44:Alpha:ga30.32 -1.3 0.5 30 32
45:Alpha:ga49.51 -2.5 0.6 49 51
50:Alpha:time1:ga.time -1.7 0.9 1 NA
51:Alpha:time2:ga.time -1.5 0.8 2 NA
52:Alpha:time3:ga.time -1.0 1.0 3 NA
2:Beta:ea2.9 -1.7 0.6 2 9
3:Beta:ea17.19 -5.0 0.8 17 19
4:Beta:ea22.24 -6.4 1.0 22 24
8:Beta:as 0.2 0.6 NA NA
9:Beta:sd 1.7 0.4 NA NA
12:Beta:time1:ea.tim -2.6 1.8 1 NA
13:Beta:time10:ea.ti -3.6 1.1 10 NA
14:Beta:time11:ea.ti -3.1 0.7 11 NA
', header = TRUE, stringsAsFactors = FALSE, na.strings = "NA")
I suggest using stringr library to extract the data you need since it handles NA values better, and also allows using a constrained-width lookbehind:
> library(stringr)
> my.data$time1 <- str_extract(my.data$my.string, "(?<=time)\\d+|(?<=\\b[ge]a)\\d+")
> my.data$time2 <- str_extract(my.data$my.string, "(?<=\\b[ge]a\\d{1,100}\\.)\\d+")
> my.data
my.string cov1 cov2 time1 time2
1 42:Alpha:ga6.8 -0.1 2.2 6 8
2 43:Alpha:ga9.11 -2.5 0.6 9 11
3 44:Alpha:ga30.32 -1.3 0.5 30 32
4 45:Alpha:ga49.51 -2.5 0.6 49 51
5 50:Alpha:time1:ga.time -1.7 0.9 1 <NA>
6 51:Alpha:time2:ga.time -1.5 0.8 2 <NA>
7 52:Alpha:time3:ga.time -1.0 1.0 3 <NA>
8 2:Beta:ea2.9 -1.7 0.6 2 9
9 3:Beta:ea17.19 -5.0 0.8 17 19
10 4:Beta:ea22.24 -6.4 1.0 22 24
11 8:Beta:as 0.2 0.6 <NA> <NA>
12 9:Beta:sd 1.7 0.4 <NA> <NA>
13 12:Beta:time1:ea.tim -2.6 1.8 1 <NA>
14 13:Beta:time10:ea.ti -3.6 1.1 10 <NA>
15 14:Beta:time11:ea.ti -3.1 0.7 11 <NA>
The first regex matches:
(?<=time)\\d+ - 1+ digits that have time before them
| - or
(?<=\\b[ge]a)\\d+ - 1+ digits that have ge or ea` as a whole word in front
The second regex matches:
(?<=\\b[ge]a\\d{1,100}\\.) - check if the current position is preceded with ge or ea as a whole word followed with 1 to 100 digits (I believe that should be enough for your scenario, 100-digit chunks are hardly expected here, you may even decrease the value), and then a .
\\d+ - 1+ digits
Here's a regex that will extract either of the two types, and output them to different columns at the end of the lines:
Search: .*(?:time(\d+)|(?:[ge]a)(\d+)\.(\d+)).*
Replace: $0\t$1\t$2\t$3
Breakdown:
.*(?: ... ).* ensures that the whole line is matched, and uses a non-capturing group for the main alternation
time(\d+): this is the first half of the alternation, capturing any digits after a "time"
(?:[ge]a)(\d+)\.(\d+): the second half of the alternation matches "ga" or "ea" followed by two sets of digits, each in its own capture group
Replacement: $0 puts the whole line back. Each of the other capture groups are added, with tabs in-between.
See regex101 example

How to meta analyze p values of different observations

I am trying to meta analyze p values from different studies. I have data frame
DF1
p-value1 p-value2 pvalue3 m
0.1 0.2 0.3 a
0.2 0.3 0.4 b
0.3 0.4 0.5 c
0.4 0.4 0.5 a
0.6 0.7 0.9 b
0.6 0.7 0.3 c
I am trying to get fourth column of meta analyzed p-values1 to p-value3.
I tried to use metap package
p<–rbind(DF1$p-value1,DF1$p-value2,DF1$p-value3)
pv–split (p,p$m)
library(metap)
for (i in 1:length(pv))
{pvalue <- sumlog(pv[[i]]$pvalue)}
But it results in one p value. Thank you for any help.
You can try
apply(DF1[,1:3], 1, sumlog)

Create a data frame with date as column names

I would like to construct a data table in R that has columns as dates and rows as times (without date info). Basically I have a table in the form:
Time 21.04.15 22.04.15 24.04.15 03.05.15
00:00 0.4 0.4 0.4 0.4
01:00 0.4 0.4 0.4 0.4
02:00 0.4 0.4 0.4 0.4
03:00 0.6 0.6 0.6 0.6
04:00 0.6 0.6 0.6 0.6
05:00 0.7 0.8 0.8 0.8
06:00 0.7 0.8 0.8 0.8
07:00 0.7 0.8 0.8 0.8
...
I would like to address (plot, extract) the columns by date and elements by date and time.
Is this possible?
The best you can do is rename them character strings that represent dates, but I don't think the names themselves can be a Date object. (I'll admit, I've never tried, and I'm not going to experiment with it because doing so seems like a really bad idea).
Assuming your current column names are in dd.mm.yy format, run
names(df_object) <- format(as.Date(names(df_object), format = "%d.%m.%y"),
format = "%Y-%m-%d")
But like those in the comments, while this will work, I have a hard time imagining circumstances where it is beneficial.

Resources