I am trying to graph the following data file:
61.0 16.4 100.0 28.6 28.6 12.2 12.2
59.0 25.4 100.0 21.4 21.4 11.8 11.8
69.0 15.9 100.0 35.7 35.7 11.5 11.5
59.0 23.7 100.0 23.4 23.4 11.8 11.8
49.0 20.4 100.0 18.0 18.0 9.8 9.8
84.0 13.1 90.9 50.8 50.8 16.8 16.8
59.0 16.9 100.0 22.6 22.6 11.8 11.8
71.0 16.9 100.0 32.8 32.8 14.2 14.2
68.0 19.1 100.0 26.2 26.2 13.6 13.6
91.0 13.2 100.0 51.6 51.6 18.2 18.2
57.0 22.8 100.0 29.4 29.4 11.4 11.4
52.0 26.9 100.0 17.8 17.8 10.4 10.4
55.0 21.8 100.0 32.2 32.2 11.0 11.0
68.0 19.1 100.0 29.8 29.8 13.6 13.6
50.0 22.0 100.0 19.0 19.0 10.0 10.0
149.0 12.1 66.7 111.2 111.2 29.8 29.8
69.0 20.3 100.0 29.8 29.8 13.8 13.8
I am very new to gnuplot I cant seem to figure out what the correct code will be to get this graph:
I was trying something like this:
gnuplot> set output 'datastore1.png'
gnuplot> plot 'desktop1.dat' using 0:1 title "totalio" with lines, 'desktop1.dat' using 0:2 title "readpercentage" with lines, 'desktop1.dat' using 0:3 title "cachehitpercentage" with lines, 'desktop1.dat' using 0:4 title "currentkbpersecond" with lines, 'desktop1.dat' using 0:5 title "maximumkbpersecond" with lines, 'desktop1.dat' using 0:6 title "currentiopersecond" with lines, 'desktop1.dat' using 0:7 title "maximumiopersecond" with lines
gnuplot> quit
However the graph is not exactly correct.
Thanks for the help!
Not sure what you are trying to plot here, but I think the error is that you are using the zero-th column for the 'using' command which does not exist. Rather use this
p 'desktop1.dat' u 1:2, 'desktop1.dat' u 1:3
edit
So when you are plotting against time, you might want to add another column to the data that you read in from the file such that you have
15 61.0 16.4 100.0 28.6 28.6 12.2 12.2
as an example for the first line of your data. Afterwards you use the given plotting command I gove above.
Related
I have a data set in 1 minute interval, but I am looking for a way to convert it to hourly average. I am new to R programming for data analysis. Below is an example of how my data looks.
Please if there are other easy ways besides using R to solve this issue, kindly specify. I hope to hear from anyone soon
TimeStamp TSP PM10 PM2.5 PM1 T RH
1 01/12/2022 14:08 44.3 14.2 6.97 3.34 32.9 53.2
2 01/12/2022 14:09 40.3 16.9 7.10 3.52 33.1 53.1
3 01/12/2022 14:10 36.5 15.6 7.43 3.64 33.2 53.1
4 01/12/2022 14:11 33.0 16.5 7.29 3.40 33.2 52.6
5 01/12/2022 14:12 41.3 18.2 7.73 3.41 33.3 52.9
6 01/12/2022 14:13 38.5 16.3 7.54 3.44 33.3 53.3
7 01/12/2022 14:14 38.5 18.5 6.80 3.14 33.2 53.6
8 01/12/2022 14:15 30.7 17.1 6.86 3.33 33.2 53.7
9 01/12/2022 14:16 32.5 18.3 8.56 4.42 33.3 53.5
10 01/12/2022 14:17 26.4 15.6 9.34 4.70 33.4 53.0
11 01/12/2022 14:18 23.8 14.6 7.56 3.97 33.4 52.5
12 01/12/2022 14:19 18.1 11.4 6.15 3.08 33.4 51.7
13 01/12/2022 14:20 22.4 12.2 6.43 3.49 33.5 50.9
14 01/12/2022 14:21 17.9 12.9 6.03 3.15 33.6 50.9
15 01/12/2022 14:22 18.6 12.8 5.87 3.19 33.7 50.7
16 01/12/2022 14:23 22.3 10.7 5.49 2.74 33.7 50.6
17 01/12/2022 14:24 18.1 9.2 4.87 2.52 33.7 49.9
18 01/12/2022 14:25 19.2 13.0 5.12 2.65 33.7 50.2
19 01/12/2022 14:26 19.0 10.3 5.01 2.78 33.9 50.0
20 01/12/2022 14:27 20.0 10.3 4.78 2.57 34.0 49.4
21 01/12/2022 14:28 14.1 9.6 4.71 2.45 34.1 49.0
22 01/12/2022 14:29 16.1 10.3 4.83 2.68 34.1 48.9
23 01/12/2022 14:30 13.9 10.0 5.21 2.99 34.2 49.5
24 01/12/2022 14:31 27.3 11.5 5.90 2.94 34.2 49.7
25 01/12/2022 14:32 23.8 12.8 5.77 2.97 34.2 49.6
26 01/12/2022 14:33 19.3 12.4 5.92 3.29 34.3 49.6
27 01/12/2022 14:34 30.9 14.4 6.10 3.22 34.3 49.3
28 01/12/2022 14:35 30.5 15.0 5.73 2.98 34.3 49.9
29 01/12/2022 14:36 24.7 13.9 6.17 3.17 34.3 50.0
30 01/12/2022 14:37 27.0 12.3 6.16 3.14 34.2 50.2
31 01/12/2022 14:38 27.0 12.4 5.65 3.28 34.2 50.3
32 01/12/2022 14:39 22.2 12.5 5.51 3.10 34.2 50.2
33 01/12/2022 14:40 19.0 11.6 5.46 3.06 34.1 50.3
34 01/12/2022 14:41 24.3 14.3 5.45 3.01 34.1 50.2
35 01/12/2022 14:42 17.6 10.9 5.64 3.30 34.1 50.5
36 01/12/2022 14:43 20.9 10.1 5.80 3.26 34.0 51.0
37 01/12/2022 14:44 19.0 11.7 5.93 3.27 33.9 50.9
38 01/12/2022 14:45 25.7 15.6 6.20 3.40 33.9 51.1
39 01/12/2022 14:46 20.1 14.4 6.08 3.39 34.0 51.3
40 01/12/2022 14:47 14.8 11.1 5.91 3.44 34.1 50.9
I have tried several methods I got via my research but non seems to work for me. Below are the codes I have tried
ref.data.hourly <- ref.data %>%
group_by(hour = format (as.POSIXct(cut(TimeStamp, break = "hour")), "%H")) %>%
summarise(meanval = mean(val, na.rm = TRUE))
I have also tried this
ref.data$TimeStamp <- as.POSIXct(ref.data$TimeStamp, format = "%d/%m/%Y %H:%M")
ref.data.xts$TimeStamp <- NULL
ref.data$TimeStamp <- strptime(ref.data$TimeStamp, "%d/%m/%Y %H:%M")
ref.data$group <- cut(ref.data$TimeStamp, breaks = "hour")
Your first attempt seems sensible to me. Lacking further info about your data or a specific error message, I assume the problem is handling the date-time formatting (or actually using cut() with date-time values).
A workaround is to convert the dates to character (if they aren't yet) and then just omit the minutes. Given that as.character(ref.data$timeStamp) is consistently formatted like e.g. 01/12/2022 14:08, you can do the following:
ref.data.hourly <- ref.data %>%
mutate(hour_grps = substr(as.character(TimeStamp), 1, 13)) %>%
group_by(hour_grps) %>%
summarise(meanval = mean(val, na.rm = TRUE))
I don't think this is good practice because it will break if you use the same code on slightly different formatted data. For instance, if the code were used on a computer with different locale, the date-time formatting used with as.character() may change. So please consider this a quick fix, not a permanent solution.
This question already has answers here:
Plot multiple columns on the same graph in R [duplicate]
(4 answers)
Filter data.frame rows by a logical condition
(9 answers)
Closed 7 months ago.
I am currently analyzing a dataset of yearly C-section rates across the 50 US States between 2004-2020. I want to create 1 scatterplot that contains the rates from Alabama, Mississippi, and Utah. I am having trouble writing the code because I haven't used R in a while. This is what I have so far.
Plot2 <- ggplot(Rates, aes(...1,...2)) +
geom_line() +
ggtitle( "C-Section Rates") +
xlab( "Year") +
ylab( "Percentage of Live Births(%)")
And here is the dataset that I am analyzing
Rate <- read.table(text="YEAR AL AK AZ AR CA CO CT DE FL GA HI ID IL IN IA KS KY LA ME MD MA MI MN MS MO MT NE NV NH NJ NM NY NC ND OH OK OR PA RI SC SD TN TX UT VT VA WA WV WI WY
2020 35 22.9 28.4 33.8 30.5 27.2 34.1 31.7 35.9 33.9 26.3 23.5 30.8 30.1 30.2 30.1 34.3 36.8 29.7 33.7 32.4 32.5 28.5 38.2 29.3 27.6 28.8 32.9 32.1 33.2 26.1 33.6 29.9 27 31.3 32.1 28.8 30.6 33.4 33.5 24.7 32.1 34.7 23.1 26.9 32.6 28.5 34.2 26.7 26.4
2019 34.6 21.6 27.8 34.5 30.8 26.8 34.6 31.5 36.5 34.3 26.8 24 30.6 29.3 29.6 29.7 33.6 36.7 30.2 33 31.4 32 27.6 38.5 30.1 28.4 29.1 32.8 31.6 33.8 26.4 33.2 29.1 26.5 31 32.1 28 30.2 32 33.2 24.5 31.8 34.8 23.1 25.8 31.9 27.8 34.6 26.7 26.3
2018 34.7 22.4 27.5 34.8 30.9 26.1 34.8 31.3 36.8 34 26.9 24 31.2 29.8 29.8 29.7 34.3 37 30.4 33.9 31.5 32.1 27 38.3 30 28.1 29.9 33.8 31.6 34.9 25.3 33.9 29.4 26.5 30.8 32.8 28 30.1 32.2 33.5 24.6 32.4 35 22.7 25.9 32.4 27.9 34.1 26.6 27.4
2017 35.1 22.5 26.9 33.5 31.4 26.5 34.8 31.8 37.2 34.2 25.9 23.7 31.1 29.7 29.7 30 35.2 37.5 29.9 33.9 31.6 31.9 27.4 37.8 30.1 28.5 30.4 34.1 31 35.9 24.7 34.1 29.4 28.3 30.3 32.2 28.1 30.5 31.5 33.5 24.5 32.4 35 22.8 25.7 32.6 27.7 35.2 26.4 26.4
2016 34.4 23 27.5 32.3 31.9 26.2 35.4 31.8 37.4 33.8 25.2 23.9 31.1 29.8 30.1 29.5 34.6 37.5 28.9 33.7 31.3 32 26.8 38.2 30.2 29.1 31 33.8 30.9 36.2 24.8 33.8 29.4 26.8 30.8 32 27.2 29.8 31.2 33.5 25.3 32.5 34.4 22.3 25.7 33 27.4 34.9 26 27.4
2015 35.2 22.9 27.6 32.3 32.3 25.9 34 31.9 37.3 33.6 25.9 24.4 31 29.6 29.8 29.6 34.4 37.5 29.4 34.9 31.4 31.9 26.5 38 30.3 29.7 31.1 34.6 30.8 36.8 24.3 33.8 29.3 27.5 30.4 32.4 27.1 30.1 30.6 33.7 25.7 33.2 34.4 22.8 25.5 32.9 27.5 34.9 26.2 27.3
2014 35.4 23.7 27.8 32 32.7 25.6 34.2 31.5 37.2 33.8 24.6 24.2 31.2 30.3 30 29.8 35.1 38.3 29.8 34.9 31.6 32.8 26.5 37.7 30.1 31.4 30.8 34.4 29.9 37.4 23.8 33.9 29.5 27.6 30.5 33.1 27.4 30.4 30.7 34.3 24.8 33.7 34.9 22.3 25.8 33.1 27.6 35.4 26.1 27.8
2004 31.8 21.9 24.7 31.5 30.7 24.6 32.4 30 34.9 30.5 25.6 22.6 28.8 28.2 26.7 28.9 33.9 36.8 28.3 31.1 32.2 28.8 25.3 35.1 29.7 25.8 28.6 31 28 36.3 22.2 31.5 29.3 26.4 28.1 32.5 27.6 28.9 30.3 32.7 25.1 31.1 32.6 21.6 25.9 31.4 27.8 34.2 23.7 24.6", header=TRUE)
ggplot2 is designed to work most smoothly with "long" aka tidy data, where each row is an observation and each column is a variable. Your original data is "wide," with the states all in separate columns. One way to switch between the two data shapes is pivot_longer from the tidyr package, which is loaded along with ggplot2 when we load tidyverse. You can filter using filter from dplyr, also loaded in tidyverse.
library(tidyverse)
Rate %>%
pivot_longer(-YEAR, names_to = "STATE") %>%
filter(STATE %in% c("AL", "MS", "UT")) %>%
ggplot(aes(YEAR, value, color = STATE)) +
geom_point()
I want to compute the rolling mean over a vector whereby the window grows with each entry in the vector. Basically, I want to have the mean of all elements up to the i-th, i+1-th, i+2-th, and so forth.
To make it more clear, I'll provide an example and a solution which works for smaller datasets but does not scale up well:
library(zoo)
# data:
x <- 1:100
# solution:
rolling_average <- rollapply(x, seq_along(x), mean, align = "right")
# result:
rolling_average
# [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5
# [27] 14.0 14.5 15.0 15.5 16.0 16.5 17.0 17.5 18.0 18.5 19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0 25.5 26.0 26.5
# [53] 27.0 27.5 28.0 28.5 29.0 29.5 30.0 30.5 31.0 31.5 32.0 32.5 33.0 33.5 34.0 34.5 35.0 35.5 36.0 36.5 37.0 37.5 38.0 38.5 39.0 39.5
# [79] 40.0 40.5 41.0 41.5 42.0 42.5 43.0 43.5 44.0 44.5 45.0 45.5 46.0 46.5 47.0 47.5 48.0 48.5 49.0 49.5 50.0 50.5
Using this approach for a vector with 500000 entries fills up my memory within seconds and renders my PC unusable. Alternatively, I've tried using roll_mean from RcppRoll, but wasn't able to come up with a solution because RcppRoll::roll_mean only accepts integers as window lengths.
So, what is the best approach to solve this problem on a large scale? Any help is greatly appreciated.
We can do
cumsum(x) / seq_along(x)
# [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5
# [21] 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0 16.5 17.0 17.5 18.0 18.5 19.0 19.5 20.0 20.5
# [41] 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0 25.5 26.0 26.5 27.0 27.5 28.0 28.5 29.0 29.5 30.0 30.5
# [61] 31.0 31.5 32.0 32.5 33.0 33.5 34.0 34.5 35.0 35.5 36.0 36.5 37.0 37.5 38.0 38.5 39.0 39.5 40.0 40.5
# [81] 41.0 41.5 42.0 42.5 43.0 43.5 44.0 44.5 45.0 45.5 46.0 46.5 47.0 47.5 48.0 48.5 49.0 49.5 50.0 50.5
We can use cummean
library(dplyr)
cummean(x)
#[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0
#[20] 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0 16.5 17.0 17.5 18.0 18.5 19.0 19.5
#[39] 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0 25.5 26.0 26.5 27.0 27.5 28.0 28.5 29.0
#[58] 29.5 30.0 30.5 31.0 31.5 32.0 32.5 33.0 33.5 34.0 34.5 35.0 35.5 36.0 36.5 37.0 37.5 38.0 38.5
#[77] 39.0 39.5 40.0 40.5 41.0 41.5 42.0 42.5 43.0 43.5 44.0 44.5 45.0 45.5 46.0 46.5 47.0 47.5 48.0
#[96] 48.5 49.0 49.5 50.0 50.5
I have the following data.
HEIrank1
HEI.ID X2007 X2008 X2009 X2010 X2011 X2012
1 OP 41.8 147.6 90.3 82.9 106.8 63.0
2 MO 20.0 20.8 21.1 20.9 12.6 20.6
3 SD 21.2 32.3 25.7 23.9 25.0 40.1
4 UN 51.8 39.8 19.9 20.9 21.6 22.5
5 WS 18.0 19.9 15.3 13.6 15.7 15.2
6 BF 11.5 36.9 20.0 23.2 18.2 23.8
7 ME 34.2 30.3 28.4 30.1 31.5 25.6
8 IM 7.7 18.1 20.5 14.6 17.2 17.1
9 OM 11.4 11.2 12.2 11.1 13.4 19.2
10 DC 14.3 28.7 20.1 17.0 22.3 16.2
11 OC 28.6 44.0 24.9 27.9 34.0 30.7
12 TH 7.4 10.0 5.8 8.8 8.7 8.6
13 CC 12.1 11.0 12.2 12.1 14.9 15.0
14 MM 11.7 24.2 18.4 18.6 31.9 31.7
15 MC 19.0 13.7 17.0 20.4 20.5 12.1
16 SH 11.4 24.8 26.1 12.7 19.9 25.9
17 SB 13.0 22.8 15.9 17.6 17.2 9.6
18 SN 11.5 18.6 22.9 12.0 20.3 11.6
19 ER 10.8 13.2 20.0 11.0 14.9 14.2
20 SL 44.9 21.6 21.3 26.5 17.0 8.0
I try following commends to draw regression line for each HEIs.
year <- c(2007 , 2008 , 2009 , 2010 , 2011, 2012)
op <- as.numeric(HEIrank1[1,])
lm.r <- lm(op~year)
plot(year, op)
abline(lm.r)
I want to draw to draw regression line for each college in one graph and I do not how.can you help me.
Here's my approach with ggplot2 but the graph is uninterpretable with that many lines.
library(ggplot2);library(reshape2)
mdat <- melt(HEIrank1, variable.name="year")
mdat$year <- as.numeric(substring(mdat$year, 2))
ggplot(mdat, aes(year, value, colour=HEI.ID, group=HEI.ID)) +
geom_point() + stat_smooth(se = FALSE, method="lm")
Faceting may be a better way to got:
ggplot(mdat, aes(year, value, group=HEI.ID)) +
geom_point() + stat_smooth(se = FALSE, method="lm") +
facet_wrap(~HEI.ID)
I have this data frame:
head(df,10)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
3 36.4 13.1 13.9 36.6 9.26 57.9 28.0 34.96 26049 3492
4 31.1 11.2 12.6 45.1 7.81 48.8 25.9 37.85 17515 2754
5 33.2 13.4 13.2 40.3 8.69 54.3 26.9 35.67 23510 3265
6 34.0 12.8 13.7 39.4 8.77 54.8 26.5 35.19 25151 3305
7 32.7 12.4 13.6 41.3 8.49 53.0 25.9 35.97 25214 3201
8 33.4 13.7 12.5 40.3 8.76 54.7 27.1 36.50 23943 3391
9 35.2 13.8 13.5 37.5 9.20 57.5 27.8 33.08 25647 3385
10 34.6 14.9 14.9 35.6 9.35 58.4 27.8 35.81 27324 3790
11 30.4 13.3 13.0 43.3 8.29 51.8 24.9 38.31 25178 2881
12 32.0 13.3 14.0 40.7 8.58 53.6 26.1 35.97 25677 3162
I have DateTime is this:
DateTime<-Sys.time()
I would like to insert another column this df and increment the DateTime value by 30 seconds for each row.
Im doing this:
for (i in 1:nrow(df)) {
df[1,]$DateTime<-DateTime
DateTime<-DateTime+30
}
This loop is not doing what Im trying to do. Any help is greatly appreicated.
df$DateTime <- Sys.time() + 30 * (seq_len(nrow(df))-1)