How to select groups of rows and store to variables?

How to select groups of rows and store to variables? - r

I currently use the following code to input a csv file, plot the data points based off one column and store a CpK number to a variable. This code works to calculate the CpK for the entire data set and the graph works as well. I am now looking to calculate the CpK number for each month in the dataset (graphing is not necessary). I looked through the data.table documentation as well as other R documentation, but I having a tough time selecting only the data for each month.
Current Code:(I could have calculated the CpK in one formula, but I have it broken up purposely)
mydf <- read.csv('ID35.csv', header = TRUE, sep=",")
date <- strptime(mydf$DATETIME, "%Y/%m/%d %H:%M:%S")
plot(date,mydf$AVG,xlab='Date',ylab='AVG',main='Data')
abline(h=mydf$MIN,col=3,lty=1)
abline(h=mydf$MAX,col=3,lty=1)
grid(NULL,NULL,col="black")
legend("topright", legend = c(" ", " "), text.width = strwidth("1,000,000"), lty = 1:2, xjust = 1, yjust = 1, title = "Points")
myavg <-mean(mydf$AVG, na.rm=TRUE)
newds <- (mydf$AVG - myavg)^2
newsum <- sum(newds, na.rm=TRUE)
N <- length(mydf$AVG) - 1
newN <- 1/N
total <- newN*newsum
sigma <- total^(1/2)
USL <- mean(mydf$MAX, na.rm=TRUE)
LSL <- mean(mydf$MIN, na.rm=TRUE)
cpk <- min(((USL-myavg)/(3*sigma)),((myavg-LSL)/(3*sigma)))
cpk
Here is what the dataset looks like(date formatting is already done):
mydf(only 24/1000 rows):
Code DATETIME AVG MIN TARG_AVG MAX
N9 2012/04/10 14:03:37 0.2647 0.22 0.25 0.27
NA 2012/03/30 07:48:17 0.2589 0.22 0.25 0.27
NB 2012/03/24 19:23:08 0.2912 0.22 0.25 0.27
NB 2012/03/25 16:10:17 0.2659 0.22 0.25 0.27
NC 2012/04/10 00:58:29 0.2622 0.22 0.25 0.27
ND 2012/04/14 18:32:52 0.2600 0.22 0.25 0.27
NG 2012/04/21 14:47:47 0.2671 0.22 0.25 0.27
NH 2012/04/09 20:31:17 0.2648 0.22 0.25 0.27
NL 2012/04/24 07:28:17 0.2527 0.22 0.25 0.27
NP 2012/04/23 13:26:50 0.2640 0.22 0.25 0.27
NQ 2012/04/14 20:30:42 0.2590 0.22 0.25 0.27
NS 2012/05/02 09:09:52 0.2651 0.22 0.25 0.27
NU 2012/05/04 13:07:49 0.2688 0.22 0.25 0.27
NV 2012/05/19 23:07:08 0.2716 0.22 0.25 0.27
NX 2012/05/03 02:00:13 0.2670 0.22 0.25 0.27
NY 2012/05/04 12:56:52 0.2680 0.22 0.25 0.27
NZ 2012/05/06 10:05:38 0.2697 0.22 0.25 0.27
O0 2012/05/07 22:01:11 0.2675 0.22 0.25 0.27
O3 2012/06/21 18:09:47 0.2606 0.22 0.25 0.27
04 2012/06/21 18:47:36 0.2545 0.22 0.25 0.27
51 2012/07/24 21:13:08 0.2541 0.22 0.25 0.27
O5 2012/07/26 16:54:09 0.2575 0.22 0.25 0.27
O6 2012/07/20 02:42:29 0.2603 0.22 0.25 0.27
OD 2012/08/25 20:56:55 0.2559 0.22 0.25 0.27
OH 2012/08/28 10:30:11 0.2372 0.22 0.25 0.27
From the table above the only two columns I care about are the DATETIME and the AVG. Once I have the new "myavg" variable for each month I can use the same formula to calculate the CpK number. I am thinking the variable name could be something like' 2012/08' I think the code should go something like:
for(each month mydf$DATETIME) (date like 2012/04*,2012/05*)
monthavg <-(mydf$AVG, na.rm=TRUE)
Is there a way to store the CpK number for each month in variables I can access?

aggregate(mydf$AVG, list(month=months(as.Date(mydf$DATETIME))), mean)
# month x
# 1 April 0.2618125
# 2 August 0.2465500
# 3 July 0.2573000
# 4 June 0.2575500
# 5 March 0.2720000
# 6 May 0.2682429

Related

Column Mean for rows with unique values

how can I compute the mean R, R1, R2, R3 values from the rows sharing the same lon,lat field? I'm sure this questions exists multiple times but I could not easily find it.
lon lat length depth R R1 R2 R3
1 147.5348 -35.32395 13709 1 0.67 0.80 0.84 0.83
2 147.5348 -35.32395 13709 2 0.47 0.48 0.56 0.54
3 147.5348 -35.32395 13709 3 0.43 0.29 0.36 0.34
4 147.4290 -35.27202 12652 1 0.46 0.61 0.60 0.58
5 147.4290 -35.27202 12652 2 0.73 0.96 0.95 0.95
6 147.4290 -35.27202 12652 3 0.77 0.92 0.92 0.91

I'd recommend using the split-apply-combine strategy, where you're splitting by BOTH lon and lat, applying mean to each group, then recombining into a single data frame.
I'd recommend using dplyr:
library(dplyr)
mydata %>%
group_by(lon, lat) %>%
summarize(
mean_r = mean(R)
, mean_r1 = mean(R1)
, mean_r2 = mean(R2)
, mean_r3 = mean(R3)
)

Logaritmic scale in x-axis

I have the following code:
S = [100 200 500 1000 10000];
H = [0.14 0.15 0.17 0.19 0.28;0.14 0.16 0.18 0.20 0.29;0.15 0.17 0.19 0.21 0.31;0.16 0.17 0.20 0.22 0.32;0.23 0.22 0.28 0.30 0.44;0.23 0.23 0.29 0.3 0.5;0.33 0.32 0.4 0.42 0.63;0.32 0.31 0.39 0.40 0.61;0.23 0.23 0.30 0.30 0.50];
for i = 1:9
hold on
plot(S, H(i,:));
legend('GHM01','GHM02','GHM03','GHM04','GHM05','GHM06','GHM07','GHM08','GHM09'); %legend not correctly
axis([100 10000 0.1 1])
end
set(gca,'xscale','log')
The x-axis looks like this:
Because The S-values are very far from each other, I used a logaritmic x-axis (and linear y-axis).
I have on the axis 5 values (see S), and I only want those 5 values visible on the x-axis with equidistant spacing between the values. How do I do this? Or is there a better alternative to display my x-axis, rather than logaritmic scale?

If you want the X-axis ticks to be equally distant although they are not (neither on a linear nor on a log scale) then you basically treat this axis as categorical, and then it should get and ordinal temporary value (say 1:5) to determine the distance between them.
Here is a quick implementation of your comment above:
S = {'100' '200' '500' '1000' '10000'};
H = [0.14 0.15 0.17 0.19 0.28;...
0.14 0.16 0.18 0.20 0.29;
0.15 0.17 0.19 0.21 0.31;
0.16 0.17 0.20 0.22 0.32;
0.23 0.22 0.28 0.30 0.44;
0.23 0.23 0.29 0.3 0.5;
0.33 0.32 0.4 0.42 0.63;
0.32 0.31 0.39 0.40 0.61;
0.23 0.23 0.30 0.30 0.50];
f = figure;
plot(1:length(S),H);
f.Children.XTick = 1:length(S);
f.Children.XTickLabel = S;
TMHO this is the most straightforward way to solve this problem ;)

Indexing certain values in a function

I have a data frame that looks like this:
df <-
ID TIME AMT k10 k12 k21
1.00 0.00 50.00 0.10 0.40 0.01
1.00 1.00 0.00 0.10 0.40 0.01
1.00 2.00 0.00 0.10 0.40 0.01
1.00 3.00 50.00 0.10 0.40 0.01
1.00 4.00 0.00 0.10 0.40 0.01
2.00 0.00 100.00 0.25 0.50 0.06
2.00 1.00 0.00 0.25 0.50 0.06
2.00 2.00 0.00 0.25 0.50 0.06
I am using the values of k10, k12, k21 to process certain calculations in the function below. Each of these values is specific to a subject ID and doesn't with time. My Question is: How can I can write it in the function so it uses, the first value for each subject ID? As you may notice in the function below, this is what I am currently using:
k10 <- d$k10
k12 <- d$k12
k21 <- d$k21
Each of these gives a vector of the same value at all time points which is obviously no need for that. I just need one value for each. I think that is one reason why I am getting warnings saying number of items to replace is not a multiple of replacement length
#This is the function that I am using:
TwoCompIVbolus <- function(d){
#set initial values in the compartments
d$A1[d$TIME==0] <- d$AMT[d$TIME==0] # drug amount in the central compartment at time zero.
d$A2[d$TIME==0] <- 0 # drug amount in the peripheral compartment at time zero.
k10 <- d$k10
k12 <- d$k12
k21 <- d$k21
k20 <- 0
E1 <- k10+k12
E2 <- k21+k20
#calculate hybrid rate constants
lambda1 <- 0.5*(k12+k21+k10+sqrt((k12+k21+k10)^2-4*k21*k10))
lambda2 <- 0.5*(k12+k21+k10-sqrt((k12+k21+k10)^2-4*k21*k10))
for(i in 2:nrow(d))
{
t <- d$TIME[i]-d$TIME[i-1]
A1last <- d$A1[i-1]
A2last <- d$A2[i-1]
A1term = (((A1last*E2+A2last*k21)-A1last*lambda1)*exp(-t*lambda1)-((A1last*E2+A2last*k21)-A1last*lambda2)*exp(-t*lambda2))/(lambda2-lambda1)
d$A1[i] = A1term + d$AMT[i] #Amount in the central compartment
A2term = (((A2last*E1+A1last*k12)-A2last*lambda1)*exp(-t*lambda1)-((A2last*E1+A1last*k12)-A2last*lambda2)*exp(-t*lambda2))/(lambda2-lambda1)
d$A2[i] = A2term #Amount in the peripheral compartment
}
d
}
#to apply it for each subject
simdf <- ddply(df, .(ID), TwoCompIVbolus)

You can just use k10 <- d$k10[1]

Error using corrplot

I need help with interpreting an error message using corrplot.
Here is my script
install.packages("ggplot2")
install.packages("corrplot")
install.packages("xlsx")
library(ggplot2)
library(corrplot)
library(xlsx)
#set working dir
setwd("C:/R")
#read xlsx data into R
df <- read.xlsx("TP_diff_frame.xlsx",1)
#set column as index
rownames(df) <- df$country
#remove column
df2<-subset(df, select = -c(country) )
#round values to to decimals
corrplot(df2, method="shade",shade.col=NA, tl.col="black", tl.srt=45)
My df2:
> df2
a b c d e f g
Sweden 0.09 0.19 0.00 -0.25 -0.04 0.01 0.00
Germany 0.11 0.19 0.01 -0.35 0.01 0.02 0.01
UnitedKingdom 0.14 0.21 0.03 -0.32 -0.05 0.00 0.00
RussianFederation 0.30 0.26 -0.07 -0.41 -0.09 0.00 0.00
Netherlands 0.09 0.16 -0.05 -0.26 0.02 0.02 0.01
Belgium 0.12 0.20 0.01 -0.34 0.01 0.00 0.00
Italy 0.14 0.22 0.01 -0.37 0.00 0.00 0.00
France 0.14 0.24 -0.04 -0.34 0.00 0.00 0.00
Finland 0.16 0.17 0.01 -0.26 -0.08 0.00 0.00
Norway 0.15 0.21 0.10 -0.37 -0.09 0.00 0.00
And the error message:
> corrplot(df2, method="shade",shade.col=NA, tl.col="black", tl.srt=45)
Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), nrow = nr, :
length of 'dimnames' [2] not equal to array extent

I think the problem is that you are plotting the data frame instead of the correlation matrix. Try to change the last line to this:
corrplot(cor(df2), method="shade",shade.col=NA, tl.col="black", tl.srt=45)
The function cor calculates the correlation matrix, which is what you need to plot

In order to use the corrplot package for heatmap plots you should pass your data.frame to a matrix and also use the is.corr argument.
df2 <- as.matrix(df2)
corrplot(df2, is.corr=FALSE)

Another option is to break it up into two lines of code.
df2 <- cor(df, use = "na.or.complete")
corrplot(df2, method="shade",shade.col=NA, tl.col="black", tl.srt=45)
I'd run a simple corrplot (e.g. corrplot.mixed(df2)) make sure it works, then get into the fine tuning and aesthetics.

Multiple boxplots with predefined statistics using lattice-like graphs in r

I have a dataset which looks like this
VegType 87MIN 87MAX 87Q25 87Q50 87Q75 96MIN 96MAX 96Q25 96Q50 96Q75 00MIN 00MAX 00Q25 00Q50 00Q75
1 0.02 0.32 0.11 0.12 0.13 0.02 0.26 0.08 0.09 0.10 0.02 0.28 0.10 0.11 0.12
2 0.02 0.45 0.12 0.13 0.13 0.02 0.20 0.09 0.10 0.11 0.02 0.26 0.11 0.12 0.12
3 0.02 0.29 0.13 0.14 0.14 0.02 0.27 0.11 0.11 0.12 0.02 0.26 0.12 0.13 0.13
4 0.02 0.41 0.13 0.13 0.14 0.02 0.58 0.10 0.11 0.12 0.02 0.34 0.12 0.13 0.13
5 0.02 0.42 0.12 0.13 0.14 0.02 0.46 0.10 0.11 0.11 0.02 0.28 0.12 0.12 0.13
6 0.02 0.32 0.13 0.14 0.14 0.02 0.52 0.12 0.12 0.13 0.02 0.29 0.13 0.14 0.14
7 0.02 0.55 0.12 0.13 0.14 0.02 0.24 0.10 0.11 0.11 0.02 0.37 0.12 0.12 0.13
8 0.02 0.55 0.12 0.13 0.14 0.02 0.19 0.10 0.11 0.12 0.02 0.22 0.11 0.12 0.13
In reality I have 26 variables and 5 years (87,96 and 00 in the column names are years). In an ideal world I would like to have a lattice-like graph with 26 plots, one per variable, with each plot containing 5 boxes, i.e. one per year. I understand that it is not possible to do this is lattice because lattice won't accept predefined statistics. Is there a fairly unpainful way to do this in R with predefined stats? I have used bxp for simple boxplots plotting all the variables for one year in a single plot e.g.
Yr01 = read.csv('dat.csv',header=T)
dat01=t(Yr01[,c("01Min","01Q25","01Mean","01Q75","01Max")])
bxp(list(stats=dat01, n=rep(26, ncol(dat01))),ylim=c(0.07,0.2))
but I don't know how to go from there to what I need.
Thanks.

This can be done, at least using ggplot2, but you'll have to reshape your data quite a bit. And you really have to have a data where the quantiles actually make sense!! Your quantile values are all messed up! For example, Var1 has 01Max = 0.26 and 01Q75 = .67!!
First, I'll recreate a valid data:
n <- c("01Min", "01Max", "01Med", "01Q25", "01Q75", "02Min",
"02Max", "02Med", "02Q25", "02Q75")
v1 <- c(0.03, 0.76, 0.41, 0.13, 0.67, 0.10, 0.43, 0.27, 0.2, 0.33)
v2 <- c(0.03, 0.28, 0.14, 0.08, 0.20, 0.02, 0.77, 0.13, 0.06, 0.44)
df <- data.frame(v1=v1, v2=v2)
df <- as.data.frame(t(df))
names(df) <- n
df <- cbind(var=c("v1","v2"), df)
> df
# var 01Min 01Max 01Med 01Q25 01Q75 02Min 02Max 02Med 02Q25 02Q75
# v1 v1 0.03 0.76 0.41 0.13 0.67 0.10 0.43 0.27 0.20 0.33
# v2 v2 0.03 0.28 0.14 0.08 0.20 0.02 0.77 0.13 0.06 0.44
Next, we'll reshape the data:
require(reshape2)
df.m <- melt(df, id="var")
# look for a bunch of numbers from the start of the string and capture it
# in the first variable: () captures the pattern. And replace it with the
# captured pattern with the variable "\\1"
df.m$year <- gsub("^([0-9]+)(.*$)", "\\1", df.m$variable)
# the same but instead refer to the captured pattern in the second
# paranthesis using "\\2"
df.m$quan <- gsub("^([0-9]+)(.*)$", "\\2", df.m$variable)
df.f <- dcast(df.m, var+year ~ quan, value.var="value")
To get to this format:
> df.f
# var year Max Med Min Q25 Q75
# 1 v1 01 0.76 0.41 0.03 0.13 0.67
# 2 v1 02 0.43 0.27 0.10 0.20 0.33
# 3 v2 01 0.28 0.14 0.03 0.08 0.20
# 4 v2 02 0.77 0.13 0.02 0.06 0.44
Now, we can plot by directly providing the quantile values to corresponding parameters using the corresponding column names as follows:
require(ggplot2)
require(scales)
p <- ggplot(df.f, aes(x=var, ymin=`Min`, lower=`Q25`, middle=`Med`,
upper=`Q75`, ymax=`Max`))
p <- p + geom_boxplot(aes(fill=year), stat="identity")
p
# if you want facetting:
p + facet_wrap( ~ var, scales="free")
You can now accomplish your task of plotting all years for each var in a separate plot using a lapply with this code and subsetting as follows:
lapply(levels(df.f$var), function(x) {
p <- ggplot(df.f[df.f$var == x, ],
aes(x=var, ymin=`Min`, lower=`Q25`,
middle=`Med`, upper=`Q75`, ymax=`Max`))
p <- p + geom_boxplot(aes(fill=year), stat="identity")
p
ggsave(paste0(x, ".pdf"), last_plot())
})
Edit: Your data is different from the earlier data you provided in some aspects. So, here's the version of the code for your new data:
# change var to VegType everywhere
require(reshape2)
df.m <- melt(df, id="VegType")
df.m$year <- gsub("^X([0-9]+)(.*$)", "\\1", df.m$variable) # pattern has a X
df.m$quan <- gsub("^X([0-9]+)(.*)$", "\\2", df.m$variable) # pattern has a X
df.f <- dcast(df.m, VegType+year ~ quan, value.var="value")
df.f$VegType <- factor(df.f$VegType) # convert integer to factor
require(ggplot2)
require(scales)
p <- ggplot(df.f, aes(x=VegType, ymin=`MIN`, lower=`Q25`, middle=`Q50`,
upper=`Q75`, ymax=`MAX`))
p <- p + geom_boxplot(aes(fill=year), stat="identity")
p
You can facet/write as separate plots using same code as before.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to select groups of rows and store to variables? - r

aggregate(mydf$AVG, list(month=months(as.Date(mydf$DATETIME))), mean) # month x # 1 April 0.2618125 # 2 August 0.2465500 # 3 July 0.2573000 # 4 June 0.2575500 # 5 March 0.2720000 # 6 May 0.2682429

Related

Column Mean for rows with unique values

Logaritmic scale in x-axis

Indexing certain values in a function

Error using corrplot

Multiple boxplots with predefined statistics using lattice-like graphs in r

Categories

Resources