i am trying to generate 0 and 1 for absence and presence. my data is line segments and i have to plot 0 or 1 at and interval of 0.1 for points that lie within the segment or points outside the segment.
V1 V2 V3 V4 V5 V6 V7
3 17 26.0 26.0 0 12-Jun-84 1 0
4 17 48.0 48.0 1 12-Jun-84 3 0
5 17 56.7 56.7 0 12-Jun-84 1 0
143 17 16.3 16.3 0 19-Jun-84 1 8
144 17 17.7 17.7 0 19-Jun-84 1 8
145 17 22.0 22.0 0 19-Jun-84 1 8
v2 and v3 are the start and endpoints and v4 is the separation between them.
i have tried
tran17 <- seq(0, 80, by=0.1)
tran17.date1 <- rep(0, length(tran17))
##
sub1 <-which(tran17 >= c$V2[i] & tran17 <= c$V3[i])
tran17.date1[sub1] <- 1
thankyou
Ignoring your data example and focusing in your question, I think this solves the problem. Also, if V1 is a grouping factor, you can use tapply over PAmatrix.
# test data
sed.seed(1104)
dat = data.frame(V1=17, V2=runif(200, 10, 60))
dat$V3 = dat$V2 + runif(200, 0, 20)
dat$V4 = dat$V3 - dat$V2
V1 V2 V3 V4
1 17 37.25826 45.54194 8.2836734
2 17 17.44098 22.86841 5.4274331
3 17 49.78488 55.51627 5.7313965
4 17 51.66640 52.54813 0.8817293
5 17 21.84276 39.38477 17.5420079
6 17 53.39457 54.51613 1.1215530
# functions to solve the problem
isInside = function(limits, tran) as.numeric(tran>=limits[1] & tran<=limits[2])
PAmatrix = function(data, tran) t(apply(data, 1, isInside, tran=tran))
# calculate the PA matrix
tran17 = seq(0, 80, by=0.1)
PA17 = PAmatrix(data=dat[,c("V2","V3")], tran=tran17)
# plot the results
image(seq(nrow(dat)), tran17, PA17, col=c("blue", "red"))
tran17 <- seq(0, 80, by=0.1)
tran17.date1 <- rep(0, length(tran17))
dm <- which(c$V5 == "31-Jul-84")
for(i in dm){
print(i)
sub1 <-which(tran17 >= c$V2[i] & tran17 <= c$V3[i])
tran17.date1[sub1] <- 1
}
plot(tran17, tran17.date1)
Related
I have a dataset that has 3 different conditions. Data within condition 1 will need to be divided by 15, data within conditions 2 and 3 will need to be divided by 10. I tried to do for() in order to create separate datasets for each condition and then merge the two groups (group 1 is composed of condition 1, group 2 is composed of conditions 2 and 3). This is what I have so far for condition 1. Is there an easier way to do this that does not require creating subgroups?
Group1 <- NULL
for (val in ParticipantID) {
ParticipantID_subset_Group1 <- subset(PronounData, ParticipantID == val & Condition == "1")
I_Words_PPM <- (ParticipantID_subset_Group1$I_Words/"15")
YOU_Words_PPM <- (ParticipantID_subset_Group1$YOU_Words/"15")
WE_Words_PPM <- (ParticipantID_subset_Group1$WE_Words/"15")
df <- data.frame(val, Group, I_Words_PPM, YOU_Words_PPM, WE_Words_PPM)
Group1 <- rbind(Group1, df)
}
dim(Group1)
colnames(Group1) <- c("ParticipantID", "Condition", "I_Words_PPM", "YOU_Words_PPM", "WE_Words_PPM")
View(Group1)
Couldn't fully test this solution without example data, but this should do what you want:
# make some fake data
PronounData <- data.frame(
ParticipantID = 1:9,
Condition = rep(1:3, 3),
I_Words = sample(0:20, 9, replace = TRUE),
YOU_Words = sample(0:40, 9, replace = TRUE),
WE_Words = sample(0:10, 9, replace = TRUE)
)
# if Condition 1, divide by 15
PronounData[PronounData$Condition == 1, c("I_Words_PPM", "YOU_Words_PPM", "WE_Words_PPM")] <-
PronounData[PronounData$Condition == 1, c("I_Words", "YOU_Words", "WE_Words")] / 15
# if Condition 2 or 3, divide by 10
PronounData[PronounData$Condition %in% 2:3, c("I_Words_PPM", "YOU_Words_PPM", "WE_Words_PPM")] <-
PronounData[PronounData$Condition %in% 2:3, c("I_Words", "YOU_Words", "WE_Words")] / 10
# result
PronounData
# ParticipantID Condition I_Words YOU_Words WE_Words I_Words_PPM YOU_Words_PPM WE_Words_PPM
# 1 1 1 17 40 6 1.1333 2.6667 0.4000
# 2 2 2 14 1 6 1.4000 0.1000 0.6000
# 3 3 3 2 34 8 0.2000 3.4000 0.8000
# 4 4 1 0 33 1 0.0000 2.2000 0.0667
# 5 5 2 4 15 0 0.4000 1.5000 0.0000
# 6 6 3 1 7 6 0.1000 0.7000 0.6000
# 7 7 1 6 10 1 0.4000 0.6667 0.0667
# 8 8 2 1 33 9 0.1000 3.3000 0.9000
# 9 9 3 9 40 0 0.9000 4.0000 0.0000
NB, R is built on vectorized operations, so looping through each row is rarely the best solution. Instead, you generally want to find a way of modifying whole vectors/columns at once, or at least subsets of them. This will usually be faster and simpler.
I have a dataframe, and I want to do some calculations depending on the previous rows (like dragging informations down in excel). My DF looks like this:
set.seed(1234)
df <- data.frame(DA = sample(1:3, 6, rep = TRUE) ,HB = sample(0:600, 6, rep = TRUE), D = sample(1:5, 6, rep = TRUE), AD = sample(1:14, 6, rep = TRUE), GM = sample(30:31, 6, rep = TRUE), GL = NA, R =NA, RM =0 )
df$GL[1] = 646
df$R[1] = 60
df$DA[5] = 2
df
# DA HB D AD GM GL R RM
# 1 2 399 4 13 30 646 60 0
# 2 2 97 4 10 31 NA NA 0
# 3 1 102 5 5 31 NA NA 0
# 4 3 325 4 2 31 NA NA 0
# 5 2 78 3 14 30 NA NA 0
# 6 1 269 4 8 30 NA NA 0
I want to fill out the missing values in my GL, R and RM columns, and the values are dependent on each other. So eg.
attach(df)
#calc GL and R for the 2nd row
df$GL[2] <- GL[1]+HB[2]+RM[1]
df$R[2] <- df$GL[2]*D[2]/GM[2]*AD[2]
#calc GL and R for the 3rd row
df$GL[3] <- df$GL[2]+HB[3]+df$RM[2]
df$R[3] <-df$GL[3]*D[3]/GM[3]*AD[3]
#and so on..
Is there a way to do all the calculations at once, instead of row by row?
In addition, each time the column 'DA' = 1, the previous values for 'R' should be summed up for the same row for 'RM', but only from the last occurence. So that
attach(df)
df$RM[3] <-R[1]+R[2]+R[3]
#and RM for the 6th row is calculated by
#df$RM[6] <-R[4]+R[5]+R[6]
Thanks a lot in advance!
You can use a for loop to calculate GL values and once you have them you can do the calculation for R columns directly.
for(i in 2:nrow(df)) {
df$GL[i] <- with(df, GL[i-1]+HB[i]+RM[i-1])
}
df$R <- with(df, (GL* D)/(GM *AD))
You can use indexing to solve the first two problems:
> # Original code from question~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> set.seed(1234)
> df <- data.frame(DA = sample(1:3, 6, rep = TRUE), HB = sample(0:600, 6, rep = TRUE),
+ D = sample(1:5, 6, rep = TRUE), AD = sample(1:14, 6, rep = TRUE),
+ GM = sample(30:31, 6, rep = TRUE), GL = NA, R =NA, RM =0 )
> df$GL[1] = 646
> df$R[1] = 60
> df$DA[5] = 2
> #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> # View df
> df
DA HB D AD GM GL R RM
1 2 399 4 13 30 646 60 0
2 2 97 4 10 31 NA NA 0
3 1 102 5 5 31 NA NA 0
4 3 325 4 2 31 NA NA 0
5 2 78 3 14 30 NA NA 0
6 1 269 4 8 30 NA NA 0
> # Solution below, based on indexing
> # 1. GL column
> df$GL <- cumsum(c(df$GL[1], df$HB[-1] + df$RM[-nrow(df)]))
> # 2. R column
> df$R[-1] <- (df$GL * df$D / df$GM * df$AD)[-1]
> # May be more clear like this (same result)
> df$R[-1] <- df$GL[-1] * df$D[-1] / df$GM[-1] * df$AD[-1]
> # Or did you mean this for last *?
> df$R[-1] <- (df$GL * df$D / (df$GM * df$AD))[-1]
The third problem can be solved with a loop.
> df$RM[1] <- df$R[1]
> for (i in 2:nrow(df)) {
+ df$RM[i] <- df$R[i] + df$RM[i-1] * (df$DA[i] != 2)
+ }
> df
DA HB D AD GM GL R RM
1 2 399 4 13 30 646 60.000000 60.000000
2 2 97 4 10 31 743 9.587097 9.587097
3 1 102 5 5 31 845 27.258065 36.845161
4 3 325 4 2 31 1170 75.483871 112.329032
5 2 78 3 14 30 1248 8.914286 8.914286
6 1 269 4 8 30 1517 25.283333 34.197619
Do these results look correct?
Update: Assuming RM should = R unless DA = 1, and in that case RM = sum of current row and previous R up to (not including) the above row with DA = 1, try the following loop.
df$RM[1] <- cs <- df$R[1]
for (i in 2:nrow(df)) {
df$RM[i] <- df$R[i] + cs * (df$DA[i] == 1)
cs <- cs * (df$DA[i] != 1) + df$R[i]
}
I'm trying to come up with a function that does the following to a data.frame outputting a new data.frame with the same names:
1- Creates a seq(min(target), max(target), .1).
2- Takes the mean of all other variables.
For example, if q is our data.frame, and jen is the target in it, I want to reformat q such that jen's data becomes seq(min(jen), max(jen), .1), and both bob and joe just change to their mean values.
Is it possible to do this in R?
I tried something but it is far from being accurate.
q = data.frame(bob = 1:5 - 3, jen = c(1.7, 2.6, 2.5, 4.4, 3.8) - 3, joe = 5:9)
change <- function(dataframe = q, target = "jen"){
n <- names(dataframe)
dataframe[target] <- seq(from = min(target), max(target), .1)
}
A base R solution. My idea is to create the target column first in the function, and then use a for-loop to add the mean of other columns.
# Example data frame
q <- data.frame(bob = 1:5 - 3, jen = c(1.7, 2.6, 2.5, 4.4, 3.8) - 3, joe = 5:9)
# Create then function
change <- function(dat, target){
vec <- dat[, target]
target_new <- seq(min(vec), max(vec), by = 0.1)
dat2 <- data.frame(target_new)
names(dat2) <- target
for (i in names(dat)[!names(dat) %in% target]){
dat2[[i]] <- mean(dat[[i]])
}
dat2 <- dat2[, names(dat)]
return(dat2)
}
# Apply the function
change(q, "jen")
# bob jen joe
# 1 0 -1.3 7
# 2 0 -1.2 7
# 3 0 -1.1 7
# 4 0 -1.0 7
# 5 0 -0.9 7
# 6 0 -0.8 7
# 7 0 -0.7 7
# 8 0 -0.6 7
# 9 0 -0.5 7
# 10 0 -0.4 7
# 11 0 -0.3 7
# 12 0 -0.2 7
# 13 0 -0.1 7
# 14 0 0.0 7
# 15 0 0.1 7
# 16 0 0.2 7
# 17 0 0.3 7
# 18 0 0.4 7
# 19 0 0.5 7
# 20 0 0.6 7
# 21 0 0.7 7
# 22 0 0.8 7
# 23 0 0.9 7
# 24 0 1.0 7
# 25 0 1.1 7
# 26 0 1.2 7
# 27 0 1.3 7
# 28 0 1.4 7
Here is one option with base R
data.frame(Map(function(x, y) if(x=="mean") get(x)(y) else
get(x)(min(y), max(y), by = 0.1), setNames(c("mean", "seq", "mean"), names(q)), q))
Or with dplyr
library(dplyr)
q %>%
summarise(bob = mean(bob),
jen = list(seq(min(jen), max(jen), by = 0.1)),
joe = mean(joe)) %>%
unnest
Or if there are many columns to get the mean and only a single column sequence, then instead of specifying one by one
q %>%
mutate_at(c(1,3), mean) %>%
group_by(bob, joe) %>%
summarise(jen = list(seq(min(jen), max(jen), by = 0.1))) %>%
unnest
Or use complete
q %>%
group_by(bob = mean(bob), joe = mean(joe)) %>%
complete(jen = seq(min(jen), max(jen), by = .1))
My solution uses colMeans function and repeats the result as many times as the sequence is long. Then I replace the target column with the sequence results.
q = data.frame(bob = 1:5 - 3, jen = c(1.7, 2.6, 2.5, 4.4, 3.8) - 3, joe = 5:9)
manip <- function(target, df){
t.column <- which(colnames(df) == target)
dfmeans <- colMeans(df)
minmax <- range(df[,t.column],na.rm = T)
t.seq <- seq(minmax[1],minmax[2],.1)
newdf <- matrix(dfmeans, ncol = length(dfmeans))[rep(1, length(t.seq)),]
newdf[,t.column] <- t.seq
colnames(newdf) <- colnames(df)
return(as.data.frame(newdf))
}
manip("jen",q)
I have vectors of different lengths. For instance:
df1
[1] 1 95 5 2 135 4 3 135 4 4 135 4 5 135 4 6 135 4
df2
[1] 1 70 3 2 110 4 3 112 4
I'm trying to write a script in R in order to have any vector enter the function or for loop and it returns a dataframe of three columns. So a separate dataframe for each input vector. Each vector is a multiple of three (hence, the three columns). I'm fairly new to R in terms of writing functions and can't seem to figure this out. Here was my attempt:
newdf = c()
ld <- length(df1)
ld_mult <- length(df1)/3
ld_seq <- seq(from=1,to=ld,by=3)
ld_seq2 < ld_seq +2
for (i in 1:ld_mult) {
newdf[i,] <- df1[ld_seq[i]:ld_seq2[i]]
}
the output I want for df1 would be:
1 95 5
2 135 4
3 135 4
4 135 4
5 135 4
6 135 4
Here's an example of how you could use matrix for that purpose:
x <- c(1, 95, 5,2, 135, 4, 3, 135, 4)
as.data.frame(matrix(x, ncol = 3, byrow = TRUE))
# V1 V2 V3
#1 1 95 5
#2 2 135 4
#3 3 135 4
And
y <- c(1, 70, 3, 2, 110, 4, 3, 112, 4)
as.data.frame(matrix(y, ncol = 3, byrow = TRUE))
# V1 V2 V3
#1 1 70 3
#2 2 110 4
#3 3 112 4
Or if you want to make it a custom function:
newdf <- function(vec) {
as.data.frame(matrix(vec, ncol = 3, byrow = TRUE))
}
newdf(y)
#V1 V2 V3
#1 1 70 3
#2 2 110 4
#3 3 112 4
You could also let the user specify the number of columns he wants to create with the function if you add another argument to newdf:
newdf <- function(vec, cols = 3) {
as.data.frame(matrix(vec, ncol = cols, byrow = T))
}
Now, the default number of columns is 3, if the user doesnt specify a number. If he wants to, he could use it like this:
newdf(z, 5) # to create 5 columns
Another nice little addon for the function would be a check if the input vector length is a multiple of the number of columns specified in the function call:
newdf <- function(vec, cols = 3) {
if(length(vec) %% cols != 0) {
stop("Number of columns is not a multiple of input vector length. Please double check.")
}
as.data.frame(matrix(vec, ncol = cols, byrow = T))
}
newdf(x, 4)
#Error in newdf(x, 4) :
# Number of columns is not a multiple of input vector length. Please double check.
If you had multiple vectors sitting in a list, here's how you could convert each of them to be a data.frame:
> l <- list(x,y)
> l
#[[1]]
#[1] 1 95 5 2 135 4 3 135 4
#
#[[2]]
#[1] 1 70 3 2 110 4 3 112 4
> lapply(l, newdf)
#[[1]]
# V1 V2 V3
#1 1 70 3
#2 2 110 4
#3 3 112 4
#
#[[2]]
# V1 V2 V3
#1 1 70 3
#2 2 110 4
#3 3 112 4
What function can I use to emulate ggplot2's default color palette for a desired number of colors. For example, an input of 3 would produce a character vector of HEX colors with these colors:
It is just equally spaced hues around the color wheel, starting from 15:
gg_color_hue <- function(n) {
hues = seq(15, 375, length = n + 1)
hcl(h = hues, l = 65, c = 100)[1:n]
}
For example:
n = 4
cols = gg_color_hue(n)
dev.new(width = 4, height = 4)
plot(1:n, pch = 16, cex = 2, col = cols)
This is the result from
library(scales)
show_col(hue_pal()(4))
show_col(hue_pal()(3))
These answers are all very good, but I wanted to share another thing I discovered on stackoverflow that is really quite useful, here is the direct link
Basically, #DidzisElferts shows how you can get all the colours, coordinates, etc that ggplot uses to build a plot you created. Very nice!
p <- ggplot(mpg,aes(x=class,fill=class)) + geom_bar()
ggplot_build(p)$data
[[1]]
fill y count x ndensity ncount density PANEL group ymin ymax xmin xmax
1 #F8766D 5 5 1 1 1 1.111111 1 1 0 5 0.55 1.45
2 #C49A00 47 47 2 1 1 1.111111 1 2 0 47 1.55 2.45
3 #53B400 41 41 3 1 1 1.111111 1 3 0 41 2.55 3.45
4 #00C094 11 11 4 1 1 1.111111 1 4 0 11 3.55 4.45
5 #00B6EB 33 33 5 1 1 1.111111 1 5 0 33 4.55 5.45
6 #A58AFF 35 35 6 1 1 1.111111 1 6 0 35 5.55 6.45
7 #FB61D7 62 62 7 1 1 1.111111 1 7 0 62 6.55 7.45
From page 106 of the ggplot2 book by Hadley Wickham:
The default colour scheme, scale_colour_hue picks evenly spaced hues
around the hcl colour wheel.
With a bit of reverse engineering you can construct this function:
ggplotColours <- function(n = 6, h = c(0, 360) + 15){
if ((diff(h) %% 360) < 1) h[2] <- h[2] - 360/n
hcl(h = (seq(h[1], h[2], length = n)), c = 100, l = 65)
}
Demonstrating this in barplot:
y <- 1:3
barplot(y, col = ggplotColours(n = 3))
To get the hex values instead of the plot you can use:
hue_pal()(3)
Instead of this code:
show_col(hue_pal()(3))