Rescale positive and negative numbers into [0, 1] and [-1, 0]

Rescale positive and negative numbers into [0, 1] and [-1, 0] - r

I have a data frame like this:
Input_df <- data.frame(Enl_ID = c("INTS121410", "INTS175899", "INTS171428", "INTS156006", "INTS196136", "INTS114771" ), `CN4244` = c(5, 0, -0.4, -0.6, 10, 2), `CN4249` = c(10, -4, -10, -2, 6, 0), `CN4250` = c(40, 10, 4, -10, 0, 4))
I'm trying to rescale the positive values between 0-1 and negative values between 0 to -1 so the output would be like
Output_df <- data.frame(Enl_ID = c("INTS121410", "INTS175899", "INTS171428", "INTS156006", "INTS196136", "INTS114771" ), `CN4244` = c(0.5, 0, -0.66, -1, 1, 0.2), `CN4249` = c(1, -0.4, -1, -0.2, 0.6, 0), `CN4250` = c(1, 0.25, 0.1, -1, 0, 0.1))
I found few examples like at stackoverflow but this is only for single-column and my file run into almost 2000 column so it is not possible to do it manually on every column.
Any idea how to do it?
Any help would be appreciated. Thanks in advance

You could use
library(dplyr)
Input_df %>%
mutate(across(starts_with("CN"), ~.x / max(abs(.x))))
This returns
Enl_ID CN4244 CN4249 CN4250
1 INTS121410 0.50 1.0 1.00
2 INTS175899 0.00 -0.4 0.25
3 INTS171428 -0.04 -1.0 0.10
4 INTS156006 -0.06 -0.2 -0.25
5 INTS196136 1.00 0.6 0.00
6 INTS114771 0.20 0.0 0.10
Or, if you want different rescaling factors for positive and negative values:
Input_df %>%
mutate(across(starts_with("CN"),
~case_when(.x >= 0 ~ .x / max(.x),
TRUE ~ - .x / min(.x))))
This returns
Enl_ID CN4244 CN4249 CN4250
1 INTS121410 0.5000000 1.0 1.00
2 INTS175899 0.0000000 -0.4 0.25
3 INTS171428 -0.6666667 -1.0 0.10
4 INTS156006 -1.0000000 -0.2 -1.00
5 INTS196136 1.0000000 0.6 0.00
6 INTS114771 0.2000000 0.0 0.10

Related

Mutating new columns based on common string using existing columns

Sample data:
X_5 X_1 Y alpha_5 alpha_1 beta_5 beta_1
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.21 0.02 0.61 10 5 3 0.01
2 0.01 0.02 0.37 0.4 0.01 0.8 0.5
3 0.02 0.03 0.55 0.01 0.01 0.3 0.99
4 0.04 0.05 0.29 0.01 0.005 0.03 0.55
5 0.11 0.1 -0.08 0.22 0.015 0.01 0.01
6 0.22 0.21 -0.08 0.02 0.03 0.01 0.01
I have a dataset which has columns of some variable of interest, say alpha, beta, and so on. I also have this saved as a character vector. I want to be able to mutate new columns based on these variable names, suffixed with an identifier, using the existing columns in the dataset as part of some transformation, like this:
df %>% mutate(
alpha_new = ((alpha_5-alpha_1) / (X_5-X_1) * Y),
beta_new = ((beta_5-beta_1) / (X_5-X_1) * Y)
)
X_5 X_1 Y alpha_5 alpha_1 beta_5 beta_1 alpha_new beta_new
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.21 0.02 0.61 10 5 3 0.01 16.1 9.60
2 0.01 0.02 0.37 0.4 0.01 0.8 0.5 -14.4 -11.1
3 0.02 0.03 0.55 0.01 0.01 0.3 0.99 0 38.0
4 0.04 0.05 0.29 0.01 0.005 0.03 0.55 -0.145 15.1
5 0.11 0.1 -0.08 0.22 0.015 0.01 0.01 -1.64 0
6 0.22 0.21 -0.08 0.02 0.03 0.01 0.01 0.0800 0
In my real data I have many more columns like this and I'm struggling to implement this in a "tidy" way which isn't hardcoded, what's the best practice for my situation?
Sample code:
structure(
list(
X_5 = c(0.21, 0.01, 0.02, 0.04, 0.11, 0.22),
X_1 = c(0.02,
0.02, 0.03, 0.05, 0.10, 0.21),
Y = c(0.61, 0.37, 0.55, 0.29, -0.08, -0.08),
alpha_5 = c(10, 0.4, 0.01, 0.01, 0.22, 0.02),
alpha_1 = c(5, 0.01, 0.01, 0.005, 0.015, 0.03),
beta_5 = c(3, 0.8, 0.3, 0.03, 0.01, 0.01),
beta_1 = c(0.01, 0.5, 0.99, 0.55, 0.01, 0.01)
),
row.names = c(NA, -6L),
class = c("tbl_df", "tbl", "data.frame")
) -> df
variable_of_interest <- c("alpha", "beta")

Here's another way to approach this with dynamic creation of columns. With map_dfc from purrr you can column-bind new results, creating new column names with bang-bang on left hand side of := operator, and using .data to access column values on right hand side.
library(tidyverse)
bind_cols(
df,
map_dfc(
variable_of_interest,
~ transmute(df, !!paste0(.x, '_new') :=
(.data[[paste0(.x, '_5')]] - .data[[paste0(.x, '_1')]]) /
(X_5 - X_1) * Y)
)
)
Output
X_5 X_1 Y alpha_5 alpha_1 beta_5 beta_1 alpha_new beta_new
1 0.21 0.02 0.61 10.00 5.000 3.00 0.01 16.05263 9.599474
2 0.01 0.02 0.37 0.40 0.010 0.80 0.50 -14.43000 -11.100000
3 0.02 0.03 0.55 0.01 0.010 0.30 0.99 0.00000 37.950000
4 0.04 0.05 0.29 0.01 0.005 0.03 0.55 -0.14500 15.080000
5 0.11 0.10 -0.08 0.22 0.015 0.01 0.01 -1.64000 0.000000
6 0.22 0.21 -0.08 0.02 0.030 0.01 0.01 0.08000 0.000000

Better to pivot the data first
library(dplyr)
library(tidyr)
# your data
df <- structure(list(X_5 = c(0.21, 0.01, 0.02, 0.04, 0.11, 0.22), X_1 = c(0.02,
0.02, 0.03, 0.05, 0.1, 0.21), Y = c(0.61, 0.37, 0.55, 0.29, -0.08,
-0.08), alpha_5 = c(10, 0.4, 0.01, 0.01, 0.22, 0.02), alpha_1 = c(5,
0.01, 0.01, 0.005, 0.015, 0.03), beta_5 = c(3, 0.8, 0.3, 0.03,
0.01, 0.01), beta_1 = c(0.01, 0.5, 0.99, 0.55, 0.01, 0.01)), class = "data.frame", row.names = c(NA,
-6L))
df <- df |> mutate(id = 1:n()) |>
pivot_longer(cols = -c(id, Y, X_5, X_1),
names_to = c("name", ".value"), names_sep="_") |>
mutate(new= (`5` - `1`) / (X_5 - X_1) * Y) |>
pivot_wider(id_cols = id, names_from = "name", values_from = c(`5`,`1`, `new`),
names_glue = "{name}_{.value}", values_fn = sum)
df
#> # A tibble: 6 × 7
#> id alpha_5 beta_5 alpha_1 beta_1 alpha_new beta_new
#> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 10 3 5 0.01 16.1 9.60
#> 2 2 0.4 0.8 0.01 0.5 -14.4 -11.1
#> 3 3 0.01 0.3 0.01 0.99 0 38.0
#> 4 4 0.01 0.03 0.005 0.55 -0.145 15.1
#> 5 5 0.22 0.01 0.015 0.01 -1.64 0
#> 6 6 0.02 0.01 0.03 0.01 0.0800 0
Created on 2023-02-16 with reprex v2.0.2
Note: if you want to add X_5 and X_1 in the output use id_cols = c(id, X_5, X_1) instead.

I modified your data to create a bit more complicated situation. My hope is that this is close to your real situation. The condition in this idea is that two columns that you wanna pair up stay next to each other. The first job is to collect column names that begin with small letters. Next job is to create a data frame. Here I keep the column names in odd positions
in target in the first column, and ones in even positions in the second column. I was thinking in the same line of Ben; I used map2_dfc to create an output data frame. In this function, I replaced all small letters with X so that I could specify two column names in the original data (i.e., ones starting with X). Then, I did the calculation as you specified. Finally, I created a column name for outcome in the loop. If you want to add the result to the original data, you can run the final line with cbind.
grep(x = names(df), pattern = "[[:lower:]]+_[0-9]+", value = TRUE) -> target
tibble(first_element = target[c(TRUE, FALSE)],
second_element = target[c(FALSE, TRUE)]) -> mydf
map2_dfc(.x = mydf$first_element,
.y = mydf$second_element,
.f = function(x, y) {
sub(x = x, pattern = "[[:lower:]]+", replacement = "X") -> foo1
sub(x = y, pattern = "[[:lower:]]+", replacement = "X") -> foo2
outcome <- ((df[x] - df[y]) / (df[foo1] - df[foo2]) * df["Y"])
names(outcome) <- paste(x,
sub(x = y, pattern = "[[:lower:]]+", replacement = ""),
sep = "")
return(outcome)
}) -> result
cbind(df, result)
# alpha_5_1 alpha_2_6 beta_5_1 beta_3_4
#1 16.05263 0.10736 9.599474 0.27145
#2 -14.43000 0.10730 -11.100000 0.28564
#3 0.00000 0.28710 37.950000 0.50820
#4 -0.14500 0.21576 15.080000 0.64206
#5 -1.64000 -0.06416 0.000000 -0.61352
#6 0.08000 -0.08480 0.000000 -0.25400
DATA
structure(list(
X_5 = c(0.21, 0.01, 0.02, 0.04, 0.11, 0.22),
X_1 = c(0.02,0.02, 0.03, 0.05, 0.10, 0.21),
X_2 = 1:6,
X_6 = 6:11,
X_3 = 21:26,
X_4 = 31:36,
Y = c(0.61, 0.37, 0.55, 0.29, -0.08, -0.08),
alpha_5 = c(10, 0.4, 0.01, 0.01, 0.22, 0.02),
alpha_1 = c(5, 0.01, 0.01, 0.005, 0.015, 0.03),
alpha_2 = c(0.12, 0.55, 0.39, 0.28, 0.99, 0.7),
alpha_6 = 1:6,
beta_5 = c(3, 0.8, 0.3, 0.03, 0.01, 0.01),
beta_1 = c(0.01, 0.5, 0.99, 0.55, 0.01, 0.01),
beta_3 = c(0.55, 0.28, 0.76, 0.86, 0.31, 0.25),
beta_4 = c(5, 8, 10, 23, 77, 32)),
row.names = c(NA, -6L),
class = c("tbl_df", "tbl", "data.frame")) -> df

Replacing zeroes with NA for values preceding non-zero

I'm new to R and have been struggling with the following for a while now so I was hoping someone would be able to help me out.
The sample data represents stock price returns (each row is a monthly period). The real data set is much bigger and is structured like the input below:
Input:
stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- cbind(stock1,stock2,stock3,stock4)
stock1 stock2 stock3 stock4
[1,] 0.01 0.00 0.00 0.00
[2,] -0.02 0.00 0.00 -0.02
[3,] 0.01 0.02 0.02 0.01
[4,] 0.05 0.04 0.00 0.00
[5,] 0.04 -0.03 -0.01 0.00
[6,] -0.02 0.02 0.03 -0.02
Any zeroes that precedes a non-zero for a given stock represents missing data as opposed to a return of zero for the period. I would like to set these values as NA so the output I would like to achieve is the following:
Desired Output:
stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(NA, NA, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(NA, NA, 0.02, 0, -0.01, 0.03)
stock4 <- c(NA, -0.02, 0.01, 0, 0, -0.02)
df <- cbind(stock1,stock2,stock3,stock4)
stock1 stock2 stock3 stock4
[1,] 0.01 NA NA NA
[2,] -0.02 NA NA -0.02
[3,] 0.01 0.02 0.02 0.01
[4,] 0.05 0.04 0.00 0.00
[5,] 0.04 -0.03 -0.01 0.00
[6,] -0.02 0.02 0.03 -0.02
I've tried a few things but they only seem to work for a single vector as opposed to a data set with multiple columns. I've tried using lapply to get around this but haven't had any luck so far. The closest I've gotten is shown below.
My single vector solution:
stock1[1:min(which(stock1!=0))-1 <- NA
My multiple vector solution which does not work:
lapply(df,function(x) x[1:min(which(x!=0))-1 <- NA]
Would greatly appreciate any guidance! Thanks!

There are three issues. First, writing:
df <- cbind(stock1,stock2,stock3,stock4)
doesn't create a data frame. It creates a matrix. This is an issue when you try to use lapply, which will operate over the columns of a data frame but over the elements of a matrix. Instead, you should write:
df <- data.frame(stock1,stock2,stock3,stock4)
Second, the function you're using in lapply needs to return the modified vector. Otherwise, the return value will be something unexpected (in this case, the assignment will return a single NA, and the lapply will return a data frame of one row of NAs instead of the data frame you want).
Third, you need to take care with 1:n when n can be zero (i.e., when the first stock quote is non-zero) because 1:0 gives the sequence c(1,0) instead of an empty sequence. (This is arguably one of R's stupidest features.)
Therefore, the following will give you what you want:
stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- data.frame(stock1,stock2,stock3,stock4)
as.data.frame(lapply(df, function(x) {
n <- min(which(x != 0)) - 1
if (n > 0)
x[1:n] <- NA
x
}))
The output is as expected:
stock1 stock2 stock3 stock4
1 0.01 NA NA NA
2 -0.02 NA NA -0.02
3 0.01 0.02 0.02 0.01
4 0.05 0.04 0.00 0.00
5 0.04 -0.03 -0.01 0.00
6 -0.02 0.02 0.03 -0.02
Update: As #Daniel_Fischer notes, there's a clever trick to avoid the 1:0 problem. You can instead write:
as.data.frame(lapply(df, function(x) {
n <- min(which(x != 0)) - 1
x[0:n] <- NA # use 0:n instead of 1:n
x
}))
This takes advantage of the fact that R ignores zeros in this type of indexing operation, so:
x[0:0] <- NA # same as x[0] <- NA and does nothing
x[0:1] <- NA # same as x[1] <- NA
x[0:2] <- NA # same as x[1:2] <- NA, etc.

This might be not the most elegant way, but I think it works
changeValues <- function(x){
place <- min(which(diff(c(0,cumsum(x==0)))==0))-1;
x[0:place] <- NA
x
}
apply(df,2,changeValues)
EDIT: Some brief explanation to the function: First I create a vector that increases at each position where is a zero in your column, then I check at which position this vector does not increase (=that means, there are not two zeros next to each other) and then I still take the minimum of that and make sure that these are only leading zeros (so that not values from within the matrix are changed)

stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- data.frame(stock1,stock2,stock3,stock4) #the following function only works if df is actually a data.frame
df[] <- lapply(df, function(x) {ifelse(cumsum(x) == 0 & x == 0, NA, x)})
df
stock1 stock2 stock3 stock4
1 0.01 NA NA NA
2 -0.02 NA NA -0.02
3 0.01 0.02 0.02 0.01
4 0.05 0.04 0.00 0.00
5 0.04 -0.03 -0.01 0.00
6 -0.02 0.02 0.03 -0.02
Some explanation: first check for each cell whether the cumulative colSum ánd the current cell are equal to 0. If so, return NA, else the original value. The brackets behind df make sure the lapply function returns a dataframe again that is assigned to df.
Also, if you don't really need df to be a dataframe, this works as well:
df <- cbind(stock1,stock2,stock3,stock4)
apply(df, 2, function(x) {ifelse(cumsum(x) == 0 & x == 0, NA, x)})

transform data.frame matrix-column into columns

When using aggregate with compound function, the resulting data.frame has matrices inside columns.
ta=aggregate(cbind(precision,result,prPo)~rstx+qx+laplace,t0
,function(x) c(x=mean(x),m=min(x),M=max(x)))
ta=head(ta)
dput(ta)
structure(list(rstx = c(3, 3, 2, 3, 2, 3), qx = c(0.2, 0.25,
0.3, 0.3, 0.33, 0.33), laplace = c(0, 0, 0, 0, 0, 0), precision = structure(c(0.174583333333333,
0.186833333333333, 0.3035, 0.19175, 0.30675, 0.193666666666667,
0.106, 0.117, 0.213, 0.101, 0.22, 0.109, 0.212, 0.235, 0.339,
0.232, 0.344, 0.232), .Dim = c(6L, 3L), .Dimnames = list(NULL,
c("x", "m", "M"))), result = structure(c(-142.333333333333,
-108.316666666667, -69.1, -85.7, -59.1666666666667, -68.5666666666667,
-268.8, -198.2, -164, -151.6, -138.2, -144.8, -30.8, -12.2, -14.2,
-3.8, -12.6, -3.4), .Dim = c(6L, 3L), .Dimnames = list(NULL,
c("x", "m", "M"))), prPo = structure(c(3.68416666666667,
3.045, 2.235, 2.53916666666667, 2.0775, 2.23666666666667, 1.6,
1, 1.02, 0.54, 0.87, 0.31, 5.04, 4.02, 2.77, 3.53, 2.63, 3.25
), .Dim = c(6L, 3L), .Dimnames = list(NULL, c("x", "m", "M")))), .Names = c("rstx",
"qx", "laplace", "precision", "result", "prPo"), row.names = c(NA,
6L), class = "data.frame")
Is there a function that transform data.frame matrix-colum into columns?
Manually, for each matrix-column, column bind plus column delete works:
colnames(ta)
[1] "rstx" "qx" "laplace" "precision" "result" "prPo"
ta[,"precision"] # ta[,4]
x m M
[1,] 0.1745833 0.106 0.212
[2,] 0.1868333 0.117 0.235
[3,] 0.3035000 0.213 0.339
[4,] 0.1917500 0.101 0.232
[5,] 0.3067500 0.220 0.344
[6,] 0.1936667 0.109 0.232
#column bind + column delete
ta=cbind(ta,precision=ta[,4])
ta=ta[,-4]
colnames(ta)
[1] "rstx" "qx" "laplace" "result" "prPo" "precision.x" "precision.m"
[8] "precision.M"
ta
rstx qx laplace result.x result.m result.M prPo.x prPo.m prPo.M precision.x precision.m
1 3 0.20 0 -142.33333 -268.80000 -30.80000 3.684167 1.600000 5.040000 0.1745833 0.106
2 3 0.25 0 -108.31667 -198.20000 -12.20000 3.045000 1.000000 4.020000 0.1868333 0.117
3 2 0.30 0 -69.10000 -164.00000 -14.20000 2.235000 1.020000 2.770000 0.3035000 0.213
4 3 0.30 0 -85.70000 -151.60000 -3.80000 2.539167 0.540000 3.530000 0.1917500 0.101
5 2 0.33 0 -59.16667 -138.20000 -12.60000 2.077500 0.870000 2.630000 0.3067500 0.220
6 3 0.33 0 -68.56667 -144.80000 -3.40000 2.236667 0.310000 3.250000 0.1936667 0.109
precision.M
1 0.212
2 0.235
3 0.339
4 0.232
5 0.344
6 0.232

matrix doesn't support matrix-column. So as.matrix() transform data.frame into matrix, breaking up matrix-column.
Here is my idea:
library(tidyverse)
ta2 <- ta %>%
as.matrix() %>%
as.data.frame()

Somewhere in Stackoverflow I found a very simple solution:
cbind(ta[-ncol(ta)],ta[[ncol(ta)]])
rstx qx laplace precision.x precision.m precision.M result.x result.m result.M x m
1 3 0.20 0 0.1745833 0.1060000 0.2120000 -142.33333 -268.80000 -30.80000 3.684167 1.60
2 3 0.25 0 0.1868333 0.1170000 0.2350000 -108.31667 -198.20000 -12.20000 3.045000 1.00
3 2 0.30 0 0.3035000 0.2130000 0.3390000 -69.10000 -164.00000 -14.20000 2.235000 1.02
4 3 0.30 0 0.1917500 0.1010000 0.2320000 -85.70000 -151.60000 -3.80000 2.539167 0.54
5 2 0.33 0 0.3067500 0.2200000 0.3440000 -59.16667 -138.20000 -12.60000 2.077500 0.87
6 3 0.33 0 0.1936667 0.1090000 0.2320000 -68.56667 -144.80000 -3.40000 2.236667 0.31
M
1 5.04
2 4.02
3 2.77
4 3.53
5 2.63
6 3.25
Just that!

length of 'dimnames' [2] not equal to array extent when using corrplot function from a matrix read from a csv file

I wanna read the data from a csv file, save it as a matrix and use it for visualization.
data<-read.table("Desktop/Decision_Tree/cor_test_.csv",header = F,sep = ",")
data
V1 V2 V3 V4 V5 V6
1 1.00 0.00 0.00 0.00 0.00 0
2 0.11 1.00 0.00 0.00 0.00 0
3 0.12 0.03 1.00 0.00 0.00 0
4 -0.04 0.54 0.32 1.00 0.00 0
5 -0.12 0.57 -0.09 0.26 1.00 0
6 0.21 -0.04 0.24 0.18 -0.21 1
It goes well. But then:
corrplot(data, method = 'color', addCoef.col="grey")
It is said that:
Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), nrow = nr, :
length of 'dimnames' [2] not equal to array extent
I don't know how to solve it.

corrplot requires a matrix, I assume your data is a data frame. Use as.matrix(data) instead.
Example:
## Your data as data frame:
data <- structure(list(V1 = c(1, 0.11, 0.12, -0.04, -0.12, 0.21), V2 = c(0,
1, 0.03, 0.54, 0.57, -0.04), V3 = c(0, 0, 1, 0.32, -0.09, 0.24
), V4 = c(0, 0, 0, 1, 0.26, 0.18), V5 = c(0, 0, 0, 0, 1, -0.21
), V6 = c(0, 0, 0, 0, 0, 1)), .Names = c("V1", "V2", "V3", "V4",
"V5", "V6"), row.names = c(NA, -6L), class = "data.frame")
## Using the data frame results in an error:
corrplot::corrplot(data, method = 'color', addCoef.col = "grey")
# Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), nrow = nr, :
# length of 'dimnames' [2] not equal to array extent
## Using the matrix works:
corrplot::corrplot(as.matrix(data), method = 'color', addCoef.col = "grey")

loops in R, finding the mean for one column depends on another column

So my test data looks like this:
structure(list(day = c(1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L
), Left = c(0.25, 0.33, 0, 0, 0.25, 0.33, 0.5, 0.33, 0.5, 0),
Left1 = c(NA, NA, 0, 0.5, 0.25, 0.33, 0.1, 0.33, 0.5, 0),
Middle = c(0, 0, 0.3, 0, 0.25, 0, 0.3, 0.33, 0, 0), Right = c(0.25,
0.33, 0.3, 0.5, 0.25, 0.33, 0.1, 0, 0, 0.25), Right1 = c(0.5,
0.33, 0.3, 0, 0, 0, 0, 0, 0, 0.75), Side = structure(c(2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("L", "R"), class = "factor")), .Names = c("day",
"Left", "Left1", "Middle", "Right", "Right1", "Side"), class = "data.frame", row.names = c(NA,
-10L))
or this:
day Left Left1 Middle Right Right1 Side
1 0.25 NA 0.00 0.25 0.50 R
1 0.33 NA 0.00 0.33 0.33 R
2 0.00 0.00 0.30 0.30 0.30 R
2 0.00 0.50 0.00 0.50 0.00 R
2 0.25 0.25 0.25 0.25 0.00 L
3 0.33 0.33 0.00 0.33 0.00 L
I would like to write a loop to find the standard error and average value for each day on the chosen side..
Ok.. So far I have this code:
td<-read.csv('test data.csv')
IDs<-unique(td$day)
se<-function(x) sqrt(var(x)/length(x))
for (i in 1:length (IDs)) {
day.i<-which(td$day==IDs[i])
td.i<-td[day.i,]
if(td$Side=='L'){
side<-cbind(td.i$Left + td.i$Left1)
}else{
side<-cbind(td.i$Right + td.i$Right1)
}
mean(side)
se(side)
print(mean)
print(se)
}
But I am getting error messages like this
Error: unexpected '}' in "}"
Obviously, I am also not getting the print out of means for each day.. Does anyone know why?
also working on things here: http://www.talkstats.com/showthread.php/27187-Writing-a-mean-loop..-(literally)

Convert your data into a list and work with that instead:
First, split up your data into a list according to Side, subsetting the relevant columns along the way.
td = split(td, td$Side)
NAMES = names(td)
td = lapply(1:length(td),
function(x) td[[x]][c(1, grep(NAMES[x],
names(td[[x]])))])
names(td) = NAMES
td
# $L
# day Left Left1
# 5 2 0.25 0.25
# 6 3 0.33 0.33
# 7 3 0.50 0.10
# 8 4 0.33 0.33
# 9 4 0.50 0.50
#
# $R
# day Right Right1
# 1 1 0.25 0.50
# 2 1 0.33 0.33
# 3 2 0.30 0.30
# 4 2 0.50 0.00
# 10 4 0.25 0.75
Then, use lapply and aggregate to apply whatever functions you want to your data.
lapply(1:length(td),
function(x) aggregate(list(td[[x]][-1]),
list(day = td[[x]]$day), mean))
# [[1]]
# day Left Left1
# 1 2 0.250 0.250
# 2 3 0.415 0.215
# 3 4 0.415 0.415
#
# [[2]]
# day Right Right1
# 1 1 0.29 0.415
# 2 2 0.40 0.150
# 3 4 0.25 0.750

Still not entirely sure if I understand (that is if you want mean and SE for both Left and Left 1 or some sort of combination like sum). This is how I interpreted your question:
FUN <- function(dat, side = "L") {
DF <- split(dat, dat$Side)[[side]]
ind <- if(side=="L") 2:3 else 5:6
stderr <- function(x) sqrt(var(x)/length(x))
meanNse <- function(x) c(mean=mean(x), se=stderr(x))
OUT <- aggregate(DF[, ind], list(DF[, 1]), meanNse)
names(OUT)[1] <- "day"
return(OUT)
}
#test it
FUN(td)
FUN(td, "R")
Which yields:
> FUN(td)
day Left.mean Left.se Left1.mean Left1.se
1 2 0.250 NA 0.250 NA
2 3 0.415 0.085 0.215 0.115
3 4 0.415 0.085 0.415 0.085
> FUN(td, "R")
day Right.mean Right.se Right1.mean Right1.se
1 1 0.29 0.04 0.415 0.085
2 2 0.40 0.10 0.150 0.150
3 4 0.25 NA 0.750 NA

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Rescale positive and negative numbers into [0, 1] and [-1, 0] - r

Related

Mutating new columns based on common string using existing columns

Replacing zeroes with NA for values preceding non-zero

transform data.frame matrix-column into columns

length of 'dimnames' [2] not equal to array extent when using corrplot function from a matrix read from a csv file

loops in R, finding the mean for one column depends on another column

Categories

Resources