How can I keep the structure of a variable in R? - r

I have a data frame and a function. I when I run a function with the data frame as an argument, somehow it seems it changes the structure.
So I am getting errors.
I pasted the code and the data frame I am working on.
The error I am getting is "Error in plot[row, 1] : incorrect number of dimensions"
Code
avg.value = function(plot,delta,row) {
#plot=deparse(substitute(plot))
#delta=Week1DeltaT
value=as.numeric()
avg.value=as.numeric()
for (row in 1:11) {
# compute firtst value separately
value[1]=plot[row,1]*delta[1]
value[1]
# Loop for rest of the values
i=1
end = length(delta)-1
end
for( i in 1:end) {
value[i+1]=(plot[row,i+1]-plot[row,i])*delta[i+1]
i=i+1
}
avg.value[row]=sum(value)/4
}
return(avg.value)
}
Variable(data frame)
plot
6 7 8 9 10 11 12 13
R1 3 3 4 4 4 4 4 4
R2 1 3 3 3 3 4 4 4
R3 1 1 3 4 4 4 4 4
R4 1 3 4 4 4 4 4 4
R5 3 4 4 4 4 4 4 4
R6 1 3 3 4 4 4 4 4
R7 1 2 2 3 3 4 4 4
R8 2 4 4 4 4 4 4 4
R9 1 2 2 4 4 4 4 4
R10 3 4 4 4 4 4 4 4
R11 1 1 4 4 4 4 4 4
delta(a numeric vector of length 8)
6 7 8 9 10 11 12 13
row(a single numeric value. It can be from 1 to 11)
1~11, numeric
I realized from the comment that I accidentally begin the loop from the wrong place. Now this function works as I wanted.

Related

Convert a small dataset written in SPSS to CSV

I have a small dataset written in SPSS syntax which comes from Table 5.3 p. 189 of this book (type 210 in the page slot to see the table).
I was wondering if there might be a way to convert this data to .csv file? (I want to use the data in R afterwards)
# SPSS Code:
DATA LIST FREE/gpid anx socskls assert.
BEGIN DATA.
1 5 3 3 1 5 4 3 1 4 5 4 1 4 5 4
1 3 5 5 1 4 5 4 1 4 5 5 1 4 4 4
1 5 4 3 1 5 4 3 1 4 4 4
2 6 2 1 2 6 2 2 2 5 2 3 2 6 2 2
2 4 4 4 2 7 1 1 2 5 4 3 2 5 2 3
2 5 3 3 2 5 4 3 2 6 2 3
3 4 4 4 3 4 3 3 3 4 4 4 3 4 5 5
3 4 5 5 3 4 4 4 3 4 5 4 3 4 6 5
3 4 4 4 3 5 3 3 3 4 4 4
END DATA.
EDIT - in order to check answers I am adding here the actual way the data looks after reading it in SPSS :
gpid anx socskls assert
1 5 3 3
1 5 4 3
1 4 5 4
1 4 5 4
1 3 5 5
1 4 5 4
1 4 5 5
1 4 4 4
1 5 4 3
1 5 4 3
1 4 4 4
2 6 2 1
2 6 2 2
2 5 2 3
2 6 2 2
2 4 4 4
2 7 1 1
2 5 4 3
2 5 2 3
2 5 3 3
2 5 4 3
2 6 2 3
3 4 4 4
3 4 3 3
3 4 4 4
3 4 5 5
3 4 5 5
3 4 4 4
3 4 5 4
3 4 6 5
3 4 4 4
3 5 3 3
3 4 4 4
If I understand correctly, the 1st, 5th, 9th, and 13th column of the dataset belong to variable gpid, the 2nd, 6th, 10th, and 14th column belong to variable anx, and so on. So, we need to
reshape from wide to long format
with multiple measure variables
where each measure variable spans several columns
and where some values are missing.
Many roads lead to Rome.
This is what I would do using my favourite tools. In particular, this approach uses the feature of data.table::melt() to reshape multiple measure columns simultaneously. There is no manual cleanup of the data section in a text editor required.
The resulting dataset result can be used directly afterwards in any subsequent R code as requested by the OP. There is no need to take a detour using a .csv file (However, feel free to save result as a .csv file).
library(data.table)
library(magrittr)
cols <- c("gpid", "anx", "socskls", "assert")
raw <- fread(text = "
1 5 3 3 1 5 4 3 1 4 5 4 1 4 5 4
1 3 5 5 1 4 5 4 1 4 5 5 1 4 4 4
1 5 4 3 1 5 4 3 1 4 4 4
2 6 2 1 2 6 2 2 2 5 2 3 2 6 2 2
2 4 4 4 2 7 1 1 2 5 4 3 2 5 2 3
2 5 3 3 2 5 4 3 2 6 2 3
3 4 4 4 3 4 3 3 3 4 4 4 3 4 5 5
3 4 5 5 3 4 4 4 3 4 5 4 3 4 6 5
3 4 4 4 3 5 3 3 3 4 4 4",
fill = TRUE)
mv <- colnames(raw) %>%
matrix(ncol = 4L, byrow = TRUE) %>%
as.data.table() %>%
setnames(new = cols)
result <- melt(raw, measure.vars = mv, na.rm = TRUE)[
order(rowid(variable))][
, variable := NULL]
result
gpid anx socskls assert
1: 1 5 3 3
2: 1 5 4 3
3: 1 4 5 4
4: 1 4 5 4
5: 1 3 5 5
6: 1 4 5 4
7: 1 4 5 5
8: 1 4 4 4
9: 1 5 4 3
10: 1 5 4 3
11: 1 4 4 4
12: 2 6 2 1
13: 2 6 2 2
14: 2 5 2 3
15: 2 6 2 2
16: 2 4 4 4
17: 2 7 1 1
18: 2 5 4 3
19: 2 5 2 3
20: 2 5 3 3
21: 2 5 4 3
22: 2 6 2 3
23: 3 4 4 4
24: 3 4 3 3
25: 3 4 4 4
26: 3 4 5 5
27: 3 4 5 5
28: 3 4 4 4
29: 3 4 5 4
30: 3 4 6 5
31: 3 4 4 4
32: 3 5 3 3
33: 3 4 4 4
gpid anx socskls assert
Some explanations
fread() returns a data.table raw with default column names V1, V2, ... V16 and with missing values filled with NA
mv is a data.table which indicates which columns of raw belong to each target variable:
mv
gpid anx socskls assert
1: V1 V2 V3 V4
2: V5 V6 V7 V8
3: V9 V10 V11 V12
4: V13 V14 V15 V16
This informations is used by melt(). melt() also removes rows with missing values from the resulting long format.
After reshaping, the rows are ordered by the variable number but need to be reordered in the original row order by using rowid(variable). Finally, the variable column is removed.
EDIT: Improved version
Giving a second thought, here is a streamlined version of the code which skips the creation of mv and uses data.table chaining:
library(data.table)
cols <- c("gpid", "anx", "socskls", "assert")
result <- fread(
text = "
1 5 3 3 1 5 4 3 1 4 5 4 1 4 5 4
1 3 5 5 1 4 5 4 1 4 5 5 1 4 4 4
1 5 4 3 1 5 4 3 1 4 4 4
2 6 2 1 2 6 2 2 2 5 2 3 2 6 2 2
2 4 4 4 2 7 1 1 2 5 4 3 2 5 2 3
2 5 3 3 2 5 4 3 2 6 2 3
3 4 4 4 3 4 3 3 3 4 4 4 3 4 5 5
3 4 5 5 3 4 4 4 3 4 5 4 3 4 6 5
3 4 4 4 3 5 3 3 3 4 4 4",
fill = TRUE, col.names = rep(cols, 4L))[
, melt(.SD, measure.vars = patterns(cols), value.name = cols, na.rm = TRUE)][
order(rowid(variable))][
, variable := NULL][]
result
Here, the columns are renamed within the call to fread(). In this case, duplicated column names are desirable (as opposed to the usual use case) because the patterns() function in the subsequent call to melt() use the duplicated column names to combine the columns which belong to one measure variable.
This requires some manual clean-up in Notepad or similar to place the data in the right format. But essentially, this could be imported using the following
df <- data.frame(
gpid = c(1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,
2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3),
anx = c(5,5,4,4,3,4,4,4,5,5,4,6,6,5,6,
4,7,5,5,5,5,6,4,4,4,4,4,4,4,4,4,5,4),
socskls = c(3,4,5,5,5,5,5,4,4,4,4,2,2,2,2,
4,1,4,2,3,4,2,4,3,4,5,5,4,5,6,4,3,4),
assert = c(3,3,4,4,5,4,5,4,3,3,4,1,2,3,2,
4,1,3,3,3,3,3,4,3,4,5,5,4,4,5,4,3,4)
)
write.csv(df, "df.csv", row.names = F)
Note that the first 4 values (1, 5, 3, 3) are the gpid, anx, socskls, and assert values for row 1. Whereas the values 1, 5, 4, 3 which appear to be in the next column of the pasted data in SPSS syntax (i.e. the next 4 values reading the syntax left to right) are actually the values for participant 10.
Note: I'm assuming you don't have SPSS installed. If you did the easiest option would using SPSS syntax to create the dataset in SPSS and then just export to R.
Using readLines and some string manipulating tools.
tmp <- readLines("spss1.txt") ## read from .txt
tmp <- trimws(gsub("[A-Z/.]", "", tmp)) ## remove caps and specials
nm <- strsplit(tmp[[1]], " ")[[1]] ## split names
tmp <- unlist(strsplit(tmp[3:11], "\\s{2,}") ) ## split data blocks
Finally, splitting at the spaces gives the result.
dat <- setNames(
type.convert(do.call(rbind.data.frame, strsplit(tmp, "\\s"))),
nm)
Result
dat
# gpid anx socskls assert
# 1 1 5 3 3
# 2 1 5 4 3
# 3 1 4 5 4
# 4 1 4 5 4
# 5 1 3 5 5
# 6 1 4 5 4
# 7 1 4 5 5
# 8 1 4 4 4
# 9 1 5 4 3
# 10 1 5 4 3
# 11 1 4 4 4
# 12 2 6 2 1
# 13 2 6 2 2
# 14 2 5 2 3
# 15 2 6 2 2
# 16 2 4 4 4
# 17 2 7 1 1
# 18 2 5 4 3
# 19 2 5 2 3
# 20 2 5 3 3
# 21 2 5 4 3
# 22 2 6 2 3
# 23 3 4 4 4
# 24 3 4 3 3
# 25 3 4 4 4
# 26 3 4 5 5
# 27 3 4 5 5
# 28 3 4 4 4
# 29 3 4 5 4
# 30 3 4 6 5
# 31 3 4 4 4
# 32 3 5 3 3
# 33 3 4 4 4
Note: Results in the same Wilks' lambda as #emily-kothe's method. Maybe the authors used different data or your manova method is flawed?

Transforming a looping factor variable into a sequence of numerics

I have a factor variable with 6 levels, which simplified looks like:
1 1 2 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6 1 1 1 2 2 2 2... 1 1 1 2 2... (with n = 78)
Note, that each number is repeated mostly but not always three times.
I need to transform this variable into the following pattern:
1 1 2 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 8...
where each repetition of the 6 levels continuous counting ascending.
Is there any way / any function that lets me do that?
Sorry for my bad description!
Assuming that you have a numerical vector that represents your simplified version you posted. i.e. x = c(1,1,1,2,2,3,3,3,1,1,2,2), you can use this:
library(dplyr)
cumsum(x != lag(x, default = 0))
# [1] 1 1 1 2 2 3 3 3 4 4 5 5
which compares each value to its previous one and if they are different it adds 1 (starting from 1).
Maybe you can try rle, i.e.,
v <- rep(seq_along((v<-rle(x))$values),v$lengths)
Example with dummy data
x = c(1,1,1,2,2,3,3,3,4,4,5,6,1,1,2,2,3,3,3,4,4)
then we can get
> v
[1] 1 1 1 2 2 3 3 3 4 4 5 6 7 7 8 8 9 9
[19] 9 10 10
In base you can use diff and cumsum.
c(1, cumsum(diff(x)!=0)+1)
# [1] 1 1 2 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 8
Data:
x <- c(1,1,2,2,2,3,3,3,4,4,4,4,5,5,5,6,6,6,1,1,1,2,2,2,2)

Sequentially remove vector elements

I want to replicate a vector with one value within this vector is missing (sequentially).
For example, my vector is
value <- 1:7
First, the series is without 1, second without 2, and so on. In the end, the series is in one vector.
The intended output looks like
2 3 4 5 6 7 1 3 4 5 6 7 1 2 4 5 6 7 1 2 3 5 6 7 1 2 3 4 6 7 1 2 3 4 5 6
Is there any smart way to do this?
You could use the diagonal matrix to set up a logical vector, using it to remove the appropriate values.
n <- 7
rep(1:n, n)[!diag(n)]
# [1] 2 3 4 5 6 7 1 3 4 5 6 7 1 2 4 5 6 7 1 2 3 5 6 7 1 2 3 4 6 7 1 2 3 4 5
# [36] 7 1 2 3 4 5 6
Well, you can certainly do it as a one-liner but I am not sure it qualifies as smart. For example:
x <- 1:7
do.call("c", lapply(as.list(-1:-length(x)), function(a)x[a]))
This simple uses lapply to create a list of copies of x with each of its entries deleted, and then concatenates them using c. The do.call function applies its first argument (a function) to its second argument (a list of arguments to the function).
For fun, it's also possible to just use rep:
> n <- 7
> rep(1:n, n)[rep(c(FALSE, rep(TRUE, n)), length.out=n^2)]
[1] 2 3 4 5 6 7 1 3 4 5 6 7 1 2 4 5 6 7 1 2 3 5 6 7 1 2 3 4 6 7 1 2 3 4 5 7 1 2
[39] 3 4 5 6
But lapply is cleaner, I think.
You could also do:
n <- 7
rep(seq(n), n)[-seq(1,n*n,n+1)]
#[1] 2 3 4 5 6 7 1 3 4 5 6 7 1 2 4 5 6 7 1 2 3 5 6 7 1 2 3 4 6 7 1 2 3 4 5 7 1 2 3 4 5 6

Symmetric circulant matrices in R

I want to create symmetric circulant matrices.
Example of order 4:
1 2 : 3 4
2 1 : 4 3
.........
3 4 : 1 2
4 3 : 2 1
Example of order 8:
1 2 3 4 : 5 6 7 8
2 1 4 3 : 6 5 8 7
3 4 1 2 : 7 8 5 6
4 3 2 1 : 8 7 6 5
..................
5 6 7 8 : 1 2 3 4
6 5 8 7 : 2 1 4 3
7 8 5 6 : 3 4 1 2
8 7 6 5 : 4 3 2 1
How do I do this in R?
This appears to solve the problem but is way too clever. The flip(x)==1 idiom gives a matrix (once converted to numeric) of the form [0 1; 1 0] ...
flip <- function(x) x[rev(seq(nrow(x))),]
x <- matrix(c(1,2,2,1),2)
x2 <- kronecker(2*(flip(x)==1),x,"+") ## 4x4 solution
x3 <- kronecker(4*(flip(x)==1),x2,"+") ## 8x8 solution
Repeat for larger matrices of size 2^n (embed in a for loop if you want to do this a lot) ... I don't know what your desired answer would be for a matrix that's not of size 2^n (e.g. 12x12), but you might be able to find a way to extend this machinery.

Create set of matrices from concatenating columns of another matrix in r

I have two matrices A and B of dimension 5 by 3 and 5 by 2, respectively. I want to produce series of matrices combining each column of matrix B to A. The dimensions of the resulting matrices would be 5 by 4
Let A be
1 2 3
4 5 6
7 8 9
2 3 1
4 1 5
and B be
1 2
2 5
3 8
6 3
2 1
Then the resulting matrices are
1 2 3 1
4 5 6 2
7 8 9 3
2 3 1 6
4 1 5 2
and
1 2 3 2
4 5 6 5
7 8 9 8
2 3 1 3
4 1 5 1
Use our old friend the assignment operator. Assigning 1st column of B to 4th of A:
A[, 4] <- B[, 1]
> A
V1 V2 V3 V4
1 1 2 3 1
2 4 5 6 2
3 7 8 9 3
4 2 3 1 6
5 4 1 5 2
Then A[, 4] <- B[, 2], etc.

Resources