R saccades analysis

R saccades analysis - r

Blockquote
I got eye tracking gaze data in the form of x/y coordinates and timestamps.
Now I want to plot the saccades using the R package saccades. Unfortunately, it doesn't work. I guess it's a matter of having the data in the wrong format.
My data:
> View(EUFKDCDL_Q09AS_saccades_2)
> head(EUFKDCDL_Q09AS_saccades)
# A tibble: 6 x 4
time x y trial
<dbl> <dbl> <dbl> <dbl>
1 1550093577941 732 391 1
2 1550093577962 706 320 1
3 1550093577980 666 352 1
4 1550093578000 886 288 1
5 1550093578017 787 221 1
6 1550093578037 729 302 1
The code that didn't work:
> fixations <- detect.fixations(EUFKDCDL_Q09AS_saccades)
Error in detect.fixations(EUFKDCDL_Q09AS_saccades) :
No saccades were detected. Something went wrong.
The full code that shouldwork according github (it'swith the sample data):
> library(saccades)
> data(samples)
> head(samples)
time x y trial
1 0 53.18 375.73 1
2 4 53.20 375.79 1
3 8 53.35 376.14 1
4 12 53.92 376.39 1
5 16 54.14 376.52 1
6 20 54.46 376.74 1
> fixations <- detect.fixations(samples)
> head(fixations[c(1,4,5,10)])
trial x y dur
0 1 53.81296 377.40741 71
1 1 39.68156 379.58711 184
2 1 59.99267 379.92467 79
3 1 18.97898 56.94046 147
4 1 40.28365 39.03599 980
5 1 47.36547 35.39441 1310
> diagnostic.plot(samples, fixations)
So there must be a problem with how my data is structured I guess? What does the mean?
I hope that any of you can help me creating this saccade plot as in the sceenshot attached
I am an R newbie as well...please be patient with me. :D

Related

How to sort a data frame by column?

I want sort a data frame by datas of a column (the first column, called Initial). My data frame it's:
I called my dataframe: t2
Initial Final Changes
1 1 200
1 3 500
3 1 250
24 25 175
21 25 180
1 5 265
3 3 147
I am trying with code:
t2 <- t2[order(t2$Initial, t2$Final, decreasing=False),]
But, the result is of the type:
Initial Final Changes
3 1 250
3 3 147
21 25 180
24 25 175
1 5 265
1 1 200
1 3 500
And when I try with code:
t2 <- t2[order(t2$Initial, t2$Final, decreasing=TRUE),]
The result is:
Initial Final Changes
1 5 265
1 1 200
1 3 500
24 25 175
21 25 180
3 1 250
3 3 147
I don't understand what happen.
Can you help me, please?

It is possible that the column types are factors, in that case, convert it to numeric and should work
library(dplyr)
t2 %>%
arrange_at(1:2, ~ desc(as.numeric(as.character(.))))
Or with base R
t2[1:2] <- lapply(t2[1:2], function(x) as.numeric(as.character(x)))
t2[do.call(order, c(t2[1:2], decreasing = TRUE)), ]
Or the OP's code should work as well
Noticed that decreasing = False in the first option OP tried (may be a typo). In R, it is upper case, FALSE
t2[order(t2$Initial, t2$Final, decreasing=FALSE),]

create matrix from raw data

My data looks like this:
> head(data, 20)
# A tibble: 20 x 2
hosp zip
<chr> <chr>
1 010001 14843
2 010001 36303
3 010016 13320
4 010021 10468
5 010023 36040
6 010023 36116
7 010023 36116
8 010023 36116
9 010024 36401
10 010029 10025
11 010029 11412
12 010029 11733
13 010033 14086
14 010033 14701
15 010033 35244
16 010034 12308
17 010038 11413
18 010039 10011
19 010039 11704
20 010039 35749
hospis hospital id and zip is zip code. Patients in each hospital came from multiple zip codes. How can I create a matrix to present for each hospital, how many patients were from each zip code?
Ideal matrix would be like this:
zip 010001 010016 010021 ... hosp
14843 1 0 0
36303 1 0 0
13320 0 1 0
10468 0 0 1
Thanks!!

As was stated in the comments you can use table. The t() function puts zip code on the left:
t(as.matrix(table(data)))

SMOTE length of 'dimnames' [2] not equal to array extent

I was trying to supersample my dataset using SMOTE and i keep running into this error.
trainSM <- SMOTE(conversion ~ ., train,perc.over = 1000,perc.under = 200)
Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE),
nrow = nr, : length of 'dimnames' [2] not equal to array extent
My dataset is as follows:
conversion horizon length_of_stay guests rooms price comp_price
(dbl) (int) (int) (int) (int) (int) (int)
1 1 193 2 2 1 199 210
2 1 263 2 2 1 171 88
3 1 300 3 2 1 164 164
4 1 70 4 2 1 76 80
5 1 65 6 2 2 260 260
6 1 50 3 2 1 171 176
7 1 4 3 2 1 158 167
8 1 29 3 2 1 171 171
9 0 130 1 2 1 161 160
10 0 26 2 2 1 110 110
I have tried working only with numerical predictors and even categorical predictors. But no luck with both.
Any help/guidance is greatly appreciated.

Passing a data.frame that is a tibble into DMwR::SMOTE() will throw this error. You can work around it by using as.data.frame(your_train_data) to 'un-tibble' your data.frame:
trainSM <- SMOTE(conversion ~ ., as.data.frame(train), perc.over = 1000, perc.under = 200)
The issue is that SMOTE() uses single bracket subsetting. Tibbles (ie. a data.frame turned into a tibble::data_frame) are much more strict about return values: single bracket subsetting always return a data frame (even if the results are only a single vector or even a single value).
Here's the problematic part of the SMOTE() source code:
# The idea here is to determine which level of the response variable appears least.
# Unfortunately, if data is a tibble, then data[,tgt] returns a data frame,
# which of course, doesn't have any levels, so the value of minCL is always NULL
minCl <- levels(data[, tgt])[which.min(table(data[, tgt]))]
# this is where the error is thrown--you're testing a data frame against NULL
minExs <- which(data[, tgt] == minCl)

Printing only certain panels in R lattice

I am plotting a quantile-quantile plot for a certain data that I have. I would like to print only certain panels that satisfy a condition that I put in for panel.qq(x,y,...).
Let me give you an example. The following is my code,
qq(y ~ x|cond,data=test.df,panel=function(x,y,subscripts,...){
if(length(unique(test.df[subscripts,2])) > 3 ){panel.qq(x,y,subscripts,...})})
Here y is the factor and x is the variable that will be plotted on X and y axis. Cond is the conditioning variable. What I would like is, only those panels be printed that pass the condition in the panel function, which is
if(length(unique(test.df[subscripts,2])) > 3).
I hope this information helps. Thanks in advance.
Added Sample data,
y x cond
1 1 6 125
2 2 5 125
3 1 5 125
4 2 6 125
5 1 3 125
6 2 8 125
7 1 8 125
8 2 3 125
9 1 5 125
10 2 6 125
11 1 5 124
12 2 6 124
13 1 6 124
14 2 5 124
15 1 5 124
16 2 6 124
17 1 4 124
18 2 7 124
19 1 0 123
20 2 11 123
21 1 0 123
22 2 11 123
23 1 0 123
24 2 11 123
25 1 0 123
26 2 11 123
27 1 0 123
28 2 2 123
So this is the sample data. What I would like is to not have a panel for 123 as the number of unique values for 123 is 3, while for others its 4. Thanks again.

Yeah, I think it is a subset problem, not a lattice one. You don't include an example, but it looks like you want to keep only rows where there are more than 3 rows for each value of whatever is in column 2 of your data frame. If so, here is a data.table solution.
library(data.table)
test.dt <- as.data.table(test.df)
test.dt.subset <- test.dt[,N:=.N,by=c2][N>3]
Where c2 is that variable in the second column. The last line of code first adds a variable, N, for the count of rows (.N) for each value of c2, then subsets for N>3.
UPDATE: And since a data table is also a data frame, you can use test.dt.subset directly as the data source in the call to qq (or other lattice function).
UPDATE 2: Here is one way to do the same thing without data.table:
d <- data.frame(x=1:15,y=1:15%%2, # example data frame
c2=c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5))
d$N <- 1 # create a column for count
split(d$N,d$c2) <- lapply(split(d$x,d$c2),length) # populate with count
d
d[d$N>3,] # subset

I did something very similar to DaveTurek.
My sample dataframe above is test.df
test.df.list <- split(test.df,test.df$cond,drop=F)
final.test.df <- do.call("rbind",lapply(test.df.list,function(r){
if(length(unique(r$x)) > 3){r}})
So, here I am breaking the test.df as a list of data.frames by the conditioning variable. Next, in the lapply I am checking the number of unique values in each of subset dataframe. If this number is greater than 3 then the dataframe is given /taken back if not it is ignored. Next, a do.call to bind all the dfs back to one big df to run the quantile quantile plot on it.
In case anyone wants to know the qq function call after getting the specific data. then it is,
trellis.device(postscript,file="test.ps",color=F,horizontal=T,paper='legal')
qq(y ~ x|cond,data=final.test.df,layout=c(1,1),pch=".",cex=3)
dev.off()
Hope this helps.

Assign industry codes according to ranges in R

I would like to assign overall industry/parent codes to a data.frame (df below) containing more detailed/child codes (called ChildCodes below). The following data serves to illustrate my data.frame containing the detailed codes:
> df <- as.data.frame(cbind(c(1,2,3,4,5,6),c(110,101,200,2041,3651,2102)))
> names(df) <- c('Id','ChildCodes')
> df
Id ChildCodes
1 1 110
2 2 101
3 3 200
4 4 2041
5 5 3651
6 6 2102
The industry/parent codes are in the .csv file here: https://www.dropbox.com/s/5qtb7ysys1ar0lj/IndustryCodes.csv
The problem for me is the format of the .csv file. The file shows the parent/industry code in column 1 and ranges of child/detailed codes in the next 2 columns. Here is a subset:
> IndustryCodes <- as.data.frame(cbind(c(1,1,2,5,6),c(100,200,2040,2100,3650),c(199,299,2046,2199,3651)))
> names(IndustryCodes) <- c('IndustryGroup','LowerRange','UpperRange')
> IndustryCodes
IndustryGroup LowerRange UpperRange
1 1 100 199
2 1 200 299
3 2 2040 2046
4 5 2100 2199
5 6 3650 3651
So that ChildCode 110 corresponds industry group 1, 2041 to industry code 2 etc. How do best assign the industry/parent codes (IndustryGroup) to df in R?
Thanks!

You can use sapply to get the Industry code for every child code:
sapply(df$ChildCodes,
function(x) IndustryCodes$IndustryGroup[IndustryCodes$LowerRange <= x &
x <= IndustryCodes$UpperRange])
# [1] 1 1 1 2 6 5

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R saccades analysis - r

Related

How to sort a data frame by column?

create matrix from raw data

SMOTE length of 'dimnames' [2] not equal to array extent

Printing only certain panels in R lattice

Assign industry codes according to ranges in R

Categories

Resources