creating heatmap with R with eye-tracker data - r

I have a table composed by the following data
frame,X,Y
which is the resulting data from several eye tracking analysis.
Now I would like to create a Heatmap using R, like the following
I tried several script found online, none of them gave me that result.
How can I do?
Here some sample data
Ignore the first two columns
task,visualization,frame,X,Y
1,b,1,383,221
1,b,1,632,356
1,b,1,947,663
1,b,1,546,206
1,b,1,488,272
1,b,1,578,752
1,b,1,415,261
1,b,1,693,158
1,b,1,684,528
1,b,1,592,67
1,b,1,393,180
1,b,1,1033,709
1,b,1,1080,739
1,b,1,711,523
1,b,1,1246,49
1,b,1,742,69
1,b,1,601,370
1,b,10,902,684
1,b,10,517,241
1,b,10,583,86
1,b,10,582,754
1,b,10,426,257
1,b,10,575,229
1,b,10,697,150
1,b,10,379,520
1,b,10,390,286
1,b,10,618,396
1,b,10,710,143
1,b,10,383,188
1,b,10,1026,713
1,b,10,1078,625
1,b,10,713,521

You can get this type of plot quite easily using stat_bin2d from ggplot2:
library(ggplot2)
ggplot(dat, aes(x = X, y = Y)) + stat_bin2d(bins = 10)
This does simple binning, as #RomanLustrik suggested you could also perform some kind of kernel smoothing. This can also be done using ggplot2:
ggplot(dat, aes(x = X, y = Y)) +
stat_density2d(geom = "tile", aes(fill = ..density..), contour = FALSE) +
geom_point()
Note that dat is the example data you gave, geting your data into a data.frame:
dat = read.table(textConnection("task,visualization,frame,X,Y
1,b,1,383,221
1,b,1,632,356
1,b,1,947,663
1,b,1,546,206
1,b,1,488,272
1,b,1,578,752
1,b,1,415,261
1,b,1,693,158
1,b,1,684,528
1,b,1,592,67
1,b,1,393,180
1,b,1,1033,709
1,b,1,1080,739
1,b,1,711,523
1,b,1,1246,49
1,b,1,742,69
1,b,1,601,370
1,b,10,902,684
1,b,10,517,241
1,b,10,583,86
1,b,10,582,754
1,b,10,426,257
1,b,10,575,229
1,b,10,697,150
1,b,10,379,520
1,b,10,390,286
1,b,10,618,396
1,b,10,710,143
1,b,10,383,188
1,b,10,1026,713
1,b,10,1078,625
1,b,10,713,521"), header = TRUE, sep = ",")

Related

Overlay bathymetric data onto OSM multipolygons

I want to draw the map of a lake with the bathymetries I have taken with the sonar. I have a .sl2 file (Lowrance sonar) that I have converted to .csv (Namely, Sonar_13_07_1.csv). Finally I get 3 columns with 30665 rows, here is an example of the first rows:
latitude,longitude,waterDepthM
39.8197123940846,-3.11133523036904,0
39.8197193169248,-3.11133523036904,0
39.8197193169248,-3.11134424374202,0
39.8197262397644,-3.11134424374202,0
39.8197331626032,-3.11135325711499,0
39.8197400854413,-3.11135325711499,0
39.8197470082787,-3.11135325711499,0
39.8197539311154,-3.11135325711499,0
39.8197608539514,-3.11135325711499,0
39.8197608539514,-3.11134424374202,0
39.8197677767867,-3.11134424374202,0
39.8197677767867,-3.11135325711499,0
39.8197746996213,-3.11135325711499,0
39.8197746996213,-3.11134424374202,0
39.8197816224553,-3.11134424374202,0
39.8197885452885,-3.11134424374202,0
39.8197885452885,-3.11133523036904,0
39.819795468121,-3.11133523036904,0
39.8198023909528,-3.11132621699607,0
39.8198023909528,-3.11133523036904,0
39.819809313784,-3.11132621699607,0
39.8198162366144,-3.11132621699607,0
39.8198231594442,-3.11132621699607,0
39.8198370051015,-3.11132621699607,0
39.8198300822732,-3.11132621699607,0
39.8198439279292,-3.11132621699607,0
39.8198508507561,-3.11133523036904,0
39.8198508507561,-3.11132621699607,0
39.8198577735824,-3.11133523036904,0
39.8198646964079,-3.11133523036904,0
39.8198646964079,-3.11134424374202,0
39.8198716192328,-3.11135325711499,0
39.8198716192328,-3.11134424374202,0
39.8198716192328,-3.11136227048797,0
39.8198785420569,-3.11135325711499,0
39.8198716192328,-3.11136227048797,-0.691144658182553
39.8198785420569,-3.11135325711499,-0.691144658182553
39.8198716192328,-3.11136227048797,-0.72783260768886
39.8198785420569,-3.11135325711499,-0.72783260768886
39.8198716192328,-3.11136227048797,-0.735494858005278
39.8198785420569,-3.11135325711499,-0.735494858005278
39.8198716192328,-3.11136227048797,-0.754367615888273
39.8198785420569,-3.11135325711499,-0.754367615888273
39.8198716192328,-3.11136227048797,-0.762954301055886
39.8198785420569,-3.11135325711499,-0.762954301055886
39.8198785420569,-3.11136227048797,-0.762954301055886
I manage to plot it like this:
library(ggplot2)
ggplot(Sonar_13_07_1, aes(longitude, latitude, colour = waterDepthM)) + geom_point() + coord_equal() + xlim(x_coords) + ylim(y_coords)
On the other hand, I plotted the base map of the lake:
library(osmdata)
library(sf)
> x_coords <- c(-3.109, -3.117)
> y_coords <- c(39.817, 39.832)
> bounding_box <- matrix(nrow = 2, ncol=2, byrow = T,
+ data = c(x_coords, y_coords),
+ dimnames = list(c("x", "y"),
+ c("min", "max")))
> osm_water_sf <- osmdata::opq(bbox = bounding_box) %>% # Limit query to bounding_box
+ osmdata::add_osm_feature(key = 'natural', value = 'water') %>% # Limit query to waterbodies
+ osmdata::osmdata_sf() # Convert to simple features
ggplot(data=osm_water_sf$osm_polygons) +
geom_sf(color="blue", fill="lightblue") +
xlim(x_coords) + ylim(y_coords)
But finally I don't manage to merge the two plots into one, superimposing the bathymetric data on the base map. Also, my intention would be to obtain contours from the bathymetric data of the lake, but of course, I'm already stuck with the merging of both plots.
Thanks for the suggestion, I have managed to have a spatial data frame like this:
"","waterDepthM","geometry"
"1",0,c(-3.11133523036904, 39.8197123940846)
"2",0,c(-3.11133523036904, 39.8197193169248)
"3",0,c(-3.11134424374202, 39.8197193169248)
"4",0,c(-3.11134424374202, 39.8197262397644)
"5",0,c(-3.11135325711499, 39.8197331626032)
"6",0,c(-3.11135325711499, 39.8197400854413)
"7",0,c(-3.11135325711499, 39.8197470082787)
"8",0,c(-3.11135325711499, 39.8197539311154)
"9",0,c(-3.11135325711499, 39.8197608539514)
"10",0,c(-3.11134424374202, 39.8197608539514)
but when I try to plot both data frames together, I only get a base map of the lake with the depth legend, but no depth values appear.
ggplot() +
+ geom_sf(data=osm_water_sf$osm_polygons, color="blue", fill="NA") +
+ geom_sf(data=Sonar_13_07_1, aes(col=waterDepthM)) +
+ xlim(x_coords) + ylim(y_coords)
plot output
Use the sf package to convert your sonar data frame to a spatial data frame. Something like:
library(sf)
Sonar_13_07_1 = st_as_sf(Sonar_13_07_1, coords=c("longitude","latitude"), crs=4326)
then you have an object you can put in a ggplot construction via an sf geom alongside your OSM vector data. Something like:
ggplot(data=osm_water_sf$osm_polygons) +
geom_sf(color="blue", fill="lightblue") +
geom_sf(data=Sonar_13_07_1, aes(col=waterDepthM)) +
xlim(x_coords) + ylim(y_coords)

I need to know why I get the error 'unexpected input in "p<-ggplot(data=mov2, aes(x=Genre,y=Gross % US))" '

I want to prepare the plot's data and aes layers. But this code doesn't run
p <- ggplot(data = mov2, aes(x = Genre, y = Gross % US))
when aes layers is taken off, it's working
p <- ggplot(data = mov2)
p <- ggplot(data = mov2, aes(x = Genre, y = Gross % US)) # this code got error
v <- ggplot(data = movies, aes(x = Genre, y = CriticRating)) #this code is working
Error: unexpected input in "p<-ggplot(data=mov2, aes(x=Genre,y=Gross % US))"
Most R code will get confused with columns that have spaces or weird symbols like %. You need to surround those with backticks to R knows that's supposed to be a column name. Try
p <- ggplot(data=mov2, aes(x=Genre,y=`Gross % US`))

How to use a custom-defined function to change a text label in geom_text()

I have some data, and I want to use some variables from stat_count() to label a bar plot.
This is what I want to do:
library(ggplot2)
library(scales)
percent_and_count <- function(pct, cnt){
paste0(percent(pct), ' (', cnt, ')')
}
ggplot(aes(x=Type)) +
stat_count(aes(y=(..prop))) +
geom_text(aes(y=(..prop..), label=percent_and_count(..prop.., ..count))),
stat='count')
However, I get this error, since it can't find the function in what I assume is either some base packages or the data frame:
Error in eval(expr, envir, enclos) : could not find function "percent_and_count"
I get this error if I do percent(..prop..) as well, although it is fine with scales::percent(..prop..). I did not load my function from a package.
If all else fails, I can do
geom_text(aes(y=(..prop..), label=utils::getAnywhere('percent_and_count')$objs[[1]]((..prop..),(..count..))))
But this seems needlessly roundabout for what should be a stupidly simple task.
You can use bquote and aes_:
# Sample data
set.seed(2017);
df <- data.frame(
Type = sample(6, 100, replace = T)
);
library(ggplot2);
library(scales);
# Your custom function
percent_and_count <- function(pct, cnt){
paste0(percent(pct), ' (', cnt, ')')
}
ggplot(df, aes(x = Type)) +
stat_count(aes(y = ..prop..)) +
geom_text(
stat = "count",
aes_(
y = ~(..prop..),
label = bquote(.(percent_and_count)((..prop..), (..count..)))))
Explanation: bquote(.(percent_and_count)(...)) ensures that percent_and_count is found (as terms .(...) are evaluated in the parent environment). We then use aes_ to ensure that quoted expressions (either with ~ or bquote) are properly evaluated.
Still not pretty, but probably more straighforward than using utils::getAnywhere.

Multiple lines multiple error bars using ggplot2 in R

I have three csv files which are read into r as dataframes. I want to create a line plot which graphs the "means" columns and uses the "sd" column as an above and below error bar.
This code gives me a multiple lines on a plot but with only one error bar:
ggplot(data=edge_c_summary,aes(x = times,y=means))+
geom_errorbar(aes(ymin=means-sd,ymax=means+sd))+
geom_line(aes(y=means))+
geom_line(data = ridge_c_summary,aes(x=times,y=means))+
geom_errorbar(aes(ymin=means-sd,ymax=means+sd))+
geom_line(data = valley_c_summary,aes(x=times,y=means))+
geom_errorbar(aes(ymin=means-sd,ymax=means+sd))
How can I change this code to make each line have the appropriate error bar for each point?
edge_c_summary
"","times","means","sd"
"1",1,23.6566108007449,0.97897699678658
"12",2,22.7815144766147,1.15800405896118
"19",3,23.3195763580458,1.10152573531062
"20",4,22.3962138084633,1.25626506966065
"21",5,23.0657328322515,1.17624485082946
"22",6,22.1194877505568,1.32888708114411
"23",7,22.9947511929107,1.25304663407105
"24",8,23.121714922049,1.53918225223541
"25",9,25.9304732720463,2.01279986529601
"2",10,27.2791342952275,2.63979959777048
"3",11,28.7510747185261,2.66804271260005
"4",12,29.4782463928968,3.00223132377325
"5",13,29.7261003070624,2.90440605187483
"6",14,30.3099889012209,3.15106156713522
"7",15,29.4545951486163,2.87696770282654
"8",16,29.1991111111111,2.73260690130748
"9",17,27.6885928961749,2.28949704545011
"10",18,26.8358888888889,1.99002819664902
"11",19,25.4207579378628,1.30543445825041
"13",20,24.6197777777778,1.28917282788259
"14",21,24.4374658469945,1.0001400647698
"15",22,23.7050055617353,1.12314557626891
"16",23,23.9770833333333,0.974658804573153
"17",24,23.2177975528365,1.12526920271045
"18",25,23.5250320924262,1.12891528015421
ridge_c_summary
"","times","means","sd"
"1",1,23.681434407626,0.989915240381175
"2",10,26.7027079303675,2.32962251222789
"3",11,27.9654291654292,2.38864888176336
"4",12,28.7457528957529,2.69414439432221
"5",13,28.9534165181224,2.68690267338402
"6",14,29.4438223938224,2.91979342111894
"7",15,28.8215325215325,2.6872152195944
"8",16,28.5877813504823,2.57493709806332
"9",17,27.3870056497175,2.19608259108006
"10",18,26.8308927424534,2.03789359897681
"11",19,25.5481404343945,1.41979111451077
"12",2,23.1454838709677,1.13422699496685
"13",20,24.9886246786632,1.36068090029202
"14",21,24.5601606664683,1.05832239119392
"15",22,24.1409646302251,1.16360525517371
"16",23,24.0566369047619,1.00175077418615
"17",24,23.6077813504823,1.11726702939239
"18",25,23.5780952380952,1.10355334756497
"19",3,23.3004172876304,1.10354221988403
"20",4,22.7314193548387,1.23686119466203
"21",5,23.0191654247392,1.18428611015011
"22",6,22.451935483871,1.29021975136401
"23",7,22.9037125037125,1.26259590667806
"24",8,23.1967741935484,1.48879695691969
"25",9,25.306534006534,1.76717581300979
valley_c_summary
"","times","means","sd"
"1",1,23.6594671201814,1.00814940817697
"2",10,26.0565511411665,2.16929556678063
"3",11,27.7657114295235,2.35397972988285
"4",12,28.3993260320135,2.71926477093656
"5",13,28.8432522492503,2.59319788793986
"6",14,29.1439865433137,2.86403883310426
"7",15,28.7382333333333,2.61080581070595
"8",16,28.488161209068,2.54623846359401
"9",17,27.2384794931644,2.06859192137737
"10",18,26.7695542472666,1.97980925001807
"11",19,25.4289052069426,1.36213237635363
"12",2,23.234375,1.2419107444281
"13",20,25.0288607594937,1.58285604050205
"14",21,24.5043071786311,1.02557712012499
"15",22,24.1491983122363,1.22981051413331
"16",23,24.0402003338898,0.981743823579669
"17",24,23.6662173546757,1.19576801398666
"18",25,23.700081300813,1.0898936548588
"19",3,23.3752591106653,1.08538931168628
"20",4,22.8620981387479,1.32723123739125
"21",5,23.1140421263791,1.16174678633048
"22",6,22.5889264581572,1.39010429942654
"23",7,22.9904,1.22621465254853
"24",8,23.0340371621622,1.48447539690888
"25",9,25.0078692897633,1.60606487763767
Easiest solution is to add an extra column to each data frame for grouping. For example, using dplyr::mutate and dplyr::bind_rows:
library(dplyr)
edge_c_summary %>%
mutate(source = "edge_c") %>%
bind_rows(mutate(ridge_c_summary, source = "ridge_c")) %>%
bind_rows(mutate(valley_c_summary, source = "valley_c")) %>%
ggplot(aes(times, means) +
geom_line(aes(color = source, group = source)) +
geom_errorbar(aes(ymin = means - sd, ymax = means + sd, color = source))
edge_c_summary <- read.csv(file="edge_c_summary.csv",header=TRUE,sep=",")
ridge_c_summary <- read.csv(file="ridge_c_summary.csv",header=TRUE,sep=",")
valley_c_summary <- read.csv(file="valley_c_summary.csv",header=TRUE,sep=",")
I also added different colors so they are somewhat distinguishable, that you can also ignore if you don't like them.
ggplot(data=edge_c_summary,aes(x = times,y=means))+
geom_errorbar(data=edge_c_summary,aes(ymin=means-sd,ymax=means+sd))+
geom_line(aes(y=means))+
geom_line(data = ridge_c_summary,aes(x=times,y=means),colour="red")+
geom_errorbar(data=ridge_c_summary,aes(ymin=means-sd,ymax=means+sd),colour="red")+
geom_line(data = valley_c_summary,aes(x=times,y=means),colour="blue")+
geom_errorbar(data=valley_c_summary,aes(ymin=means-sd,ymax=means+sd),colour="blue")

GTrendsR + ggplot2?

I want to generate a plot of interest over time using GTrendsR and ggplot2
The plot I want (generated with google trends) is this:
Any help will be much appreciated.
Thanks!
This is the best I was able to get:
library(ggplot2)
library(devtools)
library(GTrendsR)
usr = "my.email"
psw = "my.password"
ch = gConnect(usr, psw)
location = "all"
query = "MOOCs"
MOOCs_trends = gTrends(ch, geo = location, query = query)
MOOCs<-MOOCs_trends[[1]]
MOOCs$moocs<-as.numeric(as.character(MOOCs$moocs))
MOOCs$Week <- as.character(MOOCs$Week)
MOOCs$start <- as.Date(MOOCs$Week)
ggplot(MOOCs[MOOCs$moocs!=0,], aes(start, moocs)) +
geom_line(colour = "blue") +
ylab("Trends") + xlab("") + theme_bw()
I think that to match the graph generated by google I would need to aggregate the data to months instead of weeks... not sure how to do that yet
The object returned by gtrendsR is a list, of which the trend element in a data.frame that you would want to plot.
usr = "my.email"
psw = "my.password"
gconnect(usr, psw)
MOOCs_trends = gtrends('MOOCs')
MOOCsDF <- MOOCs_trends$trend
ggplot(data = MOOCsDF) + geom_line(aes(x=start, y=moocs))
This gives:
Now if you want to aggregate by month, I would suggest using the floor_date function from the lubridate package, in combination with dplyr (note that I am using the chain operator %>% which dplyr re-exports from the magrittr package).
usr = "my.email"
psw = "my.password"
gconnect(usr, psw)
MOOCs_trends = gtrends('MOOCs')
MOOCsDF <- MOOCs_trends
MOOCsDF$start <- floor_date(MOOCsDF$start, unit = 'month')
MOOCsDF %>%
group_by(start) %>%
summarise(moocs = sum(moocs)) %>%
ggplot() + geom_line(aes(x=start, y=moocs))
This gives:
Note 1: The query MOOCs was changed to moocs, by gtrendsR, this is reflected in the y variable that you're plotting.
Note 2: some of the cases of functions have changed (e.g. gtrendsR not GTrendsR), I am using current versions.
This will get you most of the way there. The plot doesn't look quite right, but that's more of a function of the data being a bit different. Here's the necessary conversions to numeric and to dates.
MOOCs<-MOOCs_trends[[1]]
library(ggplot2)
library(plyr)
## Convert to string
MOOCs$Week <- as.character(MOOCs$Week)
MOOCs$moocs <- as.numeric(MOOCs$moocs)
# split the string
MOOCs$start <- unlist(llply(strsplit(MOOCs$Week," - "), function(x) return(x[2])))
MOOCs$start <- as.POSIXlt(MOOCs$start)
ggplot(MOOCs,aes(x=start,y=moocs))+geom_point()+geom_path()
Google might do some smoothing, but this will plot the data you have.

Resources