The lattice equivalent of geom_tile when displaying text - r

I am interested in knowing if there is a lattice alternative to geom_tile() in ggplot2 when I want to display factor levels/map fill to text. Example data frame (df) follows...
Gene Sample Mutation
A1 2 Missense
A2 2 WT
A1 3 Missense
A2 3 Missense
With ggplot2 this is trivial
qplot(df, y=Gene, x=Sample, fill=Mutation, geom='tile')
what would the lattice equivalent of this be? (I am interested in this because axis alignment in ggplot2 between plots is convoluted and cumbersome currently).
df <- structure(list(Gene = structure(c(1L, 2L, 1L, 2L), .Label = c("A1", "A2"), class = "factor"),
Sample = structure(c(1L, 1L, 2L, 2L ), .Label = c("2", "3"), class = "factor"),
Mutation = structure(c(1L, 2L, 1L, 1L), .Label = c("Missense", "WT"), class = "factor")), .Names = c("Gene", "Sample", "Mutation"), row.names = c(NA, -4L), class = "data.frame")

Check out the levelplot() function in lattice, for example
library("lattice")
df <- transform(df, Sample = factor(Sample))
levelplot(Mutation ~ Gene * Sample, data = df)
You'll need to work out the colour scale key yourself though.

Related

trying to summarize survey data for questions with 'select all that apply' using R

We have a survey that asks for 'select all that apply' so the result is a string inside quotes with the values separated by commas. i.e. "red, black,green"
There are other question about income so I have a factor with 'low, medium, high'
I want to be able to answer questions: What percent selected 'Red', then group that by income.
I can split the string with
'''df4 <- c("black,silver,green")'''
I can create a data frame with a timestamp and the split string with
'''t2 <- as.data.frame(c(df2[2],l2))'''
I am not able to understand how to do this for all rows at one time.
Here is a DPUT of the input:
structure(list(RespData = structure(1:2, .Label = c("1/20/2020",
"1/21/2020"), class = "factor"), CarColor = c("red,blue,green,yellow",
"black,silver,green")), row.names = c(NA, -2L), class = "data.frame")
and here is a DPUT of the desired output:
structure(list(RespData = structure(c(1L, 1L, 1L, 1L, 2L, 2L,
2L), .Label = c("1/20/2020", "1/21/2020"), class = "factor"),
Cars = structure(c(3L, 1L, 2L, 4L, 5L, 6L, 2L), .Label = c("blue",
"green", "red", "yellow", "black", "silver"), class = "factor")), row.names = c(NA,
-7L), class = "data.frame")
Example of Function:
MySplitFunc <- function(ListIn) {
# build an empty data frame and set the column names
x1.all <- ListIn[0,]
names(x1.all) <- c("ResponseTime", "Descriptive")
# for each row build the data and combine to growing list
for(x in 1:nrow(ListIn)) {
#print(x)
r1 <- ListIn[x,1]
c1 <- strsplit(ListIn[x,2],",")
x1 <- as.data.frame(c(r1,c1))
# set the names and combine to all
names(x1) <- c("ResponseTime", "Descriptive")
x1.all <- rbind(x1.all,x1)
}
# strip the whitespace
x1.all <- data.frame(lapply(x1.all, trimws), stringsAsFactors = TRUE)
return(x1.all)
}

Prevent repeated facets in ggplot2 when using facet_wrap

I have repeated facet headers that I need to combine into a single header with the following sample data:
structure(list(Level1 = structure(c(1L, 1L, 1L, 1L, 1L), .Label =
"Repeated", class = "factor"), Level2 = structure(c(1L, 1L, 2L, 2L, 2L),
.Label = c("A", "B"), class = "factor"), Level3 = structure(1:5, .Label
= c("A", "B", "C", "D", "E"), class = "factor"), Value = c(6L, 2L, 3L,
4L, 0L)), .Names = c("Level1", "Level2", "Level3", "Value"), class =
"data.frame", row.names = c(NA, -5L))
I created a barplot with ggplot2 using facet_wrap() and set the number of columns to 1:
ggplot(data=samp, aes(x=reorder(Level3, Value), y=Value)) +
geom_bar(stat="identity") +
coord_flip() +
ylab("Value") +
xlab("Level3") +
facet_wrap(~Level1 + Level2, scales = 'free_y', ncol = 1)
I need to show the nesting for the Level3, Level2, and Level1 factors but without repeating any facet headers. Here the Level1 factor facet header is repeated.
I attempted to follow instructions from the post below, but could not figure out how to customize to my data: Nested facets in ggplot2 spanning groups
Any assistance is appreciated.

R Wide to long format for multiple variables with patterns [duplicate]

This question already has answers here:
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Closed 4 years ago.
I have a data set with a single identifier and five columns that repeat 18 times. I want to restructure the data into long format keeping the first five column headings as the column headings. Below is a sample with just two repeats:
structure(list(Response.ID = 1:2, Task = structure(c(1L, 1L), .Label = "task1", class = "factor"),
Freq = structure(c(1L, 1L), .Label = "Daily", class = "factor"),
Hours = c(3L, 2L), Value = c(10L, 8L), Mood = structure(1:2, .Label = c("Engaged",
"Neutral"), class = "factor"), Task.1 = structure(c(1L, 1L
), .Label = "task2", class = "factor"), Freq.1 = structure(c(1L,
1L), .Label = "Weekly", class = "factor"), Hours.1 = c(4L,
4L), Value.1 = c(10L, 6L), Mood.1 = structure(c(2L, 1L), .Label = c("Neutral",
"Optimistic"), class = "factor")), .Names = c("Response.ID", "Task", "Freq", "Hours", "Value", "Mood", "Task.1", "Freq.1", "Hours.1", "Value.1", "Mood.1"), class = "data.frame", row.names = c(NA, -2L))
I attempted using the melt and patterns functions, which appears to approximate my desired outcome without the desired column headings:
df = melt(df1, id.vars = c("Response.ID"), measure.vars = patterns("^Task", "^Freq","^Hours","^Mood"))
Here is the result:
structure(list(Response.ID = c(1L, 2L, 1L, 2L), variable = structure(c(1L, 1L, 2L, 2L), class = "factor", .Label = c("1", "2")), value1 = c("task1", "task1", "task2", "task2"), value2 = c("Daily", "Daily", "Weekly", "Weekly"), value3 = c(3L, 2L, 4L, 4L), value4 = c("Engaged", "Neutral", "Optimistic", "Neutral")), .Names = c("Response.ID", "variable", "value1", "value2", "value3", "value4"), row.names = c(NA, -4L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000000330788>)
When I tried to specify names with value.name() below I receive an error:
df = melt(df1, id.vars = c("Response.ID"),measure.vars = patterns("^Task", "^Freq","^Hours","^Mood"), value.name=c("Task", "Freq", "Hours", "Value","Mood"))
My desired result would look like this:
structure(list(Response.ID = c(1L, 2L, 1L, 2L), Task = structure(c(1L, 1L, 2L, 2L), .Label = c("task1", "task2"), class = "factor"),
Freq = structure(c(1L, 1L, 2L, 2L), .Label = c("Daily", "Weekly"
), class = "factor"), Hours = c(3L, 2L, 4L, 4L), Value = c(10L,
8L, 10L, 6L), Mood = structure(c(1L, 2L, 3L, 2L), .Label = c("Engaged",
"Neutral", "Optimistic"), class = "factor")), .Names = c("Response.ID", "Task", "Freq", "Hours", "Value", "Mood"), class = "data.frame", row.names = c(NA, -4L))
It looks to me like you embarked on a difficult journey by using melt: this function is well named in the sense that trying to use it will probably melt your brain. Joke aside, the function melt has lots of underlying computations and its use could be inefficient if you have a large dataset.
I would instead solve the problem manually with rbindlist (from the excellent package data.table, which also ships with an optimized version of melt if you really want to use it), to manually concatenates groups of columns. This also preserves the column names:
> rbindlist(lapply(1:2, function(i) df1[,c(1,((i-1)*5+2):((i-1)*5+6))]))
Response.ID Task Freq Hours Value Mood
1: 1 task1 Daily 3 10 Engaged
2: 2 task1 Daily 2 8 Neutral
3: 1 task2 Weekly 4 10 Optimistic
4: 2 task2 Weekly 4 6 Neutral
This works on your example: replace the indices 1:2 by the number of repetitions to make it work with the real dataset (so, lapply(1:18)).

Collapse and aggregate several row values by date

I've got a data set that looks like this:
date, location, value, tally, score
2016-06-30T09:30Z, home, foo, 1,
2016-06-30T12:30Z, work, foo, 2,
2016-06-30T19:30Z, home, bar, , 5
I need to aggregate these rows together, to obtain a result such as:
date, location, value, tally, score
2016-06-30, [home, work], [foor, bar], 3, 5
There are several challenges for me:
The resulting row (a daily aggregate) must include the rows for this day (2016-06-30 in my above example
Some rows (strings) will result in an array containing all the values present on this day
Some others (ints) will result in a sum
I've had a look at dplyr, and if possible I'd like to do this in R.
Thanks for your help!
Edit:
Here's a dput of the data
structure(list(date = structure(1:3, .Label = c("2016-06-30T09:30Z",
"2016-06-30T12:30Z", "2016-06-30T19:30Z"), class = "factor"),
location = structure(c(1L, 2L, 1L), .Label = c("home", "work"
), class = "factor"), value = structure(c(2L, 2L, 1L), .Label = c("bar",
"foo"), class = "factor"), tally = c(1L, 2L, NA), score = c(NA,
NA, 5L)), .Names = c("date", "location", "value", "tally",
"score"), class = "data.frame", row.names = c(NA, -3L))
mydat<-structure(list(date = structure(1:3, .Label = c("2016-06-30T09:30Z",
"2016-06-30T12:30Z", "2016-06-30T19:30Z"), class = "factor"),
location = structure(c(1L, 2L, 1L), .Label = c("home", "work"
), class = "factor"), value = structure(c(2L, 2L, 1L), .Label = c("bar",
"foo"), class = "factor"), tally = c(1L, 2L, NA), score = c(NA,
NA, 5L)), .Names = c("date", "location", "value", "tally",
"score"), class = "data.frame", row.names = c(NA, -3L))
mydat$date <- as.Date(mydat$date)
require(data.table)
mydat.dt <- data.table(mydat)
mydat.dt <- mydat.dt[, lapply(.SD, paste0, collapse=" "), by = date]
cbind(mydat.dt, aggregate(mydat[,c("tally", "score")], by=list(mydat$date), FUN = sum, na.rm=T)[2:3])
which gives you:
date location value tally score
1: 2016-06-30 home work home foo foo bar 3 5
Note that if you wanted to you could probably do it all in one step in the reshaping of the data.table but I found this to be a quicker and easier way for me to achieve the same thing in 2 steps.

Ggmap-geompoint, how to make grouping?

Suppose I have this dataframe
latitude longitude category
42.39905 -72.93871 A
42.39905 -73.93871 B
43.37471 -73.36336 A
43.37471 -74.36336 B
44.28322 -74.31423 B
What I would like to do is to group the coordinates by its integer. Then for each group, I could create a bubble with a size function on the counts in a group.
The colour diverges from A to B, based on how many A than B. So far, I've been doing this,
map = get_map(location="jk",zoom=6,source="stamen")
#Plot the point
ggmap(map)+
geom_point(data=zipmap,
aes(x=round(longitude),y=round(latitude),colour=category))+
scale_color_brewer(type='div')
But as you would expect, the colour is not diverging, and the size of the bubble is not implemented. How could I achieve this? I can't use scale_x_continuous, as it already used somewhere in ggmap
Here is one direction to try.
dput(df)
structure(list(latitude = c(42.39905, 42.39905, 43.37471, 43.37471,
44.28322), longitude = c(-73, -74, -73, -74, -74), category = structure(c(1L,
2L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor"), latround = structure(c(1L,
1L, 2L, 2L, 3L), .Label = c("42", "43", "44"), class = "factor"),
longround = structure(c(2L, 1L, 2L, 1L, 1L), .Label = c("-74",
"-73"), class = "factor")), .Names = c("latitude", "longitude",
"category", "latround", "longround"), row.names = c(NA, -5L), class = "data.frame")
df$latround <- as.factor(round(df$latitude)) # round the coords
df$longround <- as.factor(round(df$longitude))
library(dplyr) # group by rounded coordinates and count the categories
df2 <- df %>% group_by(latround) %>% summarise(catnumber = n())
latround catnumber
1 42 2
2 43 2
3 44 1
library(ggmap)
From here you don't specify the location jk so I outlined an approach to plotting.
map <- get_map(location="jk",zoom=6,source="stamen")
#Plot the point
ggmap(map)+
geom_point(df2, aes(x=longround),y=latround), size = catnumber, colour=catnumber))+
scale_color_brewer(type='div') # more is needed in the ggmap code

Resources