Casting dataframe gives error R - r

Here is the dataframe df on which I'm trying to do a pivot using cast function
dput(df)
structure(list(Val = c(1L, 2L, 2L, 5L, 2L, 5L), `Perm 1` = structure(c(1L,
2L, 3L, 3L, 3L, 3L), .Label = c("Blue", "green", "yellow"
), class = "factor"), `Perm 2` = structure(c(1L, 2L, 2L, 3L,
3L, 3L), .Label = c("Blue", "green", "yellow"), class = "factor"),
`Perm 3` = structure(c(1L, 2L, 2L, 2L, 3L, 3L), .Label = c("Blue",
"green", "yellow"), class = "factor")), .Names = c("Val",
"Perm 1", "Perm 2", "Perm 3"), row.names = c(NA, 6L), class = "data.frame")
And expecting the data after pivot
Blue 1 1 1
green 2 4 9
yellow 14 12 7
I tried doing
cast(df, df$Val ~ df$`Perm 1`+df$`Perm 2`+df$`Perm 3`, sum, value = 'Val')
But this gives error
Error: Casting formula contains variables not found in molten data: df$Val, df$`Perm1`, df$`Perm2`
How can I be able to do pivot so that I'll be able to get the desired O/P
P.S- The dataframe DF has around 36 column but for simplicity I took only 3 columns.
Any suggestion will be appreciated.
Thank you
Domnick

It appears you want to sum, grouped by each permutation in your dataset. Although hacky, I think this works for your problem. First we create a function to perform that summation using tidyeval syntax. Link for more information: Group by multiple columns in dplyr, using string vector input
sum_f <- function(col, df) {
library(tidyverse)
df <- df %>%
group_by_at(col) %>%
summarise(Val = sum(Val)) %>%
ungroup()
df[,2]
}
We then apply it to your dataset using lapply, and binding together the summations.
bind_cols(lapply(c('Perm1', 'Perm2', 'Perm3'), sum_f, df))
This gets us the above answer.
Caveats: Need to know the name of the columns you have to sum over for this to work. Also, each column needs to have the same levels of your permutations i.e. blue, green, yellow. The code will respect this ordering.

Related

Check which are the features that differentiate between clusters, using a boxplot

I applied the UMAP dimentionaity reduction over my data, and clustred it. I got three different clusters:
I have the data that specefices to which cluster does eahc sample belong, with the name of the sample and everything. Here is a subsample of it, let's call it df_cluster:
structure(list(X1 = c(17.6942795910888, 16.5328416912875, 15.0031683863395,
16.3550118351627, 17.6931159161312, 16.9869249394253, 16.3790173297882,
15.8964870189374, 17.1055608092973, 16.4568632337052), X2 = c(-1.64953541728691,
0.185674946464158, -1.38521677790428, -0.448487127519734, -1.63670327964466,
-0.456667476792068, -0.091689040488956, -1.77486494294163, -1.86407675524967,
0.14666260432486), cluster = c(1L, 2L, 2L, 1L, 2L, 1L, 3L, 3L,
1L, 3L)), row.names = c("Patient1", "Patient13", "Patient2", "Patient99",
"Patient10", "Patient43", "Patient167", "Patient8", "Patient17", "Patient16"
), class = "data.frame")
The samples of df_cluster are the same in the original data, data, which I used for the clustering. Which is basically just the samples you saw as rows, and features as columns, looks something like this:
structure(c(-0.0741098696855045, -0.094401270881699, 0.0410284948786532,
-0.163302950330185, -0.0942478217207681, -0.167314411991775,
-0.118272811489486, -0.0366277340916379, -0.0349008907108641,
-0.167823357941815, -0.178835447722468, -0.253897294559596, -0.0372301980787381,
-0.230579110769457, -0.224125346052727, -0.196933050675633, -0.344608041139497,
-0.0550538743643369, -0.157003425700701, -0.162295446209879,
-0.0384421660291032, -0.0275306107582565, 0.186447606591857,
-0.124972070102036, -0.15348122673842, -0.106812144494277, -0.104757782473888,
0.0686746776877563, -0.0662055287009653, 0.00388752358937872), dim = c(10L,
3L), dimnames = list(c("Patient1", "Patient13", "Patient2", "Patient99",
"Patient10", "Patient43", "Patient167", "Patient8", "Patient17", "Patient16"
), c("Feature1", "Feature2",
"Feature3")))
I just want to view each of those features (the columns of data), in each cluster, using a box plot or a violin plot. Kind of a comparison between the clusters.
So in the X-axis I'll have clusters 1, 2, and 3, the Y-axis would be the values. Each feature will get a plot. I've drawn an example by hand to make it more clear:
You could use facets.
But first you need to pivot the dataframe.
df_cluster <- structure(list(X1 = c(17.6942795910888, 16.5328416912875, 15.0031683863395,
16.3550118351627, 17.6931159161312, 16.9869249394253, 16.3790173297882,
15.8964870189374, 17.1055608092973, 16.4568632337052), X2 = c(-1.64953541728691,
0.185674946464158, -1.38521677790428, -0.448487127519734, -1.63670327964466,
-0.456667476792068, -0.091689040488956, -1.77486494294163, -1.86407675524967,
0.14666260432486), cluster = c(1L, 2L, 2L, 1L, 2L, 1L, 3L, 3L,
1L, 3L)), row.names = c("Patient1", "Patient13", "Patient2", "Patient99",
"Patient10", "Patient43", "Patient167", "Patient8", "Patient17", "Patient16"
), class = "data.frame")
data <- structure(c(-0.0741098696855045, -0.094401270881699, 0.0410284948786532,
-0.163302950330185, -0.0942478217207681, -0.167314411991775,
-0.118272811489486, -0.0366277340916379, -0.0349008907108641,
-0.167823357941815, -0.178835447722468, -0.253897294559596, -0.0372301980787381,
-0.230579110769457, -0.224125346052727, -0.196933050675633, -0.344608041139497,
-0.0550538743643369, -0.157003425700701, -0.162295446209879,
-0.0384421660291032, -0.0275306107582565, 0.186447606591857,
-0.124972070102036, -0.15348122673842, -0.106812144494277, -0.104757782473888,
0.0686746776877563, -0.0662055287009653, 0.00388752358937872), dim = c(10L,
3L), dimnames = list(c("Patient1", "Patient13", "Patient2", "Patient99",
"Patient10", "Patient43", "Patient167", "Patient8", "Patient17", "Patient16"
), c("Feature1", "Feature2",
"Feature3")))
library(tidyverse)
data %>%
as.data.frame() %>%
rownames_to_column("Patient") %>%
left_join(df_cluster %>% rownames_to_column("Patient") %>% select(Patient, cluster)) %>%
pivot_longer(- c(cluster, Patient)) %>% #Pivot the dataframe
ggplot(aes(as.factor(cluster), value)) +
geom_boxplot() +
facet_grid(~ name)

trying to summarize survey data for questions with 'select all that apply' using R

We have a survey that asks for 'select all that apply' so the result is a string inside quotes with the values separated by commas. i.e. "red, black,green"
There are other question about income so I have a factor with 'low, medium, high'
I want to be able to answer questions: What percent selected 'Red', then group that by income.
I can split the string with
'''df4 <- c("black,silver,green")'''
I can create a data frame with a timestamp and the split string with
'''t2 <- as.data.frame(c(df2[2],l2))'''
I am not able to understand how to do this for all rows at one time.
Here is a DPUT of the input:
structure(list(RespData = structure(1:2, .Label = c("1/20/2020",
"1/21/2020"), class = "factor"), CarColor = c("red,blue,green,yellow",
"black,silver,green")), row.names = c(NA, -2L), class = "data.frame")
and here is a DPUT of the desired output:
structure(list(RespData = structure(c(1L, 1L, 1L, 1L, 2L, 2L,
2L), .Label = c("1/20/2020", "1/21/2020"), class = "factor"),
Cars = structure(c(3L, 1L, 2L, 4L, 5L, 6L, 2L), .Label = c("blue",
"green", "red", "yellow", "black", "silver"), class = "factor")), row.names = c(NA,
-7L), class = "data.frame")
Example of Function:
MySplitFunc <- function(ListIn) {
# build an empty data frame and set the column names
x1.all <- ListIn[0,]
names(x1.all) <- c("ResponseTime", "Descriptive")
# for each row build the data and combine to growing list
for(x in 1:nrow(ListIn)) {
#print(x)
r1 <- ListIn[x,1]
c1 <- strsplit(ListIn[x,2],",")
x1 <- as.data.frame(c(r1,c1))
# set the names and combine to all
names(x1) <- c("ResponseTime", "Descriptive")
x1.all <- rbind(x1.all,x1)
}
# strip the whitespace
x1.all <- data.frame(lapply(x1.all, trimws), stringsAsFactors = TRUE)
return(x1.all)
}

Need help calculating percentages of each categorical variable in a column

I have a data frame with around 16 columns that have yes, no and neutral as the categories. At the end I want to calculate the percentage of Yes, No and Neutral. An example of the data frame is:
a = c('yes', 'yes', 'no', 'neutral', 'no', 'yes','no','neutral','neutral')
b = c('no', 'yes','no', 'no', 'no', 'neutral', 'yes', 'neutral','neutral')
abcd = data.frame(a,b)
Is there a way I can achieve this in r?
If you want to count the percentage for the entire dataframe as a whole we can unlist the data, calculate their count using table and convert it into percentage.
table(unlist(abcd))/(nrow(abcd) * ncol(abcd)) * 100
# neutral no yes
# 33.333 38.889 27.778
If you want to do this for each column separately, we can use sapply
sapply(abcd, table)/nrow(abcd) * 100
# a b
#neutral 33.333 33.333
#no 33.333 44.444
#yes 33.333 22.222
EDIT
If there are certain levels missing, we can convert it to factor first and then use table
sapply(abcd, function(x)
table(factor(x, levels = c("Yes", "No", "Neutral"))))/nrow(abcd) * 100
Base R solution:
# For each combination:
res <- data.frame(round(prop.table(table(abcd)) * 100, 2))
# For each var separately:
res$total_a_cat <- ave(res$Freq, res$a, FUN = sum)
res$total_b_cat <- ave(res$Freq, res$b, FUN = sum)
Data:
abcd <-
structure(list(
a = structure(
c(3L, 3L, 2L, 1L, 2L, 3L, 2L, 1L,
1L),
.Label = c("neutral", "no", "yes"),
class = "factor"
),
b = structure(
c(2L,
3L, 2L, 2L, 2L, 1L, 3L, 1L, 1L),
.Label = c("neutral", "no",
"yes"),
class = "factor"
)
),
class = "data.frame",
row.names = c(NA,-9L))

Conditional updating coordinate column in dataframe

I am attempting to populate two newly empty columns in a data frame with data from other columns in the same data frame in different ways depending on if they are populated.
I am trying to populate the values of HIGH_PRCN_LAT and HIGH_PRCN_LON (previously called F_Lat and F_Lon) which represent the final latitudes and londitudes for those rows this will be based off the values of the other columns in the table.
Case 1: Lat/Lon2 are populated (like in IDs 1 & 2), using the great
circle algorithm a midpoint between them should be calculated and
then placed into F_Lat & F_Lon.
Case 2: Lat/Lon2 are empty, then the values of Lat/Lon1 should be put
into F_Lat and F_Lon (like with IDs 3 & 4).
My code is as follows but doesn't work (see previous versions, removed in an edit).
The preperatory code I am using is as follows:
incidents <- structure(list(id = 1:9, StartDate = structure(c(1L, 3L, 2L,
2L, 2L, 3L, 1L, 3L, 1L), .Label = c("02/02/2000 00:34", "02/09/2000 22:13",
"20/01/2000 14:11"), class = "factor"), EndDate = structure(1:9, .Label = c("02/04/2006 20:46",
"02/04/2006 22:38", "02/04/2006 23:21", "02/04/2006 23:59", "03/04/2006 20:12",
"03/04/2006 23:56", "04/04/2006 00:31", "07/04/2006 06:19", "07/04/2006 07:45"
), class = "factor"), Yr.Period = structure(c(1L, 1L, 2L, 2L,
2L, 3L, 3L, 3L, 3L), .Label = c("2000 / 1", "2000 / 2", "2000 /3"
), class = "factor"), Description = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = "ENGLISH TEXT", class = "factor"),
Location = structure(c(2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L
), .Label = c("Location 1", "Location 1 : Location 2"), class = "factor"),
Location.1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = "Location 1", class = "factor"), Postcode.1 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Postcode 1", class = "factor"),
Location.2 = structure(c(2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L,
1L), .Label = c("", "Location 2"), class = "factor"), Postcode.2 = structure(c(2L,
2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L), .Label = c("", "Postcode 2"
), class = "factor"), Section = structure(c(2L, 2L, 3L, 1L,
4L, 4L, 2L, 1L, 4L), .Label = c("East", "North", "South",
"West"), class = "factor"), Weather.Category = structure(c(1L,
2L, 4L, 2L, 2L, 2L, 4L, 1L, 3L), .Label = c("Animals", "Food",
"Humans", "Weather"), class = "factor"), Minutes = c(13L,
55L, 5L, 5L, 5L, 522L, 1L, 11L, 22L), Cost = c(150L, 150L,
150L, 20L, 23L, 32L, 21L, 11L, 23L), Location.1.Lat = c(53.0506727,
53.8721035, 51.0233529, 53.8721035, 53.6988355, 53.4768766,
52.6874562, 51.6638245, 51.4301359), Location.1.Lon = c(-2.9991256,
-2.4004125, -3.0988341, -2.4004125, -1.3031529, -2.2298073,
-1.8023421, -0.3964916, 0.0213837), Location.2.Lat = c(52.7116187,
53.746791, NA, 53.746791, 53.6787167, 53.4527824, 52.5264907,
NA, NA), Location.2.Lon = c(-2.7493169, -2.4777984, NA, -2.4777984,
-1.489026, -2.1247029, -1.4645023, NA, NA)), class = "data.frame", row.names = c(NA, -9L))
#gpsColumns is used as the following line of code is used for several data frames.
gpsColumns <- c("HIGH_PRCN_LAT", "HIGH_PRCN_LON")
incidents [ , gpsColumns] <- NA
#create separate variable(?) containing a list of which rows are complete
ind <- complete.cases(incidents [,17])
#populate rows with a two Lat/Lons with great circle middle of both values
incidents [ind, c("HIGH_PRCN_LON_2","HIGH_PRCN_LAT_2")] <-
with(incidents [ind,,drop=FALSE],
do.call(rbind, geosphere::midPoint(cbind.data.frame(Location.1.Lon, Location.1.Lat), cbind.data.frame(Location.2.Lon, Location.2.Lat))))
#populate rows with one Lat/Lon with those values
incidents[!ind, c("HIGH_PRCN_LAT","HIGH_PRCN_LON")] <- incidents[!ind, c("Location.1.Lat","Location.1.Lon")]
I will use the geosphere::midPoint function based off a recommendation here: http://r.789695.n4.nabble.com/Midpoint-between-coordinates-td2299999.html.
Unfortunately, it doesn't appear that this way of populating the column will work when there are several cases.
The current error that is thrown is:
Error in `$<-.data.frame`(`*tmp*`, F_Lat, value = integer(0)) :
replacement has 0 rows, data has 178012
Edit: also posted to reddit: https://www.reddit.com/r/Rlanguage/comments/bdvavx/conditional_updating_column_in_dataframe/
Edit: Added clarity on the parts of the code I do not understand.
#replaces the F_Lat2/F_Lon2 columns in rows with a both sets of input coordinates
dataframe[ind, c("F_Lat2","F_Lon2")] <-
#I am unclear on what this means, specifically what the "with" function does and what "drop=FALSE" does and also why they were used in this case.
with(dataframe[ind,,drop=FALSE],
#I am unclear on what do.call and rbind are doing here, but the second half (geosphere onwards) is binding the Lats and Lons to make coordinates as inputs for the gcIntermediate function.
do.call(rbind, geosphere::gcIntermediate(cbind.data.frame(Lat1, Lon1),
cbind.data.frame(Lat2, Lon2), n = 1)))
Though your code doesn't work as-written for me, and I cannot calculate the same precise values your expect, I suspect the error your seeing can be fixed with these steps. (Data is down at the bottom here.)
Pre-populate the empty columns.
Pre-calculate the complete.cases step, it'll save time.
Use cbind.data.frame for inside gcIntermediate.
I'm inferring from
gcIntermediate([dataframe...
^
this is an error in R
that you are binding those columns together, so I'll use cbind.data.frame. (Using cbind itself produced some ignorable warnings from geosphere, so you can use it instead and perhaps suppressWarnings, but that function is a little strong in that it'll mask other warnings as well.)
Also, since it appears you want one intermediate value for each pair of coordinates, I added the gcIntermediate(..., n=1) argument.
The use of do.call(rbind, ...) is because gcIntermediate returns a list, so we need to bring them together.
dataframe$F_Lon2 <- dataframe$F_Lat2 <- NA_real_
ind <- complete.cases(dataframe[,4])
dataframe[ind, c("F_Lat2","F_Lon2")] <-
with(dataframe[ind,,drop=FALSE],
do.call(rbind, geosphere::gcIntermediate(cbind.data.frame(Lat1, Lon1),
cbind.data.frame(Lat2, Lon2), n = 1)))
dataframe[!ind, c("F_Lat2","F_Lon2")] <- dataframe[!ind, c("Lat1","Lon1")]
dataframe
# ID Lat1 Lon1 Lat2 Lon2 F_Lat F_Lon F_Lat2 F_Lon2
# 1 1 19.05067 -3.999126 92.71332 -6.759169 55.88200 -5.379147 55.78466 -6.709509
# 2 2 58.87210 -1.400413 54.74679 -4.479840 56.80945 -2.940126 56.81230 -2.942029
# 3 3 33.02335 -5.098834 NA NA 33.02335 -5.098834 33.02335 -5.098834
# 4 4 54.87210 -4.400412 NA NA 54.87210 -4.400412 54.87210 -4.400412
Update, using your new incidents data and switching to geosphere::midPoint.
Try this:
incidents$F_Lon2 <- incidents$F_Lat2 <- NA_real_
ind <- complete.cases(incidents[,4])
incidents[ind, c("F_Lat2","F_Lon2")] <-
with(incidents[ind,,drop=FALSE],
geosphere::midPoint(cbind.data.frame(Location.1.Lat,Location.1.Lon),
cbind.data.frame(Location.2.Lat,Location.2.Lon)))
incidents[!ind, c("F_Lat2","F_Lon2")] <- dataframe[!ind, c("Lat1","Lon1")]
One (big) difference is that geosphere::gcIntermediate(..., n=1) returns a list of results, whereas geosphere::midPoint(...) (no n=) returns just a matrix, so no rbinding required.
Data:
dataframe <- read.table(header=T, stringsAsFactors=F, text="
ID Lat1 Lon1 Lat2 Lon2 F_Lat F_Lon
1 19.0506727 -3.9991256 92.713318 -6.759169 55.88199535 -5.3791473
2 58.8721035 -1.4004125 54.746791 -4.47984 56.80944725 -2.94012625
3 33.0233529 -5.0988341 NA NA 33.0233529 -5.0988341
4 54.8721035 -4.4004125 NA NA 54.8721035 -4.4004125")

Errorbars in r of two groups ggplot2

I'd like to plot standard deviations of the mean(z)/mean(b) which are grouped by two factors $angle and $treatment:
z= Tracer angle treatment
60 0 S
51 0 S
56.415 15 X
56.410 15 X
b=Tracer angle treatment
21 0 S
15 0 S
16.415 15 X
26.410 15 X
So far I've calculated the mean for each variable based on angle and treatment:
aggmeanz <-aggregate(z$Tracer, list(angle=z$angle,treatment=z$treatment), FUN=mean)
aggmeanb <-aggregate(b$Tracer, list(angle=b$angle,treatment=b$treatment), FUN=mean)
It now looks like this:
aggmeanz
angle treatment x
1 0 S 0.09088021
2 30 S 0.18463353
3 60 S 0.08784315
4 80 S 0.09127198
5 90 S 0.12679296
6 0 X 2.68670392
7 15 X 0.50440692
8 30 X 0.83564470
9 60 X 0.52856956
10 80 X 0.63220093
11 90 X 1.70123025
But when I come to plot it, I can't quite get what I'm after
ggplot(aggmeanz, aes(x=aggmeanz$angle,y=aggmeanz$x/aggmeanb$x, colour=treatment)) +
geom_bar(position=position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=0.1, ymax=1.15),
width=.2,
position=position_dodge(.9)) +
theme(panel.grid.minor = element_blank()) +
theme_bw()
EDIT:
dput(aggmeanz)
structure(list(time = structure(c(1L, 3L, 4L, 5L, 6L, 1L, 2L,
3L, 4L, 5L, 6L), .Label = c("0", "15", "30", "60", "80", "90"
), class = "factor"), treatment = structure(c(1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("S", "X"), class = "factor"),
x = c(56.0841582902523, 61.2014237854156, 42.9900742785269,
42.4688447229277, 41.3354173870287, 45.7164231791512, 55.3943182966382,
55.0574951462903, 48.1575625699563, 60.5527200655174, 45.8412287451211
)), .Names = c("time", "treatment", "x"), row.names = c(NA,
-11L), class = "data.frame")
> dput(aggmeanb)
structure(list(time = structure(c(1L, 3L, 4L, 5L, 6L, 1L, 2L,
3L, 4L, 5L, 6L), .Label = c("0", "15", "30", "60", "80", "90"
), class = "factor"), treatment = structure(c(1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("S", "X"), class = "factor"),
x = c(56.26325504249, 61.751655279608, 43.1687113436753,
43.4147408285209, 41.9113698082799, 46.2800894420131, 55.1550995335947,
54.7531592595068, 47.3280215294235, 62.4629068516043, 44.2590192583692
)), .Names = c("time", "treatment", "x"), row.names = c(NA,
-11L), class = "data.frame")
EDIT 2: I calculated the standard dev as follows:
aggstdevz <-aggregate(z$Tracer, list(angle=z$angle,treatment=z$treatment), FUN=std)
aggstdevb <-aggregate(b$Tracer, list(angle=b$angle,treatment=b$treatment), FUN=std)
Any thoughts would be much appreciated,
Cheers
As others have noted, you'll need to join the two dataframes together. There are also some little quirks in the dput data you showed, so I've renamed some columns to make sure that they join appropriately and match what you've attempted. NOTE: You'll need name the two means differently so that they don't get merged together or cause conflicts.
names(aggmeanb)[names(aggmeanb) == "x"] = "mean_b"
names(aggmeanb)[names(aggmeanb) == "time"] = "angle"
names(aggmeanz)[names(aggmeanz) == "x"] = "mean_z"
names(aggmeanz)[names(aggmeanz) == "time"] = "angle"
joined_data = join(aggmeanb, aggmeanz)
joined_data$divmean = joined_data$mean_b/joined_data$mean_z
> head(joined_data)
angle treatment mean_b mean_z divmean
1 0 S 56.26326 56.08416 1.003193
2 30 S 61.75166 61.20142 1.008991
3 60 S 43.16871 42.99007 1.004155
4 80 S 43.41474 42.46884 1.022273
5 90 S 41.91137 41.33542 1.013934
6 0 X 46.28009 45.71642 1.012330
ggplot(joined_data, aes(factor(angle), divmean)) +
geom_boxplot() +
theme(panel.grid.minor = element_blank()) +
theme_bw()
It might be that the data you've included is just a bit of your real data set, but as is there's only one data point per angle-treatment group. However, when you are using a fuller dataset, you can try something like:
ggplot(joined_data, aes(factor(angle), diffmean, group = treatment)) +
geom_boxplot() +
facet_grid(.~angle, scales = "free_x")
That will group the boxes by angle and then allow you to fill them by treatment.
Think about the problem in two steps:
create a data frame (say data) which contains all the information
you would like to visualize. In this case, this seems to be the two
factors (angle, treatment), the mean group differences (say dif)
and standard errors (say ste).
visualize this information.
Step 2) will be easy. This should probably produce something very similar to your sketch.
ggplot(data, aes(x=angle, y=dif, colour=treatment)) +
geom_point(position=position_dodge(0.1)) +
geom_errorbar(aes(ymin=dif-ste, ymax=dif+ste), width=.1, position=position_dodge(0.1)) +
theme_bw()
However, at this point, you do not provide enough information to get help with Step 1. Try to include code which produces your original data (or the type of data you have) instead of copy-pasting chunks of your data output or pasting the aggregated data which lacks standard errors.
Combining your two aggregated data frames and generating random numbers for standard error produces the graph below:
#I imported your two aggregated data frames from your dput output.
data <- cbind(aggmeanb, aggmeanz$x, rnorm(11))
names(data) <- c("angle", "treatment", "meanz", "meanb", "ste")
data$dif <- data$meanz - data$meanb

Resources