R Program Vector, record Column Percent - r

This is my vector
head(sep)
I must find percent of all SEP 11 in each row.
For instance, in first row, percent of SEP 11 is
100 * ((63 + 124)/ (63 + 124 + 0 + 0))
And would like this stored in newly created 8th column
Thanks
dput
> dput(head(sep))
structure(list(Site = structure(1:6, .Label = c("31R001", "31R002",
"31R003", "31R004", "31R005", "31R006", "31R007", "31R008", "31R011",
"31R013", "31R014", "31R016", "31R018", "31R019", "31R020", "31R021",
"31R022", "31R023", "31R024", "31R025", "31R026", "31R027", "31R029",
"31R030", "31R031", "31R032", "31R034", "31R035", "31R036", "31R038",
"31R039", "31R040", "31R041", "31R042", "31R043", "31R044", "31R045",
"31R046", "31R048", "31R049", "31R050", "31R051", "31R052", "31R053",
"31R054", "31R055", "31R056", "31R057", "31R058", "31R059", "31R060",
"31R061", "31R069", "31R071", "31R072", "31R075", "31R435", "31R440",
"31R445", "31R450", "31R455", "31R460", "31R470", "31R600", "31R722",
"31R801", "31R825", "31R826", "31R829", "31R840", "31R843", "31R861",
"31R880"), class = "factor"), Latitude = c(33.808874, 33.877256,
33.820825, 33.852373, 33.829697, 33.810274), Longitude = c(-117.844048,
-117.700135, -117.811845, -117.795516, -117.787532, -117.830429
), Windows.SEP.11 = c(63L, 174L, 11L, 85L, 163L, 71L), Mac.SEP.11 = c(0L,
1L, 4L, 0L, 0L, 50L), Windows.SEP.12 = c(124L, 185L, 9L, 75L,
23L, 5L), Mac.SEP.12 = c(0L, 1L, 32L, 1L, 0L, 50L)), .Names = c("Site",
"Latitude", "Longitude", "Windows.SEP.11", "Mac.SEP.11", "Windows.SEP.12",
"Mac.SEP.12"), row.names = c(NA, 6L), class = "data.frame")

Assuming that you want to get the rowSums of columns that have 'Windows' as column names, we subset the dataset ("sep1") using grep. Then get the rowSums(Sub1), divide by the rowSums of all the numeric columns (sep1[4:7]), multiply by 100, and assign the results to a new column ("newCol")
Sub1 <- sep1[grep("Windows", names(sep1))]
sep1$newCol <- 100*rowSums(Sub1)/rowSums(sep1[4:7])

Related

How can I rearrange the date from d-m-y to m-d-y in R?

I am having issues with the following R code. I am trying to rearrange csv date values in a column from day-month-year to month-day-year. To issues arise: the format is changed to year-month-day instead, and this error message appears when I attempt to plot the results:
Error: Column New_Date is a date/time and must be stored as POSIXct, not POSIXlt.
I am new to R and unsure on how to fix this error.
I have gone through a lot of similar topics, however because of lack of knowledge in R, I am unable to understand whether these topics can translate to my own code, and the information that I need.
Any help is much appreciated. The code is due relatively soon, so any fast responses are going to be worshipped. Thanks!
structure(list(Date = structure(c(48L, 11L, 36L, 35L, 1L, 14L
), .Label = c("01-02-18", "02-03-18", "02-10-18", "03-01-18",
"03-04-18", "03-05-18", "03-08-18", "03-09-18", "05-07-18", "05-12-18",
"07-02-18", "07-06-18", "07-11-18", "08-03-18", "09-01-18", "09-05-18",
"09-08-18", "09-10-18", "10-01-18", "10-04-18", "10-09-18", "11-07-18",
"12-11-18", "12-12-18", "13-02-18", "13-06-18", "14-03-18", "14-09-18",
"15-01-18", "15-05-18", "16-04-18", "16-08-18", "17-07-18", "18-12-18",
"19-01-18", "19-02-18", "19-06-18", "19-10-18", "19-11-18", "20-03-18",
"20-04-18", "20-08-18", "20-09-18", "21-05-18", "23-07-18", "23-11-18",
"24-12-18", "25-01-18", "25-02-18", "25-05-18", "25-06-18", "25-10-18",
"26-03-18", "26-09-18", "27-04-18", "29-08-18", "30-07-18", "31-05-18",
"31-10-18"), class = "factor"), New_Date = structure(list(sec = c(0,
0, 0, 0, 0, 0), min = c(0L, 0L, 0L, 0L, 0L, 0L), hour = c(0L,
0L, 0L, 0L, 0L, 0L), mday = c(25L, 7L, 19L, 19L, 1L, 8L), mon = c(0L,
1L, 1L, 0L, 1L, 2L), year = c(-1882L, -1882L, -1882L, -1882L,
-1882L, -1882L), wday = c(4L, 3L, 1L, 5L, 4L, 4L), yday = c(24L,
37L, 49L, 18L, 31L, 66L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L),
zone = c("LMT", "LMT", "LMT", "LMT", "LMT", "LMT"), gmtoff = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
)), class = c("POSIXlt", "POSIXt"))), row.names = c(NA, 6L
), class = "data.frame")
EDIT:
Now having this error appear: "'Error in plot.window(...) : need finite 'xlim' values"
Below is my code:
beaches$Date = as.Date(as.character(beaches$Date), '%d-%m-%y')
beaches$New_Date = format(beaches$Date, '%m-%d-%y')
Palm_beach = filter(beaches, Site == "Palm Beach")
Shelly_beach = filter(beaches, Site == "Shelly Beach (Manly)")
plot(Palm_beach$Date, Palm_beach$Enterococci..cfu.100ml., col = "green", main = "Palm Beach vs Shelly Beach", xlab = "Dates", ylab = "Enterococci (cfu)")
points(Shelly_beach$Date, Shelly_beach$Enterococci..cfu.100ml., col = "red")
Try this:
beaches$Date = as.Date(as.character(beaches$Date), '%d-%m-%y')
beaches$New_Date = format(beaches$Date, '%m-%d-%y')
Output:
> head(beaches[, c('Date', 'New_Date')])
Date New_Date
1 2018-01-25 01-25-18
2 2018-02-07 02-07-18
3 2018-02-19 02-19-18
4 2018-01-19 01-19-18
5 2018-02-01 02-01-18
6 2018-03-08 03-08-18
Since neither input nor output are dates it might make more sense to just use regular expresions, rather than converting to and from dates:
beaches$New_Date <- sub("(\\d+)-(\\d+)-(\\d+)", "\\2-\\1-\\3", beaches$Date)
#### OUTPUT ####
Date New_Date
1 25-01-18 01-25-18
2 07-02-18 02-07-18
3 19-02-18 02-19-18
4 19-01-18 01-19-18
5 01-02-18 02-01-18
6 08-03-18 03-08-18
first of all you have to make sure that the original Date column is in character format.
In your data it is in factor format. Then you first have to convert the Date column to a date format and then you can create the New_Date column:
df$Date <- as.Date(as.character(df$Date), format = "%d-%m-%y")
df$New_Date <- format(df$Date, "%m-%d-%Y")
If you only want the last two digits of the year column you can use this instead:
df$New_Date2 <- format(df$Date, "%m-%d-%y")

How to run a function against several dataframes and output dataframes with the same name as input in R

I have several dataframes that I am applying a function to
The function works but I would like to lapply it to several dataframes and output the result according to the input names.
Here is an example of one of the dataframes
structure(list(chr = structure(c(1L, 1L, 1L), .Label = c("chr1",
"chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16",
"chr17", "chr18", "chr19", "chr2", "chr20", "chr21", "chr22",
"chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chrX",
"chrY"), class = "factor"), leftPos = c(100260254L, 100735342L,
100805662L), strand.x = structure(c(1L, 1L, 2L), .Label = c("-",
"+"), class = "factor"), X50CellJ_SLX.9395.FSeqJ.fq.gz = c(7L,
295L, 132L), Cytospongex10_SLX.9395.FSeqK.fq.gz = c(72L, 256L,
148L), FFPE20X_SLX.9395.fq.gz = c(5L, 74L, 36L), Tumour10_SMACCO_AH_088_SLX.9396.FSeqH.fq.gz = c(13L,
154L, 65L), Tumour11_SMACCO_SH_020_SLX.9396.FSeqI.fq.gz = c(1L,
0L, 0L), Tumour12_SMACCO_ED_008_SLX.9396.FSeqJ.fq.gz = c(3L,
25L, 8L), Tumour13_SMACCO_AH_086_SLX.9396.FSeqK.fq.gz = c(7L,
120L, 28L), Tumour1_SMACCO_AH_100_SLX.9396.FSeqA.fq.gz = c(0L,
0L, 0L), Tumour2_SMACCO_AH_058_SLX.9396.FSeqB.fq.gz = c(24L,
98L, 42L), Tumour3_SMACCO_SH_051_SLX.9396.FSeqC.fq.gz = c(29L,
92L, 29L), Tumour4_SMACCO_ED_031_SLX.9396.FSeqD.fq.gz = c(18L,
53L, 14L), Tumour5_SMACCO_RS_027_SLX.9396.FSeqE.fq.gz = c(8L,
93L, 17L), Tumour7_SMACCO_AH_026_SLX.9396.FSeqF.fq.gz = c(30L,
205L, 60L), Tumour9_SMACCO_ST_024_SLX.9396.FSeqG.fq.gz = c(15L,
129L, 17L), strand.y = structure(c(1L, 1L, 2L), .Label = c("-",
"+"), class = "factor"), Tumour14_SMACCO_AH_094_SLX.9394.FSeqA.fq.gz = c(0L,
7L, 3L), Tumour15_SMACCO_WG_006_SLX.9394.FSeqB..fq.gz = c(3L,
19L, 4L), Tumour16_SMACCO_ST_035_SLX.9394.FSeqC.fq.gz = c(1L,
23L, 8L), Tumour17_SMACCO_ST_034_SLX.9394.fq.gz = c(7L, 26L,
5L), Control19_SLX.9394.FSeqE.fq.gz = c(51L, 256L, 36L), Control20_SLX.9394.FSeqF.fq.gz = c(23L,
110L, 34L), Control21_SLX.9394.FSeqG..fq.gz = c(30L, 56L,
11L), Control22_SLX.9394.FSeqH.fq.gz = c(22L, 72L, 24L), Control23_SLX.9394.FSeqI.fq.gz = c(10L,
23L, 2L), Control25_SLX.9394.FSeqJ.fq.gz = c(17L, 72L, 8L),
Control27_SLX.9394.FSeqK.fq.gz = c(10L, 21L, 9L), Control28_SLX.9395.FSeqA.fq.gz = c(13L,
40L, 4L), Control29_SLX.9395.FSeqB.fq.gz = c(14L, 39L,
6L), Control30_SLX.9395.FSeqC.fq.gz = c(5L, 32L, 5L),
Control31_SLX.9395.FSeqD.fq.gz = c(7L, 11L, 5L), Control32_SLX.9395.FSeqE.fq.gz = c(5L,
32L, 4L), Control33_SLX.9395.FSeqF.fq.gz = c(10L, 25L,
6L), Control34_SLX.9395.FSeqG.fq.gz = c(3L, 32L, 1L),
Control35_SLX.9395.FSeqH.fq.gz = c(10L, 33L, 0L), Controls = c(0L,
0L, 0L), Samples = c(0L, 0L, 0L)), .Names = c("chr", "leftPos",
"strand.x", "X50CellJ_SLX.9395.FSeqJ.fq.gz", "Cytospongex10_SLX.9395.FSeqK.fq.gz",
"FFPE20X_SLX.9395.fq.gz", "Tumour10_SMACCO_AH_088_SLX.9396.FSeqH.fq.gz",
"Tumour11_SMACCO_SH_020_SLX.9396.FSeqI.fq.gz", "Tumour12_SMACCO_ED_008_SLX.9396.FSeqJ.fq.gz",
"Tumour13_SMACCO_AH_086_SLX.9396.FSeqK.fq.gz", "Tumour1_SMACCO_AH_100_SLX.9396.FSeqA.fq.gz",
"Tumour2_SMACCO_AH_058_SLX.9396.FSeqB.fq.gz", "Tumour3_SMACCO_SH_051_SLX.9396.FSeqC.fq.gz",
"Tumour4_SMACCO_ED_031_SLX.9396.FSeqD.fq.gz", "Tumour5_SMACCO_RS_027_SLX.9396.FSeqE.fq.gz",
"Tumour7_SMACCO_AH_026_SLX.9396.FSeqF.fq.gz", "Tumour9_SMACCO_ST_024_SLX.9396.FSeqG.fq.gz",
"strand.y", "Tumour14_SMACCO_AH_094_SLX.9394.FSeqA.fq.gz",
"Tumour15_SMACCO_WG_006_SLX.9394.FSeqB..fq.gz", "Tumour16_SMACCO_ST_035_SLX.9394.FSeqC.fq.gz",
"Tumour17_SMACCO_ST_034_SLX.9394.fq.gz", "Control19_SLX.9394.FSeqE.fq.gz",
"Control20_SLX.9394.FSeqF.fq.gz", "Control21_SLX.9394.FSeqG..fq.gz",
"Control22_SLX.9394.FSeqH.fq.gz", "Control23_SLX.9394.FSeqI.fq.gz",
"Control25_SLX.9394.FSeqJ.fq.gz", "Control27_SLX.9394.FSeqK.fq.gz",
"Control28_SLX.9395.FSeqA.fq.gz", "Control29_SLX.9395.FSeqB.fq.gz",
"Control30_SLX.9395.FSeqC.fq.gz", "Control31_SLX.9395.FSeqD.fq.gz",
"Control32_SLX.9395.FSeqE.fq.gz", "Control33_SLX.9395.FSeqF.fq.gz",
"Control34_SLX.9395.FSeqG.fq.gz", "Control35_SLX.9395.FSeqH.fq.gz",
"Controls", "Samples"), row.names = c(NA, 3L), class = "data.frame")
Here is what I have so far
mylist <- list(A = OriginalMeta , B = SLX9392 , C = SLX9393, D = SLX9397, E = Gastric, F = Dysplasia, G = GoodDysplasia, H = Cholangio, I = LCM_PS14_1105_1F)
sortIt <- function(df1) {
df1$strand.x<- NULL
df1$strand.y<- NULL
df1$strand<-NULL
df1$X.<-NULL
names(df1)[1] <- c("chr")
#Get rid of X and Y chromosomes
df1 <- df1[!grepl("chrX", df1$chr), ]
df1 <- df1[!grepl("chrY", df1$chr), ]
xyAss3<-df1
return(xyAss3)
}
lapply(names(mylist),
sortIt(x)write.csv(mylist[x],
file =paste0(x,'.csv')))
The thing is I just dont know how to feed the mylist into the function. Should I call x in the lapply df1? I'm a bit confused as to how to tie it all together.
I think you'll do better to fold the creation of the .csv into your function and then use a for loop to apply that function to each object in your list in turn. So something like this, where df is the sample data frame you posted:
mylist <- list(A = df, B = df)
sortIt <- function(i) {
df = mylist[[i]]
df[,"strand.x"] <- NULL
df[,"strand.y"] <- NULL
df[,"strand"] <- NULL
df[,"X."] <- NULL
names(df) <- c("chr", names(df)[2:length(names(df))])
df <- df[!grepl("chrX", df$chr), ]
df <- df[!grepl("chrY", df$chr), ]
write.csv(df, file = paste0(names(mylist)[i], ".csv"), row.names=FALSE)
}
for (i in seq(length(mylist))) {sortIt(i)}
If you were trying to create a new object in your workspace, then one of the apply functions would be a better bet. But when you're trying to output files, I think you need to use a for loop instead.
Not really sure what you are trying to achieve, but guessing that you want to save the transformed data frame to a file with a name taken from the list, this could do the job (it should work with the rest of your code - note the [[1]]):
lapply(names(mylist),
function(x) write.csv(sortIt(mylist[x][[1]]),
file = paste0(x,'.csv')))
Another option is to use mapply, here I'm attaching a complete example:
# create the data
dframes <- lapply(1:3, function(x) data.frame(x=rnorm(10), y=runif(10)))
names(dframes) <- LETTERS[1:3]
# the transformation function
sortdf <- function(df) df[order(df$x),]
# two variants of apply
lapply(names(dframes),
function(name) write.csv(sortdf(dframes[name][[1]]),
file=paste0(name, '.csv')))
# mapply does not have the ugly [[1]] syntax bit, I'd prefer it myself
mapply(function(name, df) write.csv(sortdf(df), file=paste0(name, '.csv')),
names(dframes),
dframes)

How is geom_point removing rows containing missing values?

I'm unsure why none of my data points show up on the map.
Store_ID visits CRIND_CC ISCC EBITDAR top_bottom Latitude Longitude
(int) (int) (int) (int) (dbl) (chr) (fctr) (fctr)
1 92 348 14819 39013 76449.15 top 41.731373 -93.58184
2 2035 289 15584 35961 72454.42 top 41.589428 -93.80785
3 50 266 14117 27262 49775.02 top 41.559017 -93.77287
4 156 266 7797 25095 28645.95 top 41.6143 -93.834404
5 66 234 8314 18718 46325.12 top 41.6002 -93.779236
6 207 18 2159 17999 20097.99 bottom 41.636208 -93.531876
7 59 23 10547 28806 52168.07 bottom 41.56153 -93.88083
8 101 23 1469 11611 7325.45 bottom 41.20982 -93.84298
9 130 26 2670 13561 14348.98 bottom 41.614517 -93.65789
10 130 26 2670 13561 14348.98 bottom 41.6145172 -93.65789
11 24 27 17916 41721 69991.10 bottom 41.597134 -93.49263
> dput(droplevels(top_bottom))
structure(list(Store_ID = c(92L, 2035L, 50L, 156L, 66L, 207L,
59L, 101L, 130L, 130L, 24L), visits = c(348L, 289L, 266L, 266L,
234L, 18L, 23L, 23L, 26L, 26L, 27L), CRIND_CC = c(14819L, 15584L,
14117L, 7797L, 8314L, 2159L, 10547L, 1469L, 2670L, 2670L, 17916L
), ISCC = c(39013L, 35961L, 27262L, 25095L, 18718L, 17999L, 28806L,
11611L, 13561L, 13561L, 41721L), EBITDAR = c(76449.15, 72454.42,
49775.02, 28645.95, 46325.12, 20097.99, 52168.07, 7325.45, 14348.98,
14348.98, 69991.1), top_bottom = c("top", "top", "top", "top",
"top", "bottom", "bottom", "bottom", "bottom", "bottom", "bottom"
), Latitude = structure(c(11L, 4L, 2L, 7L, 6L, 10L, 3L, 1L, 8L,
9L, 5L), .Label = c("41.20982", "41.559017", "41.56153", "41.589428",
"41.597134", "41.6002", "41.6143", "41.614517", "41.6145172",
"41.636208", "41.731373"), class = "factor"), Longitude = structure(c(3L,
7L, 5L, 8L, 6L, 2L, 10L, 9L, 4L, 4L, 1L), .Label = c("-93.49263",
"-93.531876", "-93.58184", "-93.65789", "-93.77287", "-93.779236",
"-93.80785", "-93.834404", "-93.84298", "-93.88083"), class = "factor")), row.names = c(NA,
-11L), .Names = c("Store_ID", "visits", "CRIND_CC", "ISCC", "EBITDAR",
"top_bottom", "Latitude", "Longitude"), class = c("tbl_df", "tbl",
"data.frame"))
Creating the plot:
map <- qmap('Des Moines') +
geom_point(data = top_bottom, aes(x = as.numeric(Longitude),
y = as.numeric(Latitude)), colour = top_bottom, size = 3)
I get the warning message:
Removed 11 rows containing missing values (geom_point).
However, this works without the use of ggmap():
ggplot(top_bottom) +
geom_point(aes(x = as.numeric(Longitude), y = as.numeric(Latitude)),
colour = top_bottom, size = 3)
How do I get the points to overlay on ggmap??
You are using as.numeric() with a factor. As seen here that gives you a level number for the factor (not the number represented). Unsurprisingly, all those levels are points not on the canvas displayed for "Des Moines".
Use as.numeric(as.character(Latitude)) and as.numeric(as.character(Longitude)), as ugly as it seems.
Seeing the sample data, it seems that there is one data point which does not stay in the map area.
library(dplyr)
library(ggplot2)
library(ggmap)
### You can find lon/lat for bbox using your ggmap object.
### For instance, des1 <- ggmap(mymap1)
### str(des1)
### You could use bb2bbox() in the ggmap package to find lon/lat.
filter(top_bottom,
between(Latitude, 41.27057, 41.92782),
between(Longitude, -94.04787, -93.16897)) -> inside
setdiff(top_bottom, inside)
# Store_ID visits CRIND_CC ISCC EBITDAR top_bottom Latitude Longitude
#1 101 23 1469 11611 7325.45 bottom 41.20982 -93.84298
Since you used qmap() without specifying zoom, I do not know what zoom level you had. Let's play around a bit. In the first case, there is one data point missing; Removed 1 rows containing missing values (geom_point).
mymap1 <- get_map('Des Moines', zoom = 10)
ggmap(mymap1) +
geom_point(data = top_bottom, aes(x = as.numeric(Longitude),
y = as.numeric(Latitude)), colour = top_bottom, size = 3)
mymap2 <- get_map('Des Moines', zoom = 9)
ggmap(mymap2) +
geom_point(data = top_bottom, aes(x = as.numeric(Longitude),
y = as.numeric(Latitude)), colour = top_bottom, size = 3)
So the key thing, I think, is that you want to make sure you choose the right zoom level for your data set. For that, you may want to specify zoom in qmap(). I hope this will help you.
DATA
top_bottom <- structure(list(Store_ID = c(92L, 2035L, 50L, 156L, 66L, 207L,
59L, 101L, 130L, 130L, 24L), visits = c(348L, 289L, 266L, 266L,
234L, 18L, 23L, 23L, 26L, 26L, 27L), CRIND_CC = c(14819L, 15584L,
14117L, 7797L, 8314L, 2159L, 10547L, 1469L, 2670L, 2670L, 17916L
), ISCC = c(39013L, 35961L, 27262L, 25095L, 18718L, 17999L, 28806L,
11611L, 13561L, 13561L, 41721L), EBITDAR = c(76449.15, 72454.42,
49775.02, 28645.95, 46325.12, 20097.99, 52168.07, 7325.45, 14348.98,
14348.98, 69991.1), top_bottom = structure(c(2L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("bottom", "top"), class = "factor"),
Latitude = c(41.731373, 41.589428, 41.559017, 41.6143, 41.6002,
41.636208, 41.56153, 41.20982, 41.614517, 41.6145172, 41.597134
), Longitude = c(-93.58184, -93.80785, -93.77287, -93.834404,
-93.779236, -93.531876, -93.88083, -93.84298, -93.65789,
-93.65789, -93.49263)), .Names = c("Store_ID", "visits",
"CRIND_CC", "ISCC", "EBITDAR", "top_bottom", "Latitude", "Longitude"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10", "11"))

Fitting gaussian to data geom_point in ggplot2

I have the following data set
structure(list(Collimator = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L), .Label = c("n", "y"), class = "factor"), angle = c(0L,
15L, 30L, 45L, 60L, 75L, 90L, 105L, 120L, 135L, 150L, 165L, 180L,
0L, 15L, 30L, 45L, 60L, 75L, 90L, 105L, 120L, 135L, 150L, 165L,
180L), X1 = c(2099L, 11070L, 17273L, 21374L, 23555L, 23952L,
23811L, 21908L, 19747L, 17561L, 12668L, 6008L, 362L, 53L, 21L,
36L, 1418L, 6506L, 10922L, 12239L, 8727L, 4424L, 314L, 38L, 21L,
50L), X2 = c(2126L, 10934L, 17361L, 21301L, 23101L, 23968L, 23923L,
21940L, 19777L, 17458L, 12881L, 6051L, 323L, 40L, 34L, 46L, 1352L,
6569L, 10880L, 12534L, 8956L, 4418L, 344L, 58L, 24L, 68L), X3 = c(2074L,
11109L, 17377L, 21399L, 23159L, 23861L, 23739L, 21910L, 20088L,
17445L, 12733L, 6046L, 317L, 45L, 26L, 46L, 1432L, 6495L, 10862L,
12300L, 8720L, 4343L, 343L, 38L, 34L, 60L), average = c(2099.6666666667,
11037.6666666667, 17337, 21358, 23271.6666666667, 23927, 23824.3333333333,
21919.3333333333, 19870.6666666667, 17488, 12760.6666666667,
6035, 334, 46, 27, 42.6666666667, 1400.6666666667, 6523.3333333333,
10888, 12357.6666666667, 8801, 4395, 333.6666666667, 44.6666666667,
26.3333333333, 59.3333333333)), .Names = c("Collimator", "angle",
"X1", "X2", "X3", "average"), row.names = c(NA, -26L), class = "data.frame")
I first scale average counts for both collimator y and n to a make the highest counts 1
df <- ddply(df, .(Collimator), transform,
norm.average = average / max(average))
and plot the curves:
ggplot(df, aes(x=angle,y=norm.average,col=Collimator)) +
geom_point() + geom_line()
Using geom_line is quite unpleasing on the eye and I would rather fit to the data using stat_smooth. Each data set should be symmetric about the mean so I think a Gaussian fit should be ideal. How can I fit a Gaussian to the dataset collimator="y" and collimator="n" in ggplot2 or using base R. Also I would like to output the mean and standard deviation. Can this be done?
By definition your data is not Gaussian but a kind of Gaussian-like shape, and here is the example of the visualization of fitting:
fit <- dlply(df, .(Collimator), function(x) {
co <- coef(nls(norm.average ~ exp(-(angle - m)^2/(2 * s^2)), data = x, start = list(s = 50, m = 80)))
stat_function(fun = function(x) exp(-(x - co["m"])^2/(2 * co["s"]^2)), data = x)
})
ggplot(df, aes(x = angle, y = norm.average, col = Collimator)) + geom_point() + fit
Updated
To obtain the parameters:
fit <- dlply(df, .(Collimator), function(x) {
co <- coef(nls(norm.average ~ exp(-(angle - m)^2/(2 * s^2)), data = x, start = list(s = 50, m = 80)))
r <- stat_function(fun = function(x) exp(-(x - co["m"])^2/(2 * co["s"]^2)), data = x)
attr(r, ".coef") <- co
r
})
then,
> ldply(fit, attr, ".co")
Collimator s m
1 n 52.99117 82.60820
2 y 21.99518 86.61268

melting multiple spans of variables

(still) new to r, and very confused as to how I should accomplish multiple melts of my data. Here is a subset:
df <- structure(list(Subject = c(101L, 101L, 101L, 102L, 102L, 102L
), Condition = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("apass",
"vpas"), class = "factor"), FreqCode = structure(c(1L, 1L, 1L,
2L, 2L, 2L), .Label = c("LessVerbal", "MoreVerbal"), class = "factor"),
Item = c(1L, 4L, 7L, 1L, 4L, 7L), Len = c(80L, 68L, 85L,
68L, 85L, 79L), R1_1.RT = c(237L, 203L, 207L, 336L, 487L,
340L), R1_2.RT = c(177L, 225L, 162L, 634L, 590L, 347L), R1_3.RT = c(200L,
226L, 212L, 707L, 653L, 379L), R1.RT = c(614L, 654L, 581L,
1677L, 1730L, 1066L), R1_1 = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = "The", class = "factor"), R1_2 = structure(c(3L,
1L, 2L, 1L, 2L, 4L), .Label = c("antique", "course", "new",
"road"), class = "factor"), R1_3 = structure(c(4L, 1L, 2L,
1L, 2L, 3L), .Label = c("car", "materials", "surfaces", "technology"
), class = "factor"), R1 = structure(c(3L, 1L, 2L, 1L, 2L,
4L), .Label = c("The antique car", "The course materials",
"The new technology", "The road surfaces"), class = "factor")), .Names = c("Subject",
"Condition", "FreqCode", "Item", "Len", "R1_1.RT", "R1_2.RT",
"R1_3.RT", "R1.RT", "R1_1", "R1_2", "R1_3", "R1"), class = "data.frame", row.names =
c(NA,
-6L))
My goal is to get output that (in part) looks like this:
Region RT WordRegion Word
R1_1.RT 237 R1_1 the
...
R1_2.RT 177 R1_2 new
...
EDIT: The variable ending with ".RT" (e.g., R1_1.RT) are Region names and will be melted into a Region column. The variables ending in numbers (e.g., R1_1) correspond exactly to the Region names and their associated values. I want them to be melted alongside the Region names so that I can analyze them in relation to the Region column
In the first part of the code, I melt all of the values into a Region column and change the value to RT. This seems to work fine:
#long transform (with individual regions at end)
SmallMelt1 = melt(df, measure.vars = c("R1_1.RT", "R1_2.RT", "R1_3.RT", "R1.RT"), var = "Region")
#change newly created column name to "RT" (note:you have to change the number in [] to match your data)
colnames(SmallMelt1)[11 ] <- "RT"
But I don't get how to simultaneously melt another span of variables such that they will line up vertically with the first span. I want to do something like this, after the first melt, but it does not work:
#Second Melt for region names (doesn't work)
SmallMelt2 = melt(SmallMelt1, measure.vars = c("R1_1", "R1_2", "R1_3", "R1"), var = "WordRegion")
#Change name to Word
colnames(SmallMelt2)[9] <- "Word" #add col number for "value" here
Please let me know if you need any clarification. I hope someone can help... thanks in advance - DT
So, after consulting with someone off-list, I found the solution. My mistake was that I was trying to run the second step on the output of the first step. By running the two steps independently on the original data and then concatenating, I get the right result.
SmallMelt1 = melt(df, measure.vars = c("R1_1.RT", "R1_2.RT", "R1_3.RT", "R1.RT"), var = "Region")
SmallMelt2 = melt(df, measure.vars = c("R1_1", "R1_2", "R1_3", "R1"), var = "WordRegion")
SmallMelt3=cbind(SmallMelt1,SmallMelt2[,11])

Resources