Deleting Duplicate Data & Replacing missing values in certain areas - r

So I have been looking at this code which is originally an excel sheet. Once the data set is put into R Studio then I have a few issues.
First of all I changed all the blank cells into NA once I run
CarparkData[is.na(CarparkData)] <- 0
it only changes the data which was originally NA not a blank cell.
Secondly deleting duplicate data, I used the following code and nothing happened.
library("dplyr")
install.packages("tidyverse")
library(tidyverse)
x <-CarparkData
duplicated(x)
x[duplicated(x),]
x[!duplicated(x),]
As I have a row for Date and Time I would like to use this as the column to delete the rows of duplicated data. As I have data which are the same but they are at different times compared to data which is the same and the date and time is the same.
And Thirdly Replacing missing Values
Some of the data has FULL written on it and I would like to home into one column and then change FULL to the number that is full in that specific car park, so changing the FULL cells in that column and not all the FULL cells.
Sample Data
> dput(head(CarparkData))
structure(list(Parnell = c(188L, 183L, 185L, 229L, 237L, 272L
), Ilac = c(665, 683, 694, 769, 786, 839), Jervis = c(421, 408,
403, 417, 423, 455), Arnotts = c(340, 344, 350, 359, 359, 355
), Malboro = c(160L, 160L, 156L, 157L, 173L, 207L), Abbey = c(0,
0, 0, 0, 0, 0), `Thomas Street` = c(173, 173, 173, 186, 189,
198), `Christ Church` = c(77, 76, 74, 73, 83, 91), Setanta = structure(c(24L,
23L, 23L, NA, NA, 46L), .Label = c("10", "100", "101", "102",
"103", "104", "107", "108", "110", "111", "112", "113", "114",
"115", "120", "123", "125", "128", "129", "131", "14", "17",
"19", "21", "24", "27", "28", "29", "30", "31", "32", "34", "36",
"39", "40", "44", "45", "47", "48", "51", "52", "53", "56", "57",
"6", "60", "63", "66", "67", "7", "70", "72", "74", "78", "79",
"80", "81", "82", "84", "85", "86", "89", "9", "91", "92", "93",
"94", "96", "98", "FULL"), class = "factor"), Dawson = c(70,
87, 83, 118, 122, 140), Trinity = c(142L, 143L, 145L, 165L, 167L,
191L), Greenrcs = structure(c(NA, 8L, 9L, NA, 4L, 5L), .Label = c("1125",
"157", "205", "250", "262", "264", "266", "267", "270", "296",
"305", "311", "319", "320", "324", "327", "342", "347", "350",
"353", "364", "371", "374", "375", "378", "379", "459", "463",
"591", "729", "754", "761", "879", "902", "903", "907", "911",
"913", "916", "917", "922", "931", "944", "955", "974", "985",
"FULL"), class = "factor"), Drury = c(148, 143, 147, 182, 193,
235), `Brown Thomas` = c(230, 231, 0, 267, 272, 293), `Date & Time` = structure(1:6, .Label = c("2019-03-19 13:43:33",
"2019-03-19 13:55:39", "2019-03-19 14:07:35", "2019-03-19 15:45:02",
"2019-03-19 16:00:02", "2019-03-19 16:45:03", "2019-03-19 17:00:02",
"2019-03-19 17:45:03", "2019-03-19 18:00:01", "2019-03-19 18:00:02",
"2019-03-19 18:45:03", "2019-03-19 19:00:01", "2019-03-19 19:00:02",
"2019-03-19 19:07:12", "2019-03-19 19:45:03", "2019-03-19 20:00:01",
"2019-03-19 20:00:02", "2019-03-19 20:45:03", "2019-03-19 21:00:01",
"2019-03-19 21:00:03", "2019-03-19 21:45:04", "2019-03-19 22:00:01",
"2019-03-19 22:00:03", "2019-03-19 22:45:04", "2019-03-19 23:00:01",
"2019-03-19 23:00:02", "2019-03-19 23:00:03", "2019-03-19 23:45:04",
"2019-03-20 00:00:01", "2019-03-20 00:00:02", "2019-03-20 00:00:03",
"2019-03-20 00:45:04", "2019-03-20 01:00:01", "2019-03-20 01:00:02",
"2019-03-20 01:00:03", "2019-03-20 01:45:04", "2019-03-20 02:00:01",
"2019-03-20 02:00:02", "2019-03-20 02:00:03", "2019-03-20 02:45:04",
"2019-03-20 03:00:01", "2019-03-20 03:00:02", "2019-03-20 03:00:03",
"2019-03-20 03:45:05", "2019-03-20 04:00:01", "2019-03-20 04:00:02",
"2019-03-20 04:00:04", "2019-03-20 04:45:05", "2019-03-20 05:00:01",
"2019-03-20 05:00:02",
Thanks.

First issue... if you want to explicity set all empty cells as NA you can have a custom function like the following:
empty_as_na <- function(x){
if("factor" %in% class(x)) x <- as.character(x) ## since ifelse wont work with factors
ifelse(as.character(x)!="", x, NA)
}
And then apply this function:
dplyr::mutate_all(df, .funs = empty_as_na)
where df is your data frame.
Second issue... for removing duplicate rows, you should look at dplyr::distinct()
Third issue... I did not get what the issue is... maybe you could clarify?
I am sorry, I can't give you a complete working example with the data you provided... but these functions should get you where you want.
EDIT
Solution for third issue based on comments...
Probably not the most elegant solution, but again, this is limited due to the not provided reprex.
Let df be your data frame, column_new your new column, column_number the column you mentioned that had numbers or FULL written, and column_carthe column where the cars are.
df %>%
mutate(
column_new = case_when(
column_number == "FULL" & column_car == "car_a" ~ 300,
column_number == "FULL" & column_car == "car_b" ~ 500,
TRUE ~ column_number
)
)

Related

Error in as.vector(data) when creating a mask for SECR

I am trying to create a mask in SECR using a .shp file. Always getting this error: Error in as.vector(data) :
no method for coercing this S4 class to a vector when i use the make.mask
this is my code:
fence <- rgdal :: readOGR('/Rdata/SECR', layer = 'building')
OGR data source with driver: ESRI Shapefile Source: "/Rdata/SECR",
layer: "building" with 2492 features
It has 1 fields
library(secr)
#This is secr 3.2.1. For overview type ?secr
> qmask = make.mask(Quenda_traps,
+ buffer = 300,
+ type = "trapbuffer",
+ poly = fence,
+ poly.habitat = "FALSE")
Error in as.vector(data) : no method for coercing this S4 class to
a vector
dput(Quenda_traps)
structure(list(x = c(390576.21, 390637.85, 390594.93, 390528.49,
390646.58, 390488.12, 390681.01, 390499.98, 390632.29, 390677.26,
390642.7, 390710.33, 390690.37, 390741.81, 390588.01, 390655.06,
390575.97, 390246.66, 390340.13, 390236.33, 390309.59, 390295.93,
390164.11, 390065.71, 390120.42, 390117.17, 390091.7, 389875.57,
390179.69, 390157.45, 390164.94, 390151.02, 390172.17, 390246.28,
390263.25, 390256.32, 390308.2, 390135.06, 390093.3, 389914.13,
389916.76, 389869.37, 389809.17, 389782.5, 389818.78, 389802.75,
389818.78, 389771.52, 389792.74, 389791.25, 389905.36, 389832.62,
389886.16, 389863.21, 389908.68, 389912.46, 389902.05, 389528.11,
389661.6, 389689.54, 389657.88, 389678.71, 389569.25, 389618.44,
389564.87, 389615.37, 389662.18, 389630.96, 389713.09, 389654.91,
389744.37, 389762.02, 389715.87, 389696.2), y = c(6451727.44,
6451613.91, 6451566.89, 6451511.85, 6451416.66, 6451402.77, 6451287.32,
6451177.83, 6451164.84, 6451108.78, 6451188.57, 6450929.54, 6450855.04,
6450723.66, 6450716.47, 6450451.11, 6450343.83, 6451821.46, 6451645.08,
6451553.05, 6451588.21, 6451541.5, 6451509.03, 6451442.56, 6451358.89,
6451222.49, 6451221.11, 6451303.03, 6451115.63, 6450989, 6450994.62,
6450797.13, 6450761.88, 6450717.22, 6450719.62, 6450399.14, 6450403.03,
6450352.38, 6450197.83, 6451841.15, 6451684.86, 6451526.92, 6451419.83,
6451441.72, 6451316.83, 6451367.43, 6451316.83, 6451235.39, 6451105.9,
6450981.72, 6450992.93, 6450910.1, 6450935.07, 6450787.37, 6450685.86,
6450684.79, 6450600.42, 6451656.26, 6451534.65, 6451395.26, 6451267.42,
6451262.1, 6451257.59, 6451248.14, 6451138.91, 6451096.22, 6451132.2,
6450964.46, 6450964.24, 6450844.98, 6450864.8, 6450717.54, 6450620.58,
6450519.48)), class = c("traps", "data.frame"), row.names = c("1001",
"1002", "1003", "1004.1", "1004.2", "1005.1", "1005.2", "1006.1",
"1006.2", "1006.3", "1006.4", "1007", "1008", "1009", "1010",
"1011", "1012", "2001", "2002", "2003.1", "2003.2", "2003.3",
"2004", "2005", "2006", "2007.1", "2007.2", "2008.1", "2008.2",
"2009.1", "2009.2", "2010.1", "2010.2", "2011.1", "2011.2", "2012.1",
"2012.2", "2013", "2014", "3001", "3002", "3003", "3004.1", "3004.2",
"3005.1", "3005.2", "3005.3", "3006", "3007", "3008.1", "3008.2",
"3009.1", "3009.2", "3010", "3011.1", "3011.2", "3012", "4001",
"4002", "4003", "4004.1", "4004.2", "4005.1", "4005.2", "4006",
"4007.1", "4007.2", "4008.1", "4008.2", "4009.1", "4009.2", "4010",
"4011", "4012"), detector = "multi", usage = structure(c(1, 1,
1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1,
1, 1, 1, 0, 1... ), .Dim = c(74L, 85L), .Dimnames = list(
c("1001", "1002", "1003", "1004.1", "1004.2", "1005.1", "1005.2",
"1006.1", "1006.2", "1006.3", "1006.4", "1007", "1008", "1009",
"1010", "1011", "1012", "2001", "2002", "2003.1", "2003.2",
"2003.3", "2004", "2005", "2006", "2007.1", "2007.2", "2008.1",
"2008.2", "2009.1", "2009.2", "2010.1", "2010.2", "2011.1",
"2011.2", "2012.1", "2012.2", "2013", "2014", "3001", "3002",
"3003", "3004.1", "3004.2", "3005.1", "3005.2", "3005.3",
"3006", "3007", "3008.1", "3008.2", "3009.1", "3009.2", "3010",
"3011.1", "3011.2", "3012", "4001", "4002", "4003", "4004.1",
"4004.2", "4005.1", "4005.2", "4006", "4007.1", "4007.2",
"4008.1", "4008.2", "4009.1", "4009.2", "4010", "4011", "4012"
), c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11",
"12", "13", "14", "15", "16", "17", "18", "19", "20", "21",
"22", "23", "24", "25", "26", "27", "28", "29", "30", "31",
"32", "33", "34", "35", "36", "37", "38", "39", "40", "41",
"42", "43", "44", "45", "46", "47", "48", "49", "50", "51",
"52", "53", "54", "55", "56", "57", "58", "59", "60", "61",
"62", "63", "64", "65", "66", "67", "68", "69", "70", "71",
"72", "73", "74", "75", "76", "77", "78", "79", "80", "81",
"82", "83", "84", "85"))), spacex = 0.240000000048894, spacey = 0.21999999973923, spacing = 79.5867492192872)

Anova loop in R

Im currently making a lot of ANOVA's in R and my code looks like this:
My data looks like this:
structure(list(Vial_nr = c(151L, 151L, 151L, 162L), Concentration = structure(c(1L,
1L, 1L, 1L), .Label = c("a", "b", "c", "d", "e", "x", "y"), class = "factor"),
Line = structure(c(1L, 1L, 1L, 1L), .Label = c("20", "23",
"40", "73"), class = "factor"), Sex = structure(c(1L, 1L,
1L, 1L), .Label = c("f", "m"), class = "factor"), Fly = structure(c(1L,
2L, 3L, 1L), .Label = c("1", "2", "3"), class = "factor"),
Temp = structure(c(1L, 1L, 1L, 1L), .Label = c("23", "29"
), class = "factor"), X0.5_sec = c(51.84, 41.28, 8.64, 28.8
), X1_sec = c(41.76, 8.64, 10.56, 42.72), X1.5_sec = c(42.72,
17.28, 10.08, 57.12), X2_sec = c(51.36, 29.76, 19.68, 71.52
), X2.5_sec = c(52.8, 44.64, 39.36, 69.12), X3_sec = c(55.68,
52.8, 58.08, 82.56), Vial = structure(c(138L, 138L, 138L,
149L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8",
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18",
"19", "20", "21", "22", "23", "24", "25", "26", "27", "28",
"29", "30", "31", "32", "33", "34", "35", "36", "37", "38",
"40", "41", "42", "43", "47", "49", "50", "56", "57", "59",
"61", "62", "63", "64", "66", "67", "68", "69", "70", "71",
"72", "73", "74", "75", "76", "77", "78", "79", "80", "81",
"82", "83", "84", "85", "86", "87", "88", "89", "90", "91",
"92", "93", "94", "95", "96", "97", "98", "99", "100", "101",
"102", "103", "104", "105", "106", "107", "108", "109", "110",
"111", "112", "113", "114", "115", "116", "117", "118", "119",
"120", "121", "122", "123", "124", "125", "126", "127", "128",
"129", "130", "131", "132", "133", "134", "135", "136", "137",
"138", "139", "140", "141", "142", "143", "144", "145", "146",
"147", "148", "149", "150", "151", "152", "153", "154", "155",
"156", "157", "158", "159", "160", "161", "162", "163", "164",
"165", "166", "167", "168", "169", "170", "171", "172", "173",
"174", "175", "176", "177", "178", "179", "180", "181", "182",
"183", "184", "185", "186", "187", "188", "189", "190", "191",
"192", "193", "194", "195", "196", "197", "198", "199", "200",
"201", "202", "203", "204", "205", "206", "207", "208", "209",
"210", "211", "212", "213", "214", "215", "216", "217", "218",
"219", "220", "221", "222", "223", "224", "225", "226", "227",
"228", "229", "230", "231", "232", "233", "234", "235", "236",
"237", "238", "239", "240", "241", "242", "243", "244", "245",
"246", "247", "248", "249", "250", "251", "252", "253", "254",
"255", "256", "257", "258", "259", "260", "261", "262", "263",
"264", "265", "266", "267", "268", "269", "270", "271", "273",
"274", "275", "276", "277", "278", "279", "280", "461", "462",
"463", "464", "465", "466", "467", "468", "469", "470", "471",
"472", "473", "474", "475", "476", "477", "478", "479", "480",
"481", "482", "483", "484", "485", "486", "487", "488", "489",
"490", "491", "492", "493", "494", "495", "496", "497", "498",
"499", "500", "501", "502", "503", "504", "505", "506", "507",
"508", "509", "510", "511", "512", "513", "514", "515", "516",
"517", "518", "519", "520", "521", "522", "523", "524", "525",
"526", "527", "528", "529", "530", "531", "532", "533", "534",
"535", "536", "537", "538", "539", "540", "541", "542", "543",
"544", "545", "546", "547", "548", "549", "550"), class = "factor")), .Names = c("Vial_nr",
"Concentration", "Line", "Sex", "Fly", "Temp", "X0.5_sec", "X1_sec",
"X1.5_sec", "X2_sec", "X2.5_sec", "X3_sec", "Vial"), row.names = 4:7, class = "data.frame")
dat <- read.table("Complete RING.txt", header =TRUE)
str(dat)
dat$Vial <- as.factor(dat$Vial)
dat$Line <- as.factor(dat$Line)
dat$Fly <- as.factor(dat$Fly)
dat$Temp <- as.factor(dat$Temp)
str(dat)
Line20af <- subset(dat, Line=="20" & Concentration=="a" & Sex=="f")
Line20bf <- subset(dat, Line=="20" & Concentration=="b" & Sex=="f")
Line20cf <- subset(dat, Line=="20" & Concentration=="c" & Sex=="f")
Line20df <- subset(dat, Line=="20" & Concentration=="d" & Sex=="f")
Line20ef <- subset(dat, Line=="20" & Concentration=="e" & Sex=="f")
Line20xf <- subset(dat, Line=="20" & Concentration=="x" & Sex=="f")
Line20yf <- subset(dat, Line=="20" & Concentration=="y" & Sex=="f")
out20 <- matrix(ncol=6, nrow=7)
colnames(out20) <- colnames(Line20af)[7:12]
rownames(out20) <- paste0("Concentration", 1:7)
for(i in 1:6){
tmp <- data.frame(Line20af[,6+i],Line20af$Temp)
colnames(tmp)<-c("y","c")
fit1=lm(y~c,data=tmp)
fit2=lm(y~1,data=tmp)
out20[1,i]<-anova(fit1,fit2)$"Pr(>F)"[2]
tmp <- data.frame(Line20bf[,6+i],Line20bf$Temp)
colnames(tmp)<-c("y","c")
fit1=lm(y~c,data=tmp)
fit2=lm(y~1,data=tmp)
out20[2,i]<-anova(fit1,fit2)$"Pr(>F)"[2]
tmp <- data.frame(Line20cf[,6+i],Line20cf$Temp)
colnames(tmp)<-c("y","c")
fit1=lm(y~c,data=tmp)
fit2=lm(y~1,data=tmp)
out20[3,i]<-anova(fit1,fit2)$"Pr(>F)"[2]
tmp <- data.frame(Line20df[,6+i],Line20df$Temp)
colnames(tmp)<-c("y","c")
fit1=lm(y~c,data=tmp)
fit2=lm(y~1,data=tmp)
out20[4,i]<-anova(fit1,fit2)$"Pr(>F)"[2]
tmp <- data.frame(Line20ef[,6+i],Line20ef$Temp)
colnames(tmp)<-c("y","c")
fit1=lm(y~c,data=tmp)
fit2=lm(y~1,data=tmp)
out20[5,i]<-anova(fit1,fit2)$"Pr(>F)"[2]
tmp <- data.frame(Line20xf[,6+i],Line20xf$Temp)
colnames(tmp)<-c("y","c")
fit1=lm(y~c,data=tmp)
fit2=lm(y~1,data=tmp)
out20[6,i]<-anova(fit1,fit2)$"Pr(>F)"[2]
tmp <- data.frame(Line20yf[,6+i],Line20yf$Temp)
colnames(tmp)<-c("y","c")
fit1=lm(y~c,data=tmp)
fit2=lm(y~1,data=tmp)
out20[7,i]<-anova(fit1,fit2)$"Pr(>F)"[2]
}
xtable(out20)
But doing this a lot of times just copy paste seems really dumb, i have tried to make temp data fields so i dont get to many datasets, however that doesn't yield any results. I have also tried to make a for loop using unique() on Concentration to make create the new datasets but then R doesn't print any results. Is there a way to optimize this piece of script to streamline my workflow and not overflow my workspace?
EDIT: added dput of data snip

Counting the Occurence of Hexadecimal Numbers - R

So I have a file which contains a large number of hexidecimal digits in pairs, and a 'NA'/missing data symbol of "??".
A4 BB 08 6F E7 88 D9 10 11 12 AC CB C8 CC #Row of data in the file.
?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? #Row of missing data in the file.
I'm attempting to pipe all of that in and get some insight into the frequency of each hexadecimal number from 0 to 256. So far I read it into a structure using the 'read table' command (call it test), and I'm really not sure exactly what to do from there. I've done a number of different things trying to suppress the lines with "??" in any column and then convert the rest to hex values and get something useful from this. If anyone can point me towards the tools I need to complete this task I'd much appreciate it.
Edit:
As per request the output of dput.
structure(list(V2 = structure(c(88L, 209L, 124L, 91L, 132L, 235L
), .Label = c("??", "00", "01", "02", "03", "04", "05", "06",
"07", "08", "09", "0A", "0B", "0C", "0D", "0E", "0F", "10", "11",
"12", "13", "14", "15", "16", "17", "18", "19", "1A", "1B", "1C",
"1D", "1E", "1F", "20", "21", "22", "23", "24", "25", "26", "27",
"28", "29", "2A", "2B", "2C", "2D", "2E", "2F", "30", "31", "32",
"33", "34", "35", "36", "37", "38", "39", "3A", "3B", "3C", "3D",
"3E", "3F", "40", "41", "42", "43", "44", "45", "46", "47", "48",
"49", "4A", "4B", "4C", "4D", "4E", "4F", "50", "51", "52", "53",
"54", "55", "56", "57", "58", "59", "5A", "5B", "5C", "5D", "5E",
"5F", "60", "61", "62", "63", "64", "65", "66", "67", "68", "69",
"6A", "6B", "6C", "6D", "6E", "6F", "70", "71", "72", "73", "74",
"75", "76", "77", "78", "79", "7A", "7B", "7C", "7D", "7E", "7F",
"80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "8A",
"8B", "8C", "8D", "8E", "8F", "90", "91", "92", "93", "94", "95",
"96", "97", "98", "99", "9A", "9B", "9C", "9D", "9E", "9F", "A0",
"A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9", "AA", "AB",
"AC", "AD", "AE", "AF", "B0", "B1", "B2", "B3", "B4", "B5", "B6",
"B7", "B8", "B9", "BA", "BB", "BC", "BD", "BE", "BF", "C0", "C1",
"C2", "C3", "C4", "C5", "C6", "C7", "C8", "C9", "CA", "CB", "CC",
"CD", "CE", "CF", "D0", "D1", "D2", "D3", "D4", "D5", "D6", "D7",
"D8", "D9", "DA", "DB", "DC", "DD", "DE", "DF", "E0", "E1", "E2",
"E3", "E4", "E5", "E6", "E7", "E8", "E9", "EA", "EB", "EC", "ED",
"EE", "EF", "F0", "F1", "F2", "F3", "F4", "F5", "F6", "F7", "F8",
"F9", "FA", "FB", "FC", "FD", "FE", "FF"), class = "factor"),
There are a number of other columns as well. I left them off as they have the same ~257 values for labels give or take a hex value here or there.
as.hexmode(names(test)) resulted in the same issue, couldn't coerce 'x' to hexmode.
Edit: Okay I had some success and I got it to do what I wanted it to do more or less.
First I wanted to merge the columns as I just wanted an overall count of the occurrences (this may even have been unnecessary)
test2 <-
c(as.character(test[,1]),as.character(test[,2]),as.character(test[,3]),as.character(test[,4]),
as.character(test[,5]), as.character(test[,6]), as.character(test[,7]),
as.character(test[,8]), as.character(test[,9]), as.character(test[,10]),
as.character(test[,11]), as.character(test[,12]), as.character(test[,13]),
as.character(test[,14]), as.character(test[,15]), as.character(test[,16]))
Then I just wanted the counts of each value:
table(test2)
No conversion to integers or any such shenanigans necessary. I feel more than a little dumb, but oh well. I am still curious though if there's a better way to get the overall count across all rows and columns of each value as the way I did it seems clunky.
Edit:
The ultimate answer was (going with my original naming convention):
table(unlist(lapply(test, as.character)))
Thank you BondedDust.
See if you get some success with:
as.hexmode ( names(test) )
The output you offer suggests a table-object has been created and teh first row would be the names (in character mode) of the entries seen below those hex-characters. It remains unclear whether you are showing the the content of an external text file or output on the console so this may be a WAG.
> res <- scan(what="")
1: A4 BB 08 6F E7 88 D9 10 11 12 AC CB C8 CC
15:
Read 14 items
> as.hexmode(res)
[1] "a4" "bb" "08" "6f" "e7" "88" "d9" "10" "11" "12" "ac" "cb" "c8" "cc"
> dput( as.hexmode(res) )
structure(c(164L, 187L, 8L, 111L, 231L, 136L, 217L, 16L, 17L,
18L, 172L, 203L, 200L, 204L), class = "hexmode")

How to calculate the mean of one variable (sales) in respect to another (id)?

I have a data set that looks like this:
id ......... date sales
19164958 ......... 2001-09-01 .... 30
39578413 ......... 2001-09-01 .... 75.6
There are about 65k observations in the data set. The data is structured in 4 columns: id (which are non-consecutive in the range of 10 to 80 millions), churn, date and sales. It describes the spending of all customers for about 3/4 of a year.
Now I shall calculate the average spending of each customer. I have been given this code:
aggr.data <- merge(data[, lapply(.SD, mean), by = c("id"),.SDcols = c("sales")],
data[, lapply(.SD, mean), by = c("id"),.SDcols = c("sales")],
c("id", "sales"))
Now I have the problem that r does not know .SD.
Can anybody please tell me what I have to change to receive the results. Or does anybody know what other commands I can use to get the average spendings of each id?
Thank you for your help
dput(head(tel))
structure(list(id = c(19164958L, 39578413L, 43061957L, 51326773L,
54271247L, 70765025L), churn = c(0L, 0L, 0L, 0L, 0L, 0L), date = structure(c(11566,
11566, 11566, 11566, 11566, 11566), class = "Date"), sales = structure(c(522L,
849L, 649L, 649L, 522L, 649L), .Label = c("100", "100.2", "100.4",
"100.6", "100.8", "101", "101.2", "101.4", "101.6", "101.8",
"102", "102.4", "102.8", "103", "103.2", "103.4", "103.6", "103.8",
"104", "104.2", "104.4", "104.8", "105", "105.2", "105.6", "105.8",
"106", "106.2", "106.4", "106.6", "106.8", "107", "107.2", "107.4",
"107.6", "108", "108.2", "108.4", "108.6", "108.8", "109", "109.2",
"109.4", "109.6", "109.8", "110", "110.2", "110.4", "110.8",
"111", "111.2", "111.4", "111.6", "111.8", "112", "112.4", "112.6",
"112.8", "113.2", "113.4", "113.6", "114", "114.2", "114.4",
"114.8", "115.2", "115.6", "116", "116.2", "116.4", "116.8",
"117", "117.2", "117.4", "117.6", "117.8", "118", "118.4", "118.8",
"119.2", "119.6", "119.8", "120", "120.4", "120.6", "120.8",
"121.2", "121.4", "121.6", "121.8", "122", "122.2", "122.4",
"122.8", "123", "123.2", "123.6", "123.8", "124", "124.4", "124.8",
"125", "125.2", "125.4", "125.6", "125.8", "126", "126.4", "126.8",
"127", "127.2", "127.6", "127.8", "128", "128.4", "128.8", "129",
"129.2", "129.4", "129.6", "130", "130.2", "130.4", "130.8",
"131.2", "131.4", "131.6", "131.8", "132", "132.4", "132.8",
"133.2", "133.4", "133.6", "133.8", "134", "134.4", "134.8",
"135", "135.2", "135.6", "135.8", "136", "136.2", "136.4", "136.8",
"137.2", "137.6", "138", "138.4", "138.6", "138.8", "139.2",
"139.6", "140", "140.2", "140.4", "140.8", "141.2", "141.4",
"141.6", "142", "142.2", "142.4", "142.6", "142.8", "143.2",
"143.6", "144", "144.2", "144.4", "144.6", "144.8", "145.2",
"145.4", "145.6", "146", "146.4", "146.6", "146.8", "147.2",
"147.6", "147.8", "148", "148.2", "148.4", "148.6", "148.8",
"149.2", "149.6", "149.8", "150", "150.2", "150.4", "150.8",
"151", "151.2", "151.6", "152", "152.4", "152.8", "153", "153.2",
"153.6", "154", "154.4", "154.6", "154.8", "155.2", "155.6",
"155.8", "156", "156.2", "156.4", "156.6", "156.8", "157.2",
"157.4", "157.6", "157.8", "158", "158.4", "158.8", "159.2",
"159.4", "159.6", "160", "160.2", "160.4", "160.8", "161.2",
"161.4", "161.6", "162", "162.4", "162.8", "163", "163.2", "163.6",
"163.8", "164", "164.4", "164.8", "165", "165.2", "165.6", "166",
"166.4", "166.8", "167.2", "167.4", "167.6", "168", "168.4",
"168.8", "169.2", "169.6", "170", "170.2", "170.4", "170.8",
"171", "171.2", "171.6", "172", "172.4", "172.8", "173.2", "173.6",
"173.8", "174", "174.4", "174.8", "175.2", "175.6", "175.8",
"176.4", "176.8", "177", "177.2", "177.6", "178", "178.2", "178.4",
"178.8", "179.2", "179.4", "179.6", "179.8", "180", "180.4",
"180.8", "181", "181.2", "181.6", "182", "182.2", "182.4", "182.8",
"183.2", "183.6", "183.8", "184", "184.4", "184.8", "185.2",
"185.6", "186", "186.4", "187.2", "187.4", "187.6", "187.8",
"188", "188.4", "188.8", "189.2", "189.6", "189.8", "190", "190.4",
"190.8", "191.6", "192", "192.4", "192.8", "193.2", "193.6",
"194", "194.4", "194.8", "195.2", "195.6", "196.4", "196.8",
"197.2", "197.6", "197.8", "198", "198.2", "198.4", "198.8",
"199.2", "199.6", "200", "200.4", "200.8", "201.2", "201.6",
"202", "202.4", "202.8", "203.6", "204", "204.4", "204.8", "205.6",
"206", "206.4", "206.6", "206.8", "207.2", "207.6", "208", "208.4",
"208.8", "209.2", "209.6", "209.8", "210", "210.4", "210.6",
"210.8", "211.2", "211.6", "212.8", "213.2", "213.6", "214",
"214.4", "214.8", "215.2", "215.4", "216", "216.2", "216.8",
"217", "217.2", "217.6", "218.4", "218.8", "219.2", "219.6",
"220", "221", "221.2", "221.6", "221.8", "222", "222.4", "223.2",
"223.6", "224.4", "224.8", "225.2", "225.6", "226", "226.2",
"226.4", "226.6", "227.2", "227.8", "228", "228.4", "228.8",
"229.2", "229.6", "229.8", "230", "230.4", "230.8", "231.6",
"232.2", "232.4", "232.8", "233.2", "233.6", "234.4", "234.8",
"235.6", "235.8", "236", "237.2", "237.4", "237.6", "238", "239.2",
"240.4", "240.8", "241.2", "241.6", "242", "242.4", "243.4",
"243.6", "244.6", "245.2", "245.6", "246", "246.4", "247.2",
"248", "249.6", "250", "250.4", "250.8", "251.2", "251.6", "252.8",
"254.4", "254.8", "255.2", "255.4", "255.6", "256", "256.4",
"256.8", "257.2", "257.6", "258.8", "259.2", "260", "261.6",
"262", "262.4", "262.8", "263.2", "263.6", "264", "264.4", "264.8",
"266", "266.8", "267.2", "267.6", "268.4", "270", "270.2", "270.4",
"271", "271.2", "271.6", "272.4", "272.8", "273.2", "274", "274.4",
"275.2", "275.6", "276", "276.8", "278.8", "279.2", "279.6",
"280", "281.6", "282", "282.6", "283.2", "284.8", "285.6", "287.2",
"289.6", "290.4", "291.2", "293.2", "295.2", "296", "296.8",
"298", "299", "30", "30.2", "30.4", "30.6", "30.8", "300.8",
"301.2", "301.6", "302.8", "303.6", "304", "304.4", "305.2",
"306", "306.4", "307.2", "308.8", "309.2", "31", "31.2", "31.4",
"31.6", "31.8", "310.8", "313.2", "313.6", "314", "315", "315.6",
"316", "316.4", "316.8", "317", "318.4", "319.6", "32", "32.2",
"32.4", "32.6", "32.8", "322", "324.8", "326.4", "326.8", "327.2",
"328.4", "329.2", "329.6", "33", "33.2", "33.4", "33.6", "33.8",
"331.6", "332.4", "332.8", "334", "338.4", "338.6", "339.2",
"34", "34.2", "34.4", "34.6", "34.8", "340", "341.2", "342",
"342.4", "347.2", "347.6", "35", "35.2", "35.4", "35.6", "35.8",
"350", "352.8", "353.2", "354", "354.8", "355.6", "357.6", "36",
"36.2", "36.4", "36.6", "36.8", "360.8", "361.6", "362", "362.4",
"363.6", "365.6", "367.6", "368", "368.4", "369.6", "37", "37.2",
"37.4", "37.6", "37.8", "371.6", "372.4", "375.6", "377", "38",
"38.2", "38.4", "38.6", "38.8", "382.6", "384.8", "385.2", "387.2",
"388", "388.4", "39", "39.2", "39.4", "39.6", "39.8", "390.4",
"391.2", "397.6", "399.6", "40", "40.2", "40.4", "40.6", "40.8",
"405.2", "408.8", "41", "41.2", "41.4", "41.6", "41.8", "411.6",
"414.4", "419.2", "42", "42.2", "42.4", "42.6", "42.8", "43",
"43.2", "43.4", "43.6", "43.8", "430.2", "432.4", "437.2", "438",
"439.6", "44", "44.2", "44.4", "44.6", "44.8", "444.8", "45",
"45.2", "45.4", "45.6", "45.8", "450", "454", "455.6", "46",
"46.2", "46.4", "46.6", "46.8", "47", "47.2", "47.4", "47.6",
"47.8", "473.2", "474", "475.6", "48", "48.2", "48.4", "48.6",
"48.8", "482.4", "49", "49.2", "49.4", "49.6", "49.8", "50",
"50.2", "50.4", "50.6", "50.8", "500", "503.2", "51", "51.2",
"51.4", "51.6", "51.8", "52", "52.2", "52.4", "52.6", "52.8",
"521.6", "53", "53.2", "53.4", "53.6", "53.8", "54", "54.2",
"54.4", "54.6", "54.8", "55", "55.2", "55.4", "55.6", "55.8",
"550", "56", "56.2", "56.4", "56.6", "56.8", "57", "57.2", "57.4",
"57.6", "57.8", "58", "58.2", "58.4", "58.6", "58.8", "59", "59.2",
"59.4", "59.6", "59.8", "60", "60.2", "60.4", "60.6", "60.8",
"61", "61.2", "61.4", "61.6", "61.8", "62", "62.2", "62.4", "62.6",
"62.8", "63", "63.2", "63.4", "63.6", "63.8", "64", "64.2", "64.4",
"64.6", "64.8", "65", "65.2", "65.4", "65.6", "65.8", "66", "66.2",
"66.4", "66.6", "66.8", "67", "67.2", "67.4", "67.6", "67.8",
"68", "68.2", "68.4", "68.6", "68.8", "69", "69.2", "69.4", "69.6",
"69.8", "70", "70.2", "70.4", "70.6", "70.8", "71", "71.2", "71.4",
"71.6", "71.8", "72", "72.2", "72.4", "72.6", "72.8", "73", "73.2",
"73.4", "73.6", "73.8", "74", "74.2", "74.4", "74.6", "74.8",
"75", "75.2", "75.4", "75.6", "75.8", "76", "76.2", "76.4", "76.6",
"76.8", "77", "77.2", "77.4", "77.6", "77.8", "78", "78.2", "78.4",
"78.6", "78.8", "79", "79.2", "79.4", "79.6", "79.8", "80", "80.2",
"80.4", "80.6", "80.8", "81", "81.2", "81.4", "81.6", "81.8",
"82", "82.2", "82.4", "82.6", "82.8", "83", "83.2", "83.4", "83.6",
"83.8", "84", "84.2", "84.4", "84.6", "84.8", "85", "85.2", "85.4",
"85.6", "85.8", "86", "86.4", "86.6", "86.8", "87", "87.2", "87.6",
"87.8", "88", "88.2", "88.4", "88.6", "88.8", "89", "89.2", "89.6",
"89.8", "90", "90.2", "90.4", "90.6", "90.8", "91", "91.2", "91.4",
"91.6", "91.8", "92", "92.2", "92.4", "92.6", "92.8", "93", "93.2",
"93.4", "93.6", "93.8", "94", "94.2", "94.4", "94.6", "94.8",
"95", "95.2", "95.4", "95.6", "95.8", "96", "96.2", "96.4", "96.6",
"96.8", "97", "97.2", "97.4", "97.6", "97.8", "98", "98.2", "98.4",
"98.6", "98.8", "99", "99.2", "99.4", "99.6", "99.8"), class = "factor")), .Names = c("id",
"churn", "date", "sales"), row.names = c(NA, 6L), class = "data.frame")
mydf$sales <- as.numeric(as.character(mydf$sales))
Using base R
tapply(mydf$sales,mydf$id,mean)
where mydf is your dataframe
Using data.table package
library(data.table)
DT<-data.table(mydf)
DT[,mean(sales),by=id]
Using plyr package
library(plyr)
ddply(mydf,.(id),meansales=mean(sales))

Computing angle between two vectors (with one vector having a specific X,Y position)

I am trying to compute the angle between two vectors, wherein one vector is fixed and the other vector is constantly moving. I already know the math in this and I found a code before:
theta <- acos( sum(a*b) / ( sqrt(sum(a * a)) * sqrt(sum(b * b)) ) )
I tried defining my a as:
a<-c(503,391)
and my b as:
b <- NM[, c("X","Y")]
When I apply the theta function I get:
Warning message:
In acos(sum(a * b)/(sqrt(sum(a * a)) * sqrt(sum(b * b)))) : NaNs produced
I would appreciate help to solve this.
And here is my sample data:
structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label =
c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12",
"13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23",
"24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34",
"35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45",
"46", "47", "48", "49", "50", "51", "52", "53", "54", "55", "56",
"57", "58", "59", "60", "61", "62", "63", "64", "65", "66", "67",
"68", "69", "70", "71", "72", "73", "74", "75", "76", "77", "78",
"79", "80", "81", "82", "83", "84", "85", "86", "87", "88", "89",
"90", "91", "92", "93", "94", "95", "96", "97", "98", "99", "100",
"101", "102", "103", "104", "105", "106", "107", "108", "109",
"110"), class = "factor"), T = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6 ), X =
c(528.04, 528.04, 528.04, 528.04, 528.04, 528.04), Y = c(10.32,
10.32, 10.32, 10.32, 10.32, 10.32), V = c(0, 0, 0, 0, 0, 0),
GD = c(0, 0, 0, 0, 0, 0), ND = c(NA, 0, 0, 0, 0, 0), ND2 = c(NA,
0, 0, 0, 0, 0), TID = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("t1",
"t10", "t100", "t101", "t102", "t103", "t104", "t105", "t106",
"t107", "t108", "t109", "t11", "t110", "t12", "t13", "t14",
"t15", "t16", "t17", "t18", "t19", "t2", "t20", "t21", "t22",
"t23", "t24", "t25", "t26", "t27", "t28", "t29", "t3", "t30",
"t31", "t32", "t33", "t34", "t35", "t36", "t37", "t38", "t39",
"t4", "t40", "t41", "t42", "t43", "t44", "t45", "t46", "t47",
"t48", "t49", "t5", "t50", "t51", "t52", "t53", "t54", "t55",
"t56", "t57", "t58", "t59", "t6", "t60", "t61", "t62", "t63",
"t64", "t65", "t66", "t67", "t68", "t69", "t7", "t70", "t71",
"t72", "t73", "t74", "t75", "t76", "t77", "t78", "t79", "t8",
"t80", "t81", "t82", "t83", "t84", "t85", "t86", "t87", "t88",
"t89", "t9", "t90", "t91", "t92", "t93", "t94", "t95", "t96",
"t97", "t98", "t99"), class = "factor")), .Names = c("A", "T", "X", "Y", "V", "GD", "ND", "ND2", "TID"), row.names = c(NA, 6L),
class = "data.frame")
Your function is not vectorized. Try this:
theta <- function(x,Y) apply(Y,1,function(y,x) acos( sum(x*y) / ( sqrt(sum(x^2)) * sqrt(sum(y^2)) ) ),x=x)
a<-c(503,391)
b <- DF[, c("X","Y")]
theta(a,b)
# 1 2 3 4 5 6
#0.6412264 0.6412264 0.6412264 0.6412264 0.6412264 0.6412264
There is a problem with the acos and atan functions in this application, as you cannot compute angles for the full circle, only for the plus quadrant. In 2D, you need two values to specify a vector, and you need two values (sin and cos) to define it in degrees/radians up to 2pi. Here is an example of the acos problem:
plot(seq(1,10,pi/20)) ## A sequence of numbers
plot(cos(seq(1,10,pi/20))) ## Their cosines
plot(acos(cos(seq(1,10,pi/20)))) ## NOT Back to the original sequence
Here's an idea:
angle <- circular::coord2rad(x, y)
plot(angle)
where "(x,y)" has "angle"
as.numeric(angle)
gives the angle in radians (0,360). To report geographical directions, convert to degrees, and other things, you can use the added parameters for the circular function, e.g.:
x <- coord2rad(ea,eo, control.circular = list(type = "directions",units = "degrees"))
plot(x)
as.numeric(x)

Resources