I have a vector x :
x<-rpois(16,lambda=10)
and a lookup table named u3 :
minx<-min(x)
maxx<-max(x)
dg<-maxx-minx
k<-5
sa<-ceiling(dg/k)
u1=data.frame(seq(minx,maxx,sa))
colnames(u1)<-"x"
u2<-NULL
for (i in 1:k)
{
u2[i]<-u1[i,] + sa-1
}
u2<-as.data.frame(u2)
colnames(u2)<-"y"
u3<-cbind(u1,u2)
for(i in 1:nrow(u2))
{
u3$range[i]<-paste(u1[i,],u2[i,],sep="-")
}
print(u3)
my u3 data.frame like :
x y range
1 3 5 3-5
2 6 8 6-8
3 9 11 9-11
4 12 14 12-14
5 15 17 15-17
I want to do a calculation here:
I want to every x vector look in u3 data frame variables in colums 1,2
and then if condition true,so if x values are in range,count the x values which are in the range of u3 dataframe and write the count to u3 dataframe as a new column.
something like this :
count=0
for(i in 1:length(x))
{
for(j in 1:nrow(u3))
{
u3$count[j]<-if(x[i]>=u3[j,1] & x[i]<=u3[j,2]) {count=count + 1}
}
}
but i can't make it.
Do you have an idea about it ? How can I deal such a problem ?
I dont know how to tell R to look dynamically to lookup table and write the count of it.
I want such a desired output
x y range count
1 3 5 3-5 2
2 6 8 6-8 5
3 9 11 9-11 1
4 12 14 12-14 4
5 15 17 15-17 3
Thank you
with set.seed(0)
x
[1] 13 8 14 14 11 14 12 11 9 14 11 8 2 8 10 7
u3
x y range
1 2 4 2-4
2 5 7 5-7
3 8 10 8-10
4 11 13 11-13
5 14 16 14-16
cbind(u3,data.frame(table(findInterval(x,u3$x))))
x y range Var1 Freq
1 2 4 2-4 1 1
2 5 7 5-7 2 1
3 8 10 8-10 3 5
4 11 13 11-13 4 5
5 14 16 14-16 5 4
Keep in mind, I used findInterval(x,u3$x) it only works for your example because your groupings are (x,y] ie (x inclusive, y non-inclusive)
Related
I want to create a tale like:
1 1 6 6 10 10 ...
2 2 7 7 11 11 ...
3 3 8 8 12 12 ...
4 4 9 9 13 13 ...
5 5 14 14 ...
15 15 ...
I want to use variables:
n (repeat) and m(total number of columns) and k(k=the prior columns's end number+1,for example: 6=5+1, and 10=9+1), and different number length of row
to create a table.
I know I can use like:
rep(list(1:5,6:9,10:15), each = 2)),
but how to make them as parameters using a general expression to list list(1:5,6:9,10:15,..use n,m,k expression...).
I tried to use loop for (i in 1:m) etc.. but cannot work it out
finally I want a sequence by using unlist(): 1,2,3,4,5,6,1,2,3,4,5,6......)
Many thanks.
Maybe the code below can help
len <- c(5,4,6)
res <- unlist(unname(rep(split(1:sum(len),
findInterval(1:sum(len),cumsum(len)+1)),
each = 2)))
which gives
> res
[1] 1 2 3 4 5 1 2 3 4 5 6 7 8 9 6 7 8 9 10 11 12 13 14 15 10 11 12 13 14 15
Probably, something like this would be helpful.
#Number of times to repeat
r <- 2
#Length of each sequence
len <- c(5, 4, 6)
#Get the end of the sequence
end <- cumsum(Glen)
#Calculate the start of each sequence
start <- c(1, end[-length(end)] + 1)
#Create a sequence of start and end and repeat it r times
Map(function(x, y) rep(seq(x, y), r), start, end)
#[[1]]
# [1] 1 2 3 4 5 1 2 3 4 5
#[[2]]
#[1] 6 7 8 9 6 7 8 9
#[[3]]
# [1] 10 11 12 13 14 15 10 11 12 13 14 15
You could unlist to get it as one vector.
unlist(Map(function(x, y) rep(seq(x, y), r), start, end))
Hi I have dataframe with multiple columns ,I.e first 5 columns are my metadata and remaing
columns (columns count will be even) are actual columns which need to be calculated
formula : (col6*col9) + (col7*col10) + (col8*col11)
country<-c("US","US","US","US")
name <-c("A","B","c","d")
dob<-c(2017,2018,2018,2010)
day<-c(1,4,7,9)
hour<-c(10,11,2,4)
a <-c(1,3,4,5)
d<-c(1,9,4,0)
e<-c(8,1,0,7)
f<-c(10,2,5,6)
j<-c(1,4,2,7)
m<-c(1,5,7,1)
df=data.frame(country,name,dob,day,hour,a,d,e,f,j,m)
how to get final summation if i have more columns
I have tried with below code
df$final <-(df$a*df$f)+(df$d*df$j)+(df$e*df$m)
Here is one way to do generalize the computation:
x <- ncol(df) - 5
df$final <- rowSums(df[6:(5 + x/2)] * df[(ncol(df) - x/2 + 1):ncol(df)])
# country name dob day hour a d e f j m final
# 1 US A 2017 1 10 1 1 8 10 1 1 19
# 2 US B 2018 4 11 3 9 1 2 4 5 47
# 3 US c 2018 7 2 4 4 0 5 2 7 28
# 4 US d 2010 9 4 5 0 7 6 7 1 37
My data looks like this:
x y
1 1
2 2
3 2
4 4
5 5
6 6
7 6
8 8
9 9
10 9
11 11
12 12
13 13
14 13
15 14
16 15
17 14
18 16
19 17
20 18
y is a grouping variable. I would like to see how well this grouping went.
Because of this I want to extract a sample of n pairs of cases that are grouped together by variable y
and n pairs of cases that are not grouped together by variable y. In order to calculate the number of
false positives and false negatives (either falsly grouped or not). How do I extract a sample of grouped pairs
and a sample of not-grouped pairs?
I would like the samples to look like this (for n=6) :
Grouped sample:
x y
2 2
3 2
9 9
10 9
15 14
17 14
Not-grouped sample:
x y
1 1
2 2
6 8
6 8
11 11
19 17
How would I go about this in R?
I'm not entirely clear on what you like to do, partly because I feel there is some context missing as to what you're trying to achieve. I also don't quite understand your expected output (for example, the not-grouped sample contains an entry 6 8 that does not exist in your original data...)
That aside, here is a possible approach.
# Maximum number of samples per group
n <- 3;
# Set fixed RNG seed for reproducibility
set.seed(2017);
# Grouped samples
df.grouped <- do.call(rbind.data.frame, lapply(split(df, df$y),
function(x) if (nrow(x) > 1) x[sample(min(n, nrow(x))), ]));
df.grouped;
# x y
#2.3 3 2
#2.2 2 2
#6.6 6 6
#6.7 7 6
#9.10 10 9
#9.9 9 9
#13.13 13 13
#13.14 14 13
#14.15 15 14
#14.17 17 14
# Ungrouped samples
df.ungrouped <- df[sample(nrow(df.grouped)), ];
df.ungrouped;
# x y
#7 7 6
#1 1 1
#9 9 9
#4 4 4
#3 3 2
#2 2 2
#5 5 5
#6 6 6
#10 10 9
#8 8 8
Explanation: Split df based on y, then draw min(n, nrow(x)) samples from subset x containing >1 rows; rbinding gives the grouped df.grouped. We then draw nrow(df.grouped) samples from df to produce the ungrouped df.ungrouped.
Sample data
df <- read.table(text =
"x y
1 1
2 2
3 2
4 4
5 5
6 6
7 6
8 8
9 9
10 9
11 11
12 12
13 13
14 13
15 14
16 15
17 14
18 16
19 17
20 18", header = T)
I need to extract separate tables from each excel sheet and have them as a list object. I have two lists : "allsheets" contains 38 sheets and each of sheets includes at least 2 tables, and "dataRowMeta" contains information about which rows are relevant for each table. For example,
a1 <- data.frame(y1=c(1:15),y2=c(6:20))
a2 <- data.frame(y1=c(3:18),y2=c(2:17))
allsheets <- list(a1, a2)
d1<- data.frame(starthead=c(1,9),endhead=c(2,10),startdata =c(3,11),
enddata = c(7,14),footer = c(8,15))
d2<- data.frame(starthead=c(1,10),endhead=c(2,11),startdata =c(3,12),
enddata = c(8,15),footer = c(9,16))
dataRowMeta <- list(d1,d2)
[[1]]
y1 y2
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
6 6 11
7 7 12
8 8 13
9 9 14
10 10 15
11 11 16
12 12 17
13 13 18
14 14 19
15 15 20
[[2]]
y1 y2
1 3 2
2 4 3
3 5 4
4 6 5
5 7 6
6 8 7
7 9 8
8 10 9
9 11 10
10 12 11
11 13 12
12 14 13
13 15 14
14 16 15
15 17 16
16 18 17
and here is dataRowMeta :
[[1]]
starthead endhead startdata enddata footer
1 1 2 3 7 8
2 9 10 11 14 15
[[2]]
starthead endhead startdata enddata footer
1 1 2 3 8 9
2 10 11 12 15 16
I've tried to write a loop function which would subset each sheet according to dataRowMeta, but failed to get a desired output.
I am getting an error
Error in sheet[[a[m]:b[m], ]] : incorrect number of subscripts
I guess that's because I am iterating over list, not matrices...but how to tell R to subset list in this case?
So I need 1st and 4th columns of dataRowMeta(starthead and enddata) as "start" and "end" id rows of future tables.
tables <- function(allsheets,dataRowMeta){
for(i in 1 : length(dataRowMeta)){
for (j in 1 : nrow(dataRowMeta[[i]])){
a <-""
b <- ""
a <- dataRowMeta[[i]][j:j,1]
b <- dataRowMeta[[i]][j:j,4]
for (k in 1 : length(allsheets)){
sheet <- allsheets[k]
for ( m in 1 : length(a)){
tbl <- sheet[[a[m]:b[m],]]
}
}
}
}}
Desired output : I have this for the first element of the first list(sheet1):
sheet1 <- allsheets[[1]]
tmp1 <- sheet1[dataRowMeta[[1]][1:1,1] :dataRowMeta[[1]][1:1,4] ,]
> tmp1
y1 y2
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
6 6 11
7 7 12
And need a loop which would do it for all sheets. Please help me to figure out how to get it. Thank you!
I have a table that I routinely compute with R that has three dimensions. I would like to add some tables within the (here 5) marginal tables. What I usually do is like:
A=sample(LETTERS[1:5],100, rep=T)
b=sample(letters[1:2],100, rep=T)
numbers=sample(1:3,100, rep=T)
( tab=table(A,b,numbers) )
( tab1=ftable(addmargins(tab)) )
I would like to add the sum of the first few marginal tables and then the sum of the remaining tables at the bottom, then the grand total. I can imagine that in the resulting ftable I would want the As,Bs,Cs, then the sum of those three, then the Ds, Es, and the sum of those two, then the sum of all of the tables, like:
numbers 1 2 3 Sum
A b
A a 1 5 0 6
b 4 2 2 8
Sum 5 7 2 14
B a 2 6 6 14
b 5 4 5 14
Sum 7 10 11 28
C a 3 2 5 10
b 1 2 2 5
Sum 4 4 7 15
sumac a 6 13 11 30 #### how do i add these three lines
b ....
sum ....
D a 2 1 1 4
b 4 5 3 12
Sum 6 6 4 16
E a 5 3 4 12
b 4 3 8 15
Sum 9 6 12 27
sumde a 7 4 5 20 #### and these
b ....
sum ....
sumae a 13 17 16 46
b 18 16 20 54
Sum 31 33 36 100
As usual I think the solution is prolly many fewer lines than the question. Thanks
Seth Latimer
You could do something like this:
isABC<-ifelse(A %in% c("A", "B", "C"), "ABC", "CD")
( tab=table(isABC,A,b,numbers) )
( tab1=ftable(addmargins(tab)) )
Now you have a larger table which holds even more rows than you want, but those should be easy to drop...