Creating new variable in R: a proportion of 2 existing variables - r

I would like to analyse the proportion of infected bees(DWV/TOTAL) in function of time (DAY_SINCE_TREATMENT), but how do I create a new variable: the proportion of infected bees (DWV/TOTAL)?
The dataset looks like this:
COLONY DAY_SINCE_TREATMENT CTRL DWV TOTAL
1 A 11 0 1 1
2 A 13 4 3 7
3 A 15 17 8 25
4 A 17 3 0 3
5 A 18 7 1 8
6 A 19 6 1 7

We can create the PROP variable by
DF1$PROP <- DF1$DWW/DF1$TOTAL

Related

The r function I have previously used successful no longer works

I am trying to create a summary for this data set
Morph ID black white orange green
1 O 1 2 1 0 3
2 O 2 2 1 3 0
3 O 3 2 1 1 2
4 O 4 3 0 2 1
5 O 5 3 0 2 1
6 O 6 3 0 1 2
7 O 7 3 0 1 2
8 O 8 3 0 3 0
9 O 9 0 3 2 1
10 O 10 3 0 3 0
11 O 11 3 0 1 2
12 O 12 0 3 2 1
13 O 13 3 0 2 1
14 O 14 3 0 2 1
15 O 15 2 1 1 2
I created the summary below before with a data set that has the exact same format.
n mean sd min Q1 median Q3 max percZero Choice se
sum.greenO 15 0.8666667 1.187234 0 0 0 2 3 60.00000 Orange 0.3065424
sum.greenG 15 2.1333333 1.187234 0 1 3 3 3 13.33333 Green 0.3065424
I used the function Summarize() but this function is no longer working.
I need to create the same bar graph I made for this previous data set, which I can't do without "n", "sd", or "se". (I created "se" using "n" and "sd" - it didn't come with the initial function output).
I am confused about how a function can stop working? Is there an alternative function I am not aware of?
Please let me know if this doesn't make any sense.
The following R packages on CRAN all provide a function called "Summarize" with a capital S:
> collidr::CRAN_packages_and_functions() %>% filter(function_names == "Summarize")
package_names function_names
1 alakazam Summarize
2 basket Summarize
3 bayesm Summarize
4 ChemoSpec Summarize
5 ChemoSpecUtils Summarize
6 cold Summarize
7 dataMaid Summarize
8 fastJT Summarize
9 FSA Summarize
10 GLMpack Summarize
11 LAGOSNE Summarize
12 lslx Summarize
13 MapGAM Summarize
14 MetaIntegrator Summarize
15 NetMix Summarize
16 PKNCA Summarize
17 ppclust Summarize
18 qad Summarize
19 radiant.model Summarize
20 ssmrob Summarize
Of course it is not guaranteed you made the previous summary with one of them, but hopefully this helps you find the right one.

R: Sum column from table 2 based on value in table 1, and store result in table 1

I am a R noob, and hope some of you can help me.
I have two data sets:
- store (containing store data, including location coordinates (x,y). The location are integer values, corresponding to GridIds)
- grid (containing all gridIDs (x,y) as well as a population variable TOT_P for each grid point)
What I want to achieve is this:
For each store I want loop over the grid date, and sum the population of the grid ids close to the store grid id.
I.e basically SUMIF the grid population variable, with the condition that
grid(x) < store(x) + 1 &
grid(x) > store(x) - 1 &
grid(y) < store(y) + 1 &
grid(y) > store(y) - 1
How can I accomplish that? My own take has been trying to use different things like merge, sapply, etc, but my R inexperience stops me from getting it right.
Thanks in advance!
Edit:
Sample data:
StoreName StoreX StoreY
Store1 3 6
Store2 5 2
TOT_P GridX GridY
8 1 1
7 2 1
3 3 1
3 4 1
22 5 1
20 6 1
9 7 1
28 1 2
8 2 2
3 3 2
12 4 2
12 5 2
15 6 2
7 7 2
3 1 3
3 2 3
3 3 3
4 4 3
13 5 3
18 6 3
3 7 3
61 1 4
25 2 4
5 3 4
20 4 4
23 5 4
72 6 4
14 7 4
178 1 5
407 2 5
26 3 5
167 4 5
58 5 5
113 6 5
73 7 5
76 1 6
3 2 6
3 3 6
3 4 6
4 5 6
13 6 6
18 7 6
3 1 7
61 2 7
25 3 7
26 4 7
167 5 7
58 6 7
113 7 7
The output I am looking for is
StoreName StoreX StoreY SUM_P
Store1 3 6 479
Store2 5 2 119
I.e for store1 it is the sum of TOT_P for Grid fields X=[2-4] and Y=[5-7]
One approach would be to use dplyr to calculate the difference between each store and all grid points and then group and sum based on these new columns.
#import library
library(dplyr)
#create example store table
StoreName<-paste0("Store",1:2)
StoreX<-c(3,5)
StoreY<-c(6,2)
df.store<-data.frame(StoreName,StoreX,StoreY)
#create example population data (copied example table from OP)
df.pop
#add dummy column to each table to enable cross join
df.store$k=1
df.pop$k=1
#dplyr to join, calculate absolute distance, filter and sum
df.store %>%
inner_join(df.pop, by='k') %>%
mutate(x.diff = abs(StoreX-GridX), y.diff=abs(StoreY-GridY)) %>%
filter(x.diff<=1, y.diff<=1) %>%
group_by(StoreName) %>%
summarise(StoreX=max(StoreX), StoreY=max(StoreY), tot.pop = sum(TOT_P) )
#output:
StoreName StoreX StoreY tot.pop
<fctr> <dbl> <dbl> <int>
1 Store1 3 6 721
2 Store2 5 2 119

How to do a co-occurrence matrix from multiple data frames in R

my first language isn't English so I apologize in advance for mistakes I could do. I'm newbie in R but you will notice that anyway.
I'm trying to solve the problem of having a co-occurence matrix. I have several dataframes and I am interested in 3 variables : idT, numname and numstim.
This is the unique dataframe that contains the merged data :
z=rbind(df1,df2,df3,df4,df5,df6,df7,df8,df9,df10,df11,df12,df13,df14,
df15,df16,df17,df18,df19,df20,df21,df22,df23,df24,df25,df26,df27,df28,df29,df30,df31,df32)
write.csv(z, file = ".../listz.csv")
Then I extracted the 3 variables with :
#Extract columns 3 & 6 from all the files within the list
z1 = z[,c(3,6)]
#Create a new variable 'numname' to convert name groups into numeric groups,
#then obtain levels with facNum
z1$numname <- as.numeric(z1$namegroup)
colnames(z1) <- c("namegroup", "idT", "numname")
facNum <- factor(z1$numname)
write.csv(z1, file = "...D:/z1.csv")
And data look like :
namegroup idT numname
1 GLISSEVIBREVITE 1 6
2 CINETIQUE 1 3
3 VIBRATIONS_LEGERES 1 20
4 DIFFUS 1 5
5 LIQUIDE 1 8
6 PICOTEMENTS 1 10
How to read the table : each idT is classified in a group (namegroup) and then this group is converted in a numeric variable (numname).
# Specify z1 as a data frame to make next operations
z1 = as.data.frame(z1, idT = z1$numstim, numgroup = z1$numname)
tab1 <- table(z1)
write.csv(tab1, file = ".../tab1test.csv")
out1 <- data.matrix(tab1 %*% t(tab1))
write.csv(out1, file = ".../bmtest.csv")
But the bmtest matrix doesn't look like counting pairs of idT, because only 22 users have participated and there are 32 idT, but some the numbers are much higher :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 24 10 7 7 11 7 7 8 10 8 11 8 6 11 11 12
2 10 32 27 7 5 4 7 4 4 4 5 3 2 6 6 14
3 7 27 40 0 3 1 0 2 0 0 2 2 1 2 0 15
4 7 7 0 30 7 14 15 9 15 13 13 7 5 12 13 5
5 11 5 3 7 24 7 9 20 12 13 10 19 14 20 12 7
I wanna have a matrix which shows the results of a count of idT paired together. The matrix has to look like :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 15 3 2 2 3 3 2 1 2 1 3 3 1 3 3 5
2 3 15 9 2 0 1 2 0 0 0 0 0 0 0 1 3
3 2 9 15 0 2 1 0 2 0 0 1 1 1 2 0 2
4 2 2 0 15 1 6 5 1 7 5 6 2 0 1 3 2
5 3 0 2 1 15 1 2 12 4 5 3 13 9 11 3 2
In other words, I want to see which idT have been paired together. I've looked at this topic but didn't find a way to solve my problem.
Also, I tried :
library(igraph)
library(tnet)
idT_numname <- cbind(z1$idT, z1$numname)
igraph <- graph.data.frame(idT_numname)
item_item <- projecting_tm(net = idT_numname, method="sum")
item_item <- tnet_igraph(item_item,type="weighted one-mode tnet")
itemmat <- get.adjacency(item_item,attr="weight")
itemmat #8x8 martrix of items to items
But I get error message and I don't know how to get over the "duplicated entries in the edgelist", because it seems necessary to me to have duplicated entries in order to do a co-occurrence matrix :
> idT_numname <- cbind(z1$idT, z1$numname)
> item_item <- projecting_tm(idT_numname, method="sum")
Error in as.tnet(net, type = "binary two-mode tnet") :
There are duplicated entries in the edgelist
> item_item <- as.tnet(net = idT_numname, type ="binary two-mode tnet", method="sum")
Error in as.tnet(net = idT_numname, type = "binary two-mode tnet", method = "sum") :
unused argument (method = "sum")
> item_item <- as.tnet(net = idT_numname, type ="binary two-mode tnet")
Error in as.tnet(net = idT_numname, type = "binary two-mode tnet") :
There are duplicated entries in the edgelist
Your help is greatly appreciated.
I like to do data analysis and I want to learn more and more everyday !
Thank you

How to select columns that contain specific values of a chosen row in R?

I have a dataset that looks like this
Site <- c(1,2,3,4,5,6,7,8,9,10,"kingdom","phylum","class")
A <- c(0,0,1,2,4,5,6,7,13,56,"Eukaryota","Arthropoda","Insecta")
B <- c(1,0,0,0,0,4,5,7,7,8,"Eukaryota","Arthropoda","Insecta")
C <- c(2,3,0,0,4,5,67,8,43,21,"Eukaryota","Arthropoda","")
D <- c(134,0,0,2,0,0,9,0,45,55,"Eukaryota","Arthropoda","Arachnida")
site.species.sample <- data.frame(Site,A,B,C,D)
I want to select only the columns from this dataset where the row "class" is "Insecta" (i.e. in this example only columns A and B satisfy this condition). I tried this code:
site.species.sample <- site.species.sample[,site.species.sample["class",]=="Insecta"]
But got an error:
Error in `[.data.frame`(site.species.sample, , site.species.sample["class", :
undefined columns selected
So how do I do it? Thanks
Below is an option
site.species.sample[,c(TRUE,subset(site.species.sample[,-1],site.species.sample$Site=="class")=="Insecta")]
Site A B
1 1 0 1
2 2 0 0
3 3 1 0
4 4 2 0
5 5 4 0
6 6 5 4
7 7 6 5
8 8 7 7
9 9 13 7
10 10 56 8
11 kingdom Eukaryota Eukaryota
12 phylum Arthropoda Arthropoda
13 class Insecta Insecta

R: grouped data table with proportions

I have copied my code below. I start with a list of 50 small integers, representing the number of televisions owned by 50 families. My objective is shown in the object 'tv.final' below. My effort seems very wordy and inefficient.
Question: is there a better way to start with a list of 50 integers and end with a grouped data table with proportions? (Just taking my first baby steps with R, sorry for such a stupid question, but inquiring minds want to know.)
tv.data <- read.table("Tb02-08.txt",header=TRUE)
str(tv.data)
# 'data.frame': 50 obs. of 1 variable:
# $ TVs: int 1 1 1 2 6 3 3 4 2 4 ...
tv.table <- table(tv.data)
tv.table
# tv.data
# 0 1 2 3 4 5 6
# 1 16 14 12 3 2 2
tv.prop <- prop.table(tv.table)*100
tv.prop
# tv.data
# 0 1 2 3 4 5 6
# 2 32 28 24 6 4 4
tvs <- rbind(tv.table,tv.prop)
tvs
# 0 1 2 3 4 5 6
# tv.table 1 16 14 12 3 2 2
# tv.prop 2 32 28 24 6 4 4
tv.final <- t(tvs)
tv.final
# tv.table tv.prop
# 0 1 2
# 1 16 32
# 2 14 28
# 3 12 24
# 4 3 6
# 5 2 4
# 6 2 4
You can treat the object returned by table() as any other vector/matrix:
tv.table <- table(tv.data)
round(100 * tv.table/sum(tv.table))
That will give you the proportions in rounded percentage points.

Resources