I am trying to create a summary for this data set
Morph ID black white orange green
1 O 1 2 1 0 3
2 O 2 2 1 3 0
3 O 3 2 1 1 2
4 O 4 3 0 2 1
5 O 5 3 0 2 1
6 O 6 3 0 1 2
7 O 7 3 0 1 2
8 O 8 3 0 3 0
9 O 9 0 3 2 1
10 O 10 3 0 3 0
11 O 11 3 0 1 2
12 O 12 0 3 2 1
13 O 13 3 0 2 1
14 O 14 3 0 2 1
15 O 15 2 1 1 2
I created the summary below before with a data set that has the exact same format.
n mean sd min Q1 median Q3 max percZero Choice se
sum.greenO 15 0.8666667 1.187234 0 0 0 2 3 60.00000 Orange 0.3065424
sum.greenG 15 2.1333333 1.187234 0 1 3 3 3 13.33333 Green 0.3065424
I used the function Summarize() but this function is no longer working.
I need to create the same bar graph I made for this previous data set, which I can't do without "n", "sd", or "se". (I created "se" using "n" and "sd" - it didn't come with the initial function output).
I am confused about how a function can stop working? Is there an alternative function I am not aware of?
Please let me know if this doesn't make any sense.
The following R packages on CRAN all provide a function called "Summarize" with a capital S:
> collidr::CRAN_packages_and_functions() %>% filter(function_names == "Summarize")
package_names function_names
1 alakazam Summarize
2 basket Summarize
3 bayesm Summarize
4 ChemoSpec Summarize
5 ChemoSpecUtils Summarize
6 cold Summarize
7 dataMaid Summarize
8 fastJT Summarize
9 FSA Summarize
10 GLMpack Summarize
11 LAGOSNE Summarize
12 lslx Summarize
13 MapGAM Summarize
14 MetaIntegrator Summarize
15 NetMix Summarize
16 PKNCA Summarize
17 ppclust Summarize
18 qad Summarize
19 radiant.model Summarize
20 ssmrob Summarize
Of course it is not guaranteed you made the previous summary with one of them, but hopefully this helps you find the right one.
Related
I have this two dataframe CDD26_FF (5593 rows) and CDD_HI (5508 rows) having a structure (columns) like below. CDDs are "consecutive dry days", and the two table show species exposure to CDD in far future (FF) and historical period (HI).
I want to focus only on "Biom" and "Species_name" columnes.
As you can see the two table have same "Species_names" and same "Biom" (areas in the world with sama climatic conditions). "Biom" values goes from 0 to 15. By the way, "Species_name" do not always appear in both tables (e.g. Abromoco_ben); Furthemore, the two tables not always have the combinations of "Species_name" and "Biom" (combinations are simply population of the same species belonging to that Biom)
CDD26_FF :
CDD26_FF
AreaCell
Area_total
Biom
Species_name
AreaCellSuAreaTotal
1
1
13
10
Abrocomo_ben
0.076923
1
1
8
1
Abrocomo_cin
0.125000
1
1
30
10
Abrocomo_cin
0.033333
1
2
10
1
Abrothrix_an
0.200000
1
1
44
10
Abrothrix_an
0.022727
1
3
6
2
Abrothrix_je
0.500000
1
1
7
12
Abrothrix_lo
0.142857
CDD_HI
CDD_HI
AreaCell
Area_total
Biom
Species_name
AreaCellSuAreaTot_HI
1
1
8
1
Abrocomo_cin
0.125000
1
5
30
10
Abrocomo_cin
0.166666
1
1
5
2
Abrocomo_cin
0.200000
1
1
10
1
Abrothrix_an
0.100000
1
1
44
10
Abrothrix_an
0.022727
1
6
18
1
Abrothrix_je
0.333333
1
1
23
4
Abrothrix_lo
0.130434
I want to highlight rows that have same matches of "Species_name" and "Biom": in the example they are lines 3, 4, 5 from CDD26_FF matching lines 2, 4, 5 from CDD_HI, respectively. I want to store these line in a new table, but I want to store not only "Species_name" and "Biom" column (as "compare()" function seems to do), but also all the other columns.
More precisely, I want then to calculate the ratio of "AreaCellSuAreaTot" / "AreaCellSuAreaTot_HI" from the highlighted lines.
How can I do that?
Aside from "compare()", I tried a "for" loop, but lengths of the table differ, so I tried with a 3-nested for loop, still without results. I also tried "compareDF()" and "semi_join()". No results untill now. Thank you for your help.
You could use an inner join (provided by dplyr). An inner join returns all datasets that are present in both tables/data.frames and with matching conditions (in this case: matching "Biom" and "Species_name").
Subsequently it's easy to calculate some ratio using mutate:
library(dplyr)
cdd26_f %>%
inner_join(cdd_hi, by=c("Biom", "Species_name")) %>%
mutate(ratio = AreaCellSuAreaTotal/AreaCellSuAreaTot_HI) %>%
select(Biom, Species_name, ratio)
returns
# A tibble: 4 x 3
Biom Species_name ratio
<dbl> <chr> <dbl>
1 1 Abrocomo_cin 1
2 10 Abrocomo_cin 0.200
3 1 Abrothrix_an 2
4 10 Abrothrix_an 1
Note: Remove the select-part, if you need all columns or manipulate it for other columns.
Data
cdd26_f <- readr::read_table2("CDD26_FF AreaCell Area_total Biom Species_name AreaCellSuAreaTotal
1 1 13 10 Abrocomo_ben 0.076923
1 1 8 1 Abrocomo_cin 0.125000
1 1 30 10 Abrocomo_cin 0.033333
1 2 10 1 Abrothrix_an 0.200000
1 1 44 10 Abrothrix_an 0.022727
1 3 6 2 Abrothrix_je 0.500000
1 1 7 12 Abrothrix_lo 0.142857")
cdd_hi <- readr::read_table2("CDD_HI AreaCell Area_total Biom Species_name AreaCellSuAreaTot_HI
1 1 8 1 Abrocomo_cin 0.125000
1 5 30 10 Abrocomo_cin 0.166666
1 1 5 2 Abrocomo_cin 0.200000
1 1 10 1 Abrothrix_an 0.100000
1 1 44 10 Abrothrix_an 0.022727
1 6 18 1 Abrothrix_je 0.333333
1 1 23 4 Abrothrix_lo 0.130434")
my first language isn't English so I apologize in advance for mistakes I could do. I'm newbie in R but you will notice that anyway.
I'm trying to solve the problem of having a co-occurence matrix. I have several dataframes and I am interested in 3 variables : idT, numname and numstim.
This is the unique dataframe that contains the merged data :
z=rbind(df1,df2,df3,df4,df5,df6,df7,df8,df9,df10,df11,df12,df13,df14,
df15,df16,df17,df18,df19,df20,df21,df22,df23,df24,df25,df26,df27,df28,df29,df30,df31,df32)
write.csv(z, file = ".../listz.csv")
Then I extracted the 3 variables with :
#Extract columns 3 & 6 from all the files within the list
z1 = z[,c(3,6)]
#Create a new variable 'numname' to convert name groups into numeric groups,
#then obtain levels with facNum
z1$numname <- as.numeric(z1$namegroup)
colnames(z1) <- c("namegroup", "idT", "numname")
facNum <- factor(z1$numname)
write.csv(z1, file = "...D:/z1.csv")
And data look like :
namegroup idT numname
1 GLISSEVIBREVITE 1 6
2 CINETIQUE 1 3
3 VIBRATIONS_LEGERES 1 20
4 DIFFUS 1 5
5 LIQUIDE 1 8
6 PICOTEMENTS 1 10
How to read the table : each idT is classified in a group (namegroup) and then this group is converted in a numeric variable (numname).
# Specify z1 as a data frame to make next operations
z1 = as.data.frame(z1, idT = z1$numstim, numgroup = z1$numname)
tab1 <- table(z1)
write.csv(tab1, file = ".../tab1test.csv")
out1 <- data.matrix(tab1 %*% t(tab1))
write.csv(out1, file = ".../bmtest.csv")
But the bmtest matrix doesn't look like counting pairs of idT, because only 22 users have participated and there are 32 idT, but some the numbers are much higher :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 24 10 7 7 11 7 7 8 10 8 11 8 6 11 11 12
2 10 32 27 7 5 4 7 4 4 4 5 3 2 6 6 14
3 7 27 40 0 3 1 0 2 0 0 2 2 1 2 0 15
4 7 7 0 30 7 14 15 9 15 13 13 7 5 12 13 5
5 11 5 3 7 24 7 9 20 12 13 10 19 14 20 12 7
I wanna have a matrix which shows the results of a count of idT paired together. The matrix has to look like :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 15 3 2 2 3 3 2 1 2 1 3 3 1 3 3 5
2 3 15 9 2 0 1 2 0 0 0 0 0 0 0 1 3
3 2 9 15 0 2 1 0 2 0 0 1 1 1 2 0 2
4 2 2 0 15 1 6 5 1 7 5 6 2 0 1 3 2
5 3 0 2 1 15 1 2 12 4 5 3 13 9 11 3 2
In other words, I want to see which idT have been paired together. I've looked at this topic but didn't find a way to solve my problem.
Also, I tried :
library(igraph)
library(tnet)
idT_numname <- cbind(z1$idT, z1$numname)
igraph <- graph.data.frame(idT_numname)
item_item <- projecting_tm(net = idT_numname, method="sum")
item_item <- tnet_igraph(item_item,type="weighted one-mode tnet")
itemmat <- get.adjacency(item_item,attr="weight")
itemmat #8x8 martrix of items to items
But I get error message and I don't know how to get over the "duplicated entries in the edgelist", because it seems necessary to me to have duplicated entries in order to do a co-occurrence matrix :
> idT_numname <- cbind(z1$idT, z1$numname)
> item_item <- projecting_tm(idT_numname, method="sum")
Error in as.tnet(net, type = "binary two-mode tnet") :
There are duplicated entries in the edgelist
> item_item <- as.tnet(net = idT_numname, type ="binary two-mode tnet", method="sum")
Error in as.tnet(net = idT_numname, type = "binary two-mode tnet", method = "sum") :
unused argument (method = "sum")
> item_item <- as.tnet(net = idT_numname, type ="binary two-mode tnet")
Error in as.tnet(net = idT_numname, type = "binary two-mode tnet") :
There are duplicated entries in the edgelist
Your help is greatly appreciated.
I like to do data analysis and I want to learn more and more everyday !
Thank you
I have a dataset that looks like this
Site <- c(1,2,3,4,5,6,7,8,9,10,"kingdom","phylum","class")
A <- c(0,0,1,2,4,5,6,7,13,56,"Eukaryota","Arthropoda","Insecta")
B <- c(1,0,0,0,0,4,5,7,7,8,"Eukaryota","Arthropoda","Insecta")
C <- c(2,3,0,0,4,5,67,8,43,21,"Eukaryota","Arthropoda","")
D <- c(134,0,0,2,0,0,9,0,45,55,"Eukaryota","Arthropoda","Arachnida")
site.species.sample <- data.frame(Site,A,B,C,D)
I want to select only the columns from this dataset where the row "class" is "Insecta" (i.e. in this example only columns A and B satisfy this condition). I tried this code:
site.species.sample <- site.species.sample[,site.species.sample["class",]=="Insecta"]
But got an error:
Error in `[.data.frame`(site.species.sample, , site.species.sample["class", :
undefined columns selected
So how do I do it? Thanks
Below is an option
site.species.sample[,c(TRUE,subset(site.species.sample[,-1],site.species.sample$Site=="class")=="Insecta")]
Site A B
1 1 0 1
2 2 0 0
3 3 1 0
4 4 2 0
5 5 4 0
6 6 5 4
7 7 6 5
8 8 7 7
9 9 13 7
10 10 56 8
11 kingdom Eukaryota Eukaryota
12 phylum Arthropoda Arthropoda
13 class Insecta Insecta
I would like to analyse the proportion of infected bees(DWV/TOTAL) in function of time (DAY_SINCE_TREATMENT), but how do I create a new variable: the proportion of infected bees (DWV/TOTAL)?
The dataset looks like this:
COLONY DAY_SINCE_TREATMENT CTRL DWV TOTAL
1 A 11 0 1 1
2 A 13 4 3 7
3 A 15 17 8 25
4 A 17 3 0 3
5 A 18 7 1 8
6 A 19 6 1 7
We can create the PROP variable by
DF1$PROP <- DF1$DWW/DF1$TOTAL
I have copied my code below. I start with a list of 50 small integers, representing the number of televisions owned by 50 families. My objective is shown in the object 'tv.final' below. My effort seems very wordy and inefficient.
Question: is there a better way to start with a list of 50 integers and end with a grouped data table with proportions? (Just taking my first baby steps with R, sorry for such a stupid question, but inquiring minds want to know.)
tv.data <- read.table("Tb02-08.txt",header=TRUE)
str(tv.data)
# 'data.frame': 50 obs. of 1 variable:
# $ TVs: int 1 1 1 2 6 3 3 4 2 4 ...
tv.table <- table(tv.data)
tv.table
# tv.data
# 0 1 2 3 4 5 6
# 1 16 14 12 3 2 2
tv.prop <- prop.table(tv.table)*100
tv.prop
# tv.data
# 0 1 2 3 4 5 6
# 2 32 28 24 6 4 4
tvs <- rbind(tv.table,tv.prop)
tvs
# 0 1 2 3 4 5 6
# tv.table 1 16 14 12 3 2 2
# tv.prop 2 32 28 24 6 4 4
tv.final <- t(tvs)
tv.final
# tv.table tv.prop
# 0 1 2
# 1 16 32
# 2 14 28
# 3 12 24
# 4 3 6
# 5 2 4
# 6 2 4
You can treat the object returned by table() as any other vector/matrix:
tv.table <- table(tv.data)
round(100 * tv.table/sum(tv.table))
That will give you the proportions in rounded percentage points.