I would like to extract the number of camera trap nights (CTN) (one column in df) per camera trap station (another column in DF) so I can work out relative abundance indices for each cameras station. For example Station 1 has had 5 triggers/events (of the same species) and has had 30 CTN. It is listed in my database 5 times (has 5 rows). I want to extract the unique CTN for Station 1 and subsequently all the other Stations in the DF.
Data frame:
EventID CameraStation CTN
001 Station 1 30
002 Station 1 30
003 Station 1 30
004 Station 1 30
005 Station 2 29
006 Station 2 29
007 Station 2 29
008 Station 2 29
009 Station 2 29
010 Station 3 31
011 Station 3 31
I have tried to use 'unique' and 'with' but do not get the result I want.
with(unique(rai.PS[c("CameraStation", "CTN")]), table(CameraStation))
I expect to get the following results;
CameraStation CTN
Station 1 30
Station 2 29
Station 3 31
I.e. Station 1 is only listed once with the outcome of CTN and in a new data frame.
But instead I get;
CameraStation
Station 1
1
Station 2
1
Station 3
1
I am assuming it is giving me the unique station once without the CTN as the criteria.
I have a df of 17 variables (my samples) with the condition location which I would like to plot based on a single gene "photosystem II protein D1 1"
View(metadata)
sample location
<chr> <chr>
1 X1344 West
2 X1345 West
3 X1365 West
4 X1366 West
5 X1367 West
6 X1419 West
7 X1420 West
8 X1421 West
9 X1473 Mid
10 X1475 Mid
11 X1528 Mid
12 X1584 East
13 X1585 East
14 X1586 East
15 X1678 East
16 X1679 East
17 X1680 East
View(countdata)
func X1344 X1345 X1365 X1366 X1367 X1419 X1420 X1421 X1473 X1475 X1528 X1584 X1585 X1586 X1678 X1679 X1680
photosystem II protein D1 1 11208 6807 3483 4091 12198 7229 7404 5606 6059 7456 4007 2514 5709 2424 2346 4447 5567
countdata contains thousands of genes but I am only showing the headers and gene of interest
ddsMat has been created like this:
ddsMat <- DESeqDataSetFromMatrix(countData = countdata,
colData = metadata,
design = ~ location)
When plotting:
library(DeSeq2)
plotCounts(ddsMat, "photosystem II protein D1 1", intgroup=c("location"))
By default, the function plots the "conditions" alphabetically eg: East-Mid-West. But I would like to order them so I can see them on the graph West-Mid-East.
Check plotCountsIMAGEhere
Is there a way of doing this?
Thanks,
I have found that you can manually change the order like this:
ddsMat$location <- factor(ddsMat$location, levels=c("West", "Mid", "East"))
I have two datasets. The first one is called Buildings, the data consists of each Building ID with its respective characteristics.
Building_ID Address Year BCR
1 Machida, TY 1994 80
2 Ueno, TY 1972 50
3 Asakusa, TY 1990 70
4 Machida, TY 1982 60
.
.
.
54634 Chiyoda, TY 2002 70
The second dataset is called Residential ID. It only has one table, consisting of the Building ID (which is the same with the Building ID in 'Buildings' dataset) which have Residential usage.
Building_ID
2
3
14
23
39
44
45
133
393
423
.
.
or something like that. What I want to do is to make a new column in my first dataset with regards to my second dataset. I want to categorize which one is a Residential building and which one is not (basically, I want to select all the Buildings ID mentioned in my second dataset and categorize it into Residential in my first dataset). If it is residential, we can name it 'Residential'and else it is 'NR' so it could look something like this:
Building_ID Address Year BCR Category
1 Machida, TY 1994 80 NR
2 Ueno, TY 1972 50 Residential
3 Asakusa, TY 1990 70 Residential
4 Machida, TY 1982 60 NR
.
.
.
54634 Chiyoda, TY 2002 70 NR
I was thinking it has something to do with ifelse or grepl but so far my code doesn't work.
I have a data frame which contains information about sales branches, customers and sales.
branch <- c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","LA","LA","LA","LA","LA","LA","LA","Tampa","Tampa","Tampa","Tampa","Tampa","Tampa","Tampa","Tampa")
customer <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21)
sales <- c(33816,24534,47735,1467,39389,30659,21074,20195,45165,37606,38967,41681,47465,3061,23412,22993,34738,19408,11637,36234,23809)
data <- data.frame(branch, customer, sales)
What I need to accomplish is to iterate over each branch, take each customer in the branch and divide the sales for that customer by the total of the branch. I need to do it to find out how much each customer is contributing towards the total sales of the corresponding branch. E.g. for customer 1 I would like to divide 33816/177600 and store this value in a new column. (177600 is the total of chicago branch)
I have tried to write a function to iterate over each row in a for loop but I am not sure how to do it at a branch level. Any guidance is appreciated.
Consider base R's ave for new column of inline aggregate which also considers same customer with multiple records within the same branch:
data$customer_contribution <- ave(data$sales, data$customer, FUN=sum) /
ave(data$sales, data$branch, FUN=sum)
data
# branch customer sales customer_contribution
# 1 Chicago 1 33816 0.190405405
# 2 Chicago 2 24534 0.138141892
# 3 Chicago 3 47735 0.268778153
# 4 Chicago 4 1467 0.008260135
# 5 Chicago 5 39389 0.221784910
# 6 Chicago 6 30659 0.172629505
# 7 LA 7 21074 0.083576241
# 8 LA 8 20195 0.080090263
# 9 LA 9 45165 0.179117441
# 10 LA 10 37606 0.149139610
# 11 LA 11 38967 0.154537126
# 12 LA 12 41681 0.165300433
# 13 LA 13 47465 0.188238887
# 14 Tampa 14 3061 0.017462291
# 15 Tampa 15 23412 0.133560003
# 16 Tampa 16 22993 0.131169705
# 17 Tampa 17 34738 0.198172193
# 18 Tampa 18 19408 0.110718116
# 19 Tampa 19 11637 0.066386372
# 20 Tampa 20 36234 0.206706524
# 21 Tampa 21 23809 0.135824795
Or less wordy:
data$customer_contribution <- with(data, ave(sales, customer, FUN=sum) /
ave(sales, branch, FUN=sum))
We can use dplyr::group_by and dplyr::mutate to calculate fractional sales of total by branch.
library(dplyr);
library(magrittr);
data %>%
group_by(branch) %>%
mutate(sales.norm = sales / sum(sales))
## A tibble: 21 x 4
## Groups: branch [3]
# branch customer sales sales.norm
# <fct> <dbl> <dbl> <dbl>
# 1 Chicago 1. 33816. 0.190
# 2 Chicago 2. 24534. 0.138
# 3 Chicago 3. 47735. 0.269
# 4 Chicago 4. 1467. 0.00826
# 5 Chicago 5. 39389. 0.222
# 6 Chicago 6. 30659. 0.173
# 7 LA 7. 21074. 0.0836
# 8 LA 8. 20195. 0.0801
# 9 LA 9. 45165. 0.179
#10 LA 10. 37606. 0.149
Since my other question got closed, here is the required data.
What I'm trying to do is have R calculate the last column 'count' towards the column city so I can map the data. Therefore I would need some kind of code to match this. Since I want to show how many participants (in count) are in the state of e.g Hawaii (HI)
zip city state latitude longitude count
96860 Pearl Harbor HI 24.859832 -168.021815 36
96863 Kaneohe Bay HI 21.439867 -157.74772 39
99501 Anchorage AK 61.216799 -149.87828 12
99502 Anchorage AK 61.153693 -149.95932 17
99506 Elmendorf AFB AK 61.224384 -149.77461 2
what I've tried is
match<- c(match(datazip$state, datazip$number))>$
but I'm really helpless trying to find a solution since I don't even know how to describe this in short. My plan afterwards is to make choropleth map with the data and believe me by now I've seen almost all the pages that try to give advice. so your help is pretty much appreciated. Thanks
# I read your sample data to a data frame
> df
zip city state latitude longitude count
1 96860 Pearl_Harbor HI 24.85983 -168.0218 36
2 96863 Kaneohe_Bay HI 21.43987 -157.7477 39
3 99501 Anchorage AK 61.21680 -149.8783 12
4 99502 Anchorage AK 61.15369 -149.9593 17
5 99506 Elmendorf_AFB AK 61.22438 -149.7746 2
# If you want to sum the number of counts by state
library(plyr)
> ddply(df, .(state), transform, count2 = sum(count))
zip city state latitude longitude count count2
1 99501 Anchorage AK 61.21680 -149.8783 12 31
2 99502 Anchorage AK 61.15369 -149.9593 17 31
3 99506 Elmendorf_AFB AK 61.22438 -149.7746 2 31
4 96860 Pearl_Harbor HI 24.85983 -168.0218 36 75
5 96863 Kaneohe_Bay HI 21.43987 -157.7477 39 75
Maybe aggregate would be a nice and simple solution for you:
df
zip city state latitude longitude count
1 96860 Pearl Harbor HI 24.85983 -168.0218 36
2 96863 Kaneohe Bay HI 21.43987 -157.7477 39
3 99501 Anchorage AK 61.21680 -149.8783 12
4 99502 Anchorage AK 61.15369 -149.9593 17
5 99506 Elmendorf AFB AK 61.22438 -149.7746 2
aggregate(df$count,by=list(df$state),sum)
Group.1 x
1 AK 31
2 HI 75
aggregate(df$count,by=list(df$city),sum)
Group.1 x
1 Anchorage 29
2 Elmendorf AFB 2
3 Kaneohe Bay 39
4 Pearl Harbor 36