This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 1 year ago.
The dataframe above is the an example of the original one. I am trying to create following new dataframe based on this original one:
Thank you!
We can use xtabs from base R
xtabs(abundance ~ StationCode + SpeciesCode, df1)
-output
SpeciesCode
StationCode AME BCF BKB CAP
O-01 2 1 5 0
O-02 1 0 1 1
O-03 0 4 2 0
O-04 0 0 8 1
data
df1 <- structure(list(SpeciesCode = c("AME", "AME", "BCF", "BCF", "CAP",
"CAP", "BKB", "BKB", "BKB", "BKB"), StationCode = c("O-01", "O-02",
"O-03", "O-01", "O-04", "O-02", "O-04", "O-01", "O-02", "O-03"
), abundance = c(2L, 1L, 4L, 1L, 1L, 1L, 8L, 5L, 1L, 2L)),
class = "data.frame", row.names = c(NA,
-10L))
Related
This question already has answers here:
How to merge multiple rows by a given condition and sum?
(2 answers)
Closed 2 years ago.
I have a data frame where
Disease Genemutation Mean. Total No of pateints No.of pateints.
cancertype1 BRCA1 1 10 2
cancertype2 BRCA2 5 10 3
cancertype3 BRCA2 7 10 4
cancertype1 BRCA1 8 10 1
cancertype3 BRCA2 4 10 4
cancertype2 BRCA1 6 10 1
how do I create an new variable called cancertype 4 (from cancer type 3 and cancer type 2) that includes the number of patients that have it as a result of merging the two variable?
We can use replace with %in% to replace those values (assuming 'Disease' is character class)
df1 %>%
group_by(Disease = replace(Disease,
Disease %in% c("cancertype2", "cancertype3"), "cancertype4")) %>%
summarise(TotalNoofpateints = sum(TotalNoofpateints))
-output
# A tibble: 2 x 2
# Disease TotalNoofpateints
# <chr> <int>
#1 cancertype1 20
#2 cancertype4 40
Here is a base R option using aggregate
aggregate(
Total.No.of.pateints ~ Disease,
transform(
df,
Disease = replace(Disease, Disease %in% c("cancertype2", "cancertype3"), "cancertype4")
),
sum
)
giving
Disease Total.No.of.pateints
1 cancertype1 20
2 cancertype4 40
Data
> dput(df)
structure(list(Disease = c("cancertype1", "cancertype2", "cancertype3",
"cancertype1", "cancertype3", "cancertype2"), Genemutation = c("BRCA1",
"BRCA2", "BRCA2", "BRCA1", "BRCA2", "BRCA1"), Mean. = c(1L, 5L,
7L, 8L, 4L, 6L), Total.No.of.pateints = c(10L, 10L, 10L, 10L,
10L, 10L), No.of.pateints. = c(2L, 3L, 4L, 1L, 4L, 1L)), class = "data.frame", row.names = c(NA,
-6L))
This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 2 years ago.
I have a dataframe in which patients have multiple observations of medication use over time. Some patients have consistently used medication, others have gaps, while I am trying to count the patients which have never used medication.
I can't show the actual data but here is an example data frame of what I am working with.
patid meds
1 0
1 1
1 1
2 0
2 0
3 1
3 1
3 1
4 0
5 1
5 0
So from this two patients (4 and 2) never used medication. That's what I'm looking for.
I'm fairly new to R and have no idea how to do this, any would be appreciated.
Here is another alternative from dplyr package.
library(dplyr)
df <- data.frame(patid = c(1,1,1,2,2,3,3,3,4,5,5),
meds = c(0,1,1,0,0,1,1,1,0,1,0))
df %>%
distinct(patid, meds) %>%
arrange(desc(meds))%>%
filter(meds == 0 & !duplicated(patid))
# patid meds
#1 2 0
#2 4 0
Try this:
library(dplyr)
#Data
df <- structure(list(patid = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L,
5L, 5L), meds = c(0L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA,
-11L))
#Code
df %>% group_by(patid) %>% summarise(sum=sum(meds,na.rm=T)) %>% filter(sum==0)
# A tibble: 2 x 2
patid sum
<int> <int>
1 2 0
2 4 0
A Base R solution could be
subset(aggregate(meds ~ patid, df, sum), meds == 0)
which returns
patid meds
2 2 0
4 4 0
This question already has answers here:
Recode dates to study day within subject
(2 answers)
Closed 3 years ago.
I have data structured as below:
ID Day Desired Output
1 1 1
1 1 1
1 1 1
1 2 2
1 2 2
1 3 3
2 4 1
2 4 1
2 5 2
3 6 1
3 6 1
Is it possible to create a sequence for the desired output without using a loop? The dataset is quite large so a loop won't work, is it possible to do this with the dplyr package or maybe a combination of cumsum/diff?
An option is to group by 'ID', and then do a match on the 'Day' with the unique values of 'Day' column
library(dplyr)
df1 %>%
group_by(ID) %>%
mutate(desired = match(Day, unique(Day)))
data
df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L), Day = c(1L, 1L, 1L, 2L, 2L, 3L, 4L, 4L, 5L, 6L, 6L)), row.names = c(NA,
-11L), class = "data.frame")
This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 3 years ago.
I am trying to remove some rows of my data by adding them to a different row, in the form of another column. Is there a way I can group rows together by a certain variable?
I have tried using group_by statement in the dplyr package, but it does not seem to solve my issue.
library(dplyr)
late <- read.csv(file.choose())
late <- group_by(late, state, add = FALSE)
The data set I have (named "late") now is in this form:
ontime state count
0 AL 1
1 AL 44
null AL 3
0 AR 5
1 AR 50
...
But I would like it to be:
state count0 count1 countnull
AL 1 44 3
AR 5 50 null
...
Ultimately, I want to calculate count0/count1 for each state. So if there is a better way of going about this, I would be open to any suggestions.
You could do this with dcast() from the reshape2 package
library(reshape2)
df = data.frame(
ontime = c(0,1,NA,0,1),
state = c("AL","AL","AL","AR","AR"),
count = c(1,44,3,5,50)
)
dcast(df,state~ontime,value=count)
With spread:
library(dplyr)
library(tidyr)
df %>%
mutate(ontime = paste0('count', ontime)) %>%
spread(ontime, count)
Output:
state count0 count1 countnull
1 AL 1 44 3
2 AR 5 50 NA
Data:
df <- structure(list(ontime = structure(c(1L, 2L, 3L, 1L, 2L), .Label = c("0",
"1", "null"), class = "factor"), state = structure(c(1L, 1L,
1L, 2L, 2L), .Label = c("AL", "AR"), class = "factor"), count = c(1L,
44L, 3L, 5L, 50L)), class = "data.frame", row.names = c(NA, -5L
))
This question already has answers here:
Convert integer to class Date
(3 answers)
Closed 2 years ago.
I am trying to read a date column from a dataframe, which is stored as a string |(see column 'DateString'. This is how my data looks like:
X. Date_String ASIN Stars positive_rating
1 0 20150430 B00GKKI4IE 5 0
2 1 20150430 B00GKKI4IE 5 0
3 2 20150430 B00GKKI4IE 5 0
4 3 20150429 B00GKKI4IE 5 0
5 4 20150428 B00GKKI4IE 5 0
6 5 20150428 B00GKKI4IE 5 0
and this is what I am using to format this column as Date
data$date.of.review <- as.Date(data$Date_String, "%Y%m%d")
and getting an error message
Error in charToDate(x) :
character string is not in a standard unambiguous format
any ideas how to solve it? thanks
Here is the structure of the dataframe:
structure(list(X. = 0:5, Date_String = c(20150430L, 20150430L,
20150430L, 20150429L, 20150428L, 20150428L), ASIN = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("B00GKKI4IE", "B00I4OBXWI", "B00IB17BFM",
"B00IN2WD5C", "B00J58F0IA", "B00K7NCS9G", "B00KJEZIBS", "B00KZER5GS",
"B00MK39H68", "B00O1GTTWY"), class = "factor"), Stars = c(5L,
5L, 5L, 5L, 5L, 5L), positive_rating = c(0, 0, 0, 0, 0, 0)), .Names = c("X.",
"Date_String", "ASIN", "Stars", "positive_rating"), row.names = c(NA,
6L), class = "data.frame")
The class is integer if you wrap the variable in as.character as.date can read it.
sapply(data, class)
data$date.of.review <- as.Date(as.character(data$Date_String), "%Y%m%d")
data