How to distill this data.frame into more condense data? [duplicate] - r

This question already has answers here:
Mean per group in a data.frame [duplicate]
(8 answers)
Calculate the mean by group
(9 answers)
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 1 year ago.
ID Error
EID0062, EID0175 1
EID0063 1
EID0063 1
EID0064 1
EID0069 1
EID0069 0
EID0072 0
EID0075 0
EID0075 0
EID0093 1
EID0023 0
EID0013 1
EID0062, EID0175 1
I have ~200 rows with ~150 unique IDs. I would like to create a new data.frame with just the unique IDs and have the column Error be representative if there is ever an error for that person. For example, for EID0069, there is both an error and non-error, but I would like the new df to show that person as an error. Like this:
ID Error
EID0062, EID0175 1
EID0063 1
EID0064 1
EID0069 1
EID0072 0
EID0075 0
EID0093 1
EID0023 0
EID0013 1
All the best!

Related

R won't interpret table column as numerical values [duplicate]

This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Count number of occurences for each unique value
(14 answers)
Closed 2 years ago.
I am just trying to convert a column of numbers, which R thinks are characters, to numerical values.
I have the following table:
> longtab=as.data.frame(table(long));head(longtab)
long Freq
1 189485 1
2 189486 1
3 189487 1
4 189488 1
5 189489 1
6 189490 1
I've created a new table from those data as follows:
> q=head(longtab);q
long Freq
1 189485 1
2 189486 1
3 189487 1
4 189488 1
5 189489 1
6 189490 1
When I test whether the "long" column is numeric, R tells me that it is not.
> is.numeric(q$long)
[1] FALSE
When I try to coerce "long" values to be numeric using as.numeric(), I get the following:
> as.numeric(q$long)
[1] 1 2 3 4 5 6
But these are the row numbers not the values in the "long" column. This seems like it should be a simple problem to fix but I am struggling and have been at this a while. Any help would be greatly appreciated.

how to change my dataframe based on value of a column [duplicate]

This question already has answers here:
Faster ways to calculate frequencies and cast from long to wide
(4 answers)
Closed 3 years ago.
there is a dataframe with two column as below,and i want to change it into a dataframe with 3 column
df <- data.frame(key=c('a','a','a','b','b'),value=c(1,2,2,1,3))
I have tried it in python,that's ok,but in r i have no idea
the expect output should be like
1 2 3
a 1 2 0
b 1 0 1
library(data.table)
dcast(key~value, data=df, fun.aggregate=length)
# key 1 2 3
# 1 a 1 2 0
# 2 b 1 0 1

How to convert 0=..., 1=... columns, into 1 single column [duplicate]

This question already has answers here:
collapse mulitple columns into one column and generate an index variable
(4 answers)
Reshaping data.frame from wide to long format
(8 answers)
Closed 3 years ago.
I have been tasked to tidy up some data and am having issues with trying to transform the data from this format:
id occupation_busdriver occupation_cashier occupation_nurse
1 0 0 1
2 0 1 0
3 1 0 0
my actual dataset is significantly larger, but this is the area in which I am struggling, and therefore an example for this set would be much appreciated.
I have already tried using the gather and select functions
I am looking to have the data in this format:
id occupation
1 nurse
2 cashier
3 busdriver
We can use max.col to get the column index of the max value per row and based on the index, get the column names
data.frame(df1[1], occupation = sub(".*_", "", names(df1))[-1][max.col(df1[-1])])
# id occupation
#1 1 nurse
#2 2 cashier
#3 3 busdriver

Remove columns from data frame that only contain zeros [duplicate]

This question already has answers here:
Remove columns with zero values from a dataframe
(10 answers)
Closed 4 years ago.
I'm trying to recreate a data frame (DC5_prod) that has hundreds of columns, but many without any values other than zero.
The first column in the data frame is text and the rest are numeric. Is there a way to ignore the first column while simultaneously eliminating the remainder of columns that are composed entirely of zeros?
DC5_Prod
a b c d e f
1 AK 0 0 0 0 1
2 JI 0 0 0 0 0
The above is a snippet of how it currently stands and would want an output of:
DC5_Prod
a f
1 AK 1
2 JI 0
When I attempt to utilize the solution issued on a similar question on the site:
DC5_prod[, colSums(DC5_prod != 0) > 0]
just essentially returns the first column without removing any.
Try this R base approach
> ind <- sapply(DC5_Prod, function(x) sum(x==0)) != nrow(DC5_Prod)
> DC5_Prod[,ind]
a f
1 AK 1
2 JI 0

Row numbering by group and date [duplicate]

This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
numbering by groups [duplicate]
(8 answers)
Closed 6 years ago.
I have a question about numbering rows by group AND by one further condition. I know how to do this by group but not by adding one further condition.
Suppose I have the ID and the DATE and want to create NUM as shown in the table:
ID ...... DATE...... NUM
1 20160103 ...... 1
1 20160104...... 1
1 20160104...... 2
1 20160105...... 1
1 20160105...... 2
1 20160105...... 3
1 20160106...... 1
2 20160103...... 1
2 20160103...... 2
2 20160105...... 1
Any one knows How to do this?
We can use ave from base R
df$NUM <- with(df, ave(ID, ID, DATE, FUN =seq_along))

Resources