How to put dates on the top column of my output in R - r

I have three columns of data
Category Date Value
A 10/12/2018 1
A 10/14/2018 2
B 10/12/2018 3
B 10/13/2018 4
C 10/12/2018 5
C 10/14/2018 6
How can I transform my output so that the output has dates on the top like this and groups the Categories?
10/12/2018 10/13/2018 10/14/2018
A 1 2
B 3 4
C 5 6
I've tried searching for crosstab and some basic R functions and appreciate your thoughts on this.

What you want is called "wide" format. There are many package and methods in R that do this kind or formatting. Bing Sun, pointed to the dplyr method. I prefer the data.table method.
## loading your data here
library(readr)
x <- read_delim("Category Date Value
A 10/12/2018 1
A 10/14/2018 2
B 10/12/2018 3
B 10/13/2018 4
C 10/12/2018 5
C 10/14/2018 6", delim = " ")
## casting your data to wide format
library(data.table)
xcast <- dcast(x, Category~Date, value.var = "Value")
xcast
returns...
Category 10/12/2018 10/13/2018 10/14/2018
1 A 1 NA 2
2 B 3 4 NA
3 C 5 NA 6

This is a reshape problem.
library(tidyr)
df %>% spread(Date,Value)
Category 10/12/2018 10/13/2018 10/14/2018
1 A 1 NA 2
2 B 3 4 NA
3 C 5 NA 6

Related

r- dynamically detect excel column names format as date (without df slicing)

I am trying to detect column dates that come from an excel format:
library(openxlsx)
df <- read.xlsx('path/df.xlsx', sheet=1, detectDates = T)
Which reads the data as follows:
# a b c 44197 44228 d
#1 1 1 NA 1 1 1
#2 2 2 NA 2 2 2
#3 3 3 NA 3 3 3
#4 4 4 NA 4 4 4
#5 5 5 NA 5 5 5
I tried to specify a fix index slice and then transform these specific columns as follows:
names(df)[4:5] <- format(as.Date(as.numeric(names(df)[4:5]),
origin = "1899-12-30"), "%m/%d/%Y")
This works well when the df is sliced for those specific columns, unfortunately, there could be the possibility that these column index's change, say from names(df)[4:5] to names(df)[2:3] for example. Which will return coerced NA values instead of dates.
data:
Note: for this data the column name is read as X4488, while read.xlsx() read it as 4488
df <- data.frame(a=rep(1:5), b=rep(1:5), c=NA, "44197"=rep(1:5), '44228'=rep(1:5), d=rep(1:5))
Expected Output:
Note: this is the original excel format for these above columns:
# a b c 01/01/2021 01/02/2021 d
#1 1 1 NA 1 1 1
#2 2 2 NA 2 2 2
#3 3 3 NA 3 3 3
#4 4 4 NA 4 4 4
#5 5 5 NA 5 5 5
How could I detect directly these excel format and change it to date without having to slice the dataframe?
We may need to only get those column names that are numbers
i1 <- !is.na(as.integer(names(df)))
and then use
names(df)[i1] <- format(as.Date(as.numeric(names(df)[i1]),
origin = "1899-12-30"), "%m/%d/%Y")
Or with dplyr
library(dplyr)
df %>%
rename_with(~ format(as.Date(as.numeric(.),
origin = "1899-12-30"), "%m/%d/%Y"), matches('^\\d+$'))

How to rearrange a data in R

I have a long data list similar to the following one:
set.seed(9)
part_number<-sample(1:5,5,replace=TRUE)
Type<-sample( c("A","B","C"),5, replace=TRUE)
rank<-sample(1:20,5,replace=TRUE)
data<-data.frame(cbind(part_number,Type,rank))
data
part_number Type rank
1 2 A 3
2 1 B 1
3 2 B 18
4 2 C 7
5 3 C 10
I want to rearrange the data in the following way:
part_number A B C
1 1
2 3 18 7
3 10
I think I need to use the reshape library. But I am not sure.
libary(tidyr)
data %>% spread(Type,rank)
# part_number A B C
# 1 1 <NA> 1 <NA>
# 2 2 3 18 7
# 3 3 <NA> <NA> 10
You would go about doing the following:
data <- reshape(data, idvar = "part_number", timevar = "Type", direction = "wide")
data
To format it exactly as you asked, I would add in,
library(tidyverse)
data %>%
arrange(part_number) %>%
dplyr::select(part_number, A = rank.A, B = rank.B, C = rank.C)
If you however had a lot more columns to rename, I would use the gsub function to rename by pattern. In addition, since now the row names are messy,
rownames(data) <- c()
Let me know if this doesn't work or this wasn't what you had in mind.

R Sum columns by index

I need to find a way to sum columns by their index,I'm working on a bigread.csv file, I'll show here a sample of the problem; I'd like for example to sum from the 2nd to the 5th and from the 6th to the 7h the following matrix:
a 1 3 3 4 5 6
b 2 1 4 3 4 1
c 1 3 2 1 1 5
d 2 2 4 3 1 3
The result has to be like this:
a 11 11
b 10 5
c 7 6
d 8 4
The columns have all different names
We can use rowSums on the subset of columns i.e 2:5 and 6:7 separately and then create a new data.frame with the output.
data.frame(df1[1], Sum1=rowSums(df1[2:5]), Sum2=rowSums(df1[6:7]))
# id Sum1 Sum2
#1 a 11 11
#2 b 10 5
#3 c 7 6
#4 d 11 4
The package dplyr has a function exactly made for that purpose:
require(dplyr)
df1 = data.frame(a=c(1,2,3,4,3,3),b=c(1,2,3,2,1,2),c=c(1,2,3,21,2,3))
df2 = df1 %>% transmute(sum1 = a+b , sum2 = b+c)
df2 = df1 %>% transmute(sum1 = .[[1]]+.[[2]], sum2 = .[[2]]+.[[3]])

R - Output of aggregate and range gives 2 columns for every column name - how to restructure?

I am trying to produce a summary table showing the range of each variable by group. Here is some example data:
df <- data.frame(group=c("a","a","b","b","c","c"), var1=c(1:6), var2=c(7:12))
group var1 var2
1 a 1 7
2 a 2 8
3 b 3 9
4 b 4 10
5 c 5 11
6 c 6 12
I used the aggregate function like this:
df_range <- aggregate(df[,2:3], list(df$group), range)
Group.1 var1.1 var1.2 var2.1 var2.2
1 a 1 2 7 8
2 b 3 4 9 10
3 c 5 6 11 12
The output looked normal, but the dimensions are 3x3 instead of 5x3 and there are only 3 names:
names(df_range)
[1] "Group.1" "var1" "var2"
How do I get this back to the normal data frame structure with one name per column? Or alternatively, how do I get the same summary table without using aggregate and range?
That is the documented output of a matrix within the data frame. You can undo the effect with:
newdf <- do.call(data.frame, df_range)
# Group.1 var1.1 var1.2 var2.1 var2.2
#1 a 1 2 7 8
#2 b 3 4 9 10
#3 c 5 6 11 12
dim(newdf)
#[1] 3 5
Here's an approach using dplyr:
library(dplyr)
df %>%
group_by(group) %>%
summarise_each(funs(max(.) - min(.)), var1, var2)

Condensing Data Frame in R

I just have a simple question, I really appreciate everyones input, you have been a great help to my project. I have an additional question about data frames in R.
I have data frame that looks similar to something like this:
C <- c("","","","","","","","A","B","D","A","B","D","A","B","D")
D <- c(NA,NA,NA,2,NA,NA,1,1,4,2,2,5,2,1,4,2)
G <- list(C=C,D=D)
T <- as.data.frame(G)
T
C D
1 NA
2 NA
3 NA
4 2
5 NA
6 NA
7 1
8 A 1
9 B 4
10 D 2
11 A 2
12 B 5
13 D 2
14 A 1
15 B 4
16 D 2
I would like to be able to condense all the repeat characters into one, and look similar to this:
J B C E
1 2 1
2 A 1 2 1
3 B 4 5 4
4 D 2 2 2
So of course, the data is all the same, it is just that it is condensed and new columns are formed to hold the data. I am sure there is an easy way to do it, but from the books I have looked through, I haven't seen anything for this!
EDIT I edited the example because it wasn't working with the answers so far. I wonder if the NA's, blanks, and unevenness from the blanks are contributing??
hereĀ“s a reshape solution:
require(reshape)
cast(T, C ~ ., function(x) x)
Changed T to df to avoid a bad habit. Returns a list, which my not be what you want but you can convert from there.
C <- c("A","B","D","A","B","D","A","B","D")
D <- c(1,4,2,2,5,2,1,4,2)
my.df <- data.frame(id=C,val=D)
ret <- function(x) x
by.df <- by(my.df$val,INDICES=my.df$id,ret)
This seems to get the results you are looking for. I'm assuming it's OK to remove the NA values since that matches the desired output you show.
T <- na.omit(T)
T$ind <- ave(1:nrow(T), T$C, FUN = seq_along)
reshape(T, direction = "wide", idvar = "C", timevar = "ind")
# C D.1 D.2 D.3
# 4 2 1 NA
# 8 A 1 2 1
# 9 B 4 5 4
# 10 D 2 2 2
library(reshape2)
dcast(T, C ~ ind, value.var = "D", fill = "")
# C 1 2 3
# 1 2 1
# 2 A 1 2 1
# 3 B 4 5 4
# 4 D 2 2 2

Resources