xts dropping column names - r

I have a data.frame
res0 = structure(list(year = "2017", il = 11200000), .Names = c("year",
"il"), row.names = c(NA, -1L), class = "data.frame")
however, when I try to make this an xts object I lose the column names.
as.xts(x = res0[,2:ncol(res0)], order.by = as.POSIXct(paste0(res0$year,"-01-01")), name = NULL)
This returns:
[,1]
2017-01-01 11200000
instead of
il
2017-01-01 11200000

Subscripting in R drops dimensions by default. Use drop = FALSE to prevent this.
res0[, 2:ncol(res0), drop = FALSE]
Also note that this works to create an n x 1 zoo series with year as the index.
library(zoo)
z <- read.zoo(res0, FUN = c, drop = FALSE)

Related

The first two columns defined as "rownames"

I want to define the first two columns of a data frame as rownames. Actually I want to do some calculations and the data frame has to be numeric for that.
data.frame <- data_frame(id=c("A1","B2"),name=c("julia","daniel"),BMI=c("20","49"))
The values for BMI are numerical (proved with is.numeric), but the over all data.frame not. How to define the first two columns (id and name) as rownames?
Thank you in advance for any suggestions
You can combine id and name column and then assign rownames
data.frame %>%
tidyr::unite(rowname, id, name) %>%
tibble::column_to_rownames()
# BMI
#A1_julia 20
#B2_daniel 49
In base R, you can do the same in steps as
data.frame <- as.data.frame(data.frame)
rownames(data.frame) <- paste(data.frame$id, data.frame$name, sep = "_")
data.frame[c('id', 'name')] <- NULL
Not sure if the code and result below is the thing you are after:
dfout <- `rownames<-`(data.frame(BMI = as.numeric(df$BMI)),paste(df$id,df$name))
such that
> dfout
BMI
A1 julia 20
B2 daniel 49
DATA
df <- structure(list(id = structure(1:2, .Label = c("A1", "B2"), class = "factor"),
name = structure(2:1, .Label = c("daniel", "julia"), class = "factor"),
BMI = structure(1:2, .Label = c("20", "49"), class = "factor")), class = "data.frame", row.names = c(NA,
-2L))

R Programming: Creating rows on basis of date difference

I am trying to create a dataset which is based on the difference in the days of start & end date. as an example
Name Start_Date End_Date
Alice 1-1-2017 3-1-2017
John 4-3-2017 5-3-2017
Peter 12-3-2017 12-3-2017
So, the final dataset will be inclusive of the start, end date and also the difference. And eventually it should look something like
Name Date
Alice 1-1-2017
Alice 2-1-2017
Alice 3-1-2017
John 4-3-2017
John 5-3-2017
Peter 12-3-2017
Every help is Great Help. Thanks !
We can use Map to get the sequence and melt the list to data.frame`
df1[-1] <- lapply(df1[-1], as.Date, format = "%d-%m-%Y")
lst <- setNames(Map(function(x, y) seq(x, y, by = "1 day"),
df1$Start_Date, df1$End_Date), df1$Name)
library(reshape2)
melt(lst)[2:1]
data
df1 <- structure(list(Name = c("Alice", "John", "Peter"), Start_Date = structure(c(17167,
17229, 17237), class = "Date"), End_Date = structure(c(17169,
17230, 17237), class = "Date")), .Names = c("Name", "Start_Date",
"End_Date"), row.names = c(NA, -3L), class = "data.frame")
This uses the expandRows function from the package splitstackshape:
df = df %>%
mutate(days_between = as.numeric(End_Date - Start_Date),
id = row_number(Name)) %>%
expandRows("days_between") %>%
group_by(id) %>%
mutate(Date = seq(first(Start_Date),
first(End_Date) - 1,
by = 1)) %>%
ungroup()
using a for loop:
library(data.table)
library(foreach)
library(lubridate)
setDT(df)
names = df[, unique(Name)]
l = foreach(i = 1:length(names)) %do% {
# make a date sequence per name
d = df[Name == names[i], ]
s = seq(from = dmy(d$Start_Date), to = dmy(d$End_Date), by = "days")
# bind the results in a data.table
dx = data.table(name = rep(names[i], length(s)))
dx = cbind(dx, date = s)
}
rbindlist(l)

Apply a function over several columns

I am trying to use values from a look up table, to multiply corresponding values in a main table.
This is an example of some data
The look up
lu = structure(list(year = 0:12, val = c(1.6422, 1.6087, 1.5909, 1.4456,
1.4739, 1.4629, 1.467, 1.4619, 1.2588, 1.1233, 1.1664, 1.1527,
1.2337)), .Names = c("year", "val"), class = "data.frame", row.names = c(NA,
-13L))
Main data
dt = structure(list(year = c(3L, 4L, 6L, 10L, 3L, 9L, 10L, 7L, 7L,
1L), x = 1:10, y = 1:10), .Names = c("year", "x", "y"), row.names = c(NA,
-10L), class = c("data.table", "data.frame"))
I can produce the results I want by merging and then multiplying one column at a time
library(data.table)
dt = merge(dt, lu, by = "year")
dt[, xnew := x*val][, ynew := y*val]
However, I have many variables to apply this over. There have been many questions on this, but I cannot get it to work.
Using ideas from How to apply same function to every specified column in a data.table , and R Datatable, apply a function to a subset of columns , I tried
dt[, (c("xnew", "ynew")):=lapply(.SD, function(i) i* val), .SDcols=c("x", "y")]
Error in FUN(X[[i]], ...) : object 'val' not found
for (j in c("x", "y")) set(dt, j = j, value = val* dat[[j]])
Error in set(dt, j = j, value = val * dt[[j]]) : object 'val' not found
And just trying the multiplication without assigning (from Data table - apply the same function on several columns to create new data table columns) also didnt work.
dt[, lapply(.SD, function(i) i* val), .SDcols=c("x", "y")]
Error in FUN(X[[i]], ...) : object 'val' not found
Please could you point out my error. Thanks.
Im using data.table version v1.9.6.
We can try by join and then by specifying .SDcols
dt[lu, on = .(year), nomatch =0
][, c("x_new", "y_new") := lapply(.SD, `*`, val), .SDcols = x:y][]

How to save the column names and their corresponding type in R into excel?

i have a R data set with >200 columns. I need to get what class each column is and get that into excel, with col name and its corresponding class as two columns
1. Using lapply/sapply with stack/melt
You could do this using lapply/sapply to get the class of each column and then using stack from base R or melt from reshape2 to get the 2 column data.frame.
res <- stack(lapply(df, class))
#or
library(reshape2)
res1<- melt(lapply(df, class))
Then use write.csv or using any of the specialized libraries for writing to excel data i.e. XLConnect, WriteXLS etc.
write.csv(res, file="file1.csv", row.names=FALSE, quote=FALSE)
.csv files can be opened in excel
2. From the output of str
Or you could use capture.output and regex to get the required info from the str and convert it to data.frame using read.table
v1 <- capture.output(str(df))
v2 <- grep("\\$", v1, value=TRUE)
res2 <- read.table(text=gsub(" +\\$ +(.*)\\: +([A-Za-z]+) +.*", "\\1 \\2", v2),
sep="",header=FALSE,stringsAsFactors=FALSE)
head(res2,2)
# V1 V2
#1 t02.clase Factor
#2 Std_A_CLI_monto_sucursal_1 chr
data
df <-structure(list(t02.clase = structure(c(1L, 1L, 1L), .Label = "AK",
class = "factor"),Std_A_CLI_monto_sucursal_1 = c("0", "0", "0"),
Std_A_CLI_monto_sucursal_2 = c(0, 0.01303586, 0), Std_A_CLI_monto_sucursal_3 =
c(0.051311597, 0.003442244, 0.017347593), Std_A_CLI_monto_sucursal_4 = c(0L,
0L, 0L), Std_A_CLI_promociones = c(0.4736842, 0.5, 0), Std_A_CLI_dias_cliente =
c(0.57061341, 0.55492154, 0.05991441), Std_A_CLI_sucursales = c(0.05555556,
0.05555556, 0.05555556)), .Names = c("t02.clase", "Std_A_CLI_monto_sucursal_1",
"Std_A_CLI_monto_sucursal_2", "Std_A_CLI_monto_sucursal_3",
"Std_A_CLI_monto_sucursal_4", "Std_A_CLI_promociones", "Std_A_CLI_dias_cliente",
"Std_A_CLI_sucursales"), row.names = c("1", "2", "3"), class = "data.frame")

Creating new dataframe using weighted averages from dataframes within list

I have many dataframes stored in a list, and I want to create weighted averages from these and store the results in a new dataframe. For example, with the list:
dfs <- structure(list(df1 = structure(list(A = 4:5, B = c(8L, 4L), Weight = c(TRUE, TRUE), Site = c("X", "X")),
.Names = c("A", "B", "Weight", "Site"), row.names = c(NA, -2L), class = "data.frame"),
df2 = structure(list(A = c(6L, 8L), B = c(9L, 4L), Weight = c(FALSE, TRUE), Site = c("Y", "Y")),
.Names = c("A", "B", "Weight", "Site"), row.names = c(NA, -2L), class = "data.frame")),
.Names = c("df1", "df2"))
In this example, I want to use columns A, B, and Weight for the weighted averages. I also want to move over related data such as Site, and want to sum the number of TRUE and FALSE. My desired result would look something like:
result <- structure(list(Site = structure(1:2, .Label = c("X", "Y"), class = "factor"),
A.Weight = c(4.5, 8), B.Weight = c(6L, 4L), Sum.Weight = c(2L,
1L)), .Names = c("Site", "A.Weight", "B.Weight", "Sum.Weight"
), class = "data.frame", row.names = c(NA, -2L))
Site A.Weight B.Weight Sum.Weight
1 X 4.5 6 2
2 Y 8.0 4 1
The above is just a very simple example, but my real data have many dataframes in the list, and many more columns than just A and B for which I want to calculate weighted averages. I also have several columns similar to Site that are constant in each dataframe and that I want to move to the result.
I'm able to manually calculate weighted averages using something like
weighted.mean(dfs$df1$A, dfs$df1$Weight)
weighted.mean(dfs$df1$B, dfs$df1$Weight)
weighted.mean(dfs$df2$A, dfs$df2$Weight)
weighted.mean(dfs$df2$B, dfs$df2$Weight)
but I'm not sure how I can do this in a shorter, less "manual" way. Does anyone have any recommendations? I've recently learned how to lapply across dataframes in a list, but my attempts have not been so great so far.
The trick is to create a function that works for a single data.frame, then use lapply to iterate across your list. Since lapply returns a list, we'll then use do.call to rbind the resulting objects together:
foo <- function(data, meanCols = LETTERS[1:2], weightCol = "Weight", otherCols = "Site") {
means <- t(sapply(data[, meanCols], weighted.mean, w = data[, weightCol]))
sumWeight <- sum(data[, weightCol])
others <- data[1, otherCols, drop = FALSE] #You said all the other data was constant, so we can just grab first row
out <- data.frame(others, means, sumWeight)
return(out)
}
In action:
do.call(rbind, lapply(dfs, foo))
---
Site A B sumWeight
df1 X 4.5 6 2
df2 Y 8.0 4 1
Since you said this was a minimal example, here's one approach to expanding this to other columns. We'll use grepl() and use regular expressions to identify the right columns. Alternatively, you could write them all out in a vector. Something like this:
do.call(rbind, lapply(dfs, foo,
meanCols = grepl("A|B", names(dfs[[1]])),
otherCols = grepl("Site", names(dfs[[1]]))
))
using dplyr
library(dplyr)
library('devtools')
install_github('hadley/tidyr')
library(tidyr)
unnest(dfs) %>%
group_by(Site) %>%
filter(Weight) %>%
mutate(Sum=n()) %>%
select(-Weight) %>%
summarise_each(funs(mean=mean(., na.rm=TRUE)))
gives the result
# Site A B Sum
#1 X 4.5 6 2
#2 Y 8.0 4 1
Or using data.table
library(data.table)
DT <- rbindlist(dfs)
DT[(Weight)][, c(lapply(.SD, mean, na.rm = TRUE),
Sum=.N), by = Site, .SDcols = c("A", "B")]
# Site A B Sum
#1: X 4.5 6 2
#2: Y 8.0 4 1
Update
In response to #jazzuro's comment, Using dplyr 0.3, I am getting
unnest(dfs) %>%
group_by(Site) %>%
summarise_each(funs(weighted.mean=stats::weighted.mean(., Weight),
Sum.Weight=sum(Weight)), -starts_with("Weight")) %>%
select(Site:B_weighted.mean, Sum.Weight=A_Sum.Weight)
# Site A_weighted.mean B_weighted.mean Sum.Weight
#1 X 4.5 6 2
#2 Y 8.0 4 1

Resources