I did some calculations in R and I want to produce it into excel like this
DATA1 DATA2
54.364 2.05
56.532
54.21
41.485
65.8745
54.0546
75.156
but instead is coming like this
DATA1 DATA2
54.364 2.05
56.532 2.05
54.21 2.05
41.485 2.05
65.8745 2.05
54.0546 2.05
75.156 2.05
My function to produce it in excel is
write.xlsx(c(data.frame(DATA1),data.frame(DATA2)))
Although data1 has values of 54.364, 56.532, 54.21, 41.485, 65.8745, 54.0546, 75.156 and data2 2.05
Excel has a rather bizarre "copy down" feature where it copies a function returning a scalar into every cell in the calling range. It appears that this is happening to you here.
One way to work round this is to use Application.Caller at the top of the function that's called directly. This returns a Range object denoting the calling range. You can then pad your function return values with #N/A. You do this by inserting variant types into your array set to VT_ERROR and the error vales set to xlErrNa. You can use CVErr(xlErrNa) to do that in one step. Padding with #N/A matches what Excel does with oversized calling ranges for functions returning arrays.
Following code can also be used:
(using #akrun's data in https://stackoverflow.com/questions/25547210/how-to-produce-this-order-in-r)
DATA1 <- c(54.364, 56.532, 54.21, 41.845, 65.8745, 54.0546, 75.156)
DATA2 <- 2.05
DATA3 <- c(2.2, 2.4, 2.32)
outdf = data.frame(data1=numeric(), data2=numeric(), data3=numeric())
for(i in 1:length(DATA1)) outdf[i,]=c(DATA1[i],0,0)
for(i in 1:length(DATA2)) outdf$data2[i]=DATA2[i]
for(i in 1:length(DATA3)) outdf$data3[i]=DATA3[i]
outdf
data1 data2 data3
1 54.3640 2.05 2.20
2 56.5320 0.00 2.40
3 54.2100 0.00 2.32
4 41.8450 0.00 0.00
5 65.8745 0.00 0.00
6 54.0546 0.00 0.00
7 75.1560 0.00 0.00
Then you can use outdf with write.xlsx .
Related
I need to merge two lists with each other but I am not getting what I want and I think it is because the "Date" column is in two different formats. I have a list called li and in this list there are 12 lists each with the following format:
> tail(li$fxe)
Date fxe
3351 2020-06-22 0.0058722768
3352 2020-06-23 0.0044256216
3353 2020-06-24 -0.0044998220
3354 2020-06-25 -0.0027309539
3355 2020-06-26 0.0002832672
3356 2020-06-29 0.0007552346
I am trying to merge each of these unique lists with a different list called factors which looks like :
> tail(factors)
Date Mkt-RF SMB HML RF
3351 20200622 0.0071 0.83 -1.42 0.000
3352 20200623 0.0042 0.15 -0.56 0.000
3353 20200624 -0.0261 -0.52 -1.28 0.000
3354 20200625 0.0112 0.25 0.50 0.000
3355 20200626 -0.0243 0.16 -1.37 0.000
3356 20200629 0.0151 1.25 1.80 0.000
The reason I need this structure is because I am trying to send them to a function I wrote to do linear regressions. But the first line of my function aims to merge these lists. When I merge them I end up with a null structure even thought my lists clearly have the same number of rows. In my function df is li. The embedded list of li is confusing me. Can someone help please?
Function I want to use:
Bf <- function(df, fac){
#This function calculates the beta of the french fama factor #using linear regression
#Input: df = a dataframe containg returns of the security
# fac = dataframe containing excess market retrun and
# french fama 3 factor
#Output: a Beta vectors of the french fama model
temp <- merge(df, fac, by="Date")
temp <- temp[, !names(temp) %in% "Date"]
temp[ ,1] <- temp[,1] - temp$RF return(lm(temp[,1]~temp[,2]+temp[,3]+temp[,4])$coeff)
}
a: you are dealing with data frames and not lists
b: if you want to merge them, you need to modify the factors$date column to match that of li$fxe$date
try to do:
factors$date <- as.Date(strptime(factors$date, format = "%Y%M%d"))
This should convert, the factors column to "Date" format.
I have a data table with 411 rows and 6 variables and looks like this:
> head(datapp)
TP220170823pos-42 TP220170823pos-43 TP220170823pos-44 TP220170823pos-48 TP220170823pos-49 TP220170823pos-50
1 22744.04 43727.39 25901.80 36771.81 23925.68 30524.43
2 0.00 0.00 0.00 37894.31 37103.62 33042.50
3 88752.27 139099.31 79979.00 130399.10 90345.01 90159.49
4 40225.14 72674.44 40121.52 61857.07 48736.48 46398.95
5 0.00 0.00 0.00 19587.50 10146.16 17582.49
6 0.00 0.00 0.00 54149.82 52733.90 54033.67
I want to change each zero value with a new random number for each variable. I first changed the name of each variable to make them similar and then used the following code:
names(datapp)[1:6] <- "intensity"
library(data.table)
setDT(as.data.frame(datapp))[intensity == 0, intensity := (runif(.N, 0, 1))]
However, when I run this code it only changes the 0's in the first column and not the other 5. Is there a way to also change the 0's with a random number in the other columns?
I have some results from a model in Python which i have saved as a .txt to render in RMarkdown.
The .txt is this.
precision recall f1-score support
0 0.71 0.83 0.77 1078
1 0.76 0.61 0.67 931
avg / total 0.73 0.73 0.72 2009
I read the file into r as,
x <- read.table(file = 'report.txt', fill = T, sep = '\n')
When i save this, r saves the results as one column (V1) instead of 5 columns as below,
V1
1 precision recall f1-score support
2 0 0.71 0.83 0.77 1078
3 1 0.76 0.61 0.67 931
4 avg / total 0.73 0.73 0.72 2009
I tried using strsplit() to split the columns, but doesn't work.
strsplit(as.character(x$V1), split = "|", fixed = T)
May be strsplit() is not the right approach? How do i get around this so that i have a [4x5] dataframe.
Thanks a lot.
Not very elegant, but this works. First we read the raw text, then we use regex to clean up, delete white space, and convert to csv readable format. Then we read the csv.
library(stringr)
library(magrittr)
library(purrr)
text <- str_replace_all(readLines("~/Desktop/test.txt"), "\\s(?=/)|(?<=/)\\s", "") %>%
.[which(nchar(.)>0)] %>%
str_split(pattern = "\\s+") %>%
map(., ~paste(.x, collapse = ",")) %>%
unlist
read.csv(textConnection(text))
#> precision recall f1.score support
#> 0 0.71 0.83 0.77 1078
#> 1 0.76 0.61 0.67 931
#> avg/total 0.73 0.73 0.72 2009
Created on 2018-09-20 by the reprex
package (v0.2.0).
Since much simpler to have python output csv, i am posting an alternative here. Just in case if it is useful as even in python needs some work.
def report_to_csv(report, title):
report_data = []
lines = report.split('\n')
# loop through the lines
for line in lines[2:-3]:
row = {}
row_data = line.split(' ')
row['class'] = row_data[1]
row['precision'] = float(row_data[2])
row['recall'] = float(row_data[3])
row['f1_score'] = float(row_data[4])
row['support'] = float(row_data[5])
report_data.append(row)
df = pd.DataFrame.from_dict(report_data)
# read the final summary line
line_data = lines[-2].split(' ')
summary_dat = []
row2 = {}
row2['class'] = line_data[0]
row2['precision'] = float(line_data[1])
row2['recall'] = float(line_data[2])
row2['f1_score'] = float(line_data[3])
row2['support'] = float(line_data[4])
summary_dat.append(row2)
summary_df = pd.DataFrame.from_dict(summary_dat)
# concatenate both df.
report_final = pd.concat([df,summary_df], axis=0)
report_final.to_csv(title+'cm_report.csv', index = False)
Function inspired from this solution
So I would like to read in data and do a summation of one of the columns of 18000 points of data. The thing is the summation requires the variable Tc and then to subtract five iterations before. I don't know how to make it start at its summation 5 data points down so it does not give me an error that there is nothing to subtract in the first 4 data points.
Here is what a small portion of the data looks like:
head(data)
Time Record Ux Uy Uz Ts Tc Tn To Tp Tq
1 2016-09-07 09:00:00.1 38651948 0.46 1.21 -0.26 19.53 19.31726 20.43197 19.39093 19.54993 NAN
2 2016-09-07 09:00:00.2 38651949 0.53 1.24 -0.24 19.48 19.30391 20.43744 19.37996 19.51704 NAN
3 2016-09-07 09:00:00.3 38651950 0.53 1.24 -0.24 19.48 19.31249 20.43269 19.3752 19.44648 NAN
4 2016-09-07 09:00:00.4 38651951 0.53 1.24 -0.24 19.48 19.30391 20.40221 19.33919 19.41596 NAN
5 2016-09-07 09:00:00.5 38651952 0.53 1.24 -0.24 19.48 19.24906 20.36079 19.31178 19.38068 NAN
6 2016-09-07 09:00:00.6 38651953 0.51 1.28 -0.28 19.44 19.20519 20.32008 19.30629 19.42693 NAN
Here is the code:
data <- read.csv(('TOA5_10815.raw_data5411_2016_09_07_0900.dat'),
header = FALSE,
dec = ",",
col.names = c("Time", "Record", "Ux", "Uy", "Uz", "Ts", "Tc", "Tn", "To", "Tp", "Tq"),
skip = 4)
Tc = data$Tc
sum = 0
m = 18000
j = 5
for (k in 1:(m-j)){
inner = (Tc[[k]]-Tc[[k-j]])
sum = sum + inner
}
final = 1/(m-j)*sum
Welcome to stackoverflow!
I would suggest you make a more reproducible example for your next questions here (see here).
To answer your question you can either to this in a for loop as you have been working on currently or in much more efficient way; using one type of apply functions (here: lapply). You can read more about these functions here.
Creating data set:
set.seed(1)
Tc<-rnorm(18000)
The lapply function. Note that we are starting on 6, since Tc[5] - Tc[c(5-5)] would just return Tc[5].
sum<-unlist(lapply(6:18000,function(x) Tc[x]-Tc[c(x-5)]))
Done!
Verifying the function by typing in console:
> head(sum)
[1] -0.1940146 0.3037857 1.5739533 -1.0194995 -0.6348962 2.3322496
> Tc[6]-Tc[1]
[1] -0.1940146
I have a .csv file with data like this:
RI Na Mg Al Si K Ca Ba Fe Type
1 1.51793 12.79 3.50 1.12 73.03 0.64 8.77 0.00 0.00 BWF
2 1.51643 12.16 3.52 1.35 72.89 0.57 8.53 0.00 0.00 VWF
3 1.51793 13.21 3.48 1.41 72.64 0.59 8.43 0.00 0.00 BWF
4 1.51299 14.40 1.74 1.54 74.55 0.00 7.59 0.00 0.00 TBL
5 1.53393 12.30 0.00 1.00 70.16 0.12 16.19 0.00 0.24 BWNF
6 1.51655 12.75 2.85 1.44 73.27 0.57 8.79 0.11 0.22 BWNF
I want to create histograms for the distribution of each of the columns.
I've tried this:
data<-read.csv("glass.csv")
names<-(attributes(data)$names)
for(name in names)
{
dev.new()
hist(data$name)
}
But i keep getting this error: Error in hist.default(data$name) : 'x' must be numeric
I'm assuming that this error is because attributes(data)$names returns a set of strings, "RI" "Na" "Mg" "Al" "Si" "K" "Ca" "Ba" "Fe" "Type"
But I'm unable to convert them to the necessary format.
Any help is appreciated!
You were close. I think you were also trying to get Type at the end.
data<-read.csv("glass.csv")
# names<-(attributes(data)$names)
names<-names(data)
classes<-sapply(data,class)
for(name in names[classes == 'numeric'])
{
dev.new()
hist(data[,name]) # subset with [] not $
}
You could also just loop through the columns directly:
for (column in data[class=='numeric']) {
dev.new()
hist(column)
}
But ggplot2 is designed for multiple plots. Try it like this:
library(ggplot2)
library(reshape2)
ggplot(melt(data),aes(x=value)) + geom_histogram() + facet_wrap(~variable)
Rather than drawing lots of histograms, a better solution is to draw one plot with histograms in panels.
For this, you'll need the reshape2 and ggplot2 packages.
library(reshape2)
library(ggplot2)
First, you'll need to convert your data from wide to long form.
long_data <- melt(data, id.vars = "Type", variable.name = "Element")
Then create a ggplot of the value argument (you can change the name of this by passing value.name = "whatever" in the call to melt above) with histograms in each panel, split by each element.
(histograms <- ggplot(long_data, aes(value)) +
geom_histogram() +
facet_wrap(~ Element)
)
hist(data$name) looks for a column named name, which isn't there. Use hist(data[,name]) instead.