I have a data in excel in the format as shown below. The user may add comments to the score column using 'Insert Comment' option in excel. I would like to extract the comments added to scores column and put it in the column 'Comments'. Is this possible? Can you please help?
Report Component Score Comments
R1 C1 1
R2 C2 2
R3 C3 3
R4 C4 4
R5 C5 5
Here is the code I have written so far. Not sure how to proceed further. PLease help.
require(readxl)
read_excel("Testfile01.xlsx")
I have yet to see this kind of functionality in read_excel, but in the mean time, you could perhaps write comments into cell content using just prior to importing the file into R.
From ExtendOffice:
Function GetComments(pRng As Range) As String
'Updateby20140509
If Not pRng.Comment Is Nothing Then
GetComments = pRng.Comment.Text
End If
End Function
You can then use GetComments function, e.g. =GetComments(A1).
Related
I have already loaded the data in R from various sheets, below you can see the names of the sheets. Below you can see the data and code:
df<-data.frame(
Tables=c("Export_A1.xlsx","Export_A2.xlsx","Export_A10.xlsx"))
So now I want to extract specific names, or in other words, I want to remove the text "Export_" and ".xlsx". Below you can see an example
So can anybody help me how to solve this problem ?
You can use the following code:
df$Tables <- gsub(".*[_]([^.]+)[.].*", "\\1", df$Tables)
df
Output:
Tables
1 A1
2 A2
3 A10
I have a data frame in which the first column is email addresses and the rest of the columns are statements that the person agreed with. I want to filter out only the rows in which the person agreed with nothing. If they agree with one thing they stay in (so below only doug's row would be removed.
Email C1 C2 C3
q#gmail.com agree agree
bob#gmail.com agree
doug#gmail.com]
I tried this in excel without luck. I have not seen R code that can fix it.
One base R solution using rowSums() to count non blank cells per row
df <- df[rowSums(df != "") > 1, ]
> df
Email C1 C2 C3
1 q#gmail.com agree agree
2 bob#gmail.com agree
I'm currently doing this in Perl, and I'd like to find a more efficient/faster way to do it. Any advice is appreciated!
What I'm trying to do is to extract certain data from a csv/xlsx file and write them into Excel so that Bloomberg can read.
Here is an example of the csv file:
Account.Name Source.Number Source.Name As.Of.Date CUSIP.ID Value
AR PSF30011202 DK 3/31/2016 111165194 100.00
AR PSF30011602 MOF 3/31/2016 11VVA0WE4 150.00
AR PSF30014002 OZM 3/31/2016 11VVADWF3 125.00
FI PSF30018502 FS 3/31/2016 11VVA2625 170.00
FI PSF30018102 IP 3/31/2016 11VVAFPH2 115.00
....
What I want to have in the Excel file is that if Account.Name = AR, then:
Cell A1 =Source.Name. E.g. DK.
Cell A2 =weight of Value. E.g. the weight of DK is 0.151515 (100/660).
Cell A3 = =BDH("CUSIP.ID CUSIP","PX_LAST","01/01/2000","As.Of.Date","PER=CM"). E.g. =BDH("111165194 CUSIP","PX_LAST","01/01/2000","03/31/2016","PER=CM")
Cell D1 =MOF
Cell D2 =0.227273
Cell D3 = =BDH("11VVA0WE4 CUSIP","PX_LAST","01/01/2000","03/31/2016","PER=CM")
There are two columns in between because if DK's CUSIP is valid, then A3 and after would be the dates; B3 and after would contain monthly price from Bloomberg; C4 and after will be the log returns of monthly prices (=LN(B4/B3)).
Below is what it should look like:
I don't know anything about Pearl and I'm not sure what you are doing, but it looks like you are getting stock prices. Is that right. Maybe you can download what you need from Yahoo.finance and get rid of Bloomberg altogether. Take a look at the link below and see if it helps you get what you need.
http://www.financialwisdomforum.org/gummy-stuff/Yahoo-data.htm
I apologize in advance for the somewhat lack of reproducibility here. I am doing an analysis on a very large (for me) dataset. It is from the CMS Open Payments database.
There are four files I downloaded from that website, read into R using readr, then manipulated a bit to make them smaller (column removal), and then stuck them all together using rbind. I would like to write my pared down file out to an external hard drive so I don't have to read in all the data each time I want to work on it and doing the paring then. (Obviously, its all scripted but, it takes about 45 minutes to do this so I'd like to avoid it if possible.)
So I wrote out the data and read it in, but now I am getting different results. Below is about as close as I can get to a good example. The data is named sa_all. There is a column in the table for the source. It can only take on two values: gen or res. It is a column that is actually added as part of the analysis, not one that comes in the data.
table(sa_all$src)
gen res
14837291 822559
So I save the sa_all dataframe into a CSV file.
write.csv(sa_all, 'D:\\Open_Payments\\data\\written_files\\sa_all.csv',
row.names = FALSE)
Then I open it:
sa_all2 <- read_csv('D:\\Open_Payments\\data\\written_files\\sa_all.csv')
table(sa_all2$src)
g gen res
1 14837289 822559
I did receive the following parsing warnings.
Warning: 4 parsing failures.
row col expected actual
5454739 pmt_nature embedded null
7849361 src delimiter or quote 2
7849361 src embedded null
7849361 NA 28 columns 54 columns
Since I manually add the src column and it can only take on two values, I don't see how this could cause any parsing errors.
Has anyone had any similar problems using readr? Thank you.
Just to follow up on the comment:
write_csv(sa_all, 'D:\\Open_Payments\\data\\written_files\\sa_all.csv')
sa_all2a <- read_csv('D:\\Open_Payments\\data\\written_files\\sa_all.csv')
Warning: 83 parsing failures.
row col expected actual
1535657 drug2 embedded null
1535657 NA 28 columns 25 columns
1535748 drug1 embedded null
1535748 year an integer No
1535748 NA 28 columns 27 columns
Even more parsing errors and it looks like some columns are getting shuffled entirely:
table(sa_all2a$src)
100000000278 Allergan Inc. gen GlaxoSmithKline, LLC.
1 1 14837267 1
No res
1 822559
There are columns for manufacturer names and it looks like those are leaking into the src column when I use the write_csv function.
I want to read a file and calculate the mean of it.
`>list
[1] "book1.csv" "book2.csv".
for book1
observation1
23
24
65
76
34
In books i have a variable observation 1 and observation 2 column for book 1 and 2 respectively. So i want to write a function where i can calculate mean of it.I am new to R and not able subset the variable of books. Can anyone please help me out in writing the function?
Try this. File represents the file to be read in (book1) and the variable represents the variable to take mean over (observation 1)
read.mean<-function(file,variable){
df<-read.csv(file)
mean.df <- mean(df[,variable])
return(mean.df)
}
Make sure to pass your arguments in quotes, i.e. read.mean("book1", "observation1"). There is a way to do it without the quotes (Passing a variable name to a function in R) but it is complicated.