Changing data type in data frame

Changing data type in data frame - r

I have some data tables in Excel spreadsheets which I am using in R. Some of the tables store numbers as text i.e. numeric values are stored as characters.
To clarify, it is not a formatting that is a problem but numbers themselves. The Excel (and R) sees such numbers as characters such as letters, rather then numbers.
Because formatting seems to be an issue, addStyle function in openxlsx did not work for me.
After some googling, I've decided to try and write a for loop that will check each value individually.I wrote a nested for loop that checks each value and overwrites it if it is a number (code is below).This seems to work logically but values do not get overwritten i.e. values that were stored as text are still there.
library(readxl)
library(openxlsx)
wb<-loadWorkbook(choose.files())
data0<-as.data.frame(read_excel(choose.files(),sheet=1,range = "B1:E1131"))
data<-data0
for(i in 1:ncol(data)){
for(j in 1:nrow(data)){
if(is.numeric(as.numeric(data[j,i]))&&!is.na(as.numeric(data[j,i]))){
data[j,i]<-as.numeric(data[j,i])
}
}
}
Desired outcome:
I would like to change data in column "Expenses" (in a picture below) to data in a column to its right via R.

coming from my comment:
You can use the col_types-argument in the readxl::read_excel()-function to force reading of text/numeric/date/... data

Related

Exporting multiple R data frames to a single Excel sheet

I would like to export multiple data frames from R to a single Excel sheet. By using the following code:
write.xlsx(DF1, file="C:\\Users\\Desktop\\filename.xlsx", sheetName="sheet1",
col.names=TRUE, row.names=TRUE, append=FALSE)
write.xlsx(DF2, file="C:\\Users\\Desktop\\filename.xlsx", sheetName="sheet2",
col.names=TRUE, row.names=TRUE, append=TRUE)
I can export two data frames to a single excel workbook, but in two separate sheets. I would like to export them in a single sheet, andif possible, to determine the specific cells that these data frames will be placed in.
Any suggestions more than welcome.

This is not a ready to use answer but this should get you to your target. It would be a mess to write it into a comment.
Create the combined df with the tools of R
Write df to excel
a few notes to point 1.:
vertical offset the second df from the first by using Reduce(rbind,c(list(mtcars),rep(list(NA),3))) for a 3 cell offset for e.g.
rbind the colnames to your df rbind(names(mtcars),mtcars)
use numbers as colnames for so you will not have a problem rbinding different df with different variables. names(mtcars) <- seq_along(mtcars)
To point 2.:
Since your colnames are numbers now make sure you have your colnames set as FALSE.
Hope this helps and you can get your desired output.

Following most of your suggestions I realized that by using cbind.data.frame I get an output which is not optimal, but the amount of time that I need to restructure the data in EXCEL is really insignificant. So, I will proceed with this for the time being.
Thanks

I can't comment yet, so I'll provide my input here:
Using write.xlsx in R, how to write in a Specific Row or column in excel file
In that link it is suggested to organize your data in a single data frame to then write that into the excel sheet. You should have a look at that.
as slackline suggested, this is quite easy if your columns or rows are the same, using his suggested methods
Edit: To add spaces in between, just insert empty columns in between before writing

R studio numeric integer display format options

I don't want the display format like this: 2.150209e+06
the format I want is 2150209
because when I export data, format like 2.150209e+06 caused me a lot of trouble.
I did some search found this function could help me
formatC(numeric_summary$mean, digits=1,format="f").
I am wondering can I set options to change this forever? I don't want to apply this function to every variable of my data because I have this problem very often.
One more question is, can I change the class of all integer variables to numeric automatically? For integer format, when I sum the whole column usually cause trouble, says "integer overflow - use sum(as.numeric(.))".
I don't need integer format, all I need is numeric format. Can I set options to change integer class to numeric please?

I don't know how you are exporting your data, but when I use write.csv with a data frame containing numeric data, I don't get scientific notation, I get the full number written out, including all decimal precision. Actually, I also get the full number written out even with factor data. Have a look here:
df <- data.frame(c1=c(2150209.123, 10001111),
c2=c('2150209.123', '10001111'))
write.csv(df, file="C:\\Users\\tbiegeleisen\\temp.txt")
Output file:
"","c1","c2"
"1",2150209.123,"2150209.123"
"2",10001111,"10001111"
Update:
It is possible that you are just dealing with a data rendering issue. What you see in the R console or in your spreadsheet does not necessarily reflect the precision of the underlying data. For instance, if you are using Excel, you highlight a numeric cell, press CTRL + 1 and then change the format. You should be able to see full/true precision of the underlying data. Similarly, the number you see printed in the R console might use scientific notation only for ease of reading (SN was invented partially for this very reason).

Thank you all.
For the example above, I tried this:
df <- data.frame(c1=c(21503413542209.123, 10001111),
c2=c('2150209.123', '100011413413111'))
c1 in df is scientific notation, c2 is not.
then I run write.csv(df, file="C:\Users\tbiegeleisen\temp.txt").
It does out put all digits.
Can I disable scientific notation in R please? Because, it still cause me trouble, although it exported all digits to txt.
Sometimes I want to visually compare two big numbers. For example, if I run
df <- data.frame(c1=c(21503413542209.123, 21503413542210.123),
c2=c('2150209.123', '100011413413111'))
df will be
c1 c2
2.150341e+13 2150209.123
2.150341e+13 100011413413111
The two values for c1 are actually different, but I cannot differentiate them in R, unless I exported them to txt. The numbers here are fake numbers, but the same problem I encounter very day.

R: referencing a character string

Is it possible to feed R a character string and for it to know I'm looking for the data frame with that name?
Example:
TestData <- matrix(1:100,nrow=10, ncol=10)
Then, if I want to reference it later, can I do something similar to this to have R pull dataset?
paste("TestData$",x[1,],sep="")
When entered this way, it comes up as a character string and obviously returns no data. For context, I'm trying to do it this way because I'm creating a loop that goes through several data sets (and columns within those datasets), but does similar operations, so I would like to be able to dynamically change the referenced dataset.
Any help is appreciated. Thanks!

Strangeness with filtering in R and showing summary of filtered data

I have a data frame loaded using the CSV Library in R, like
mySheet <- read.csv("Table.csv", sep=";")
I now can print a summary on that mySheet object
summary(mySheet)
and it will show me a summary for each column, for example, one column named Diagnose has the unique values RCM, UCM, HCM and it shows the number of occurences of each of these values.
I now filter by a diagnose, like
subSheet <- mySheet[mySheet$Diagnose=='UCM',]
which seems to be working, when I just type subSheet in the console it will print only the rows where the value has been matched with 'UCM'
However, if I do a summary on that subSheet, like
summary(subSheet)
it still 'knows' about the other two possibilities RCM and HCM and prints those having a value of 0. However, I expected that the new created object will NOT know about the possible values of the original mySheet I initially loaded.
Is there any way to get rid of those other possible values after filtering? I also tried subset but this one just seems to be some kind of shortcut to '[' for the interactive mode... I also tried DROP=TRUE as option, but this one didn't change the game.
Totally mind squeezing :D Any help is highly appreciated!

What you are dealing with here are factors from reading the csv file. You can get subSheet to forget the missing factors with
subSheet$Diagnose <- droplevels(subSheet$Diagnose)
or
subSheet$Diagnose <- subSheet$Diagnose[ , drop=TRUE]
just before you do summary(subSheet).
Personally I dislike factors, as they cause me too many problems, and I only convert strings to factors when I really need to. So I would have started with something like
mySheet <- read.csv("Table.csv", sep=";", stringsAsFactors=FALSE)

nested for loops in R to parse csv files?

Edit: I've corrected the typo in the coding (copy and paste error). I can't add an example of the csv files, as its too complex to model in a simple example (I tried..)
I've spent hours looking through similarly titled questions to solve a for loop problem in R, and have tried a lot of different approaches, but I'm having no luck.
I have many different csv files, each of which has a set of 10 separate strings (variables) identifying a specific row (e.g., names = c("Delta values", "Scream factor", "nightmare mode"). Two rows below such a string, I need the max value of that row of data. I can create loops scanning files for such a value in single csv files using the following
test files-
test1.csv, test2.csv, test3.csv test4.csv
names<-list.files(pattern=".csv")
DF <- NULL
for (i in names){
dat <- read.csv(i, header=FALSE, stringsAsFactors=FALSE)
index <- which(dat=="Delta values", arr.ind=TRUE)
row=as.numeric(rownames(dat)[index[1]])
aver=dat[row+2,]
p=max(na.omit(as.numeric(aver)))
DF=rbind(DF, p)
colnames(DF)=dat[index]}
However, my problem comes in trying to generalize it, so that I get a data frame returned indicating the file each value was retrieved from as a row (not "p") and looping over the files so that I can retrieve the next several variables, while appending to the same data frame so that I end up with a data frame listing by row the filename the variable was derived from, and each variable listed in a separate column.
I'm pretty sure I need a nested loop listing the values I want to retrieve as calculated by "p" but I can't find any good examples describing how to iteratively loop using such an approach, and append the new variables to the growing data frame while staying consistent with the row numbering by file.
please help!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Changing data type in data frame - r

coming from my comment: You can use the col_types-argument in the readxl::read_excel()-function to force reading of text/numeric/date/... data

Related

Exporting multiple R data frames to a single Excel sheet

R studio numeric integer display format options

R: referencing a character string

Strangeness with filtering in R and showing summary of filtered data

nested for loops in R to parse csv files?

Categories

Resources