How to use sprintf() function to format columns in R - r

I am a newbie and learning R. Please accept my apology for stupid questions.
I am Stuck with an error and unable to understand what it is exactly?
"Error in [.data.frame: undefined columns selected"
Can anyone help me with this, please?
When I use the below:
sprintf("%s", "Return (%)")
It worked fine and render the table for me correctly.
But If I use sprintf("%.1f", "Return (%)"), it gives the error with
"Error in sprintf: invalid format '%.1f'; use format %s for character objects"
Now, I converted the column in numeric with
sprintf("%.1f", as.numeric("Return (%)")).
Then I get the below error again:
"Error in [.data.frame: undefined columns selected"

R uses the extract operator to access information within an object. Since a data frame is also an R object, we use the extract operator to access columns in a data frame.
The first sprintf() function fails because it interprets "Return (%)" as text, and therefore generates an invalid format error.
The second sprintf() function fails because as.numeric("Return (%)") is parsed as an incomplete reference to a column in a data frame.
To extract columns from a data frame, multiple forms of the extract operator can be used:
1. dataframe[rows,columns]
2. dataframe[,"columnName"]
3. dataframe[["columnName"]]
4. dataframe$columnName
Since columns can be reference by name or position (from left to right),
one way to solve the problem described in the question is to use the column number with the extract operator to reference the percentage return column, as follows.
textFile <- "symbol,Return (%)
AAPL,23.1
GOOG,30.5
IBM,11.8
MSFT,14"
stockReturns <- read.csv(text=textFile)
# version that works: Return (%) is second column in data frame
sprintf("%.1f",stockReturns[,2])
...and the output:
> sprintf("%.1f",stockReturns[,2])
[1] "23.1" "30.5" "11.8" "14.0"
>
Why did the error occur?
When we print the data frame, we see how R parsed the column name when reading the header via read.csv().
stockReturns
> stockReturns
symbol Return....
1 AAPL 23.1
2 GOOG 30.5
3 IBM 11.8
4 MSFT 14.0
>
Interesting! R converted the space and special characters to ..... Now let's try sprintf() again with how R interpreted the column name, and the [[ form of the extract operator.
> sprintf("%.1f", stockReturns[["Return...."]])
[1] "23.1" "30.5" "11.8" "14.0"
>
Now that we know the problem is caused by special characters in the column name, we can use the colnames() function to rename the Return (%) column name to something we can more easily reference in R code. This allows one to use multiple forms of the extract operator to access the column from a data frame.
colnames(stockReturns) <- c("symbol","return_pct")
sprintf("%.1f",stockReturns[["return_pct"]])
sprintf("%.1f",stockReturns$return_pct)
sprintf("%.1f",stockReturns[,"return_pct"])
...and the output:
> colnames(stockReturns) <- c("symbol","return_pct")
> sprintf("%.1f",stockReturns[["return_pct"]])
[1] "23.1" "30.5" "11.8" "14.0"
> sprintf("%.1f",stockReturns$return_pct)
[1] "23.1" "30.5" "11.8" "14.0"
> sprintf("%.1f",stockReturns[,"return_pct"])
[1] "23.1" "30.5" "11.8" "14.0"
>
Note: this answer references content from my blog article, Forms of the Extract Operator.

Related

Replacing all semicolons with a space pt2

Im trying to run text analysis on a list of 2000+ rows of keywords, but they are listed like
"Strategy;Management Styles;Organizations"
So when I use tm to remove punctuation it becomes
"StrategyManagement StylesOrganizations"
and I assume this breaks my frequently used terms analysis some how.
Ive tried using
vector<-gsub(';', " ",vector)
but this takes my vector data "List of 2000" and makes it a value, with the description "Large character (3 elements)" when I inspected this Value it gave me a really long list of words and stuff which took forever to load! Any ideas what Im doing wrong?
Should I use gsub on my vector or on my corpus? They are just
vector<-VectorSource(dataset$Keywords)
corpus<-VCorpus(vector)
I tried using
inspect(corpus[[1]])
on my corpus after using gsub to make it a value, but I got error "no applicable method for 'inspect' applied to an object of class "character""
You need to split the data into a vector of strings, one of the ways to do this is by using stringr package as follows;
library(tm)
library(stringr)
vector <- c("Strategy;Management Styles;Organizations")
keywords <- unlist(stringr::str_split(vector, ";"))
vector <- VectorSource(keywords)
corpus <- VCorpus(vector)
inspect(corpus[[1]])
#<<PlainTextDocument>>
# Metadata: 7
#Content: chars: 8
#Strategy
Maybe you can try strsplit
X <- c("Global Mindset;Management","Auditor;Accounting;Selection Process","segmantation;banks;franchising")
res <- Map(function(v) unlist(strsplit(v,";")),X)
such that
> res
$`Global Mindset;Management`
[1] "Global Mindset" "Management"
$`Auditor;Accounting;Selection Process`
[1] "Auditor" "Accounting" "Selection Process"
$`segmantation;banks;franchising`
[1] "segmantation" "banks" "franchising"

R is changing my variable value by itself

I have a dataframe that has an id field with values as these two:
587739706883375310
587739706883375408
The problem is that, when I ask R to show these two numbers, the output that I get is the following:
587739706883375360
587739706883375360
which are not the real values of my ID field, how do I solve that?
For your information: I have executed options(scipen = 999) to R does not convert my number to a scientific notation.
This problem also happens in R console, if I enter these examples numbers I also get the same printing as shown above.
EDIT: someone asked
dput(yourdata$id)
I did that and the result was:
c(587739706883375360, 587739706883375360, 587739706883375488, 587739706883506560, 587739706883637632, 587739706883637632, 587739706883703040)
To compare, the original data in the csv file is:
587739706883375310,587739706883375408,587739706883375450,587739706883506509,587739706883637600,587739706883637629,587739706883703070
I also did the following test with one of these numbers:
> 587739706883375408
[1] 587739706883375360
> as.double(587739706883375408)
[1] 587739706883375360
> class(as.double(587739706883375408))
[1] "numeric"
> is.double(as.double(587739706883375408))
[1] TRUE
You can use the bit64 package to represent such large numbers:
library(bit64)
as.integer64("587739706883375408")
# integer64
# [1] 587739706883375408
as.integer64("587739706883375408") + 1
# integer64
# [1] 587739706883375409

Paste function to construct existing data frame name and evaluate in R

I am working with a long list of data frames.
Here is a simple hypothetical example of a data frame:
DFrame<-data.frame(c(1,0),c("Yes","No"))
colnames(DFrame)<-c("ColOne","ColTwo")
I am trying to retrieve a specified column of the data frame using paste function.
get(paste("DFrame","$","ColTwo",sep=""))
The get function returns the following error, when trying to retrieve a specified column:
Error in get(paste("DFrame", "$", "ColTwo", sep = "")) :object 'DFrame$ColTwo' not found
When I enter the constructed name of the data frame DFrame$ColTwo it returns the desired output of the second column.
If I reconstruct an example without the '$' sign then I get the desired answer from the get function. For example the code yields 2:
enter code here
Ans <- 2
get(paste("An","s",sep=""))
[1] 2
I am looking for the same desired outcome, but struggling to get past the error that the object could not be found.
I also attempted using the following format, but the quotation in the column name breaks the paste function:
paste("DFrame","[,"ColTwo"]",sep="")
Thank you very much for the input,
Kind regards
You can do that using the following syntax:
get("DFrame")[,"ColTwo"]
You can use paste() in both of these strings, for example:
get(paste("D", "Frame", sep=""))[,paste("Col", "Two", sep="")]
Edit: Despite someone downvoting this answer without leaving a comment, this does exactly what the original poster asked for. If you feel that it does not or is in some way dangerous, I would encourage you to leave a comment.
Stop trying to use paste and get entirely.
The whole point of having a list (of data frames, say) is that you can reference them using names:
DFrame<-data.frame(c(1,0),c("Yes","No"))
colnames(DFrame)<-c("ColOne","ColTwo")
#A list of data frames
l <- list(DFrame,DFrame)
#The data frames in the list can have names
names(l) <- c("DF1",'DF2')
# Now you just use `[[`
> l[["DF1"]][["ColOne"]]
[1] 1 0
> l[["DF1"]][["ColTwo"]]
[1] Yes No
Levels: No Yes
If you have to, you can use paste to construct the indices passed inside [[.

How to call expression result after paste command in R?

I want to get a cell value after dynamically passing its address. So I am trying paste command to join the address of the cell like following:
paste0("DT1$", eval(cols[1]),"[1]")
where DT1 is datatable, cols[1] is refering to 1 column and [1] is first row of that column. While running this I am getting the string(address of the cell):
> paste0("DT1$", eval(cols[1]),"[1]")
[1] "DT1$BCC1[1]"
But I want the value of the cell like if I run:
> DT1$BCC1[1]
[1] 0
So how to run call the result of the paste expression to get value of cell like "0" in previous example. I tried eval() and do.call(), but nothing seems to be working. I am sorry for this basic question as I am new to R. Any help is really appreciated.
You can use eval(), but you have to parse the string "DT1$BCC1[1]" first:
str <-paste0("DT1$", eval(cols[1]),"[1]")
eval(parse(text = str))
The $ dollar is suitable for console use(partial name matching). You should Use the subsetting [ operator.
For example you can call it like this :
DT1[1,cols[1]]
Ore more general :
x= 1
y = "BCC1"
DT1[x,y]
Note that DT1 that here is a data.frame not a data.table. You can do the same thing with a data.table:
DT1[x,y,with=F]

Compute Column in R

What is the difference between the two statements below. They are rendering different outcomes, and since I am trying to come to R from SPSS, I am a little confused.
ds$share.all <- ds[132]/ ds[3]
mean(ds$share.all, na.rm=T)
and
ds$share.all2 <- ds$col1/ ds$Ncol2
mean(ds$share.all2, na.rm=T)
they render the same mean, but on the first, the output is printed as
col1
0.02669424
and the second only prints the .02xxxxx.
Any help will be much appreciated.
Indicating a column of a data frame with single brackets (your first example) produces a data frame with just that column, but using the $ operator (as in your second example) is just a vector. Printing something will print the names associated with it if it has names (the col1 in your first example). The data frame you get with ds[132] has a name attribute, but the vector you get with ds$col1 does not. The equivalent of ds$col1 would be to use double instead of single brackets: ds[[132]]. For example:
> x<-data.frame(1:10)
> names(x)<-"var"
> class(x$var)
[1] "integer"
> class(x[1])
[1] "data.frame"
> identical(x[1],x$var)
[1] FALSE
> identical(x[[1]],x$var)
[1] TRUE

Resources