Need some help for my math python project [duplicate] - math

This question already has answers here:
How can I read inputs as numbers?
(10 answers)
Closed 6 months ago.
I have this simple python program. I ran it and it prints yes, when in fact I expect it to not print anything because 14 is not greater than 14.
I saw this related question, but it isn't very helpful.
#! /usr/bin/python
import sys
hours = "14"
if (hours > 14):
print "yes"
What am I doing wrong?

Convert the string to an integer with int:
hours = int("14")
if (hours > 14):
print "yes"
In CPython2, when comparing two non-numerical objects of different types, the comparison is performed by comparing the names of the types. Since 'int' < 'string', any int is less than any string.
In [79]: "14" > 14
Out[79]: True
In [80]: 14 > 14
Out[80]: False
This is a classic Python pitfall. In Python3 this wart has been corrected -- comparing non-numerical objects of different type raises a TypeError by default.
As explained in the docs:
CPython implementation detail: Objects of different types except
numbers are ordered by their type names; objects of the same types
that don’t support proper comparison are ordered by their address.

I think the best way is to convert hours to an integer, by using int(hours) .
hours = "14"
if int(hours) > 14:
print("yes")```

Related

R extract operator: [ vs $ [duplicate]

This question already has answers here:
What is the meaning of the dollar sign "$" in R function()?
(2 answers)
Closed 2 years ago.
There are multiple post in the internet regarding the differences and similarities about [ and $. I see some post where $ is recommended only for interactive use but not for programming. However, I am not sure I understand if this is a preference or there is an explanation behind this idea.
Now lets say I am writing a package or function, if I am extracting an element by name (e.g., mtcars[["mpg"]]) why I should avoid using mtcars$mpg?
There are two differences that really matter between [[ and $:
[[ - works with strings (i.e. it supports variable substitution), $ doesn't. If you have my_var = "mpg", you can use mtcars[[my_var]], but there isn't a good way to use my_var with $.
$ auto-completes, if a partial column name is unambiguous. mtcars$m will return the mpg column, mtcars[["m"]] will return NULL. mtcars$d will return NULL because multiple columns start with a "d".
#1 makes [[ more flexible for programming - it's extremely common in programmatic use to be working with column names stored as strings.
#2 makes $ more dangerous - you should not use abbreviated column names in programming, however in interactive use it can be nice and quick. (Though this is largely moot with RStudio's auto-completion features, if you use that IDE.)
$ does partial matching: if you have a column named xxx in a dataframe dat, then dat$xx will return the xxx column (unless you also have a xx column). This can be dangerous.
I always use [["..."]] for another reason: I use RStudio, and there is a nice highlighting for strings, whereas there's no highlighting with $.

R is adding extra numbers while reading file

I have been trying to read a file which has date field and a numeric field. I have the data in an excel sheet and looks something like below -
Date X
1/25/2008 0.0023456
12/23/2008 0.001987
When I read this in R using the readxl::read_xlsx function, the data in R looks like below -
Date X
1/25/2008 0.0023456000000000
12/23/2009 0.0019870000000000
I have tried limiting the digits using functions like round, format (nsmall = 7), etc. but nothing seems to work. What am I doing wrong? I also tried saving the data as a csv and a txt and read it using read.csv and read.delim but I face the same issue again. Any help would be really appreciated!
As noted in the comments to the OP and the other answer, this problem is due to the way floating point math is handled on the processor being used to run R, and its interaction with the digits option.
To illustrate, we'll create an Excel spreadsheet with the data from the OP, and demonstrate what happens as we adjust the options(digits=) option.
Next, we'll write a short R script to illustrate what happens when we adjust the digits option.
> # first, display the number of significant digits set in R
> getOption("digits")
[1] 7
>
> # Next, read data file from Excel
> library(xlsx)
>
> theData <- read.xlsx("./data/smallNumbers.xlsx",1,header=TRUE)
>
> head(theData)
Date X
1 2008-01-25 0.0023456
2 2008-12-23 0.0019870
>
> # change digits to larger number to replicate SO question
> options(digits=17)
> getOption("digits")
[1] 17
> head(theData)
Date X
1 2008-01-25 0.0023456000000000002
2 2008-12-23 0.0019870000000000001
>
However, the behavior of printing significant digits varies by processor / operating system, as setting options(digits=16) results in the following on a machine running an Intel i7-6500U processor with Microsoft Windows 10:
> # what happens when we set digits = 16?
> options(digits=16)
> getOption("digits")
[1] 16
> head(theData)
Date X
1 2008-01-25 0.0023456
2 2008-12-23 0.0019870
>
library(formattable)
x <- formattable(x, digits = 7, format = "f")
or you may want to add this to get the default formatting from R:
options(defaultPackages = "")
then, restart your R.
Perhaps the problem isn't your source file as you say this happens with .csv and .txt as well.
Try checking to see the current value of your display digits option by running options()$digits
If the result is e.g. 14 then that is likely the problem.
In which case, try running r command options(digits=8) which will set the display digits=8 for the session.
Then, simply reprint your dataframe to see the change has already taken effect with respect to how the decimals are displayed by default to the screen.
Consult ?options for more info about digits display setting and other session options.
Edit to improve original answer and to clarify for future readers:
Changing options(digits=x) either up or down does not change the value that is stored or read into into internal memory for floating point variables. The digits session option merely changes how the floating point values print i.e. display on the screen for common print functions per the '?options` documentation:
digits: controls the number of significant digits to print when printing numeric values.
What the OP showed as the problem he was having (R displaying more decimals after last digit in a decimal number than the OP expected to see) was not caused by the source file having been read from Excel - i.e. given the OP had the same problem with CSV and TXT the import process didn't cause a problem.
If you are seeing more decimals than you want by default in your printed/displayed output (e.g. for dataframes and numeric variables) try checking options()$digits and understand that option is simply the default for the number of digits used by R's common display and printing methods. HOWEVER, it does not affect floating point storage on any of your data or variables.
Regarding floating point numbers though, another answer here shows how setting option(digits=n) higher than the default can help demonstrate some precision/display idiosyncrasies that are related to floating point precision. That is a separate problem to what the OP displayed in his example but it's well worth understanding.
For a much more detailed and topic specific discussion of floating point precision than would be appropriate to rehash here, it's well worth reading this definitive SO question+answer: Why are these numbers not equal?
That other question+answer+discussion covers issues specifically around floating point precision and contains a long, well presented list of references that you will find helpful if you need more information on the subject.

Replace words that start with a period [duplicate]

This question already has answers here:
What special characters must be escaped in regular expressions?
(13 answers)
Closed 5 years ago.
I'm trying to fix a dataset that has some errors of decimal numbers wrongly typed. For example, some entries were typed as ".15" instead of "0.15". Currently this column is chr but later I need to convert it to numeric.
I'm trying to select all of those "words" that start with a period "." and replace the period with "0." but it seems that the "^" used to anchor the start of the string doesn't work nicely with the period.
I tried with:
dataIMN$precip <- str_replace (dataIMN$precip, "^.", "0.")
But it puts a 0 at the beginning of all the entries, including the ones that are correctly typed (those that don't start with a period).
If you need to do as you've stated, brackets [] are regex for 'find exact', or you can use '\\' which escapes a character, such as a period:
Option 1:
gsub("^[.]","0.",".54")
[1] "0.54"
Option 2:
gsub("^\\.","0.",".54")
[1] "0.54"
Otherwise, as.numeric should also take care of it automatically.

Scientific notation issue in R

I have an ID variable with 20 digits. Once i read the data in R , it changes to Scientific notation and then if i write the same id to csv file, the value of ID changes.
For example , running the below code should print me the value of x as "12345678912345678912",but it prints "12345678912345679872":
Code:
options(scipen=999)
x <- 12345678912345678912
print(x)
Output:
[1] 12345678912345679872
My questions are :
1) Why it is happening ?
2) How to fix this problem ?
I know it has to do with the storage of data types in R but still i think there should be some way to deal with this problem. I hope i am clear with this question.
I don't know if this question was asked or not in so point me to a link if its a duplicate.I will remove this post
I have gone through this, so i can relate with the issue of mine, but i am unable to fix it.
Any help would be highly appreciated. Thanks
R does not by default handle integers numerically larger than 2147483647L.
If you append an L to your number (to tell R its an integer), you get:
x <- 12345678912345678912L
#Warning message:
#non-integer value 12345678912345678912L qualified with L; using numeric value
This also explains the change of the last digits as R stores the number as a double.
I think the gmp-package should be able to handle large numbers in general. You should therefore either accept the loss of precision, store them as character stings, or use a data-type from the gmp package.
To circumvent the problem due to number storing/representation, you can import your ID variable directly as character with the option colClasses, for example, if using read.csv and importing a data.frame with the ÌD column and another numeric column:
mydata<-read.csv("file.csv",colClasses=c("character","numeric"),...)
Using readr you can do
mydata <- readr::read_csv("file.csv", col_types = list(ID=col_character()))
where "ID" is the name of your ID column

Variable name restrictions in R

What are the restrictions as to what characters (and maybe other restrictions) can be used for a variable name in R?
(This screams of general reference, but I can't seem to find the answer)
You might be looking for the discussion from ?make.names:
A syntactically valid name consists of letters, numbers and the dot or
underline characters and starts with a letter or the dot not followed
by a number. Names such as ".2way" are not valid, and neither are the
reserved words.
In the help file itself, there's a link to a list of reserved words, which are:
if else repeat while function for in next break
TRUE FALSE NULL Inf NaN NA NA_integer_ NA_real_ NA_complex_
NA_character_
Many other good notes from the comments include the point by James to the R FAQ addressing this issue and Josh's pointer to a related SO question dealing with checking for syntactically valid names.
Almost NONE! You can use 'assign' to make ridiculous variable names:
assign("1",99)
ls()
# [1] "1"
Yes, that's a variable called '1'. Digit 1. Luckily it doesn't change the value of integer 1, and you have to work slightly harder to get its value:
1
# [1] 1
get("1")
# [1] 99
The "syntactic restrictions" some people might mention are purely imposed by the parser. Fundamentally, there's very little you can't call an R object. You just can't do it via the '<-' assignment operator. "get" will set you free :)
The following may not directly address your question but is of great help.
Try the exists() command to see if something already exists and this way you know you should not use the system names for your variables or function.
Example...
> exists('for')
[1] TRUE
>exists('myvariable')
[1] FALSE
Using the make.names() function from the built in base package may help:
is_valid_name<- function(x)
{
length_condition = if(getRversion() < "2.13.0") 256L else 10000L
is_short_enough = nchar(x) <= length_condition
is_valid_name = (make.names(x) == x)
final_condition = is_short_enough && is_valid_name
return(final_condition)
}

Resources