Dear String-to-Integer Parsers

7 points | by billpg 4 days ago

8 comments

dataflow 3 hours ago
> I look forward to comments telling me how wrong I am.
Okay, I'll try to explain.
The job of a parser is to perform the opposite of stringification as precisely as possible. That string source is not always a human typing on a keyboard.
If your stringifier doesn't output scientific notation, then you reject scientific notation because it's invalid data, plain and simple. It's completely irrelevant that to a human it still looks like it could be some number. It's like picking up the wrong baby at the hospital. It might be lovely and all, but it wasn't what you were supposed to get, and that matters. It's a sign that someone screwed up, potentially badly, and your job is to hunt down where the bug was and how to address it, not just roll with whatever came your way and win some competition on how long you can ignore problems.
If you want to parse natural language then sure, it makes sense. That's what flags and different functions are good for. It doesn't mean that has to be the default. As far as programming languages go, the default assumption is that your Parse() is parsing whatever your ToString() produced on the same type.
[-]
- Someone 2 hours ago
  By that logic, I think most toString/fromString pairs are broken. Many “string to int” functions accept some strings that their counterpart “int to string” never produces, for example “-0”, “0000”, and “01”.
  (Chances are ‘fromString’ accepts strings with very long prefixes of zeroes)
  Having said that, I do think accepting “1E9” may go too far. If you accept that, why would you disallow “12E3”, “1.23E4”, “12300E-2”?
  If you’re going to widen what the function accepts, I would add support for hexadecimal (“0x34”) and binary (“0b1001”) before adding this.
king_geedorah 2 hours ago
I’m not sure why this is a “proposal” for other string to int parsers rather than a function the author wrote themselves. It seems rather trivial to implement on top of something like strtol (or whatever your language’s equivalent is).
sedatk 3 hours ago
“E” is also a plausible typo for numbers though because it’s next to the number row on the keyboard.
bombcar 3 hours ago
This is the kind of thing Services on MacOS used to be for - type 1e9 into a text box, hit command option control alt meta escape shift B and it converts it to 1000000000.
pwg 4 days ago
Why limit to a single digit integer for the mantissa? I might just as well want to input 243E9 to get 243 billion.
[-]
- billpg 4 days ago
  Keeping it simple. Once you've got the mantissa and exponent out, you can check your number is in range by a simple check that the exponent is within range.
  For a 32 bit signed integer the limit is 2E9. This means that the exponent is fine 0-8, or if the exponent is 9, then only if the mantissa is 1 or 2. This only works with a single digit mantissa.
  For adding more digits to the mantissa, while a robust range check can be done, it gets more complicated. String-to-Integer functions are very conservatively written pieces of code designed to be quick.
HappyPanacea 2 hours ago
> While we’re on the subject of hex numbers, I may be following this up with a proposal that “H” should mean “times 16 to the power of” in a similar style, but that’ll be for another day.
I like this idea but I think it should be "HE" for hexadecimal exponent.