Could you tell me why '?\?'=='?\?'
gives True
? That drives me crazy and I can’t find a reasonable answer…
>>> list('?\?') ['?', '\', '\', '?'] >>> list('?\\?) ['?', '\', '\', '?']
Basically, because python is slightly lenient in backslash processing. Quoting from https://docs.python.org/2.0/ref/strings.html :
Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string.
(Emphasis in the original)
Therefore, in python, it isn’t that three backslashes are equal to four, it’s that when you follow backslash with a character like ?
, the two together come through as two characters, because ?
is not a recognized escape sequence.
Because x
in a character string, when x
is not one of the special backslashable characters like n
, r
, t
, 0
, etc, evaluates to a string with a backslash and then an x
.
>>> '?' '\?'
From the python lexical analysis page under string literals at: https://docs.python.org/2/reference/lexical_analysis.html
There is a table that lists all the recognized escape sequences.
\ is an escape sequence that is ===
? is not an escape sequence and is === ?
so ‘\\’ is ‘\’ followed by ‘\’ which is ‘\’ (two escaped )
and ‘\’ is ‘\’ followed by ” which is also ‘\’ (one escaped and one raw )
also, is should be noted that python does not distinguish between single and double quotes surrounding string literal, unlike some other languages.
So ‘String’ and “String” are the exact same thing in python
This is because backslash acts as an escape character for the character(s) immediately following it, if the combination represents a valid escape sequence. The dozen or so escape sequences are listed here. They include the obvious ones such as newline n
, horizontal tab t
, carriage return r
and more obscure ones such as named unicode characters using N...
, e.g. NWAVY DASH
which represents unicode character u3030
. The key point though is that if the escape sequence is not known, the character sequence is left in the string as is.
Part of the problem might also be that the Python interpreter output is misleading you. This is because the backslashes are escaped when displayed. However, if you print those strings, you will see the extra backslashes disappear.
>>> '?\?' '?\\?' >>> print('?\?') ?\? >>> '?\?' == '?\?' # I don't know why you think this is True??? False >>> '?\?' == r'?\?' # but if you use a raw string for '?\?' True >>> '?\\?' == '?\?' # this is the same string... see below True
For your specific examples, in the first case '?\?'
, the first escapes the second backslash leaving a single backslash, but the third backslash remains as a backslash because
?
is not a valid escape sequence. Hence the resulting string is ?\?
.
For the second case '?\\?'
, the first backslash escapes the second, and the third backslash escapes the fourth which results in the string ?\?
.
So that’s why three backslashes is the same as four:
>>> '?\?' == '?\\?' True
If you want to create a string with 3 backslashes you can escape each backslash:
>>> '?\\\?' '?\\\?' >>> print('?\\\?') ?\?
or you might find “raw” strings more understandable:
>>> r'?\?' '?\\\?' >>> print(r'?\?') ?\?
This turns of escape sequence processing for the string literal. See String Literals for more details.