- Python has two build-in types of strings: str holds bytes, and unicode holds Unicode characters.
- If we only deal with 7-bit ASCII characters (characters in the range of 0-127), we can save some memory by using strs.
- However, we should be careful if we use an 8-bit character set.
- In general, it is not always possible simply by examining the bytes to determine which 8-bit encoding is used for a particular string.
- But the safest way is to use strs for 7-bit ASCII and for raw binary 8-bit bytes, and unicode otherwise.
- Note: Good news is that Python 3.x doesn't have a special Unicode string type/class. Every string is a Unicode string.
Learn Python - Python tutorial - python strings - Python examples - Python programs
String Literals
- Python strings are fairly easy to use. But there are so many ways to write them in our code:
Quoted Strings
- Single and double quote characters are the same.
- The reason for supporting both is that it allows us to embed a quote character of the other variety inside a string without escaping it with a backslash.
- Python concatenates adjacent string literals in any expression.
- If we add commas between these strings, we'll have a tuple not a string.
Escape Sequences
- A backslash is representative of a general pattern in strings. Backslashes are used to introduce special byte coding, escape sequences.
- Escape sequences let us embed byte codes in strings that cannot easily be type on a keyboard.
- The character \, and one or more characters following it in the string literal, are replaced with a single character in the resulting string object.
- The object has the binary value specified by the sequence. For instance, here is a five-character string that embeds a newline and a tab:
- The two characters \n stand for a single character - the byte containing the binary value of the newline character in our character set which is ASCII code 10.
- The sequence \t is replaced with the tab character. The way this string looks when printed depends on how we print it.
- While the interactive echo shows the special characters as escapes, but print interprets them instead:
- We can check how many characters are in the string.
- So, the string is five bytes long. It contains an ASCII a, a new line, an ASCII b, etc.
- The backslash characters are not really stored with the string in memory.
- They are just used to tell Python to store special byte values in the string. Here are string backslash characters:
Escape | Meaning |
---|---|
\newline | Ignored (continuation line) |
\\ | Backslash (stores one \) |
\' | Single quotes (stores ') |
\" | Double quotes (stores ") |
\a | Bell |
\b | Backspace |
\f | Formfeed |
\n | Newline (linefeed) |
\r | Carriage return |
\t | Horizontal tab |
\v | Vertical tab |
\xhh | Character with hex value hh (at most 2 digits) |
\ooo | Character with octal value ooo (up to 3 digits) |
\0 | Null: binary 0 character (doesn't end string) |
\N{ id } | Unicode database ID |
\uhhhh | Unicode 16-bit hex |
\Uhhhhhhhh | Unicode 32-bit hex |
\other | Not an escape (keeps both \ and other) |
- Some escape sequences allow us to embed binary values into the bytes of a string. Here we have five-character string with two binary zeros:
- The zero(null) byte does not terminate a string. Instead, Python keeps the string's length and text in memory. Here we have a string with a binary 1 and 2 (in octal) and 3 (hexa):
- Here, Python prints out nonprintable characters in hex, regardless of how they are specified. Here we have "Picasso", a tab, a newline, and a zero value coded in hex:
- If Python does not recognize the character after a backslash (\) as an escape code, it simply keeps the backslash in the string:
- As memtioned before, Python 3.x doesn't have a special Unicode string type/class, and very string is a Unicode string.
- So, we do not need to use unichr() any more, we can just use chr() as in the example below.
Raw String with Escape Sequences
- Let's look at the following code for opening a file:
- The problem is that \n is considered as a newline character, and \t as a tab.
- This is where raw strings can do something. If the letter r (uppercase or lowercase) appears before the opening quote of a string, it suppresses the escape mechanism.
- The result is that Python keeps our backslash literally. In other words, backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing
- '\' and 'n', while "\n" is a one-character string containing a newline.
- Usually patterns will be expressed in Python code using this raw string notation.
- So, to fix the filename problem, we can just add the letter r:
- Or, since two backslashes are really an escape sequence for one backslash, we can keep our backslash by doubling them:
- Actually, we sometimes need to this method when we should print strings with embedded backslashes:
- As we've seen in numeric representation, the default format at the interactive prompt prints results as they were coded.
- So, escape backslashes are in the output. The print provides a more user-friendly format that shows that there is actually only on backslash in each spot.
Triple Quotes for Multiline Block Strings
- A block string is a string literal format with triple-quotes. It is for coding multiline text data.
- Though the string spans three lines, Python collects all the triple-quoted text into a single multiline string with embedded newline characters (\n) at the places where our code has line breaks.
- If we print it instead of echoing:
Indexing and Slicing
- We can access strong components by position because strings are order collections of characters.
Learn Python - Python tutorial - string - Python examples - Python programs
- Python offsets start at 0 and end at one less than the length of the string.
- It also lets us fetch items from sequences such as strings using negative offsets.
- A negative offset is added to the length of a string to derive a positive offset.
- We can also thing of negative offsets as counting backward from the end.
- The basics of slicing are straightforward. When we index a sequence object such as a string on a pair of offset separated by a colon, Python returns a new object containing the contiguous section.
- The left offset is taken to be the lower bound (inclusive) and the right is the upper bound (noninclusive).
- In other words, Python fetches all items from the lower bound up to but not including the upper bound.
- Then, it returns a new object containing the fetched items. If omitted, the left and right bounds default to o and the length of the object, respectively.
Indexing
- S[i] fetches components at offsets:
- The first item is at offset 0.
- Negative indexes mean to count backward from the end or right.
- S[0] fetches the first item.
- S[-2] fetches the second item from the end (same as S[len(S)-2]).
Slicing
- S[i:j] extracts contiguous sections of sequences:
- The upper bound is noninclusive.
- Slice boundaries default to 0 and the sequence length, if omitted.
- S[1:3] fetches items at offsets 1 up to but not including 3.
- S[1:] fetches items at offset 1 through the end (the sequence length).
- S[:3] fetches items at offset 0 up to but not including 3.
- S[:-1]fetches items at offset 0 up to but not including the last item.
- S[:] fetches items at offsets o through the end - this effectively performs a top-level copy of S.
- The last item is very common trick. It makes a full top-level copy of a sequence object which is an object with the same value but a distinct piece of memory.
- This isn't very useful for immutable objects like strings but it is very useful for objects that may be changed in-place such as lists.
The Third Limit and Slice Objects
- Slice expressions have an optional third index as a step or stride:
- That means "extract all the items in X, from offset i through j-1 by k."
- A stride of -1 indicates that the slice should go from right to left. The net effect is to reverse the sequence:
- With a negative stride, the meanings of the first two bounds are reversed.
- In other words, the slice S[5:1:-1] fetches the items from 2 to 5, in reverse order:
1. Strings are Immutable
Once a string is defined, it cannot be changed.
python - Sample - python code :
Output:
But below code works fine.
Learn Python - Python tutorial - interesting-facts-about-strings-python concad - Python examples - Python programs
python - Sample - python code :
Output:
In the second program, interpreter makes a copy of the original string and then work on it and modifies it. So the expression a = a +’for’ doesn’t change string but reassigns the variable a to the new string generated by the result and drops down the previous string.
2. Three ways to create strings:
The single quotes and double quotes works same for the string creation. Example of single quote and double quote string. Now talking about triple quotes, these are used when we have to write a string in multiple lines and printing as it is without using any escape sequence.
python - Sample - python code :
Output:
How to print single quote or double quote on screen?
We can do that in the following two ways:
- First one is to use escape character to display the additional quote.
- The second way is by using mix quote, i.e., when we want to print single quote then using double quotes as delimiters and vice-versa.
Example-
python - Sample - python code :
Output: