That Blue Square Thing

AQA Computer Science GCSE

This page is up to date for the AQA 8525 syllabus for exams from 2022.

Programming Concepts - String handling

String Handling

Strings are one of the main data types you need to know about. They represent words - sets of characters. We show that data is a string by putting it inside quite marks, like: "Boris Budge".

A character is a specific data type. It's just a single keyboard character. We put these inside quotes as well, although sometimes just single quotes are used: 'B'.

You can read more about data types on Variables & Data Types page.

Some of this is linked to character encoding, which is part of Unit 3 and deals with ASCII code and Unicode and the idea of character sets.

Dealing with Strings

Strings should be thought of as sequences of characters.

This means that the string "Asparagus" is made up of a sequence of 9 characters in the right order. We can write the sequence as ['A', 's', 'p', 'a', 'r', 'a', 'g', 'u', 's'].

Note that this is exactly the same way of writing that we use when we're dealing with arrays. A string is really just an array of characters - but they are such important data types that we simplify the way we deal with them by writing them in a simpler way (like, "Asparagus").

PDF iconString basics

There are six main sets of operations you need to be able to do on strings:

  1. Find the length
  2. Find the position of a character
  3. Concatenate strings (join them together)
  4. Create and use substrings
  5. Convert strings to numbers (and back)
  6. Convert to and from character codes (ASCII code)

Length

Strings have a length. This is just the number of characters in the string - including any spaces.

In Pseudocode you'd see:

theLength <- LEN(theString)

In Python this becomes:

theLength = len(theString)

The length of a string is helpful to know when you want to iterate over it - to use a loop to work through character by character.

Position

Just like arrays, each character in a string can be identified using the index of the element. This uses a number to identify each character in the string.

Just like with arrays, the index of a string usually starts from 0. So, in the code below, the first character of the string (the 'A') is index 0, the second (the first 's') is index 1 and so on. The string has 9 characters and so a length of 9, but the last letter in the string (the second 's') is index 8.

As with arrays, square brackets are used to access individual characters.

theString <- "Asparagus"

OUTPUT theString[0] # outputs 'A'
OUTPUT theString[1] # outputs 's'
OUTPUT theString[7] # outputs 'u'
OUTPUT theString[9] # index out of range error

It is also possible to find the position in the string of a particular character. This uses the command POSITION and will find the first time that the character appears in the string. If the character doesn't appear at all in the string then -1 is returned.

theString <- "Asparagus"

posg <- POSITION(theString, "g") # find the position of g
posa <- POSITION(theString, "a") # find the position of a
posA <- POSITION(theString, "A") # find the position of a
posz <- POSITION(theString, "z") # find the position of z

The values returned by POSITION in each case would be:

The fact that -1 is returned if the value isn't in the string can be very helpful. Say, for example, you wanted to check a user had entered a valid e-mail address. One of things you could check would be if the character '@' is included. You can then use the logic:
posAtSymbol <- POSITION(theEMail, "@")
IF posAtSymbol = -1 THEN
OUTPUT "That is not a valid e-mail address"
ENDIF

The Python equivalent of POSITION is the built in function find(). It also returns -1 if the character isn't in the string.

theString = "Boris Budge"

posd = theString.find("d")
print(posd)

Remember that you can use "d" or 'd' to represent a character. I prefer to use "d" in Python as it causes less problems when I want to use an apostrophe in a word like "can't".

Concatenation

Concatenation involves joining two or more strings together. This uses the operator +.

stringOne <- "Asparagus"
stringTwo <- "Butter"

stringThree <- stringOne + stringTwo

This produces the string "AsparagusButter". If you want to add a space you need to say so!

stringOne <- "Asparagus"
stringTwo <- "Butter"

stringThree <- stringOne + " " + stringTwo

The major problem with concatenation comes when you try and concatenate a string variable with a number variable of some kind. This won't work - you need to convert the number variable to a string first:

aString <- "Exam mark"
score <- 42

stringOne <- aString + score # will not work

stringOne <- aString + INT_TO_STRING(score) # convert the integer first

You can also use REAL_TO_STRING if necessary to convert a decimal number to a string.

The Python code to do this conversion to a string is simpler:

stringOne = aString + str(score)

This works whether the variable score is an integer or a real number.

Substrings

Substrings are strings created from part of a longer string. There are built in commands which make this easy to do. They can be useful for all sorts of things.

aString <- "Amazing shoes"

stringOne <- SUBSTRING(0, 3, aString)
OUTPUT(stringOne)

This will output the string "Amaz" - the characters from element 0 to element 3.

If you wanted the substring of just the word "shoes" you'd use:

stringOne <- SUBSTRING(8, 12, aString)

Take a look at each of these are work them out:

aString <- "Amazing shoes"

stringOne <- SUBSTRING(1, 3, aString)
stringTwo <- SUBSTRING(5, 8, aString)
stringThree <- SUBSTRING(3, 3, aString)

These evaluate to:

In Python you do this slightly differently using a technique called slicing.

myString = "banana republic"

stringOne = myString[0:6] # returns "banana"
stringTwo = myString[7:15] # returns "republic"
stringThree = myString[3:8] # returns "ana R"

Just like for loops, in Python the last number in the square brackets is the one after the last character you want. I know this is annoying, but it's the way substrings (and for loops) work in Python.

Converting Strings to/from Numbers

Sometimes you need to be able to convert a number into a string and vice-versa.

For example, when using input() data is always stored as a string - even if you enter "42".

I mentioned this on the page dealing with using input and it's covered in the concatenation section above as well, but here's a summary of the methods to use:

theInteger = int(input("Enter a number: ")) # to integer
theFloat = float(input("Enter a number: ")) # to real (float)

# convert to a string to concatenate
print("The number is " + str(theInteger))
print("The number is " + str(theFloat))

Note that both Integers and Floats are converted to a String using the same method.

The Pseudocode to do the same thing is here. You might see this in an exam:

STRING_TO_INT(aString) - converts from a string to an integer
INT_TO_STRING(anInteger) - the opposite
STRING_TO_REAL(aString) - converts string to real number
REAL_TO_STRING(aRealNumber) - the opposite

You're more likely to see this written in pseudocode and have to know what they mean - which isn't too difficult.

Converting Characters to ASCII Codes:

You can convert a character to its ASCII code representation in most programming languages. This can be helpful sometimes and is something you need to know for exams.

Here's the pseudocode to do this:

theName <- "Doris Budge"
theChar <- theName[3]

theChar <- CHAR_TO_CODE(theName[i])

This converts the next character at index 3 in theName (i) to its ASCII code number (i is ASCII code 105 - which is different to I which is ASCII code 73). A space would be converted to 32 - the ASCII code value for a space character.

The Python to do this:

theName = "Doris Budge"
theChar = theName[3]

theChar = ord(theName[i])

ord() is the Python equivalent of CHAR_TO_CODE. Perhaps the only time that Pseudocode is easier to use than Python!

Converting ASCII Codes to Characters:

You can also convert from a character code to a character using CODE_TO_CHAR:

theCode <- USERINPUT

theChar <- CODE_TO_CHAR(theCode)
OUTPUT theChar

This code uses USERINPUT to allow you to enter a character code. When you enter a number in Python it gets stored as a string, so when using Python we need to make sure we convert to an integer first.

theCode = int(input("Enter a character code: "))

theChar = chr(theCode)
print(theChar)

This time Python uses the built in function chr() to convert from an integer to a character.