Beamtic's logo
  1. Create user
Home

Share via:

PHP: Count the Number of Characters in a String

How to count the number of characters in a multi-byte string using PHP.

61 views

Edited: 2020-07-27 17:46

Count characters in php.

To count the number of characters in a string in PHP we can use a regular expression; we might also be able to use the strlen function, but this will not accurately count multi-byte characters.

Multi-byte characters can sometimes appear in strings in applications that supports UTF-8.

Just to give an example, the alphabetic characters [a-z] only take up one byte; but if we attempted to count the characters in a string that contained multi-byte characters, then we would end up with an inaccurate result.

This happens because some characters take up more bytes. For example, the shitty emoji character "💩" takes up 4 bytes rather than one; that is no fun at all. In fact. You might call it extremely crappy.

To also count multi-byte characters, we can use the preg_split function with the u modifier, and then count the resulting array:

function count_characters(string $string) {
  return count(preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY));
}

Then, to count count characters, we would simply call this function:

echo count_characters('abcd'); // Should result in "4"
echo count_characters('abcd💩'); // Should result in "4"

Counting multi-byte characters in PHP

When working with UTF-8, counting the number of characters in a string will not be as simple as calling strlen; this is because strlen only counts the bytes in a string, and not the characters themselves. It still works for single-byte characters, such as those in iso-8859-1, but not for UTF-8 aware applications.

Counting the characters in a string that contains a single 4-byte character will result in an highly inaccurate character-count:

echo strlen('abcd💩'); // Should result in "8"

Here, the first four characters are 1-byte characters, making up a total of 4 bytes; but the "crap" emoji at the end will, itself, also take up 4 bytes. The result is eight, which is an inaccurate count — the correct count would be five.

UTF-8 characters take up between 1 and 4 bytes in a string; the alphabetic characters [a-z] only take up one byte, so this is only applicable for strings that actually does contain multi-byte characters.

Comments

  1. Learn what PHP objects are, and how to use them in your own PHP coding adventures.
  2. Why the caret character is not working in exponential growth calculations in PHP, and what to use instead.
  3. Vanilla PHP is basically plain PHP, without the use of extra frameworks and libraries to help you in the development process.
  4. Tutorial showing how to check if a variable is a number in PHP

More in: PHP Tutorials