Skip to content

Commit

Permalink
runes concept: single sentence per line (#2825)
Browse files Browse the repository at this point in the history
  • Loading branch information
kotp authored Jul 29, 2024
1 parent a6a216b commit 9a86e56
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 13 deletions.
27 changes: 20 additions & 7 deletions concepts/runes/about.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
# About

The `rune` type in Go is an alias for `int32`. Given this underlying `int32` type, the `rune` type holds a signed 32-bit integer value. However, unlike an `int32` type, the integer value stored in a `rune` type represents a single Unicode character.
The `rune` type in Go is an alias for `int32`.
Given this underlying `int32` type, the `rune` type holds a signed 32-bit integer value.
However, unlike an `int32` type, the integer value stored in a `rune` type represents a single Unicode character.

## Unicode and Unicode Code Points

Unicode is a superset of ASCII that represents characters by assigning a unique number to every character. This unique number is called a Unicode code point. Unicode aims to represent all the world's characters including various alphabets, numbers, symbols, and even emoji as Unicode code points.
Unicode is a superset of ASCII that represents characters by assigning a unique number to every character.
This unique number is called a Unicode code point.
Unicode aims to represent all the world's characters including various alphabets, numbers, symbols, and even emoji as Unicode code points.

In Go, the `rune` type represents a single Unicode code point.

Expand All @@ -21,7 +25,9 @@ The following table contains example Unicode characters along with their Unicode

## UTF-8

UTF-8 is a variable-width character encoding that is used to encode every Unicode code point as 1, 2, 3, or 4 bytes. Since a Unicode code point can be encoded as a maximum of 4 bytes, the `rune` type needs to be able to hold up to 4 bytes of data. That is why the `rune` type is an alias for `int32` as an `int32` type is capable of holding up to 4 bytes of data.
UTF-8 is a variable-width character encoding that is used to encode every Unicode code point as 1, 2, 3, or 4 bytes.
Since a Unicode code point can be encoded as a maximum of 4 bytes, the `rune` type needs to be able to hold up to 4 bytes of data.
That is why the `rune` type is an alias for `int32` as an `int32` type is capable of holding up to 4 bytes of data.

Go source code files are encoded using UTF-8.

Expand Down Expand Up @@ -76,9 +82,14 @@ fmt.Printf("myRune Unicode character: %c\n", myRune)

## Runes and Strings

Strings in Go are encoded using UTF-8 which means they contain Unicode characters. Since the `rune` type represents a Unicode character, a string in Go is often referred to as a sequence of runes. However, runes are stored as 1, 2, 3, or 4 bytes depending on the character. Due to this, strings are really just a sequence of bytes. In Go, slices are used to represent sequences and these slices can be iterated over using `range`.
Strings in Go are encoded using UTF-8 which means they contain Unicode characters.
Since the `rune` type represents a Unicode character, a string in Go is often referred to as a sequence of runes.
However, runes are stored as 1, 2, 3, or 4 bytes depending on the character.
Due to this, strings are really just a sequence of bytes.
In Go, slices are used to represent sequences and these slices can be iterated over using `range`.

Even though a string is just a slice of bytes, the `range` keyword iterates over a string's runes, not its bytes. In this example, the `index` variable represents the starting index of the current rune's byte sequence and the `char` variable represents the current rune:
Even though a string is just a slice of bytes, the `range` keyword iterates over a string's runes, not its bytes.
In this example, the `index` variable represents the starting index of the current rune's byte sequence and the `char` variable represents the current rune:

```go
myString := "❗hello"
Expand All @@ -94,7 +105,8 @@ for index, char := range myString {
// Index: 7 Character: o Code Point: U+006F
```

Since runes can be stored as 1, 2, 3, or 4 bytes, the length of a string may not always equal the number of characters in the string. Use the builtin `len` function to get the length of a string in bytes and the `utf8.RuneCountInString` function to get the number of runes in a string:
Since runes can be stored as 1, 2, 3, or 4 bytes, the length of a string may not always equal the number of characters in the string.
Use the builtin `len` function to get the length of a string in bytes and the `utf8.RuneCountInString` function to get the number of runes in a string:

```go
import "unicode/utf8"
Expand All @@ -118,7 +130,8 @@ fmt.Println(myString)
// Output: exercism
```

Similarly, a string can be type converted to a slice of runes. Remember, without formatting verbs, printing a rune yields its integer (decimal) value:
Similarly, a string can be type converted to a slice of runes.
Remember, without formatting verbs, printing a rune yields its integer (decimal) value:

```go
myString := "exercism"
Expand Down
24 changes: 18 additions & 6 deletions concepts/runes/introduction.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
# Introduction

The `rune` type in Go is an alias for `int32`. Given this underlying `int32` type, the `rune` type holds a signed 32-bit integer value. However, unlike an `int32` type, the integer value stored in a `rune` type represents a single Unicode character.
The `rune` type in Go is an alias for `int32`.
Given this underlying `int32` type, the `rune` type holds a signed 32-bit integer value.
However, unlike an `int32` type, the integer value stored in a `rune` type represents a single Unicode character.

## Unicode and Unicode Code Points

Unicode is a superset of ASCII that represents characters by assigning a unique number to every character. This unique number is called a Unicode code point. Unicode aims to represent all the world's characters including various alphabets, numbers, symbols, and even emoji as Unicode code points.
Unicode is a superset of ASCII that represents characters by assigning a unique number to every character.
This unique number is called a Unicode code point.
Unicode aims to represent all the world's characters including various alphabets, numbers, symbols, and even emoji as Unicode code points.

In Go, the `rune` type represents a single Unicode code point.

Expand All @@ -21,7 +25,9 @@ The following table contains example Unicode characters along with their Unicode

## UTF-8

UTF-8 is a variable-width character encoding that is used to encode every Unicode code point as 1, 2, 3, or 4 bytes. Since a Unicode code point can be encoded as a maximum of 4 bytes, the `rune` type needs to be able to hold up to 4 bytes of data. That is why the `rune` type is an alias for `int32` as an `int32` type is capable of holding up to 4 bytes of data.
UTF-8 is a variable-width character encoding that is used to encode every Unicode code point as 1, 2, 3, or 4 bytes.
Since a Unicode code point can be encoded as a maximum of 4 bytes, the `rune` type needs to be able to hold up to 4 bytes of data.
That is why the `rune` type is an alias for `int32` as an `int32` type is capable of holding up to 4 bytes of data.

Go source code files are encoded using UTF-8.

Expand Down Expand Up @@ -67,9 +73,14 @@ fmt.Printf("myRune Unicode code point: %U\n", myRune)

## Runes and Strings

Strings in Go are encoded using UTF-8 which means they contain Unicode characters. Since the `rune` type represents a Unicode character, a string in Go is often referred to as a sequence of runes. However, runes are stored as 1, 2, 3, or 4 bytes depending on the character. Due to this, strings are really just a sequence of bytes. In Go, slices are used to represent sequences and these slices can be iterated over using `range`.
Strings in Go are encoded using UTF-8 which means they contain Unicode characters.
Since the `rune` type represents a Unicode character, a string in Go is often referred to as a sequence of runes.
However, runes are stored as 1, 2, 3, or 4 bytes depending on the character.
Due to this, strings are really just a sequence of bytes.
In Go, slices are used to represent sequences and these slices can be iterated over using `range`.

Even though a string is just a slice of bytes, the `range` keyword iterates over a string's runes, not its bytes. In this example, the `index` variable represents the starting index of the current rune's byte sequence and the `char` variable represents the current rune:
Even though a string is just a slice of bytes, the `range` keyword iterates over a string's runes, not its bytes.
In this example, the `index` variable represents the starting index of the current rune's byte sequence and the `char` variable represents the current rune:

```go
myString := "❗hello"
Expand All @@ -85,7 +96,8 @@ for index, char := range myString {
// Index: 7 Character: o Code Point: U+006F
```

Since runes can be stored as 1, 2, 3, or 4 bytes, the length of a string may not always equal the number of characters in the string. Use the builtin `len` function to get the length of a string in bytes and the `utf8.RuneCountInString` function to get the number of runes in a string:
Since runes can be stored as 1, 2, 3, or 4 bytes, the length of a string may not always equal the number of characters in the string.
Use the builtin `len` function to get the length of a string in bytes and the `utf8.RuneCountInString` function to get the number of runes in a string:

```go
import "unicode/utf8"
Expand Down

0 comments on commit 9a86e56

Please sign in to comment.