13. Strings

Lab materials

Lab tasks

In this task you have 2 tasks. First of which we’ll focus on the string.h library and in the second one we’ll look at manipulating strings manually.

Task 1: introducing the string.h library

In this task we’ll be focusing on the functions in the string.h library.

Requirements
  • Follow the step-by-step guide when solving this task!
  • The program must ask for a predefined password when starting. You must not allow the user past it until a correct password is entered.
  • User is asked for a sentence. Program will show how many characters there were in the given sentence (including spaces, punctuation).
  • The user is asked for a search phrase, after which it will output whether the phrase was present in the originally entered sentence (yes/no answer).
  • User is asked for 2 words, which are used to compose a sentence
    • You can choose the types of words yourself (e.g. verbs, nounds, names of items, names of people etc)
    • You can also choose the sentence you wish to compose
    • Compose the given words with others to compose a simple sentence consisting of at least 4 words.
    • One of the words given by the user must be the first word in this sentence.
    • The composed sentence must be written in a completely new (empty) variable, which must fit the composed sentence even in the worst case scenario (e.g. maximum possible length for the user-entered words).
Helper function for debugging

If you have any trouble figuring out what is actually in the string, use this function. It will print out the string, followed by each individual character (one per line). It will print the index, ascii code and the character it represents. This way it’s easy to figure out if you have a stray line change for an example.

Step-by-step guide
1. step: getting user input

We will go through this step in the class together! In this step we will create two functions necessary for our program.

For starters, let’s create a function to read a string. In the solution it is important that we would be able to read strings consisting of multiple words – we must be able to read  strings containing spaces. There are multiple ways to do this. In this sample, we will approach it using the function fgets() . If you wish to take a different approach, you are welcome to do so.

The function  fgets()  is meant to be used for reading files. However everything is a file, including the data stream coming from the keyboard. To use it, we will read the data from a file called  stdin . Secondly, the function requires us to specify the maximum length for the string – this is for safety so we wouldn’t be susceptible to buffer overflow attacks. Thirdly and the most problematic for us is that when we press the enter key when writing text, the newline \n  created by the enter key is also stored into the array.

We will approach this with the idea of creating a wrapper function. Wrappers surround an existing function or functions while providing extra features, convenience or safety.

Our wrapper needs two inputs – the character array (string) where we will store the input and the maximum length of that string.

In the solution for this function, I’ve left in question marks in 3 places where you will have to fill the gaps! Hint: if the read string is 10 characters, then at the index 8 is the last important character for us which the user entered. It is followed by the unwanted newline character, which we must get rid of. To do so, we will replace it with the string terminator symbol.

If you need, you can use the helper function provided earlier to see the contents of the read string.

In order for it to compile, the two lines handling the newline correction are commented out. Once you have replaced the question marks, comment them back in for it to work correctly.

Once the reading is done, let’s write another wrapper, this time for our  GetString()  function. This way it will be a lot cleaner to ask for input.

Now let’s try to get some input. In the example I have a character array sentence[]  , which has a length of  MAX_STR , defined a macro. The function call will look like this:

2. step: read a sentence and print its length

Read a sentence from the user. Find and print the length of the sentence the user entered.

3. step: search phrase

Add a new function to the program, where you will ask the user to input a search phrase. The function should then print whether the phrase existed in the previously entered sentence or not.

A yes/no answer is enough. To achieve this, you can just check the return value as such:

4. step: password prompt

Add a function to the program that will ask the user for their password. It could look something like this:

The prompt must be inside of a loop in such a way that the user wouldn’t be able to proceed to use your program before they’ve entered a correct password. The password prompt must be case sensitive. If you wish, you can add hints on incorrect input or limit the amount of tries the user has.

5. step: composing a sentence

Add a function to your program that will compose a simple sentence of at least 4 words. Since we don’t have any good inputs or outputs for the function, we can create it as follows (typically avoid void-void functions!):

Think! How long should the variable sentence  be to hold both user entered words in the worst case scenario, as well as all the characters you are using to formulate the sentence? The size can be approximated, but must be sufficient!

Now think of a sentence you wish to create. The sentence will have two gaps that the user will have to fill. You can choose which type of words go in there (names of objects, people, verbs, adjectives, …). One of the words that the user enters must be the at the start of the sentence. The location of the second word is up to you. E.g.  <word1> is a <word2> name! .

Once you have asked the user for input and read the words, you must compose the sentence. The sentence must be written into a new, unused (empty) character array. You must account for the worst case scenario for fitting the sentence (when the user decides to enter the maximum possible length words for booth inputs).

Example
Advanced task: Alternative count

Create a new function to do a manual count and add statistics

  • Count and show how many alphabetical characters [a-zA-Z] there were. Do not count punctuation, spaces etc.
  • Calculate and print the percentage of non-alphabetical characters in the sentence.
  • Show the percentage with one place after the comma.

Example

Task 2: Generating e-mail addresses from CSV

In this task we’re introducing you to a widely used file format for keeping data. The purpose of this task is to practice working with characters inside of a string manually.

Download the starter code: 12_2_csv_starter.c

CSV format

CSV stands for comma separated value. CSV files are used for storing structured data. Every data field in a CSV file, as the name suggest, is separated by a comma. It’s one of the most widely used formats for storing and backing up data next to database systems themselves. The primary benefit of CSV is in its simplicity, making it supported by almost all applications that process any kind of data.

In the most simple case, all data fields are separated by comas:

We are using the same complexity level for this lab task. To see how more complicated data is stored when you also need to store commas and quotes within the data fields, as well as add column headers, check here: https://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules

Requirements
  • The task must be built on the starter code.
  • Program generates e-mail addresses for every person
    • The name part of the e-mail address must be composed of first 3 characters from the first name, followed by the first 3 characters of the last name.
    • The name part is followed by a domain of your choosing
    • You can only have lower case characters in the e-mail address
    • The e-mail address must be stored in a new character array that you create. Print it to the screen from that array. You are not allowed to print the characters on the fly while processing the entry.
  • The program must print:
    • The full name of the person. First and last name must be separated by a space.
    • Generated e-mail address
  • You are not allowed to change the code present in the starter code without a confirmation from us. The starting point of the code you write is in the function  ProcessPerson()  . You are welcome (and recommended) to add more functions to the code.
Example
Hints
  • By knowing the location of the comma, you will also be able to calculate the position of the first character from the last name
  • Lower and upper case ASCII characters differ from each other by a single bit, which is valued at 32 (e.g. A is 65, a is 97)
  • All operations in this task besides adding the domain address in the end are easiest done character at a time. Library functions from string.h  can be used, but they might be unnecessarily complex for now.
  • The most common mistake in this task is forgetting to add a string terminator to the end after copying over the name part of the address!
Advanced task 1: short names

Change the way you are generating the e-mail addresses to accommodate people with shorter first or last names.

Example: Ly Kask -> lykask@ttu.ee

Advanced task 2: unique e-mail addresses

Change the way you are generating the e-mail addresses in such a way that for similarly starting names the resulting e-mail address would be different.

Change your data array to the following:

Requirements

  • The e-mail addresses must be unique
  • The name part of the address must be 6 characters
  • The addresses must still refer to the owner’s name as much as possible
  • The exact algorithm and thus the final form of the address is up to you. You will defend your decision when showing.

After this lesson, you should

  • Know that there are various ways of encoding characters, including ASCII and Unicode
  • Know what is an ASCII table and how to use it.
  • Know how strings work in C.
  • Know how to terminate a string in C and why it’s necessary.
  • Know that strings in C are also related to byte-arrays.
  • Know what is CSV, where it’s used and why.
  • Know how to use the string.h library to manipulate strings.
  • Be able to write your own string manipulation functions (manipulating characters)
  • Know what is buffer overflow and the attacks related to it.

Additional content