Lab materials
- Slides: Strings
- Sample: Some simpler character assignments
Lab tasks
In this task you have two tasks. The first task focuses on using string.h library and reading input from the user. The second task looks at manipulating strings manually and introduces you to CSV (comma separated value) format.
Task 1 [W13-1]: Introducing the string.h library
In this task, there are two main goals. First, we’ll look at reading input from the user safely. The second goal is to get acquainted with some of the standard functions in the string.h library. This will be done in a step-by-step manner.
Requirements
- Follow the step-by-step guide to solve the task!
- The program must ask for a predefined password. It must not allowed to proceed before a correct password is entered.
- Ask user for a sentence. The program must show how many characters there were in the given sentence (including spaces, punctuation).
- The user is asked for a search phrase. Check if the search phrase was present in the previously entered sentence (yes/no answer).
- User is asked for two words, which are used to compose a sentence
- You can choose the types of words yourself (e.g. verbs, nouns, names of items, names of people etc)
- You can also choose the sentence you wish to compose
- Compose the given words with others to compose a simple sentence consisting of at least 4 words.
- One of the words given by the user must be the first word in this sentence.
- The composed sentence must be written in a completely new (empty) variable, which must fit the composed sentence even in the worst case scenario (e.g. maximum possible length for both user-entered words).
Helper function for debugging
To better understand the contents of a string (i.e. the character array), the following helper function is provided for you. It prints the string out first, followed by printing each individual character (one per line). It will print the index, ASCII code and the character it represents. This should help you identify any stray or unwanted characters. Note that you can’t identify uninitialized array slots with this function!
|
1 2 3 4 5 6 7 8 9 10 11 |
void DebugString(char str[]) { int i = 0; printf("String is: '%s'\n", str); while (str[i] != '\0') { printf("str[%d] = %3hhu %c\n", i, str[i], str[i]); i++; } printf("\n"); } |
Step-by-step guide
1. step: getting user input
The first step will be done in class together, but you can also follow this guide instead. In this step we will create two functions necessary for our program.
In the first function, the focus is on reading input safely. The solution should also be able to read input that spans multiple words, i.e. includes spaces. There are multiple solutions to solve this, but we’ll use fgets() function within this guide.
The function fgets() is meant to be used for reading files. However everything is a file, including the data stream coming from the keyboard. To take advantage of this, the data will be read from a file stream named stdin , which you are familiar with as the standard input stream. Secondly, the function requires specifying the maximum length for the input – this is for safety, so it wouldn’t be susceptible to buffer overflow attacks. The third gotcha is that when we press the enter key on the keyboard, it signals a newline \n , which is also stored into the array. This needs to be cleaned up as well.
To make these functions more convenient to use, we’ll create wrappers around those functions. The goal is to provide extra features, convenience or safety not present in the language itself. The wrapper for reading a string needs two inputs – the character array (string) to store the input and the maximum length of that string. The downside of this is of course making the program slower.
There are three blanks left in the provided solution for you to fill in. Hint: if the read string is 10 characters, then at the index 8 is the last important character that the user entered. It is followed by the unwanted newline character, that needs to be removed by replacing it with the string terminator. You can use the helper function for better understanding the indexes and the position you need to make the correction at.
In order for it to compile, the two lines handling the newline correction are commented out. Once you have replaced the question marks, comment them back in for it to work correctly.
Note that it also contains a new type size_t . It’s a data type that is most commonly used for storing any kind of lengths, indexing arrays and counting. It’s an unsigned integer type.
|
1 2 3 4 5 6 7 8 9 10 11 |
void GetString(char str[], int max) { // Read the string from keyboard fgets(str, max, stdin); // TODO: Find the length of the actual string we just read // size_t len = ???; // TODO: Write the string terminator in place of the newline to fix the string // str[ ??? ] = ???; } |
Once the reading is completed, let’s write another wrapper, this time for our GetString() function. This provides us with an option to read strings either with or without providing a prompt. There are other ways to do this without separate functions, but they are considered too complex for this course.
|
1 2 3 4 5 |
void PromptString(char str[], int max, char prompt[]) { printf("%s: ", prompt); GetString(str, max); } |
Now let’s try to get some input. In the example I have a character array sentence[] , which has a length of MAX_STR , defined a macro. The function call will look like this:
|
1 |
PromptString(sentence, STR_MAX, "Please enter a sentence"); |
2. step: read a sentence and print its length
Read a sentence from the user. Find and print the length of the sentence the user entered.
3. step: search phrase
Add a new function, where you will ask the user to input a search phrase. The function should then print whether the phrase existed in the previously entered sentence or not.
A yes/no answer is enough. To achieve this, you can just check the return value of strstr() .
Note, that the following example uses NULL , which refers to a memory address or an object, that doesn’t exists. We call this the NULL-pointer. It is needed because the function strstr() doesn’t just return a yes/no answer, but instead gives you the location (memory address) where the search phrase is located at. It will be NULL , if it can’t find that location, i.e. the search phrase is not in the sentence.
|
1 2 3 4 5 6 7 8 |
if (strstr() != NULL) { } else { } |
4. step: password prompt
Add a function to the program that will ask the user for their password. It could look something like this:
|
1 2 3 4 5 6 |
void PromptPassword(char correctPassword[]) { char userEnteredPassword[STR_MAX]; // Write your loop for password prompt here } |
The prompt must be inside of a loop in such a way that the user wouldn’t be able to proceed to use your program before they’ve entered a correct password. The password prompt must be case sensitive. If you wish, you can add hints on incorrect input or limit the amount of tries the user has.
5. step: composing a sentence
Add a function to your program that will compose a simple sentence of at least 4 words. Since we don’t have any good inputs or outputs for the function, we can create it as follows (typically avoid void-void functions!):
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
void FormulateSentence(void) { // String where the final sentence will be held char sentence[ ??? ]; // Strings for the two user-entered words // Prompt the user for the two words // Formulating the final sentence // Print the final formulated sentence printf("Result: %s\n", sentence); } |
Think! How long should the variable sentence be in the worst case scenario, so that it would be able to hold both user entered words, as well as all the characters you are using to formulate the sentence? The size can be approximated, but must be sufficient!
Now think of a sentence you wish to create. The sentence will have two gaps that the user will have to fill. You can choose which type of words go in there (names of objects, people, verbs, adjectives, …). One of the words that the user enters must be the at the start of the sentence. The location of the second word is up to you. E.g. <word1> is a <word2> name! .
Once you have asked the user for input and read the words, you must compose the sentence. The sentence must be written into a new, unused (empty) character array. You must account for the worst case scenario for fitting the sentence (when the user decides to enter the maximum possible length words for booth inputs).
Example
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
Please enter a password: password Invalid password! Try again! Please enter a password: again Invalid password! Try again! Please enter a password: hunter2 Password accepted. Welcome AzureDiamond! Enter a sentence: I do wish we could chat longer, but I'm having an old friend for dinner. The length of the entered sentence is 72 Please enter a search phrase: old friend Your search phrase "old friend" exists in the originally entered sentence Enter a name: Pauline Enter an adjective: awesome Result: Pauline is an awesome person! |
Extra task [W13-3]: Alternative count
Create a new function to do a manual count and add statistics
- Count and show how many alphabetical characters [a-zA-Z] there were. Do not count punctuation, spaces etc.
- Calculate and print the percentage of non-alphabetical characters in the sentence.
- Show the percentage with one place after the comma.
Example
|
1 2 3 4 |
Sentence entered: Hi, Bob! Sentence length: 8 Alphabetical characters: 5 Percentage of other characters: 37,5% |
Task 2 [W13-2]: Generating e-mail addresses from CSV
The goal of this task is to introduce you to CSV (comma separated value) format. You will be also be practicing manipulating characters individually.
Download the starter code: 13_2_csv_starter.c
CSV format
CSV stands for comma separated value. CSV files are used for storing structured data. Every data field in a CSV file, as the name suggest, is separated by a comma. It’s one of the most widely used formats for storing and backing up data next to database systems themselves. The primary benefit of CSV is in its simplicity, making it supported by almost all applications that process any kind of data.
In the most simple case, all data fields are separated by comas:
|
1 2 |
Mari,Maasikas,112222IACB,49001013333 Toomas,Toomingas,111111MVEB,39002204444 |
We are using the same complexity level for this lab task. To see how more complicated data is stored when you also need to store commas and quotes within the data fields, as well as add column headers, check here: https://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules.
Requirements
- The task must be built on the starter code.
- The program must create an e-mail address for all people listed in the CSV
- The name part of the e-mail address must be composed of first 3 characters from the first name, followed by the first 3 characters of the last name.
- The name part is followed by a domain of your choosing
- You can only have lower case characters in the e-mail address
- The e-mail address must be stored in a new character array that you create. Print it to the screen from that array. You are not allowed to print the characters on the fly while processing the entry.
- The program must print:
- The full name of the person. First and last name must be separated by a space.
- Generated e-mail address
- You are not allowed to change the code present in the starter code without a confirmation from us. The starting point of the code you write is in the function ProcessPerson() . You are free (and recommended) to add more functions to the code.
Example
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Number of CSV lines: 3 Processing line: 'Maria,Kask' Name: Maria Kask E-mail: markas@ttu.ee Processing line: 'Johanna-Maria,Kask' Name: Johanna-Maria Kask E-mail: johkas@ttu.ee Processing line: 'Kalev Kristjan,Kuusk' Name: Kalev Kristjan Kuusk E-mail: kalkuu@ttu.ee |
Hints
- By knowing the location of the comma, you will also be able to calculate the position of the first character from the last name
- Lower and upper case ASCII characters differ from each other by a single bit, which is valued at 32 (e.g. A is 65, a is 97)
- All operations in this task besides adding the domain address in the end are easiest done character at a time. Library functions from string.h can be used, but they might be unnecessarily complex for now.
- The most common mistake in this task is forgetting to add a string terminator to the end after copying over the name part of the address!
Extra task 1 [W13-4]: Short names
Change the way you are generating the e-mail addresses to accommodate people with shorter first or last names.
Example: Ly Kask -> lykask@ttu.ee
Extra task 2 [W13-5: Unique e-mail addresses
Change the way you are generating the e-mail addresses in such a way that for similarly starting names the resulting e-mail address would be different.
Change your data array to the following:
|
1 2 3 4 5 6 7 |
char *data[] = {"Maria,Kask", "Johanna-Maria,Kask", "Kalev,Kristjan,Kuusk", "Margit,Kasemets", "Maris,Kase", "Marko,Kasvataja", "Margus,Kasevee"}; |
Requirements
- The e-mail addresses must be unique
- The name part of the address must be 6 characters
- The addresses must still refer to the owner’s name as much as possible
- The exact algorithm and thus the final form of the address is up to you. You will defend your decision when showing.
After this lesson, you should
- Know that there are various ways of encoding characters, including ASCII and Unicode
- Know what is an ASCII table and how to use it.
- Know how strings work in C.
- Know how to terminate a string in C and why it’s necessary.
- Know that strings in C are also related to byte-arrays.
- Know what is CSV, where it’s used and why.
- Know how to use the string.h library to manipulate strings.
- Be able to write your own string manipulation functions (manipulating characters)
- Know what is buffer overflow and the attacks related to it.
Additional content
- Characters, Symbols and the Unicode Miracle – Computerphile
https://www.youtube.com/watch?v=MijmeoH9LT4 - A beginners guide away from scanf
https://www.sekrit.de/webdocs/c/beginners-guide-away-from-scanf.html - ASCII
https://en.wikipedia.org/wiki/ASCII - ASCII table
https://www.rapidtables.com/code/text/ascii-table.html - Character encoding
https://en.wikipedia.org/wiki/Character_encoding - String.h library
https://www.cplusplus.com/reference/cstring/ - Strings in C
https://www.geeksforgeeks.org/strings-in-c-2/ - CSV
https://en.wikipedia.org/wiki/Comma-separated_values