PR2EN2: Enumerations

Lab materials

Tasks

The lab has two tasks. These tasks cover a lot of topics from Programming 1, but are enriched with the enumeration topic introduced this week.  Both tasks also benefit from the pointers covered last week.

Lab task: File categorization

In this task, you will create a utility that will be able to count how many files of each category exist (e.g. how many image files in a directory and its subdirectories).  We will only create the part that categorizes files and counts the totals.

To find the names of files, we will use knowledge from the Linux task lask semester. We will use a tool called find  to find the files recursively  and pipe them to our program. By doing this, we will be able to index and categorize any amount of files recursively in all subdirectories.

Note: To demonstrate the potential of combining programs you need to be in a Linux. Easiest way is to test in the university environment (use the lab computer, Horizongate or create an SSH tunnel to one of our servers or lab computers).

Requirements

Create a program that

  • Accepts an unknown number of file names from the standard input ( stdin ).
    • You are not allowed to preemptively ask for number of inputs or have a designated string to stop reading inputs.
    • To stop reading the strings and show the statistics, listen for the EOF  (end of file) signal.
  • Categorizes those files to groups based on the identifiable extensions and counts how many files in each category.
  • Display how many files were in each category.
  • Categories must be identified as enumeration type in code.
  • One of the functions you need to have is specified for you. It needs to  take the file extension as a parameter and returns the enum of the category. Proposed function prototype:
    enum FileCategory GetFileType(char *extension);
  • Your program can provide a prompt when started (i.e. instructions), but must not write anything to the output in between inputs.
Categories and extensions
  • Archives: zip, rar, 7z, tar, gz
  • Data: csv, xls, xlsx, ods
  • Documents: pdf, doc, docx, rtf, odt
  • Code: c, h, cpp, hpp, py
  • Text: txt
  • Images: jpg, jpeg, png, svg,
  • Other: all files with extensions, but not in the previously listed types
  • No extension
Template for the task

In order to give you a bit better idea on the expected structure and how the reading and processing would work, you are provided a template to base your task on.

Recommended steps for solving the task
  1. Add a function that will fix the trailing newline in the read string
    I.e. void FixTrailingNewline(char *str);
  2. Add a function that will find the location of the last point (.) symbol in the string to identify the start of the file extension.
    I.e. int GetLastPointPos(char *str);
  3. Add the category enumeration to your code and create an initialized array of counters for the categories.
  4. Add a function to print the array of counters (result)
  5. Add a function that will, based on the given extension, find the category of the file.
    I.e.  enum FileCategory GetFileType(char *extension);
Hints and warnings:
  • Check out the additional enum example on this page. It is based on a similar categorization task, it will offer quite a few ideas on code structure.
  • You should recognize various subtasks from last semester – i.e. parts of the first and second strings lab task and age classifier home work.
  • If you add a count item after the last enumerated item, it will tell you the number of items in the list. This will only work if you allow it to automatically number all items!

    This allows you to automatically declare the correct length array for counters.

  • By using what you learned about pointers last week (i.e. pointer arithmetic),  you can use the location of the point as an offset to calculate the address where the extension starts. That new address would also be pointing at a string.
  • The length of the reading loop is of unknown length. fgets()  returns NULL  when EOF  (end of file, indicating no more inputs) is reached.
  • fgets()  stores the trailing newline character which needs to be corrected for.
  • Input for your program comes from a pipe to your programs stdin
  • To quickly test without the command line, you can hit ctrl+d  to send the EOF  signal
Testing manually when creating the program

To test manually, we can run the program normally, type in the names of files, pressing enter after each file name. Once done, hit ctrl+d  to send the EOF (end of file) signal.

Testing correctness

To test the correctness, we will index a folder that I have prepared for you. Your numbers for each category should match the ones presented in this example.

We use a tool called find to search for files and folders and limit it to only show files and print without the path. First we show the location where we are searching in, then we specify to only show files (omit folders) and we then print the names of files without the path. This will be piped into the program we just created.

Command executed: find ~/M/risto.heinsar/lab_cat/ -type f -printf '%f\n' | ./task1_category

Hint: if you’re curious, you can also test your own P drive and add extensions and/or categories.

Backup for when university network fails

Note: if you are unable to demonstrate the correctness due to networking issues or the systems go offline, you can can demonstrate the correctness by using the following the archived version of the directory structure.

https://blue.pri.ee/ttu/files/iax0584/andmefailid/2_1_file_cat_directory_structure.zip

The structure is the same as on the M drive.

Lab task 2: Distance conversion

You have been provided activity data from a group of employees in an international company. Your task is to convert all data to the desired output units, show the results and give basic statistics.

Requirements
  • Program takes 2 command line arguments
    • First argument is the name of the input file
    • Second argument is the desired output unit of distance (available options: m  for meters, ft  for feet and km  for kilometers)
  • Input file is a basic ASCII text file (first command line argument).
    • Each line in the file contains one entry
    • Each entry consist of two fields, separated by a space: <distance> <unit>
    • Distances are given as real numbers
    • Units are given as strings. Units in the input file can only be in feet or meters.
  • Calculate and display all distances, converted to the desired output unit
  • Calculate and output the average and total distances walked.
  • All distances are shown with 2 places after the comma.
  • Units must be handled using enums. Recommended list is provided:
  • Conversion coefficients are also provided
Data files

There are 3 files provided for you to test your program with. Look under the paragraph Testing for what you should look out for when testing with each of the files!

Download the test files: https://blue.pri.ee/ttu/files/iax0584/andmefailid/2_2_converter_data.zip

Hints and tricks

There are a lot of units in play. Printing the correct one can be a bit tricky. There are two ideas to help you with:

Option 1: Create a function to print the unit and call it when you need it. Call it whenever you need to print the correct unit according to the task.

Option 2: Create a function that will return you a pointer to a string containing the unit.  Since it is written as a constant, it will be available in the memory after the function returns. It makes this really convenient to use it in print statements – e.g. printf("%.2f %s\n, distance, ReturnPrintableUnit(unit));

Testing

This program has a lot of ways it can go wrong. Make sure to test for all constraints!

Test 1 – 3: Invalid arguments

This test actually is comprised of 3 different tests, but all of them have wrong arguments passed to the program.

Test 4, 5: Problematic arguments

The next two tests are about parsing the arguments themselves and making sure that both the file exist and the unit is within the allowed list.

Test 6: empty file

The purpose of this test is to make sure that our program does not crash when there is no data to process.

task2_data1.txt contents:

And the results for this data file: 

Test 7 – 9: Conversion tests

In these tests we will go over all of the possible input and output unit conversions. We use a simple data file that allows us to easily observe if our answers are correct.

task2_data2.txt contents:

And the results for this data file:

Test 10: Different file

The emphasis for this test is to test your program with a different data file – just different length and units to make sure that nothing got passed us by.

task2_data3.txt contents:

And the results for this data file:

Sidenote: did you notice what we didn’t test for, but cold also be improtant?

Advanced task: comprehensive converter

The advanced task is based on lab task 2 and must be an extension of the base task. Disregard the concept of “walking” and consider the task as just a distance converter with statistics.

Requirements
  • Add support for additional distance units
    • Yard (yd)
    • Inch (in)
    • Decimeter (dm)
  • You must allow all 6 units to be both inputs and outputs for the program.
  • Design the conversions in a expandable fashion so that if we would add additional units, it wouldn’t require large overhauls of the code. The complexity of adding another unit must not expand the codebase exponentially!

Warning! Even though the expected  method for conversion is simple to implement and manage, it may increase the error of the final result due to rounding of the conversion coefficients outside of metric system. Be careful with tasks requiring high precision!

After the class, you should

  • Be able to work with enumerations
    • Declaring new enum types
    • Declaring variables based on enum types
    • Pass enums to functions, return enums from functions

Additional content

Note: most sites explaining enumerations can’t even follow the same coding style on a single page! Use the style guide provided by us!