Rubriigiarhiiv: pretest

PRETEST_PR2_25

Task

Your task is to create a program that analyzes written text (i.e. books in public domain)  and compares it to a dictionary containing correctly written words.

Generic requirements

  • Program must be written in either C90, C99, C11 or C17 standard. On agreement, you can also use C23. When electing for a newer version of the standard than C99, you should also be using the features of the newer standard.
  • You are permitted to use GCC extensions, C POSIX and GNU C. You are free to make your case for additional libraries, but they must be agreed upon before use.
  • The program must compile under either
    • Ubuntu 24.04 with GCC-13 (based on the recommended software)
    • OpenSUSE Linux 15 SP5 with GCC-12 (lab configuration)
  • Solution must be divided into source and header files as appropriate
    • You are expected to provide a Makefile to compile the solution with a make all  recipe, that compiles the entire solution from scratch
  • The code is expected to conform to practices widely accepted for C programs, including style, code division, commenting etc.
  • You are allowed to minimally use global variables where appropriate, but they must be limited to file scope and not harm code reuse practices (e.g. keeping a global struct of settings for logging). This does NOT include a pointer to your data structure to skip on passing data to functions.
  • The user experience for the program must be clear.
    • Any errors  that occur must be described to the user in a clear manner. The program is not allowed to “just close” without informing the user what happened.
    • Successful operations must also be confirmed (E.g. successfully read dictionary containing 120 391 words)
  • The length of the dictionary file must not cause significant impacts on the performance of the application
    •  Ideally the size of the dictionary should not affect searching at all and can only minimally affect loading and unloading the application.
    • Thus, You are recommended to implement a Trie data structure, however you can also look for alternative faster data structures and algorithms. There are faster alternatives out there.
  • Program must manage its memory dynamically
    • Do not implement any arbitrary length for word  length or number of words
    • The program must not excessively over-allocate memory. Memory usage should be either exactly as needed or close to what’s needed.
    • Memory must be deallocated before exit. Deallocation will be checked using valgrind . 0 bytes in use is expected at exit.
  • You are expected to create and use a custom enum  type. If You are not able to find a suitable use case for creating a new type within the specified task requirements, you will need to add a feature to the task of your own choosing. Some ideas for that might work:
    • Error handling with specified error cases
    • Multiple output file types (CSV, TSV, space-separated file)
    • Logging that uses logging levels (info, warning, error)

Task requirements

  • The program must offer a basic interactive experience to perform tasks (e.g. a menu or step-by-step input prompts).
    • The program can also offer command line options, if the author so chooses, but it’s not a requirement. If command line arguments are used, the entered arguments should skip the relevant prompts and be documented in the readme.
  • Program must provide the following features
    • Read a dictionary (reference list of correctly written words)
    • Analyze a plain text document (book, short story, paragraph of text)
    • Provide analysis reports of the text document
    • User must be able to choose both the names of the dictionary and (/or) the text document to analyze
  • The user must be able to get the following reports (depending on their selection)
    • The list of words in the document in alphabetical order. Output must include both the word and number of occurrences in the text.
    • The list of unrecognized words (not present in the dictionary), ordered by the number of times they were used in the document.
  • The result will be written into a text file that is formatted as a CSV file with a header row. The name of the results file must contain a timestamp when the result file was created and must be stored in an appropriate subdirectory of the program

Data

Your programs are expected to work with ASCII-encoded text files, however you are free to test and play around with other encodings, such as UTF-8 or UTF-16.

Dictionary files are plain-text files where each word is on a separate line.

Documents are plain-text files that contain written sentences that can formulate a longer story. They may include double empty lines, punctuation marks, upper and lower case words. Words containing numbers can be ignore (e.g. 1st, 6th). Words written using different capitalization must be identified as the same word (e.g. “HELLO!!!!”, “Hello!” and “hello”  all contain the same word).

Useful links

Note: The Estonian dictionary and many texts in Gutenberg are provided with UTF-8 encoding. Your program will be tested only using ASCII strings, but they are a good source of realistic data.

Word lists for English: https://github.com/dwyl/english-words

Word list for Estonian: https://github.com/binoternary/diceware-ee

Public domain books as plain text files: https://www.gutenberg.org

Lorem Ipsum generator: https://www.lipsum.com

Submitting

The preferred method of submitting is by providing a Private Git repository that contains all the source files, Makefile and data files to test the solution (at least 2 dictionary files and 2 written text files, one of which should be a simpler case to test correctness and one containing large files to test for performance). A README file should describe the project and how to build it, including any other necessary details.

You are also expected to provide some examples (screenshots) with supporting text explanations of your program running in a format of your choosing – e.g. It can be written into the Wiki section, provided as a part of the README.md or a secondary markdown file or included as a pdf in the repository.

It’s preferred to use our department GitLab instance https://gitlab.pld.ttu.ee, accessible using your Uni-ID. Create a private project and add your instructor to the project (handle: risto.heinsar ).

Once the task is completed, notify your instructor in Mattermost.

NB! If you are not comfortable using Git, you can also provide the solution as a .zip file through Mattermost.

Task extension to homework 3

Note, that this task is also extendable to Homework 3. All the features included in the extension are additive to this task and agreed upon separately, including with a separate agreed upon deadline. This will include adding additional features, as well as implementing a local SQLite database. To extend the task, ask the instructor separately about the requirements.

pretest_23

Task

Your task is to generate a F1 racing event results. The results will be displayed on the screen with an option to store them to a file.

Generic requirements

  • Program must be written in either C90, C99 or C11 standard. GNU extensions are permitted. The necessary C standard version must be specified in the header of the code file.
  • You can only use standard libraries and the GNU extensions.
  • The program must compile under either
    • Ubuntu 22.04 with GCC-12 (based on the recommended software)
    • OpenSUSE Linux 15 SP4 with GCC-11 (lab comfiguration)
  • All arrays must be created with static sizes
    • Arrays must be large enough that the program can operate within the specified limitations. Specification is listed under task specifications
    • Program does not need to work with larger inputs, but also must not crash when those are entered. The decision on how to handle improper input is left to the developer.
  • If the program uses dynamic memory allocation, all of the best practices regarding it must be followed. All dynamically allocated memory must be freed before exit. Note that dynamic memory is not a part of Programming 1, so it is not necessary to use it.
  • Program must be divided into functions.
  • Use of global variables is forbidden
  • Use of GOTO statements is forbidden
  • The user experience for the program must be clear. Any errors  that occur must be described to the user in a clear manner. The program is not allowed to “just close” without informing the user what happened.

Program flow general description

  1. The program will read the names of the pilots from an input file and pick the participating pilots.
  2. The number of laps will be specified by the user inside of the program by inputting the number..
  3. All lap times are generated as random numbers.
  4. Total time is calculated for each pilot who participated.
  5. Pilots, their lap times and the total time is shown on the screen as a table.
  6. If an output file name was specified as a command line argument, all results are also written to the output file with that name as a CSV file.

Task specific requirements

  • The program will take either 1 or 2 arguments from the command line
    • Execution pattern:   ./binary_name number_of_pilots [name_of_output_file]
    • First argument is the number of pilots participating. This argument is mandatory.
    • Second argument is the name of the output file. This argument is optional. If this argument is given, an output file that name with the name will be created. Results will be stored into that file.
  • The program must support up to 10 participating pilots. The maximum length for the name name of a participant is 32 characters. The number of laps can range from 5 – 15.
  • The first n pilots will be chosen to participate in the race, based on the order that they are given in the input file. n is the number of pilots, given from the command line. You must validate that you have enough pilots given in the data file.
  • Each lap time will be a randomly generated number between 70 and 125 seconds.
  • Each pilot will have a 3% probability that they have to abandon the race during any lap. If a pilot needs to abandon, they will no longer drive the following laps. The decision (dice roll) will be redone on each lap.
  • For pilots who abandon the race, a -  symbol will be used for the lap that they stopped on and all the following laps. Previous lap times must be present. The total time for that pilot will be presented as DNF (did not finish).

Data file requirements

Input file

The names of pilots will be read by the program from a file called pilots.dat.  The data file must be in the same directory as the binary fine. Name of the file will be hardcoded into the program.

The data file is formatted as an ASCII text file, containing all pilots with the format  initial.last_name.  Names are separated by a space. The number of names can range from 5 – 10.

Names for the input file

Output file

The program will generate a CSV file. File must contain a header with the column names. As data, you need to include the name, all lap times and total time. The output file will be created in the same directory as the binary file.

Examples of expected output

Example  1

Example 2:

Example 3: