Text Analyzer

Objectives:  Use sequential files, text extraction, parallel arrays, sorting, and searching.

Description:  This program requires that you maintain parallel arrays--one that stores every word that appears in a text excerpt, and another that stores the frequency with which each word appears.  

Deliverable 1 consists of steps 1-4.  (Dump the word array to a list box for grading purposes.)
Deliverable 2 consists of step 5-6.

This is manageable as a single deliverable, but is broken into two in order to keep you on schedule.

  1. First read a text excerpt from a file.  

  2. Next, extract individual words from the text.  

  3. Compare the extracted word to each word in the word array

  4. Sort the parallel arrays in alphabetical order, and then print the arrays.

  5. Sort the parallel arrays in ascending order by frequency , and then print the arrays.  

  6. Finally, read a list of words from another file, search the concordance for each word, and if it was included in the original file indicate how many times it appeared.  If it was not included, it should be listed as having appeared 0 times.

A demo of the Text Analyzer has been posted.  Use a text box to display the text being scanned.  (Set the MultiLine property to True and the ScrollBars property to Vertical.)  Use synchronized list boxes to display the results of the analysis.  Be sure to see the Clues page for coding hints.  Perform all sorting operations through the use of code rather than through form controls.

Your instructor may present a structure chart (module chart) in class, but an outline version follows:

    prepareText  (Steps 1 & 2)
            read text
            echo text in text box
            remove hyphens from text using Replace
            strip end-of-line characters from text using Replace
            split text into an array using Split
            process individual words to remove punctuation 
                    using IndexOf and SubString
            process individual words to remove leading and trailing spaces 
                    using Trim, TrimStart, or TrimEnd
            convert all words to lower-case using ToLower
    createWordList using Insertion Sort   (Step 3 & 4)
            find insertion point
            update word list
            update frequency list
            increment numWords

    printAlphaList        (Step 4) 

       searchforWords    (Step 6)
            read Search list
            binary search (Array.BinarySearch)
            print in list box
    sortListByFrequency        (Step 5)
            sort word list (Array.Sort)
            sort frequency list
            synchronize indexes
    printFreqList            (Step 5)
            synchronize alpha list
            synchronize freq list
            synchronize search list

The lines above which are the farthest indented do not necessarily represent entire modules, but may instead represent the low-level tasks that make up a module.  Click on the links above for references in the notes that you may find helpful.

Why do steps 5 and 6 as listed in the requirements above appear reversed in the module outline?  It had something to do with the requirements of the VB search algorithm.

Students must follow the modularization as detailed above unless alternative approaches are approved on an individual basis.  Use the methods noted above, such as insertion sort, and make your interface appear as much like the demo interface as possible.

Important Note:

As you can see form the module list above, this is a complex system.  However, most modules are fairly simple or build on code provided in class notes.  When developing your code, the difficulty will be greatly reduced if you use the approach known as incremental development.  In incremental development you design and code one module, such as prepareText, test it, and when it works properly set it aside.  Design and code the second module, such as createWordList, test it, and when it is working properly integrate it with the first module.  When they work properly together, set the combined module aside and begin work on module 3, printAlphaList.  Follow this procedure until all modules have been coded, tested, and integrated and the program is complete.  This enables you to view the problem as a series of subsystems, all of which are easily managed.  Proper planning and management will greatly reduce the apparent complexity of this problem.

Sample Interface