CIS 220 - Program 9
String Methods


Objectives:  Use sequential files, text extraction, parallel arrays, sorting, and searching.

Description:  This program requires that you maintain parallel arrays--one that stores every word that appears in a text excerpt, and another that stores the frequency with which each word appears.  

This is manageable as a single deliverable, but is broken into two in order to keep you on schedule.


Deliverable 1

Deliverable 1 consists of steps 1-3.  (Dump the word array to a list box for grading purposes.)

  1. First read a text excerpt from a file.  

  2. Next, extract individual words from the text into an array and "sanitize" them.  

  3. Insert each word into an alphabetized array of words.

    • Compare the extracted word to each word in the word array

    • If it is already in the array, increment the corresponding element in the frequency array.  

    • If it is not found in the array, then add it. 

    • Print the arrays.

An outline version of the program structure follows:

Go_Click
    readInputFile (Step 1)
            read text
            echo text in text box
    prepareText  (Step 2)
            remove hyphens from text using Replace
            strip end-of-line characters from text using Replace
            convert all words to lower-case using ToLower
            split text into an array using Split
            process individual words to remove punctuation 
                    using IndexOf and SubString (or a version of Trim)
            process individual words to remove leading and trailing spaces 
                    using Trim, TrimStart, or TrimEnd
    createWordList using Insertion Sort   (Step 3)
            find insertion point
            update word list
            update frequency list
            increment numWords
            printAlphaList       

The lines above that are the farthest indented do not necessarily represent entire modules, but may instead represent the low-level tasks that make up a module.  Click on the links above for references in the notes that you may find helpful.

Students must follow the modularization as detailed above unless alternative approaches are approved on an individual basis.  Use the methods noted above, such as insertion sort.


Deliverable 2

Deliverable 2 consists of steps 4-5.

  1. Sort the parallel arrays in ascending order by frequency , and then print the arrays.  

  2. Finally, read a list of words from another file, search the concordance for each word, and if it was included in the original file indicate how many times it appeared.  If it was not included, it should be listed as having appeared 0 times.

An outline version of the program structure follows:

       searchforWords    (Step 5)
            read Search list
            binary search (Array.BinarySearch)
            print in list box
    sortListByFrequency        (Step 4)
            sort word list (Array.Sort)
            sort frequency list
            synchronize indexes
    printFreqList            (Step 4)
    synchronizeListScrolling
            synchronize alpha list
            synchronize freq list
            synchronize search list

The lines above that are the farthest indented do not necessarily represent entire modules, but may instead represent the low-level tasks that make up a module.  Click on the links above for references in the notes that you may find helpful.

Why do steps 4 and 5 as listed in the requirements above appear reversed in the module outline?  It has something to do with the requirements of the VB search algorithm. (Remember that a keyword search requires that the array be sorted in alphabetical order, so we have to search before we sort the array by frequency.)

Students must follow the modularization as detailed above unless alternative approaches are approved on an individual basis.  Use the methods noted above, such as insertion sort.


Sample structure chart

The following structure chart appears in the comments at the beginning of my solution program

' Structure chart appears below. Asterisk (*) indicates sub or function.
'
' Main
'    |
'    |- readInputFile
'    |          |
'    |          |- readFile*
'    |          |- echoText
'    |
'    |-prepareText
'    |          |
'    |          |- removeHyphen*
'    |          |- removeEOLN*
'    |          |- convertToLowerCase
'    |          |- splitIntoArray
'    |          |- removePunctuation*
'    |
'    |- createWordList
'    |          |
'    |          |- insertWord*
'    |          |- printAlphaList*
'    |
'    |- searchForWords
'    |          |
'    |          |- readSearchList*
'    |          |- performSearch*
'    |                      |
'    |                      |- binarySearch (Array.BinarySearch)
'    |                      |- updateSearchListboxes*
'    |
'    |- sortListByFrequency
'                |
'                |- sort (Array.Sort)
'                |- secondarySort*
'                |- printFreqList*


Overall Notes:

A demo of the Text Analyzer has been posted.  Use a text box to display the text being scanned.  (Set the MultiLine property to True and the ScrollBars property to Vertical.)  Use synchronized list boxes to display the results of the analysis.  Be sure to see the Clues page for coding hints.  Perform all sorting operations through the use of code rather than through form controls.

Important Note:

As you can see form the module list above, this is a complex system.  However, most modules are fairly simple or build on code provided in class notes.  When developing your code, the difficulty will be greatly reduced if you use the approach known as incremental development.  In incremental development you design and code one module, such as readInputFile, test it, and when it works properly set it aside.  Design and code the second module, like prepareText, test it, and when it is working properly integrate it with the first module.  When they work properly together, set the combined module aside and begin work on module 3, createWordList.  Follow this procedure until all modules have been coded, tested, and integrated and the program is complete.  This enables you to view the problem as a series of subsystems, all of which are easily managed.  Proper planning and management will greatly reduce the apparent complexity of this problem.


Sample Interface

Make your interface appear as much like the demo interface as possible.

Note that the sorted order is not quite what you may want.  For example, there are three words that occurred 10 times, but they are listed as 'we', 'have', and 'of' while you might prefer 'have', 'of', and 'we'.  That is because the Array.Sort method is an unstable sort.  If you prefer to have both lists correctly alphabetized you will need to use the secondary key sort algorithm in the class notes.