当前位置: 代码迷 >> java >> 如何计算Java中文本文件中单词出现的次数?
  详细解决方案

如何计算Java中文本文件中单词出现的次数?

热度:61   发布时间:2023-08-02 11:14:05.0

所以我对Java很陌生,我正在开发一个代码,该代码应该读取用户输入的.txt文件,然后要求用户在.txt文件中搜索单词。 我无法弄清楚如何计算输入的单词出现在.txt文件中的次数。 相反,我所拥有的代码只计算代码显示的行数。任何人都可以帮我弄清楚如何让我的程序计算单词显示的次数而不是单词显示的行数进来? 谢谢! 这是代码:

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class TextSearch {

    public static void main(String[] args) throws FileNotFoundException {
        Scanner txt;
        File file = null;
        String Default = "/eng/home/tylorkun/workspace/09.1/src/Sample.txt";

        try {
            txt = new Scanner(System.in);
            System.out.print("Please enter the text file name or type  'Default' for a default file. ");
            file = new File(txt.nextLine());

            txt = new Scanner(file);

            while (txt.hasNextLine()) {
                String line = txt.nextLine();
                System.out.println(line);
            }
            txt.close();
        } catch (Exception ex) {
            ex.printStackTrace();
        }

        try {
            txt = new Scanner(file);
            Scanner in = new Scanner(System.in);
            in.nextLine();
            System.out.print("Please enter a string to search for. Please do not enter a string longer than 16 characters. ");
            String wordInput = in.nextLine();

            //If too long
            if (wordInput.length() > 16) {
                System.out.println("Please do not enter a string longer than 16 characters. Try again. ");
                wordInput = in.nextLine();
            }

            //Search
            int count = 0;
            while (txt.hasNextLine()) //Should txt be in? 
            {
                String line = txt.nextLine();
                count++;
                if (line.contains(wordInput)) //count > 0
                {
                    System.out.println("'" + wordInput + "' was found " + count + " times in this document. ");
                    break;
                }
            //else
                //{
                //    System.out.println("Word was not found. ");
                //}
            }
        } catch (FileNotFoundException e) {
            System.out.println("Word was not found. ");
        }
    } //main ends
} //TextSearch ends

您的问题是,无论该单词是否存在,您都在递增每行的计数。 此外,您没有代码可以计算每行多个匹配项。

相反,使用正则表达式搜索来查找匹配项,并为找到的每个匹配项增加计数:

//Search
int count = 0;
Pattern = Pattern.compile(wordInput, Pattern.LITERAL | Pattern.CASE_INSENSITIVE);
while(txt.hasNextLine()){
    Matcher m = pattern.matcher(txt.nextLine());

    // Loop through all matches
    while (m.find()) {
        count++;
    }
}

注意:不确定您使用的是什么,但如果您只需要该功能,则可以组合使用grepwc (wordcount)命令行实用程序。 请参阅 ,了解如何执行此操作。

由于这个词不必是独立的,你可以做一个有趣的for循环来计算你的单词在每一行中出现的次数。

public static void main(String[] args) throws Exception {
    String wordToSearch = "the";
    String data = "the their father them therefore then";
    int count = 0;
    for (int index = data.indexOf(wordToSearch); 
             index != -1; 
             index = data.indexOf(wordToSearch, index + 1)) {
        count++;
    }

    System.out.println(count);
}

结果:

6

因此,代码的搜索段可能如下所示:

//Search
int count = 0;
while (txt.hasNextLine()) 
{
    String line = txt.nextLine();
    for (int index = line.indexOf(wordInput); 
             index != -1; 
             index = line.indexOf(wordInput, index + 1)) {
        count++;
    }        
}

System.out.println(count);
  相关解决方案