Skip to content

[FEATURE REQUEST] Add TopKFrequentWords algorithm in strings package #7298

@oleksii-tumanov

Description

@oleksii-tumanov

What would you like to Propose?

The implementation returns the k most frequent words from an input array.
https://en.wikipedia.org/wiki/Top-k_problem

Test Case (feature demonstration)
Input:

String[] words = {"the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"};
int k = 4;

Expected output:

["the", "is", "sunny", "day"]

Issue details

Algorithm details:

  • Input: array of words String[] words and integer k.
  • Output: list of the top k most frequent words.
  • Ordering:
    • Higher frequency first.
    • If frequencies are equal, lexicographically smaller word first.
  • Behavior:
    • If k == 0 or input array is empty, return empty list.
    • If k is greater than number of unique words, return all unique words in ranked order.
  • Validation:
    • Throws IllegalArgumentException when:
      • words is null
      • k < 0
      • any element in words is null

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions