Table of Contents
Problem Statement
The problem “Minimum sum of squares of character counts in a given string after removing k characters” states that you are given a string containing lower case characters only. You are allowed to remove k characters from the string such that in the remaining string the sum of squares of frequency of each character is minimum. Find that minimum value and print it.
Examples
str = "aabcd" k = 2
3
str = "aabbcc" k = 2
6
str = "aaaaabbxd" k = 1
22
Naive Approach
It is always optimal to remove the character with maximum frequency.
- In the given original string, count the frequency of all the characters and store it in freq array.
- Sort the freq array in decreasing order.
- Repeat step 4 ‘k’ times.
- Reduce the frequency of first element in the freq array by 1 and sort the freq array again.
- The minimum value is the sum of squares of values present in the freq array.
Time Complexity = O(n + k * c)
Space Complexity = O(1)
where c is constant which is equals to the number of unique characters in the string.
Code to find Minimum sum of squares of character counts in a given string after removing k characters
Java Code
import java.util.Arrays; import java.util.Collections; class MinimumSumOfSquaresOfCharacterCountsInAGivenStringAfterRemovingKCharacters { private static int minSumSquares(String str, int k) { char sArr[] = str.toCharArray(); int minSum = 0; // create a freq array, with initial value as 0 Integer freq[] = new Integer[26]; for (int i = 0; i < 26; i++) { freq[i] = 0; } // traverse in the original string and calculate the frequency of every character for (int i = 0; i < sArr.length; i++) { freq[sArr[i] - 'a']++; } // sort the array in descending order Arrays.sort(freq, Collections.reverseOrder()); // remove k characters for (int i = 0; i < k; i++) { // remove the character with maximum freq and reduce the freq freq[0]--; // sort the array in descending order Arrays.sort(freq, Collections.reverseOrder()); } // minSum is the sum of squares of all the elements in freq array for (int i = 0; i < 26; i++) { minSum += (freq[i] * freq[i]); } return minSum; } public static void main(String[] args) { // Example 1 String str1 = "aabcd"; int k1 = 2; System.out.println(minSumSquares(str1, k1)); // Example 2 String str2 = "aabbcc"; int k2 = 2; System.out.println(minSumSquares(str2, k2)); // Example 3 String str3 = "aaaaabbxd"; int k3 = 1; System.out.println(minSumSquares(str3, k3)); } }
3 6 22
C++ Code
#include<bits/stdc++.h> using namespace std; int minSumSquares(string str, int k) { int minSum = 0; // create a freq array, with initial value as 0 int freq[26]; for (int i = 0; i < 26; i++) { freq[i] = 0; } // traverse in the original string and calculate the frequency of every character for (int i = 0; i < str.length(); i++) { freq[str[i] - 'a']++; } // sort the array in descending order sort(freq, freq + 26, greater<int>()); // remove k characters for (int i = 0; i < k; i++) { // remove the character with maximum freq and reduce the freq freq[0]--; // sort the array in descending order sort(freq, freq + 26, greater<int>()); } // minSum is the sum of squares of all the elements in freq array for (int i = 0; i < 26; i++) { minSum += (freq[i] * freq[i]); } return minSum; } int main() { // Example 1 string str1 = "aabcd"; int k1 = 2; cout<<minSumSquares(str1, k1)<<endl; // Example 2 string str2 = "aabbcc"; int k2 = 2; cout<<minSumSquares(str2, k2)<<endl; // Example 3 string str3 = "aaaaabbxd"; int k3 = 1; cout<<minSumSquares(str3, k3)<<endl; return 0; }
3 6 22
Optimal Approach
The goal of removing the character with maximum frequency can be optimally achieved using a Priority Queue.
- Create a Priority Queue q, which is a max heap.
- In the original string, count the frequency of all the characters and make a priority queue of them.
- Repeat step 4 ‘k’ times.
- Remove the top of priority queue, reduce it by 1 and if it is not zero push it back to the priority queue.
- The minimum value is the sum of squares of values present in the priority queue.
Time Complexity = O(n + k * log(c))
Space Complexity = O(c)
where c is constant which is equals to the number of unique characters in the string. Since we are removing k elements from the priority queue. Insertion and removal from priority queue take O(log N) time. And thus a factor logN came. The space complexity is because we created a frequency array but since the size of frequency array is independent of the input. And thus space complexity is constant.
Code to find Minimum sum of squares of character counts in a given string after removing k characters
Java code
import java.util.Collections; import java.util.PriorityQueue; class MinimumSumOfSquaresOfCharacterCountsInAGivenStringAfterRemovingKCharacters { private static int minSumSquares(String str, int k) { char[] sArr = str.toCharArray(); // create a frequency array int freq[] = new int[26]; // traverse in the original string and count the frequency of each character for (int i = 0; i < sArr.length; i++) { freq[sArr[i] - 'a']++; } // create a priority queue and store the frequency of each character PriorityQueue<Integer> priorityQueue = new PriorityQueue<>(Collections.reverseOrder()); for (int i = 0; i < 26; i++) { if (freq[i] != 0) priorityQueue.add(freq[i]); } // remove k characters for (int i = 0; i < k; i++) { // remove the character with maximum frequency int curr = priorityQueue.poll(); // reduce the frequency curr--; if (curr != 0) priorityQueue.add(curr); } int minSum = 0; // min sum is the sum of squares of elements in priority queue for (Integer i : priorityQueue) { minSum += (i * i); } return minSum; } public static void main(String[] args) { // Example 1 String str1 = "aabcd"; int k1 = 2; System.out.println(minSumSquares(str1, k1)); // Example 2 String str2 = "aabbcc"; int k2 = 2; System.out.println(minSumSquares(str2, k2)); // Example 3 String str3 = "aaaaabbxd"; int k3 = 1; System.out.println(minSumSquares(str3, k3)); } }
3 6 22
C++ Code
#include<bits/stdc++.h> using namespace std; int minSumSquares(string str, int k) { int minSum = 0; // create a freq array, with initial value as 0 int freq[26]; for (int i = 0; i < 26; i++) { freq[i] = 0; } // traverse in the original string and calculate the frequency of every character for (int i = 0; i < str.length(); i++) { freq[str[i] - 'a']++; } // create a priority queue and store the frequency of each character priority_queue<int> pq; for (int i = 0; i < 26; i++) { if (freq[i] != 0) { pq.push(freq[i]); } } // remove k characters for (int i = 0; i < k; i++) { // remove the character with maximum frequency int curr = pq.top(); pq.pop(); // reduce the frequency curr--; if (curr != 0) { pq.push(curr); } } // min sum is the sum of squares of elements in priority queue while (!pq.empty()) { minSum += (pq.top() * pq.top()); pq.pop(); } return minSum; } int main() { // Example 1 string str1 = "aabcd"; int k1 = 2; cout<<minSumSquares(str1, k1)<<endl; // Example 2 string str2 = "aabbcc"; int k2 = 2; cout<<minSumSquares(str2, k2)<<endl; // Example 3 string str3 = "aaaaabbxd"; int k3 = 1; cout<<minSumSquares(str3, k3)<<endl; return 0; }
3 6 22