Algorithm Basics - Discretization

discretization

This refers specifically to the discretization of integers: a sequence with a relatively scattered distribution is mapped to a centralized sequence (the span of the numbers in the sequence is relatively large but sparse), such as:

a[]  1    3    100  2000 500000
     ↓    ↓    ↓    ↓    ↓
b[]  0    1    2    3    4

The following problems occur:

  1. There may be duplicate elements in a[] (de-duplication)
  2. How to calculate the value of a[] after discretization

Solution:

vector<int> alls;//Store all values ​​to be discretized
sort(alls.begin(),alls.end())//sort all values
alls.erase(unique(alls.begin(),alls.end()),alls.end())//remove duplicate elements

//Divide to find the discretized value corresponding to x
int find(int x)
{
    int i=0, r=alls.size()-1;
    while(l<r)
    {
        int mid=l+r>>1;
        if(alls[mid]>=x) r=mid;
        else l=mid+1;
    }
    return r+1;
}

interval sum

Topic description

Suppose there is an infinite number line, and the number on each coordinate on the number line is 0.

Now, we first perform n operations, each operation adding c to the number at a certain position x.

Next, make m queries, each query contains two integers l and r, and you need to find the sum of all numbers in the interval [l,r].

input format

The first line contains two integers n and m.

The next n lines each contain two integers x and c.

Next m lines, each containing two integers l and r.

output format

There are m lines in total, and each line outputs the sum of numbers in the interval sought in the query.

data range

−109≤x≤109,
1≤n,m≤105,
−109≤l≤r≤109,
−10000≤c≤10000

input sample

3 3
1 2
3 6
7 5
1 3
4 6
7 8

Sample output

8
0
5

answer

The range of numbers is [−109, 109], so if you use violence, it will explode directly. Since the number of operations is 1e5, in fact, there are only 1e5 numbers on the number line that need to be operated at most, but the range of the numbers is scattered in the entire number line range [−109, 109], so consider first performing the operation on the data given by the operation. Discretization processing, which can greatly reduce the range that needs to be traversed. Then add the c added in each operation to the processed data, and use the prefix sum to complete the query after all processing is completed.

So now there are three problems to solve:

  1. How to achieve discretization
  2. How to deal with the coincidence of elements in the original array during the discretization process
  3. How to locate the position of the original element in the new discretized array

First deal with the first problem: in the process of discretization, we often use vector dynamic arrays to store data. We store the data to be processed in the add dynamic array, each group of data is stored in the form of a pair, first stores the number x that needs to be operated, and the number c to be added to second. The data to be queried is stored in the query dynamic array in the same form, the first stores the left endpoint l of the interval, and the second stores the right endpoint r of the interval. Use int type dynamic array to store all the numbers to be discretized. Put all x, l, r into the all dynamic array on input for the next discretization. Finally, the a array is used to represent the result after all processing, and the s array is used to represent its prefix sum.

Next, deal with the second problem: delete the duplicate elements in the all array, using the unique and erase functions.

The last question: We use the binary search method to find the position of the number x to be processed in the all array, and assign its value to the result array a. (Because the all dynamic array has been re-operated, the index of the last a array is the same as that of all.)

Finally, the prefix and operation are completed.

code

#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
const int N = 3e5 + 7;
int a[N], s[N];
vector<int> all;
vector<pair<int, int>> add, query;
int find(int x)
{
	int l = 0, r = all.size() - 1;
	while (l < r)
	{
		int mid = l + r >> 1;
		if (all[mid] >= x) r = mid;
		else l = mid + 1;
	}
	return r + 1;
}
int main(void)
{
	int n, m;
	cin >> n >> m;
	while (n--)
	{
		int x, c;
		cin >> x >> c;
		add.push_back({ x, c });
		all.push_back(x);
	}
	while (m--)
	{
		int l, r;
		cin >> l >> r;
		query.push_back({ l, r });
		all.push_back(l);
		all.push_back(r);
	}
	//deduplication
	sort(all.begin(), all.end());
	all.erase(unique(all.begin(), all.end()), all.end());
	//insert
	for (auto i : add) a[find(i.first)] += i.second;
	//Preprocessing prefix and
	for (int i = 1; i <= all.size(); ++i) s[i] = s[i - 1] + a[i];
	//handle inquiries
	for (auto i : query) cout << s[find(i.second)] - s[find(i.first) - 1] << endl;
	return 0;
}

TIP

The unique() function is a deduplication function. The function of unique in STL is to remove adjacent duplicate elements (retain only one).

There is another feature that is easy to ignore: it does not really delete the duplicate elements, but moves the duplicate elements to the back, still saves them in the original array, and finally returns the address of the last element after deduplication.

It is a function in c++, so #include<iostream> and #include<algorithm> should be added to the header file. The specific usage is as follows:

Because unique removes adjacent duplicate elements, it is generally sorted before use.

Note: The unique function does not really delete elements, so it is generally used in conjunction with the erase member function or the resize member function.

Tags: Algorithm C++ data structure

Posted by grumpy on Fri, 04 Nov 2022 02:38:59 +0530