Service Mesh ServiceMesh and High Concurrency, High Availability Design Means


undertake the above The evolution of Internet architecture

1,ServiceMesh Architecture design
2,High Availability Design Means
3,High Concurrency Design Means

First, I recommend you a very useful tool for drawing software design drawings.


This picture is only to show the effect of drawing with gitmind, please ignore the content

Concise, clear and effective

text begins

ServiceMesh Architecture Design

Public business logic sinks

Business programs do not focus on communication components
pass jar import by package
Difficulty upgrading infrastructure components
Service component version upgrade
To allow business groups to jar replace in package
Affects the ability and speed of delivery of infrastructure teams
between multiple programming languages communication question
Every language needs to write a set of infrastructure high cost
Because the communication component is coupled with the application

The business R&D team and the infrastructure team must be physically decoupled

One set of infrastructure supports multilingual development
Apps are available in multiple languages
Infrastructure capabilities sink from the application into a separate process

service mesh

1,independent process
2,communication between services
3,Whether stateful services or stateless services will run in cloud native in the future(docker k8s)superior
4,Lightweight web proxy
5,Deployed with the application Transparent to the application

Service mesh architecture

sidecar Is an rpc Serve

application A->sidecarA->sidecarB->application B
Reverse analogy

Both communication protocols are tcp
data protocol pb 

Regardless of the language of the application only one set sidecar just fine

Service upgrades do not depend on business teams
The business team iterates fast

Why it must be deployed together

the same physical machine or k8s pod
based on tcp lightweight 
No need to consider load balancing\Retry\service registration\Configuration Center\routing

The earliest ServiceMesh

Case 1 - Baidu Space

is used pull model

Baidu Space Big amount of data Data consistency is not so high is an asynchronous architecture

Case 2 - Social IM

pc desktop version tcp Long connection

web can't use tcp Long connection
http Protocol simulation long polling

Similar to live difficulty Anti-simultaneous concurrency

Synchronous Architecture real-time news
The routing layer is IM Unique no need to pay attention

split horizontally

at the very beginning
Simple business few people business in one process good maintenance
Coarse granularity

Split vertically

common logic layer 

1,componentized jar Bag
2,Servicing Sink into independent services Provide compatible interface

continue to split horizontally

Internet core technology practice

1,High Availability Design Means
2,High concurrency design means (from architecture, code, algorithm)
3,Service statelessness (one of the means of high availability)
4,load balancing (Common Load Balancing Algorithms More generalized perspectives)
5,Service idempotency (depends on distributed locks)
a The request is repeated multiple times The service guarantees that the final result is completely consistent
b business Only one left in stock Multiple users snap up at the same time Guarantee that the product is not oversold
c User places an order User does not end before Repeat orders are not allowed
d message to mq posted multiple times Downstream consumption ensures message deduplication
6,Distributed transaction
7,Service downgrade Limiting fuse 
8,grayscale release
9,Service full link stress test

High Availability Design

Hardware always has a life cycle

x86 32cpu 128 Memory 10 Gigabit 1T hard disk about 50,000
Basically use 3 years 

Downtime is related to the number of servers The more the number of units, the greater the possibility of downtime

distributed CAP

stand-alone CA does not exist no network partition 

AP and CP More Multiple hard drives in the computer room network division machine still available

Software always has bugs

Evaluation dimension

  • Unscientific (because the flow has low and high peaks)
one year/the first quarter/January % of downtime
99 means 365*24*60*1%=88 Hour

1 9 is 90%
365*24*60*90%=1/10 880 Hours
  • science
Total requests affected by downtime/total requests

Redundancy deploy multiple copies

Services are deployed in different cabinets and different racks

Stateless (full peer-to-peer) for rapid scaling Elastic shrink

such as two machines Each is full session data not stateless

If a service hangs reboot session Gone not equal

load balancing

gateway->Business Logic Tier 1 and Business Logic Tier 2


If the business logic layer 1 hangs up
The load balancing component will know that 1 is down
kick 1
transfer to 2
restore 1

Asynchronous is a means of high concurrency and high availability

don't care much about returning results
not a critical route for the request

Core traffic synchronization non-core async

More than just service-level high availability
The data layer should also be highly available

Service real-time monitoring

real time monitoring implement logic:

Do it based on logs
5 record a log in seconds
Time-consuming on logging
write to local disk
pass flume data collection 
send to kafaka message queue 
Then spark real-time statistics real-time time-consuming 

Service tiering - reducing and avoiding service failures

How to stop online services seamlessly

Target:Downtime does no harm to users
i.e. if your request is accepted I 100%finished

rejected at the gateway level

Gateway hot swap function

Hot switch switching

For example, stop at 8 o'clock
8 All incoming requests are processed
8 request after point switch from 0 to 1 All requests are denied

When are requests before 8 o'clock guaranteed to be processed?

1,Inelegant: whether to log on each layer If the printing log layer is still processing
2,Elegance: front-end requests has a timeout 
If the timeout is 5 seconds
Then at 8:05 time out

Shutdown stops the service from the upper layer to the lower layer

No thermal switch

The firewall configuration of the machine can only go in and out at a certain point in time

request process

High Concurrency Design

Reduce latency
Increase throughput
Put the system in a reasonable state

space for time

cache database data
Because the database is more expensive than the memory

time for space

network transmission http communication gz compression Decompression will consume cpu time

Data with few changes

1,app Shopping categories on the page Level 1 cell phone secondary computer
change less You can't pull it every time you log in
Judging by the version number which are updated Download only if updated

2,Number of friends Pull once every time you log in change infrequently 
List data has a version number server Both the client and the client put one
Check if the version number has been updated Pull once if there is one

Which requests are taking a lot of time

1,If a service cluster has 40,000 QPS
For accounting for 90%The top 5 interfaces for traffic are optimized

leftover qps also cannot be ignored If the query is slow neither

2,See how many calls a service has made rpc ask 
If data, algorithms and other cores are synchronized

Asynchronous execution of non-core processes
Split into separate modules to execute such as message queues 

3,One logic calls multiple RPC interface
No data dependencies between interfaces Consider calling in parallel

optimization level

code level

1,don't loop through rpc ask
   Instead, the batch interface should be called to assemble the data

2,Avoid generating too many useless objects such as using isDebugEnabled() should be directly log.debug()

3,ArrayList HashMap Is the initial capacity setting reasonable?
Expansion is expensive
For example, directly initialize the size of 1 million according to the actual situation of the business volume Although it consumes some memory But performance can be guaranteed

4,rpc Data reuse from interface query

5,make a copy of the data Modify directly
Read more and write less use CopyOnWriteArrayList

6,StringBuilder of capacity in the case of pre-allocation performance ratio String Increase about 15 times

7,Is the data initialized correctly Shared data globally hungry man mode 
   That is, it is initialized before user access


1,status value If the length is within 255 use unsigned tinyint ; ip use int instead of varchar

2,use enum the scene use tinyint replace because enum Expansion needs to be changed

3,prohibit select * will cost io,Memory, cpu,network

4,Analyze query scenarios to build appropriate indexes
Analyze field selectivity, index length, pair length varchar Use prefix index

5,field Not Null

allow Null Fields require additional storage space to process Null
and difficult to optimize

The purpose is to bring down the server CPU usage, IO Traffic, memory usage, network consumption, reduced response time

locality principle

This is a two-dimensional array is a one-dimensional array in memory

first paragraph time consuming 140ms
second paragraph time consuming 2700ms

The closer to the CUP the faster

1,The speed is getting higher and higher Memory->L3->L2->L1 multilevel cache

2,is a large one-dimensional array in memory 
2D array arranged in memory row by row
store first a[0]Row put it again a[1]Row

3,Traversing by row: the principle of locality Cache Hit(High cache hit rate)

4,Traversal by column: the array elements of the next column and the previous column are not contiguous in memory 
   likely to cause Cache Miss(cache miss)

5,CPU Need to load data into memory faster CPU L1 Cache The speed is much reduced
(main memory 100ns L1 Cache 0.5ns)

6,Use cache if you can use cache Whether it is a local cache or a distributed cache

7,high frequency visit The timeliness is not high suitable for caching for example advertising space
   high timeliness Cache coherency issues need to be considered Not suitable for caching Compare transaction data

Code logic needs to adapt to scenarios where data changes

1,explain:SQL execution plan
2,prossible_keys idx_addtime 
key null means no index

Once the amount of query data exceeds 30% no index full table scan

Report Query

Only calculate the incremental data and merge the previous calculation results

Concurrency and lock optimization

based on CAS LockFee(Read without lock write lock)Compare mentex(Both read and write are locked)better performance

Case 1 - E-commerce spike system

Data Hierarchical Check
The upper layer tries to filter invalid requests
can be inexact filtering
Layer-by-layer current limiting The last layer does data consistency check deducted inventory

funnel pattern

1,static data Html js css static files put CDN cache to the client(APP/browser)

2,Non-real-time dynamic data Cached in a location close to the user's access link (product title, product description, whether the user is eligible for the seckill, whether the seckill has ended) 

3,Real-time data: marketing data (red envelopes, discounts) commodity inventory filter out users

How to make sure you don't oversell

DB transaction guarantees consistency

Case 2-Feed System

Hotspot data is cached where the calling link is closer to the user

1,Memory stores the most active data

2,L1 Small cache capacity is responsible for resisting the hottest data
L2 Cache consideration goal is capacity Cache a wider range of data
general user timeline
High hotspot data is cached separately For example, setting a whitelist Big V User data is cached separately

3,feed first 3 pages 97% The first few pages of data are cached as hotspot data to L1 cache

4,The business logic layer often also opens some caches to store hot data such as big V of id

push mode

like push That only pushes active users
like 10,000 users in the business logic layer Each batch of 100 users needs to push 
100 parallel push

Optimize strategically

Active users first

How to distinguish active users?

Active user list length 1 million
If the user is online, write it in the list offline delete

Weibo data storage solution

1,Pika Key-Value curing storage(persistent storage)
2,object storage Ceph\FastDFS

WeChat Moments are a combination of push and pull

1,Find There is a message reminder is to push
2,Click to open the circle of friends is to pull

Weibo latest data display logic

Say you have 500 friends Take 100 stats of each person A total of 50,000 pieces of data In the business logic layer according to timeline reverse order

There is essentially no difference between websocket and long polling

websocket The bottom layer is also long polling
web can't be used tcp protocol
websocket exist http On the basis of encapsulating long polling


Will continue to share service mesh ServiceMesh relevant practice 

If you find it useful click and see😄

Posted by Dilbert137 on Fri, 03 Jun 2022 07:39:12 +0530