Android WebSocket State Management Framework - WebSocketGo

Reading this article takes Integer.MAX_VALUE minutes.

story background

The main business of the company where the author is located is smart home, and the author is responsible for the development of Android App in the company. About smart home, it is estimated that 80 to 90% of children's shoes have heard of it, but it is estimated that the majority of people who really understand or use it. This article does not talk about industry prospects, only technology.

In order to make it easier for everyone to understand the background of the story, by the way, let's learn about smart home, and if you want to go straight to the WebSocket theme, you can go directly to the second chapter.

Smart home is a typical application scenario of the Internet of Things. What is the Internet of Things? It literally means connecting many objects to form a network. The English is Internet of things (IoT). As small as your mobile phone and Bluetooth headset, as large as every corner of a city, the ultimate form is "Internet of Everything". (Why do you suddenly think of Wanfo Chaozong -_-!)

IoT actually doesn't care about the protocol of the network, nor what it is connected to. As far as this form itself is concerned, it is IoT. In fact, this concept has been around for a long time. When I was in college, I was not afraid of revealing my age. It is said that I heard the word Internet of Things in 2009-10. Do you still remember the GPRS intelligent meter reading system in the experimental class? You may not think that it is very tall. Isn’t it just inserting a SIM card into the water meter and sending the value to the client regularly, so that you are no longer afraid of being visited by someone to check the water meter~ No Wrong, this is also a manifestation of IoT.

But for so many years, the Internet of Things has been tepid. As for why it has been proposed again in recent years, smart home plays a very important role in it. In addition, the fire of AI has also played a role in fueling the flames, and any product must be added with the word "intelligent". So there is AI + IoT, which is AIOT. I won't go into details in detail, and I can write a separate article to introduce the Internet of Things in detail in the future.

Speaking of smart home, smart home is a family-based Internet of Things. Simply put, you can control the lights, sockets, monitors, electrical appliances, etc.

Words count as a ball, a diagram is a thousand worries, and the architecture diagram is above

To briefly explain, the role of the gateway is to connect all the devices in the house. The gateway and the device generally do not use HTTP (except for individual items), but use near-field communication protocols such as zigbee and lora (or directly use wired methods). More stable), because the power consumption is low and it is relatively power-saving, you think that if you install dozens of switch panels in your home, you can save a dozen or twenty dollars in electricity bills every month. Of course, the most important reason is because the program is relatively mature. Therefore, before using this type of device, you need to have a "networking" operation to group the device to the gateway. If disconnected, the gateway considers the device offline.

The gateway will report the status of the device to the cloud platform, and the cloud platform will send the status to the client. Again, client control of the device is a reverse process. Of course, if 5G is popularized in the future, the device will directly connect to the cloud platform through the 5G network, because 5G has low power consumption, low latency, and fast speed, so there is no need for a gateway.

The overall architecture is relatively simple. Of course, there are a lot of complex logics in the middle, such as equipment status, personnel, permissions, house management, etc., which are not concerned here.

Push is involved when the server sends the device status. Because the smart home has the concept of scenario mode, such as going home mode, executing a request and turning on more than a dozen lights, it is not realistic to judge the status by the return value of the request, and you can only rely on push.

Due to the particularity of the business, the push of smart home needs to be highly real-time. For example, if the user turns on the light (whether it is turned on on the app or directly), the status of the light on the app needs to be turned on immediately. If you wait 3 or 5 seconds before the state changes, such a user experience is not good.

We first used a certain optical push, and found that the delay was serious, and sometimes it took more than ten seconds to receive the push. After all, the applicable scenarios are different after all. So I decided to build a long connection by myself and push it. After selection, we decided to use the protagonist of this article, WebSocket.

Existing Issues with WebSocket State Management

There are many ready-made WebSocket connection libraries on the market, the more famous ones are Java-WebSocket, and OkHttp also comes with WebSocket support.

At first, OkHttp was used directly because OkHttp was already connected to the project. The way to use it is very simple. Children who are familiar with OkHttp should understand it very well.

OkHttpClient client = OkHttpClient.Builder().build();
Request request = new Request.Builder().build();
client.newWebSocket(request, new WebSocketListener() {
    public void onOpen(okhttp3.WebSocket webSocket, Response response) {}

    public void onMessage(okhttp3.WebSocket webSocket, String text) {}

    public void onClosed(okhttp3.WebSocket webSocket, int code, String reason) {}

    public void onFailure(okhttp3.WebSocket webSocket, Throwable t, Response response) {}
copy code 

It is convenient to call, and the callback status is also very clear. Java-Websocket is almost similar, but in general there are the following problems:

1. Heartbeat mechanism

Both OkHttp and Java-WebSocket have a heartbeat mechanism. However, the heartbeat interval of OkHttp, that is, pingInterval, is fixed when the client is created, and it does not support midway adjustment. (Jake also replied to related questions on github, which means that the caller does not need to pay attention to these. But there is often such a demand 😢😢😢)

For example, when the application is in the foreground, the heartbeat may be a little more frequent, but when the application is in the background, for power saving optimization, the heartbeat interval can be set a little longer. If you need to adjust the heartbeat, you need to create a new client, disconnect the old one, and reconnect. There are two strategies:

  • Break the old first, then connect the new. If there is no caching strategy for messages on the server side, messages may be missed between disconnections and new connections.
  • Connect the new first, then break the old. This will have two connections connected to the server in an instant, and the server may have an exception to the status of the message.

The above two strategies require the client and the server to agree first, and the server does additional processing, which increases the cost of research and development, and also increases the probability of errors.

2. Disconnect and reconnect

Usually the connection is broken in the following situations:

  • The client actively disconnects (no need to reconnect)
  • The server is actively disconnected (no need to reconnect)
  • The client is not active due to network reasons, such as the mobile phone disconnecting from the network (reconnecting is required)
  • Carrier disconnection due to idle connection (reconnect required)

For reconnection, another issue that needs to be considered is the reconnection time interval. If the mobile phone network is disconnected, it will also fail to reconnect continuously. Therefore, you need to formulate a relationship between the number of reconnections and the time interval.

In short, the logic and strategy of reconnection need to be maintained by ourselves.

3. Multithreading

The state callback in OkHttp occurs in the child thread created by OkHttp, and we need to initiate a reconnection according to the state. At this time, if our application switches between the front and the back and needs to create a new connection, various connections (reconnections) are intertwined, it will be a mess. Issues to consider are:

  • When multiple connections are initiated at the same time, it is necessary to wait for the execution of the previous connection request to complete, and then initiate the next connection request according to the request. If the previous connection is connected, the next waiting connection request can be skipped directly.
  • Connection requests occur in different threads, state synchronization


Because OkHttp is used in the author's project, the initial solutions are all for OkHttp. The general idea is as follows:

1. Heartbeat mechanism

1.1 OkHttp WebSocket Heartbeat Analysis

As mentioned earlier, OkHttp does not support dynamic adjustment of the heartbeat, so how does OkHttp maintain the heartbeat? Let's analyze its source code:

To analyze the source code, we start by calling the entry client.newWebSocket:


 * Internally use the RealWebSocket class to initiate a connection
@Override public WebSocket newWebSocket(Request request, WebSocketListener listener) {
    // Note the pingInterval here, so it is fixed when RealWebSocket is created and cannot be modified.
    RealWebSocket webSocket = new RealWebSocket(request, listener, new Random(), pingInterval);
    return webSocket;
copy code


public RealWebSocket(Request request, WebSocketListener listener, Random random, long pingIntervalMillis) {
    this.originalRequest = request;
    this.listener = listener;
    this.random = random;
    this.pingIntervalMillis = pingIntervalMillis;
    // ...
    // The writerRunnable here is used to send messages to the server, which will be detailed later
    this.writerRunnable = new Runnable() {
      @Override public void run() {
        try {
          while (writeOneFrame()) {
        } catch (IOException e) {
          failWebSocket(e, null);
  public void connect(OkHttpClient client) {
    // WebSocket protocol related header s
    client = client.newBuilder()
    final Request request = originalRequest.newBuilder()
        .header("Upgrade", "websocket")
        .header("Connection", "Upgrade")
        .header("Sec-WebSocket-Key", key)
        .header("Sec-WebSocket-Version", "13")
    call = Internal.instance.newWebSocketCall(client, request);
    call.enqueue(new Callback() {
      @Override public void onResponse(Call call, Response response) {
        try {
        } catch (ProtocolException e) {
          failWebSocket(e, response);

        // Promote the HTTP streams into web socket streams.
        StreamAllocation streamAllocation = Internal.instance.streamAllocation(call);
        streamAllocation.noNewStreams(); // Prevent connection pooling!
        Streams streams = streamAllocation.connection().newWebSocketStreams(streamAllocation);

        // Process all web socket messages.
        try {
          // The connection is successful, and the onOpen of the listener is called
          listener.onOpen(RealWebSocket.this, response);
          String name = "OkHttp WebSocket " + request.url().redact();
          // Focus 1: Create reader s and writer s, see below
          initReaderAndWriter(name, streams);
          // Focus 2: Polling Read
        } catch (Exception e) {
          failWebSocket(e, null);

      @Override public void onFailure(Call call, IOException e) {
        failWebSocket(e, null);
   public void initReaderAndWriter(String name, Streams streams) throws IOException {
    synchronized (this) {
      this.streams = streams;
      this.writer = new WebSocketWriter(streams.client, streams.sink, random);
      // Create Scheduled thread pool
      this.executor = new ScheduledThreadPoolExecutor(1, Util.threadFactory(name, false));
      if (pingIntervalMillis != 0) {
        // Using the thread pool to run PingRunnable regularly, it is getting closer and closer to the truth
            new PingRunnable(), pingIntervalMillis, pingIntervalMillis, MILLISECONDS);
      if (!messageAndCloseQueue.isEmpty()) {
        runWriter(); // Send messages that were enqueued before we were connected.
    // Create a reader to read messages
    reader = new WebSocketReader(streams.client, streams.source, this);
copy code 

Focus on what PingRunnable has done

private final class PingRunnable implements Runnable {
    PingRunnable() {

    @Override public void run() {
      // send ping frame

  void writePingFrame() {
    WebSocketWriter writer;
    int failedPing;
    synchronized (this) {
      if (failed) return;
      writer = this.writer;
      // Determine whether to wait for pong, if waiting for failedPing set to sentPingCount, otherwise set to -1
      failedPing = awaitingPong ? sentPingCount : -1;
      awaitingPong = true;

    if (failedPing != -1) {
      // The condition for running here is that the last pong has not returned, and an error is reported to exit
      failWebSocket(new SocketTimeoutException("sent ping but didn't receive pong within "
          + pingIntervalMillis + "ms (after " + (failedPing - 1) + " successful ping/pongs)"),

    try {
      // send a ping message
    } catch (IOException e) {
      failWebSocket(e, null);
copy code 

So how to judge whether the pong has been received, you need to use the reader. Let's take a look at what's going on in the reader.

Remember the loopReader marked in point 2 above?


/** Receive frames until there are no more. Invoked only by the reader thread. */
public void loopReader() throws IOException {
    while (receivedCloseCode == -1) {
        // This method call results in one or more onRead* methods being called on this thread.
        // Process the received frame

 * According to the header to judge whether it is a message frame or a control frame, the heartbeat pong belongs to the control frame
void processNextFrame() throws IOException {
    if (isControlFrame) {
    } else {

 * Parse opcode
private void readControlFrame() throws IOException {
    // ...
    switch (opcode) {
        // Note: receive pong, trigger onReadPing
        // ...
        frameCallback.onReadClose(code, reason);
        closed = true;
        throw new ProtocolException("Unknown control opcode: " + toHexString(opcode));
    * The counter is incremented automatically, and the awaitingPong state is set to false
  @Override public synchronized void onReadPong(ByteString buffer) {
    // This API doesn't expose pings.
    awaitingPong = false;
copy code 

After chasing it all the way, I found that after receiving the pong, okhttp simply changed the state of the awaitingPong flag, which means that the last pong message was received.

To sum up, the heartbeat process of okhttp,

  1. Create a schedule thread pool when ReadWebSocket is connected, and periodically execute PingRunnable to send ping messages; (send other request messages in WriteRunnable)
  2. At the same time, create a reader to continuously receive messages;
  3. Before sending a ping, use the awaitingPong mark to determine whether the last pong has returned;
  4. For the received message, parse it, and change the awaitingPong state if the received pong is received.

1.2 Dynamically adjust okhttp heartbeat

We found that by changing the delayTime of the thread pool executing PingRunable, we can dynamically change the ping rate without recreating OkHttpClient.

So you can use reflection to shutDown the original thread pool, create a new thread pool, and execute pingRunnable with a new pingInterval.

Class clazz = Class.forName("");
Field field = clazz.getDeclaredField("executor");
ScheduledExecutorService oldService = (ScheduledExecutorService) field.get(mWebSocket);

Class[] innerClasses = Class.forName("").getDeclaredClasses();
for (Class innerClass : innerClasses) {
    if ("PingRunnable".equals(innerClass.getSimpleName())) {
        // Create a new instance of PingRunnable
        Constructor constructor = innerClass.getDeclaredConstructor(RealWebSocket.class);
        Object pingRunnable = constructor.newInstance(mWebSocket);

        // Create a new thread pool
        ScheduledThreadPoolExecutor newService = new ScheduledThreadPoolExecutor(1, Util.threadFactory("ws-ping", false));
        newService.scheduleAtFixedRate((Runnable) pingRunnable, interval, interval, unit);
        field.set(mWebSocket, newService);

        // shutdown old thread pool
copy code 

In this way, we achieve the purpose of dynamically adjusting the heartbeat.

The ping of Java-WebSocket achieves the purpose of sending ping by calling sendPing() externally. There is no ping/pong state mechanism internally, so we need to maintain this relationship ourselves. In fact, you can use the timing message to send ping like OkHttp, and then parse the pong to maintain the heartbeat state. Not much to elaborate here.

2. Disconnect and reconnect

As mentioned earlier, disconnecting and reconnecting only needs to handle several disconnected state logic. Fortunately, most websocket libraries are divided into close and error callbacks, which can distinguish between abnormal disconnection and active disconnection.

What you need to maintain yourself is the relationship between the reconnection interval and the number of retries, similar to the exponential backoff algorithm.

3. Multithreading

For the multi-threading problem mentioned above, the main problem is that when multiple connection requests occur at the same time, instead of adding various synchronizations to multi-threading, it is better to parallelize the serial. In a similar way to a message queue, message processing is done in a separate thread. This eliminates the need for thread synchronization, while also ensuring the orderliness of the state.

In fact, in Android, the use of HandlerThread can perfectly meet the above two points. At first, the author did use HandlerThread to do it. Later, I felt that it could be independent of the platform or run on a pure Java platform, so I implemented it manually. At the same time, you can maintain the message queue yourself, and you can better handle the priority, delay, etc. yourself.


As I said before, the author's project is based on OkHttp, but I found that the state management part can be abstracted separately, so I have the title of this article WebSocketGo (hereinafter referred to as WsGo).

The data flow is as follows:

  1. Dispatcher is the message queue mentioned above, which is the producer-consumer mode, maintaining two queues, sending command s, and receiving event s.


    • The main event s are: OnConnect, onMessage, onSend, onRetry, onDisConnect, onClose

  2. Channel Manager is mainly responsible for state management.

  3. The WebSocket interface can be understood as the adaptation layer, which is responsible for calling the WebSocket library.

  4. Event Listener is the status callback of the foreground.

Access to WsGo

implementation 'com.gnepux:wsgo:1.0.2'
// use okhttp
implementation 'com.gnepux:wsgo-okwebsocket:1.0.1'
// use java websocket
implementation 'com.gnepux:wsgo-jwebsocket:1.0.1'
copy code 


WsConfig config = new WsConfig.Builder()
        .debugMode(true)    // true to print log
        .setUrl(pushUrl)    // ws url
        .setHttpHeaders(headerMap)  // http headers
        .setConnectTimeout(10 * 1000L)  // connect timeout
        .setReadTimeout(10 * 1000L)     // read timeout
        .setWriteTimeout(10 * 1000L)    // write timeout
        .setPingInterval(10 * 1000L)    // initial ping interval
        .setWebSocket(OkWebSocket.create()) // websocket client
        .setRetryStrategy(retryStrategy)    // retry count and delay time strategy
        .setEventListener(eventListener)    // event listener
copy code 


// connect
// Send a message
WsGo.getInstance().send("hello from WsGo");
// disconnect
WsGo.getInstance().disconnect(1000, "close");
// change the heartbeat
WsGo.getInstance().changePingInterval(10, TimeUnit.SECONDS);
// freed
copy code 

More configuration of WsConfig

setWebSocket(WebSocket socket)

WsGo already supports OkHttp and Java WebSocket

// for OkHttp (wsgo-okwebsocket)
// for Java WebSocket (wsgo-jwebsocket)
copy code 

If you need to use other WebSocket libraries or custom clients, just implement a WebSocket interface and pass the corresponding result to ChannelCallback. For the rest of the connection management, WsGo will do it for you.

public interface WebSocket {
    void connect(WsConfig config, ChannelCallback callback);
    void reconnect(WsConfig config, ChannelCallback callback);
    boolean disconnect(int code, String reason);
    void changePingInterval(long interval, TimeUnit unit);
    boolean send(String msg);
copy code 

setRetryStrategy(RetryStrategy retryStrategy)

For abnormal disconnection, WsGo will automatically reconnect. RetryStrategy refers to the relationship between the number of reconnections and the delay.

WsGo has a DefaultRetryStrategy by default. If you need to adjust it yourself, you can implement the onRetry method in the RetryStrategy interface.

public interface RetryStrategy {
     * The relationship between the number of retries and the delay
     * @param retryCount number of retries
     * @return delay time
    long onRetry(long retryCount);
copy code 

setEventListener(EventListener eventListener)

Add event callbacks. It should be noted that the callback runs in a thread created by WsGo itself, not in the calling thread. If necessary, manually switch threads will be called.

public interface EventListener {
    void onConnect();
    void onDisConnect(Throwable throwable);
    void onClose(int code, String reason);
    void onMessage(String text);
    void onReconnect(long retryCount, long delayMillSec);
    void onSend(String text, boolean success);
copy code 


At present, WsGo is not associated with the change notification of the mobile phone network (that is, you can receive a callback when the network is disconnected, and the network will not be automatically reconnected), because I think this part does not belong to the state management category of WebSocket itself, and it is no longer platform-independent. If you have this convenience, you need to initiate a connection from the outside. It's ok to simply encapsulate it in the project~


This article is a bit long, starting from the initial smart home scenario, introducing the existing problems of websocket state management, then giving the author's solution, and finally the general state management framework WebSocketGo. It can also be regarded as the author's thinking and summary in the work. In fact, think about it, not only WebSocket, but all long connections should face the same problem. But the angle of thinking and the starting point of the solution should be the same.

end of the article


If you want to become an architect, you should not be limited to coding and business, but you must be able to select and expand, and improve your programming thinking. In addition, a good career plan is also very important, and the habit of study is very important, but the most important thing is to be persistent. Any plan that cannot be implemented consistently is empty talk.

If you have no direction, here is a set of "Advanced Notes on Android Eight Modules" written by senior architects of Ali to help you organize the messy, scattered and fragmented knowledge systematically and efficiently. Master all knowledge points of Android development.

Compared with the fragmented content we usually read, the knowledge points of this note are more systematic, easier to understand and remember, and are arranged in strict accordance with the knowledge system.

1. Essential skills for architects to build foundations

1. In-depth understanding of Java generics
2. Annotate in-depth explanation
3. Concurrent programming
4. Data transmission and serialization
5. Principle of Java Virtual Machine
6. Efficient IO

Second, the source code analysis of Android top 100 frameworks

1.Retrofit 2.0 source code analysis
2.Okhttp3 source code analysis
3.ButterKnife source code analysis
4.MPAndroidChart source code analysis
5.Glide source code analysis
6.Leakcanary source code analysis
7.Universal-lmage-Loader source code analysis
8.EventBus 3.0 source code analysis
9.zxing source code analysis
10.Picasso source code analysis
11.LottieAndroid usage details and source code analysis
12.Fresco source code analysis - image loading process

Third, the actual analysis of Android performance optimization

  • Tencent Bugly: A little understanding of string matching algorithm
  • iQIYI: Android APP crash capture solution - xCrash
  • ByteDance: In-depth understanding of one of the Gradle frameworks: Plugin, Extension, buildSrc
  • Baidu APP technology: Android H5 first screen optimization practice
  • Alipay Client Architecture Analysis: "Garbage Collection" for Android Client Startup Speed ​​Optimization
  • Ctrip: Seeing the practice of componentized architecture from the Zhixing Android project
  • NetEase News Construction Optimization: How to Make Your Construction Speed ​​"Lightning"?
  • ...

4. Advanced kotlin strengthens actual combat

1. Introduction to Kotlin
2. Kotlin combat pit avoidance guide
3. Project combat "Kotlin Jetpack combat"

  • Start with a Demo that worships the great god

  • What is the experience of writing Gradle scripts in Kotlin?

  • The triple realm of Kotlin programming

  • Kotlin Higher-Order Functions

  • Kotlin Generics

  • Kotlin extension

  • Kotlin delegates

  • Coroutines "Unknown" Debugging Tips

  • Graphical coroutine: suspend

Five, Android advanced UI open source framework advanced decryption

1. Use of SmartRefreshLayout
2. Analysis of the source code of the PullToRefresh control of Android
3. Basic usage of Android-PullToRefresh pull-down refresh library
4.LoadSir - an efficient and easy-to-use loading feedback page management framework
5. Detailed explanation of Android general LoadingView loading framework
6.MPAndroidChart implements LineChart (line chart)
7.hellocharts-android usage guide
8.SmartTable User Guide
9. Introduction of open source project android-uitableview
10.ExcelPanel User Guide
11. In-depth analysis of the Android open source project SlidingMenu
12.MaterialDrawer User Guide

6. NDK module development

1. NDK module development
2. JNI module
3. Native development tools
4. Linux programming
5. Bottom image processing
6. Audio and video development
7. Machine Learning

7. Advanced Flutter technology

1. Overview of Flutter cross-platform development
2. Building the Flutter development environment in Windows
3. Write your first Flutter APP
4. Flutter development environment construction and debugging
5. The basic grammar of Dart grammar (1)
6. The use and source code analysis of the collection of Dart grammar articles (2)
7. Collection operator functions and source code analysis of Dart grammar articles (3)

8. WeChat applet development

1. Mini Program Overview and Getting Started
2. Mini program UI development
3. API operation
4. The actual combat of the shopping mall project...

Full set of video materials:

A collection of interviews

Second, the source code analysis collection

3. Collection of open source frameworks

Welcome everyone to support with one click and three consecutive links. If you need the information in the text, please click the CSDN official certified WeChat card at the end of the text to get it for free [guaranteed 100% free] ↓↓↓

Tags: Android IoT websocket

Posted by bluedot on Wed, 01 Jun 2022 00:05:53 +0530