Talking about js deep copy

preface

Recently, after sorting out js related documents, I recall a question that the front-end is easy to say and can test the level of professionalism (basic skills): the depth of js copy. I believe that the front-end students know less or more about it and understand its root causes. I won't repeat it here. Let's simply ask a few questions:

  • How to understand deep and shallow copy?
  • How to implement a perfect deep copy?

Let's talk about how to understand deep and shallow copy.

Values and references

As we know, many languages can be changed into languages. Assignment and parameter passing can be done through value copy or reference copy, depending on the syntax.

A reference in JavaScript points to a value. If a value has more than one reference, these references all point to the same value, and they have no reference / pointing relationship with each other.

The assignment / passing of values and references by JavaScript is syntactically indistinguishable and depends entirely on the value type.

Take a look at the following:

var a = 3;
var b = a; // b is a copy of the value of a
b++;
a = 3;
b = 4;

var c = [0, 1, 2];
var d = c; // d is a reference to [0, 1, 2]
d.push(3);
c; // [1, 2, 3, 4]
d; // [1, 2, 3, 4]

Simple values (i.e. scalar basic types) are always assigned / passed through value copying, including null, undefined, string, number, Boolean value, symbol in ES6 and the latest bigint.

compound value - objects (including arrays and encapsulated objects) and functions are always assigned / passed by reference copy.

In the above example, 3 is a scalar basic data type, so variable a holds a copy of the value and b holds another copy of it. When b is changed, the value of a remains unchanged.

c and d respectively point to two different references of the same compliance value [0, 1, 2]. Note that c and d simply point to the values [0, 1, 2], not hold. So they change the same value (call the push method), and then they all point to the new value after the change [0, 1, 2, 3].

Because a reference points to the value itself, not to a variable, one reference cannot change the point of another.

var a = [1, 2, 3];
var b = a;
a; // [1, 2, 3]
b; // [1, 2, 3]

b = [4, 5, 6];
a; // [1, 2, 3]
b; // [4, 5, 6]

b=[4, 5, 6] does not affect a's pointing to [1, 2, 3], unless b points to a pointer to a instead of an array reference, so b's assignment will affect a, but this does not exist in JavaScript.

Note: there is no pointer concept in JavaScript, and the working mechanism of reference is different. In JavaScript, a variable cannot be a reference to another variable.

<img: src= "$withbase ('/ stack memory.Jpg')" alt= "foo" >

Let's take another example:

var m = [1, 2, 3];

function fn(n) {
    n.push(4);
    n; // [1, 2, 3, 4]
    
    n = [4, 5, 6];
    n.push(7);
    n; // [4, 5, 6, 7]
}
fn(m);
m; // Is [1, 2, 3, 4] instead of [4, 5, 6, 7]

When the function is called to pass parameters, a copy of the reference m is actually assigned to N. therefore, when the push(4) operation is called, it points to the same object [1, 2, 3, 4]. However, when the reference of n is manually changed, M is not affected, so the final m is [1, 2, 3, 4] instead of [4, 5, 6, 7].

If we do not want the variables outside the function to be involved, we can create a copy first, so that the original value will not be affected. For example:

fn(n.slice())

The slice() method without parameters will return a shallow copy of the current array. Since the reference to the copy is passed to the function, the internal operation n will no longer affect m.

So far, we have roughly understood that different types of data may affect each other during replication, so it is necessary to implement a deep copy.

Next, we will introduce it step by step, and finally realize a more perfect deep copy.

Serialized version

Yes, this is also the easiest one to think of. It is only realized through serialization and deserialization.

JSON.parse(JSON.stringify());

Although this method can cope with most application scenarios, it still has great defects, such as copying functions, circular reference structures, other types of objects (Reg, Map), etc.

Use json Stringify note:

  1. undefined, arbitrary functions, and symbol values are ignored during serialization
  2. If you execute this method on an object that contains circular references (objects are referenced to each other to form an infinite loop), an error will be thrown
  3. Other types of objects, including Map/Set/WeakMap/WeakSet, will only serialize enumerable properties

Therefore, this method is only applicable to objects with simple data format.

Attribute copy (shallow copy)

If it is a shallow copy, it is similar to the extend in jqery. It simply traverses the attributes and assigns values:

function clone(target) {
    let _target = {};
    for (const key in target) {
        _target[key] = target[key];
    }
    return _target;
};

If it is a deep copy and we do not know the hierarchy of nested objects, we can use recursion to implement:

function deepClone(target) {
    if (typeof target === 'object') {
        let _target = {};
        for (const key in target) {
            _target[key] = deepClone(target[key]);
        }
        return _target;
    } else {
        return target;
    }
};

Although a deep copy demo is basically implemented here, we should think of something missing. Yes, Array. In fact, considering the fact that Array is also very simple, we should take a moment.

Array and object literals

In fact, the idea is very simple. It is ok to make the array compatible each time we traverse to create a new object:

function deepClone(target) {
    if (typeof target === 'object') {
        let _target = Array.isArray(target) ? [] : {};
        for (const key in target) {
            _target[key] = deepClone(target[key]);
        }
        return _target;
    } else {
        return target;
    }
};

So far, we have basically implemented a deep copy example. However, just as we will consider the array, there is another problem that is not common but can not be ignored: circularReference.

Circular reference

We execute the following test case:

const target = {
    refer: 'circularReference'
};
target.refer = target;

We output the following on the console:

<img :src="$withBase('/circleReference.jpg')" alt="foo">

You can see an infinitely expanded structure (that is, the object's attributes indirectly or directly reference itself).

First, let's analyze the circular reference structure: if we do not deal with the structure with circular references, each recursion will point to its own object, which will cause memory leakage. To solve this problem, we start from the fundamental point. For the loop structure, we can find a chche to store the current object every time we loop. When we copy the next time, we can go to the cache to find whether there is a current object. If there is one, we will return. If not, we will continue to traverse the copy. Let's implement this first:

function deepClone(target, cache = []) {
      if (target === null || typeof target !== 'object') {
       return target
    }
    
    let circleStructure = cache.filter(e => e.original === target)[0]
    if (circleStructure) {
        return circleStructure.copy
    }
    
    let copy = Array.isArray(target) ? [] : {}
    cache.push({
        original: target,
        copy
    })
    
    Object.keys(target).forEach(key => {
        copy[key] = deepClone(target[key], cache)
    })
    
    return copy
}

Defects of this method:

  1. You can only clone objects in {} format, but you can't do anything about objects with prototype chains
  2. You cannot clone objects of other types (iteratable collections, RegExp, Symbol, Date, TypedArray, etc.)

We are looking for solutions to the current defects:

  • Since the prototype chain cannot be copied for {} type objects, we can copy its prototype objects and extend its familiarity
  • For iteratable sets (Map, Set) because object Keys () can't facilitate it, so we can use their own constructors
  • For other types of objects, we should also use their respective constructors to copy

To distinguish object types, we must first find a method that can strictly judge object types. Previously, I saw a method to strictly judge the object type when looking at the vue source code. I passed the object The toString method can return the specific type of the object:

function getPlainObjType(obj) {
    return Object.prototype.toString.call(obj)
}

I believe that this method can be seen everywhere when reading other component libraries. Secondly, let's think about it (excluding function first). For collection types, we can use the map of key value pairs for storage. However, if we use an object as the key of the mapping, even if all references of the object are released later and the memory of the object begins to be collected at a certain time (GC), the map itself will still maintain its entries (value objects), unless the entries (clear) are manually removed to support GC.

At this time, the function of WeakMap will be shown. In fact, their external behavior characteristics are basically the same. The difference is reflected in the working mode of memory allocation.

WeakMap only accepts objects as keys, and these objects are weakly held. That is, if the key object itself is garbage collected, the item in the WeakMap will be automatically removed. This is why WeakMap is better than Map in this respect. Let's take an example:

var wm = new WeakMap();

var x = {id: 1},
    y = {id: 2},
    z = {id: 3},
    w = {id: 4};

wm.set(x, y);

x = null;       // {id: 1} can be garbage collected
y = null;       // {id: 2} can be garbage collected. In fact, x = null; The items in the weakmap are recycled

wm.set(z, w);

y = null;         // {id: 4} is not recycled because the key is also soft associated with the object {id: 4}

Next, we will improve the deepClone method above. Lodash implements a relatively comprehensive divine copy. We can learn from lodash's ideas to implement a simplified version of:

  • Declare several tool functions required for cloning
  • Replace cache with WeakMap
  • If it is judged that the data is of basic type, it is returned directly
  • Declare deepInit
  • If it is a map or a set, use their own add method copy
  • If it is an array or {}, use object Keys traverse copy properties
  • If it is a wrapper type object or an object of type Date, RegExp, or Symbol, use their constructors to copy
var boolTag = '[object Boolean]',
    dateTag = '[object Date]',
    errTag = '[object Error]',
    mapTag = '[object Map]',
    arrTag = '[object Array]',
    objTag = '[object Object]',
    numberTag = '[object Number]',
    regexpTag = '[object RegExp]',
    setTag = '[object Set]',
    stringTag = '[object String]',
    argsTag = '[object Arguments]',
    symbolTag = '[object Symbol]';


function getPlainObjType(obj) {
    return Object.prototype.toString.call(obj)
}

// Judge object type
function isObject(obj) {
    var type = typeof obj;
    return obj != null && (type == 'object' || type == 'function');
}

// Other (built-in) reference type objects
function isReferObj(type) {
    return ~([dateTag, errTag, regexpTag, symbolTag].indexOf(type))
}

function isSet(type) {
    return type === '[object Set]'
}

function isMap(type) {
    return type === '[object Map]'
}

// Return the incoming object constructor so that the prototype chain properties can be copied
function deepInit(obj) {
    const Ctor = obj.constructor;
    return new Ctor();
}

function cloneObjByTag(object, tag) {
    var Ctor = object.constructor;
    switch (tag) {
      case dateTag:
        return new Ctor(+object);

      case errTag:
        return new Ctor(object);

      case regexpTag:
        return cloneRegExp(object);

      case symbolTag:
        return cloneSymbol(object);
  }
}

function cloneRegExp(object) {
    let reFlags = /\w*$/;
    let result = new object.constructor(object.source, reFlags.exec(object));
    result.lastIndex = object.lastIndex;
    return result;
}

function cloneSymbol(object) {
    return Object(Symbol.prototype.valueOf.call(object));
}

function deepClone(target, wm = new WeakMap()) {
    if (!isObject(target)) {
        return target;
    }
    
    let type = getPlainObjType(target);
    let copy = deepInit(target);
    
    // Determine whether there is a circular reference structure
    let hit = wm.get(target);
    if (hit) {
        return hit;
    }
    wm.set(target, copy);
    
    if(isReferObj(type)) {
        copy = cloneObjByTag(target, type);
        return copy;
    }
    
    if (isSet(type)) {
        target.forEach(value => {
            copy.add(deepClone(value));
        });
        return copy;
    }
    
    if (isMap(type)) {
        target.forEach((value, key) => {
            copy.set(key, deepClone(value));
        });
        return copy;
    }
    
    Object.keys(target).forEach(key => {
        copy[key] = deepClone(target[key], wm)
    });
    
    return copy;
}

Note: object is used when copying object attributes For the time being, keys does not consider typedArray type objects and functions. We divide the listed object types into traversable objects and non traversal objects (built-in objects). Different traversal methods are used to copy attributes. Map and Set types can use their own forEach traversal. Objects and arrays use object Keys, and other built-in reference type objects directly use their constructors to regenerate new objects.

In fact, writing here is equivalent to completing a large part. Lodash has done a lot of detailed optimization work. For example, when there are many object levels, lodash has made some special efforts to traverse:

function arrayEach(array, iteratee) {
  var index = -1,
      length = array == null ? 0 : array.length;

  while (++index < length) {
    if (iteratee(array[index], index, array) === false) {
      break;
    }
  }
  return array;
}

Compare for, for In, while loop, because for In will traverse the enumerable attributes other than Symbol on the entire object (prototype chain), so it will be slower. However, the test results of many online Posts show that there is little difference between for and while. Generally speaking, while is faster in terms of execution time.

There are also some methods that use iterators to traverse, such as forEach, every, and some. The only difference between them is that they handle the return value of the callback function differently: forEach will traverse all of the array and ignore the return value of the callback function. Every will run until the callback returns false, and some will run until the callback returns true. The above example simulates the every behavior to traverse.

Other traversals are used to access object properties: object Keys, object Getownpropertynames. These differences from in: object Keys only enumerates the enumerable attributes directly contained in the object Getownpropertynames iterates over the properties directly contained in the object (whether they are enumerable or not). The in operator will find whether the attribute of the object (including the prototype chain) exists (whether it can be traversed or not), and for In will traverse the enumerable attributes other than Symbol on the entire object (prototype chain).

In addition, for Of can traverse the values of the array (if the object itself defines an iterator).

You can clearly see that using while rewrites forEach. The array here should be noted:

  • If an object is traversed, the key and value need to be swapped, because the key of the object is the value of the array rather than the subscript
  • When customizing the iteration callback function, you can set the return value according to different logic to interrupt the traversal

We can change the traversal logic in deepClone:

let keys = Array.isArray(target) ? undefined : Object.keys(target);

// Object.keys(target).forEach(key => {
//     copy[key] = deepClone(target[key], wm);
// });

arrayEach(keys || target, (value, key) => {
    if (keys) {
        key = value
    }
    copy[key] = deepClone(target[key], wm);
})

At present, the typedArray of function type and binary array has not yet implemented deep copy. However, at present, the serialization and serialization versions are still used most in daily development. Moreover, I don't often use the deep copy of this package. I only write these things for learning and expansion. If it is really used, lodash is enough. I will continue to study the copy of function and arrayBuffer in the future.

Since there are too many edge case s to consider in deep copy, I believe you will have a lot of discussion. It is not easy to write a deep copy. The specific advantages and disadvantages also need to be combined with the business.

reference

Summary

  1. For deep and shallow copy, we first introduce values and references. Simple values (i.e. scalar basic types) are always assigned / transferred by value replication. Compound values - objects (including arrays and encapsulated objects) and functions are always assigned / transferred by reference replication, and references do not affect each other. Therefore, we point out the relevant ideas of deep copy.
  2. First, the most extensive serialization is used for copying, but the serialization is limited to automatically ignoring function, collection, wrapper object and reference object, which leads to recursive copying.
  3. Considering the circular reference problem, the cache is used to avoid the copy from falling into an endless loop.
  4. Because the attributes of the prototype chain are not considered, it leads to the use of constructors to copy objects, which leads to deep copies of more data types. For objects in the form of collections, we refer to the WeakMap, which supports better memory allocation, as the cache object. This leads to the problem of copying objects such as symbol, regular, and other reference types.
  5. Finally, we analyze the performance optimization of lodash for traversal when copying, and give us an idea when traversing a large amount of data.

Tags: Javascript ECMAScript Front-end

Posted by jpmm76 on Tue, 31 May 2022 16:24:14 +0530