1, Introduction to Unicode
Unified code (Unicode), also known as universal code and single code, is an industry standard in the field of computer science, including character set, coding scheme, etc. Unicode is produced to solve the limitations of traditional character encoding schemes. It sets a unified and unique binary encoding for each character in each language to meet the requirements of cross language and cross platform text conversion and processing.
In this language environment, there will be no language coding conflicts. The content of any language can be displayed on the same screen, which is the greatest advantage of Unicode. Is to encode all the words in the world with two bytes. In that way, two bytes will be enough to hold most of the words of all the languages in the world.
Unicode is, of course, a very thick dictionary, recording a number corresponding to all the characters in the world. The specific correspondence, or how to divide, is not our problem. We only know that Unicode assigns a number to all characters to represent the character.
There are some misunderstandings about Unicode. It is just a character set, which specifies the corresponding binary code. There is no regulation on how to store the binary code. Its idea is very simple, that is to specify a number for each character to represent that character, and that's all.
2, Coding method
Unicode is a character coding scheme developed by international organizations that can accommodate all the characters and symbols in the world. Unicode uses the number 0-0x10FFFF to map these characters. It can hold up to 1114112 characters, or 1114112 code points. A code point is a number that can be assigned to a character. UTF-8, UTF-16 and UTF-32 are encoding schemes that convert numbers to program data.
The Unicode Character Set can be abbreviated to UCS (Unicode Character Set). The early Unicode standards were called UCS-2 and UCS-4. UCS-2 is encoded with two bytes, and UCS-4 is encoded with four bytes. UCS-4 is divided into 2^7=128 groups according to the highest byte with the highest bit of 0. Each group is divided into 256 planes according to the next highest byte. Each plane is divided into 256 rows according to the third byte, and each row has 256 code points (cell s). Plane 0 of group 0 is called BMP (Basic Multilingual Plane). UCS-2 is obtained by removing the first two zero bytes from the BMP of UCS-4.
Unicode encoding for Chinese:
In Unicode version 5.0.0, 238605-65534*2-6400-2408=99089. The remaining 99089 defined code points are distributed on plane 0, plane 1, plane 2 and plane 14, which correspond to 99089 characters defined by Unicode, including 71226 Chinese characters. 52080, 3419, 43253 and 337 characters are defined on plane 0, plane 1, plane 2 and plane 14 respectively. The 43253 characters in plane 2 are all Chinese characters. 27973 Chinese characters are defined on plane 0.
Most computers use ASCII (American Standard Information Interchange Code), which is a 7-bit coding scheme that represents all upper and lower case letters, numbers, punctuation marks and control characters. The unified code (Unicode) contains ASCII codes, and the characters' \u0000 'to' \u007F 'correspond to all 128 ACSII characters. Unified code can be used in JAVA.
3, Code implementation
Controller:
package com.sducsrp.csrp.controller.ToolsController; import com.sducsrp.csrp.common.Constants; import com.sducsrp.csrp.common.Result; import org.springframework.web.bind.annotation.*; @RestController public class UnicodeController { @RequestMapping("/tools/Unicode/encode") public @ResponseBody Result encode(@RequestParam("data") String data) { if(data==null||"".equals(data)) { return Result.error(); } StringBuffer unicode=new StringBuffer(); for(int i=0;i<data.length();i++) { char c= data.charAt(i); String hex = Integer.toHexString(c); if(hex.length()<=2) { hex="00"+hex; } unicode.append("\\u"+hex); System.out.print(unicode); } Result res=new Result(Constants.CODE_200,null,unicode.toString()); return res; } @RequestMapping("/tools/Unicode/decode") public @ResponseBody Result decode(@RequestParam("data") String data) { if(data==null||"".equals(data)) { return Result.error(); } StringBuffer sb=new StringBuffer(); int i=-1; int pos=0; while ((i = data.indexOf("\\u", pos)) != -1) { sb.append(data.substring(pos, i)); if (i + 5 < data.length()) { pos = i + 6; sb.append((char) Integer.parseInt(data.substring(i + 2, i + 6), 16)); } } Result res=new Result(Constants.CODE_200,null,sb.toString()); System.out.print(sb); return res; } }
vue:
<template> <br> <br> <!-- <p>Base64 Online encoding and decoding</p>--> <el-card shadow="always" style="text-align: center; margin-right: 100px;margin-left: 100px;height: 600px"> <el-row> <el-col :span="8"> </el-col> <el-col :span="8"> <div style="font-size: x-large;background-color: darkgreen;color:white;margin: 5px;"> UNICODE Online encoding and decoding </div> <el-button style="text-align: right" size="small" icon="el-icon-thumb"> <el-link href="https://blog.csdn.net/hyongilfmmm/article/details/112037596" target="_blank" style="font-size: 20px;color: darkgreen" >Unicode Code explanation </el-link> </el-button> </el-col> <el-col :span="8"> </el-col> </el-row> <div class="el-common-layout"> <div class="text-area" style="margin: 10px"> <!-- <textarea v-model="textdata" placeholder="Please enter the encoded string or the string to be decoded">--> <!-- </textarea>--> <el-input v-model="textdata" type="textarea" placeholder="Please input" style="width: 700px;font-size: 20px" :rows=5 /> </div> </div> <el-button @click="encode" type="info" style="margin: 20px">EnCode</el-button> <el-button @click="decode" type="info">DeCode</el-button> <el-card style="width: 40%;height: 100px;margin-left: 30%;margin-top: 5%"> <p style="text-align: left">code/Decoding result:</p> <p>{{ myresult }}</p> </el-card> </el-card> </template> <script> import request from "@/utils/request"; export default { data() { return { textdata: '', myresult: '' } }, methods: { encode() { // alert(this.textdata) request.get("/tools/Unicode/encode", { params: { data: this.textdata } }).then(res => { //alert(res.data); this.myresult = res.data }) }, decode() { request.get("/tools/Unicode/decode", { params: { data: this.textdata } }).then(res => { this.myresult = res.data }) } } } </script> <style scoped> </style>