Posts | Yitao Li's Homepage

Particle Swarm Optimization with Penalty Function

Mon, 27 Feb 2023 00:00:00 +0000

This post is synchronously published on my CSDN blog: https://blog.csdn.net/weixin_45766278/article/details/129244334

1. Overview of algorithm

Particle Swarm Optimization with Penalty Function is a commonly used optimization algorithm, which is mainly used to solve constrained optimization problems.

In traditional particle swarm optimization, particles move in search space to find the best solution. However, in practical problems, there are usually some constraints, such as the range of variables or the inequality constraints of functions. These constraints may result in discontinuous or even infeasible search space. Therefore, particle swarm optimization with penalty function introduces penalties in the objective function to penalize particles that do not meet the constraints to avoid them entering the infeasible area and gradually approach the feasible area during the search process.

Simply put, with the particle swarm algorithm with penalty function, the constraints can be placed in the target function of the particle swarm algorithm to achieve the optimal solution under the constraints.

2. Objective functions and updated formulas

Specifically, the objective function of the particle swarm optimization with penalty function can be expressed as:

f(x) + p(x)

Where x is the variable vector to be optimized, f(x) is the original objective function, and p(x) is the penalty function. It punishes particles that do not meet the constraint conditions according to the degree of violation of the variable vector. There are many forms of penalty function, such as linear penalty function, quadratic penalty function, exponential penalty function, and so on. Which penalty function to choose depends on the characteristics of the actual problem.

In particle swarm optimization with penalty function, each particle needs to consider whether the current position satisfies the constraint when updating its position and velocity. If they are not, the particle needs to be punished with a penalty function to avoid entering the infeasible area. Update the formula as follows:

Speed update: v[i][j] = w * v[i][j] + c1 * rand() * (pbest[i][j] - x[i][j]) + c2 * rand() * (gbest[j] - x[i][j])

Location update: x[i][j] = x[i][j] + v[i][j]

(The above two formulas are conventional formulas of particle swarm optimization algorithm)

Where, v[i][j] represents the velocity of the ith particle in the jth dimension, x[i] [j] represents the position of the ith particle in the jth dimension, pbest[i][j] represents the historical best position of the ith particle, gbest[j] represents the global best position, w, c1 and c2 are inertia weights, individual learning factors and global learning factors respectively, and rand() is a random number function.

When updating the location, if it is found that the variable on a dimension exceeds the value range, it needs to be corrected.

Specifically, if the lower bound is exceeded, it is adjusted to the lower bound plus a random number multiplied by the difference between the upper and lower bounds. If the upper bound is exceeded, it is adjusted to the upper bound minus a random number multiplied by the difference between the upper and lower bounds.

At the same time, for particles that do not meet the constraints, penalty values need to be calculated from the penalty function and added to the target function values of the particles to be considered in the order of fitness.

if (x[i][j] < lb[j]) or (x[i][j] > ub[j]):

x[i][j] = lb[j] + (ub[j] - lb[j]) * rand()

p[i] = f(x[i]) + P(x[i])

(The above three formulas are the main manifestations of the penalty function part)

Note that lb[j] and ub[j] represent the lower and upper bounds for variables on the jth dimension, respectively.

3. Summary of advantages and disadvantages

The advantage of particle swarm optimization with penalty function is that it can handle constrained optimization problems, and has good global search ability and convergence performance.

However, it needs to select an appropriate penalty function according to the specific problem, and attention should be paid to the weight of penalties and the selection of parameters of penalty function in practical application to avoid the algorithm falling into the local optimal solution.

Sources and Effects of Noise in Kalman Filter

Sat, 24 Dec 2022 00:00:00 +0000

This post is synchronously published on my CSDN blog: https://blog.csdn.net/weixin_45766278/article/details/128426596

1. Theoretical assumptions of two noises in the description of state space

First put forward the basic formula: Equation of state: x(k) = Ax(k-1) +Bu(k-1) +w(k-1) Observation equation: y(k)=Cx(k)+v(k) Where w(k-1) is process noise, which is usually recorded as Q, v(k) as observation noise and R. There are four main requirements of standard Kalman filter for Q and R:

Uncorrelated
Zero Mean
Gauss white noise sequence
Q, R are nonnegative and positive fixed matrices of known values, respectively

i.e.

Where the $\delta_{kj}$ is the Kronecker $\delta$ function,

2. How to obtain two noises from engineering application

Process noise Q: Construct the “ideal state” of the research question, and compare it with the actual situation by using the sample variance as Q

For example, when studying the motion of sliders, it is possible to compare the motion data of relatively smooth surfaces with that of actual rough surfaces. Or control an unmanned car, which actually travels near an arc in DT time, but we often approximate it as a linear model when we study it, so the resulting system error can calculate a range threshold

Observation noise R: This noise is relatively easy to obtain, usually based on the accuracy of the sensor, directly conducting observation experiments, using the sample variance as Q

For example, the error of a thermometer is +0.5, and the observation noise R=0.5^2=0.25

3. Effect of two noise sources on Kalman filter estimation error

In fact, I think it is difficult to directly construct the linear relationship between the specific values of process noise Q and process noise R and the error of Kalman filter estimation at this stage. Current research mainly focuses on finding the optimal combination of Q and R.

For example, genetic algorithm is used to find the best combination of variances in this paper:

[1] Guo Ying Shi, Wang Chang, Zhang Yaqi. The influence of noise variance on Kalman filter result analysis [J]. Computer Engineering and Design, 2014,35(02): 641-645.DOI:10.16208/j.issn1000-7024.02.016.

However, it should be noted that Q and R must exist objectively in practical engineering problems. We try our best to reduce Q and R, which usually helps us to obtain better estimation results.

Mobile Communication - Three modes of RRC in 5G/NR (idle, active, inactive)

Wed, 21 Sep 2022 00:00:00 +0000

This post is synchronously published on my CSDN blog: https://blog.csdn.net/weixin_45766278/article/details/126928988

Introduction of RRC_ INACTIVE mode

Before the introduction of RRC_INACTIVE mode, LTE had only two RRC statuses: RRC_IDLE and RRC_CONNECTED. After R13, LTE RRC introduced the new mode.

The R15 specification of 5G/NR continues the inactive mode item introduced in R13, that is, the RRC under NR has three states: IDLE, INACTIVE, and ACTIVE (CONNECTED).

So why introduce RRC_ INACTIVE mode?

In fact, similar to NB-IoT in LTE, NB-IoT belongs to the deployment of low power consumption scenarios, and RRC_ INACTIVE mode is mainly used to reduce terminal energy consumption and delay. The mMTC, one of the three typical application scenarios of 5G/NR, includes NB-IoT. (And the RRC_INACTIVE state is not limited to applications in the mMTC scenario.)

How to reduce energy consumption and delay in RRC_ INACTIVE mode?

There are three reasons:

The UE will retain the context of the core network when entering the RRC_INACTIVE mode; Until there is data receiving or sending in RRC_INACTIVE mode, when it is necessary to transition to RRC_CONNETED mode, it is only necessary to carry the unique UE identifier of the core network for recovery through the recovery process, and the gNB can receive and send data packets after receiving the connection recovery.
Compared with the previous RRC_IDLE mode transition to RRC_CONNETED mode directly (the context applied by the core network needs to be released, and the signaling interaction with the core network side is required when applying for the context), the above process can be omitted when the RRC_INACTIVE mode transitions to RRC_CONNETED mode.
When UE receives signaling messages from gNB, it needs to blind check PDCCH to know the resource location of signaling. When the RRC_INACTIVE mode transitions to the RRC_CONNETED mode, the UE does not release the context, and the core network side does not need to re allocate the context, so the signaling reception is reduced. This will reduce the energy consumption caused by UE blind detection and the transmission time caused by air port transmission.

Differences and explanations of the three models

ACTIVE (CONNECTED): UE — NG-RAN :connected NG-RAN — 5GC :connected
IDLE: UE — NG-RAN :released NG-RAN — 5GC :released
INACTIVE: UE — NG-RAN :suspend NG-RAN — 5GC :connected

RRC_ IDLE：

PLMN selection;
Broadcasting system information;
Cell reselection mobility;
Paging of mobile termination data is initiated by 5GC;
Paging in the mobile terminal data area is managed by 5GC;
DRX configured by NAS for CN paging.

RRC_ INACTIVE：

PLMN selection;
Broadcasting system information;
Cell reselection mobility;
Paging is initiated by NG-RAN (RAN paging);
RAN based notification area (RNA) is managed by NG-RAN;
RAN paging DRX configured by NG-RAN;
Establish 5GC-NG-RAN connection for UE (including control plane/user plane);
UE AS message is stored in NG-RAN and UE;
NG-RAN knows the RNA of UE.

RRC_ ACTIVE (CONNECTED)：

Establish 5GC-NG-RAN connection for UE (including control plane/user plane);
UE AS message is stored in NG-RAN and UE;
NG-RAN knows the cell to which the UE belongs;
Transmit unicast data to or from UE;
Network control mobility, including measurement.

Machine Learning - AWS series of course exercises

Sat, 13 Aug 2022 00:00:00 +0000

This post is synchronously published on my CSDN blog: https://blog.csdn.net/weixin_45766278/article/details/127181466

[Module 2 - Introducing Machine Learning]

Machine learning is scientific research on Algorithms and statistical models, which relies on reasoning rather than instructions to perform tasks
reinforcement learning take actions that can obtain the maximum reward through interaction with the environment and learning
Create a machine learning solution for the call center. The system template is to transfer customers to the appropriate departments (there are eight possible departments in total) This scenario describes the multi category classification problem
The system needs to respond to environmental changes to improve performance: reinforcement learning
In the data preparation stage of the machine learning pipeline, whether the design verification data are all of the same type
The data of multiple countries or regions are listed in the order of abbreviations and subtitles. Which stages involve converting these abbreviations into numerical values: data preparation
If a model performs well on the training data but performs poorly on the evaluation data, it belongs to the overfitting model Correct
Which resources are Python libraries used to deal with machine learning problems: pandas, scikit learn
Which Amazon service can be used to deploy machine learning instances and run jupyter notebooks: Amazon Sagemaker
What are the requirements for selecting machine learning as the development method: large data sets containing a large number of variables

[Module 3 - Implementing a Machine Learning pipeline with Amazon SageMaker]

What resources help define machine learning problems: access to marked data, domain experts to consult
What attributes should the data prepared for monitoring classification learning have: the data should be marked and the data should be the representative of production data
What can be found by checking data statistics: abnormal data found
How to split training data for a preprocessed data set that can be used for training model: 80% is used for training, and the rest is split into test data (10%) and verification data (10%)
Mechanical energy single model and multi model can be managed through Amazon sagemaker Correct
Role of confusion matrix: show true false positive and true false negative
What does the correlation Heatmap show: correlation degree (positive correlation / negative correlation) between data set characteristics
Which of the following file formats does pandas support importing data: JSON, CSV
Which Amazon service is used to deploy the machine learning instance and run the jupyter Notebook: Amazon Sagemaker
What is the goal of Amazon sagemaker super parameter tuning job: optimize model parameters to generate the best model

[Module 4 - Implementing a Machine Learning pipeline with Amazon SageMaker]

What are the common patterns in time series data: trend and seasonality
Which use cases are suitable for prediction: predict the necessary inventory of goods in the warehouse and predict the energy consumption of the Office
Which data sets can be used as time series data sets (core: including time): sales data including goods, purchase date and quantity; Web log with IP address, page and timestamp
For the data set of temperature readings of a weather station (recorded every 5 minutes), several values are missing every day. What countermeasures can be taken: fill the missing values forward / backward
Which scenarios show appropriate downsampling examples: use the mean function to convert the temperature reading per minute into an hour system; Use the sum function to convert the sales order information of the current day into the daily total
What seasonal examples can be observed in the time series data: every quarter, every year, spring, summer, autumn and winter
Amazon forecast generates P10, P50 and P90 prediction results. If you use Amazon forecast to forecast the sales volume of shoes and boots, you can learn what information through P10, P50 and P90 (core: P value indicates the percentage of true value will be lower than the predicted value) P10 indicates that the order volume is less than the predicted value within 10% of the time
What data sets (core: time series) are needed to generate retail forecasts using Amazon forecast: time series data including time stamps, goods and quantities
What steps should be performed to generate the best model using the available data: use pandas to split the data into training data sets and test data sets by time; Use the training data set in Amazon forecast by specifying the back test window; Test data sets are used to compare predicted and actual values

[Module 5 - Introducing Computer Vision (CV)]

What are the common use cases of computer vision: image analysis, face recognition, family safety
What is the object position in the image: bounding box
What functions does Amazon rekognition provide: search for images and video libraries, identify faces, and perform Emotional Analysis on images
When Amazon rekognition performs prediction, it also provides a score indicating the confidence of prediction Correct
What will Amazon rekognition do to the results after completing the video analysis: publish the results to Amazon SNS queue
What functions does Amazon rekognition custom labels have: the UI for marking images and defining bounding boxes Automatic selection of machine learning algorithm
To use Amazon sagemaker ground truth to automatically mark data, the minimum number of images required is 1250
What is the confusion matrix: determine the accuracy of the model in classifying objects
What types of data are included in the Amazon sagemaker ground truth manifest file: confidence value, creation date, and class name
Which of the following steps are used to prepare custom data sets for object detection: collect images and train models

[Module 6 - Introducing Natural Language Processing]

Which of the following is not the main challenge of natural language processing (NLP): storage limitation
What are the common preprocessing tasks of NLP applications: eliminate noise and normalize similar words
NLP appeared earlier than machine learning system Correct
What are the common machine learning models for NLP applications: word bag, word frequency and inverse document frequency
Which of the following does not belong to the text analysis category: AutoCorrect text
What functions does Amazon transcribe support: convert streaming audio into text; Build multilingual subtitles
How can I change how Amazon Polly pronounces words: add speech synthesis markup language (SSML) tags to text
Which functions belong to Amazon comprehensive: identify the language used in the document; Identify the emotions contained in the document (positive, negative, neutral, or mixed)
Which of the following AWS services can be used to start the workflow according to the input of Amazon Lex chat robot: Amazon lambda
When working for a company that builds applications for global audiences, what services can be used to analyze how customers use applications: Amazon comprehensive, Amazon translate

Cloud Computing - Total Cost of Ownership (TCO)

Sun, 10 Jul 2022 00:00:00 +0000

Overview

Total Cost of Ownership (TCO) is a technical evaluation standard often adopted by companies. Its core idea is the total cost including acquisition cost and annual total cost within a certain time range. In some cases, this total cost is an average of costs over a period of 3 to 5 years to obtain comparable current expenditures.

In summary, TCO is the cost of retaining and maintaining all software within the company.

Introduction

Almost all institutions or enterprises can not do without cost budget in the process of informatization, especially in the investment of it projects, which requires a scientific and reasonable value evaluation method. The theme of this paper - total cost of ownership (TCO) is such an economic evaluation mechanism established in the process of completing this task.

Concepts of TCO

The authority Gartner defines the TCO as: TCO is the holistic view of costs across enterprise boundaries over time。TCO is a quantitative means for understanding the qualitative performance of the IS organization。TCO is a comprehensive set of methodologies, models and tools to help IS organizations better:

Measure costs
Manage costs
Reduce costs
Improve overall value of IT investments
Align IT support to the business mission

Total cost of ownership (TCO) is often an important part of the IT strategy of an organization. It is a technical evaluation standard often adopted by a company. Its core is to evaluate the total cost, including acquisition cost and annual total cost, owned by an enterprise within a certain time range. Compared with the simple calculation of return on investment (ROI), the calculation of total cost of ownership often focuses on long-term and in-depth analysis, so it has become one of the most effective tools for storage economic evaluation. In some cases, this total cost is an average of costs over a period of 3 to 5 years to obtain comparable current expenditures. Specifically, we can evaluate the TCO value from the following specific aspects:

Accounting of TCO

Total cost of ownership accounting generally consists of two parts: technology and business. The technical cost includes the cost of hardware, software (including maintenance and upgrading), installation and training, as well as the manpower expenses such as operation, support and consultation.

Business costs involve financial issues related to availability, performance and recovery. Availability also includes the benefits brought by the improvement of availability and the costs paid by the unavailability of data. The commercial benefits brought by the improvement of availability include the benefits related to the solution, the saved human resource costs and the productivity improvement brought by the new investment. Performance factor is to evaluate the contribution of performance to data availability. The calculation method for performance improvement is not to calculate the one-time benefits, but to calculate the benefits that increase continuously over time.

Recovery cost refers to the time and money spent to recover to normal operation once the storage facility fails. It also includes the resulting loss of business value and the impact on productivity, as well as other miscellaneous expenses.

It is very useful for companies of all sizes to consider it costs through total cost of ownership. This means that we have not only seen the hardware cost of end users, but also considered all the related costs that will be brought:

Additional asset costs - software, it support software and network architecture.
Technical support cost - hardware and software deployment, technical support personnel, system maintenance.
Management cost - Finance, supplier management, user training, and asset management.
End user operating costs - downtime costs, user support and expensive it technician support.

Pros & Cons of TCO

Pros

The outstanding advantage of TCO is that it provides a powerful cost estimation method when people are not clear about the possible future cost of a project at the initial stage of purchase. However, since this estimation method only focuses on the cost, such companies that rely on TCO will eventually adopt the strategy of minimizing the expenses, instead of considering how to maximize the return. For this reason, these companies may purchase the cheapest application software, and rarely choose the application software that can have the greatest impact on the minimum requirements of the company.

In most cases, calculating the total cost of ownership is a process that requires continuous efforts. It needs to consider both technical and non-technical factors. If you want to have a complete understanding of the continuous cost of the relevant application software, it is best to calculate TCO over a period of at least 3 years. All costs including software and hardware costs, as well as consulting and related support costs in the preparation work before the software is actually launched, shall be taken into account when calculating TCO. After the software is really launched, the maintenance and upgrade costs in the following years, as well as the costs of training users and it support, must also be considered.

The annual TCO figure of each year is an excellent indicator of the current cost situation, and can be well used to achieve the budget objectives and be well used on budget oriented projects. Compared with other similar methods, the average TCO value over a period of time provides a more reasonable standard. However, the average TCO cannot provide insight into the timing of costs. We find that the products with low input cost and high input maintenance are more attractive than those with high input cost and low input maintenance. However, through analysis over a period of time, the two types of products are similar in TCO, with little difference.

Cons

The problem with total cost is that since it is used in isolation, it only provides a very narrow approach to the cost of a certain application. TCO doesn’t consider profit at all. What you care about is not only choosing the cheapest application software, but more importantly, you should consider choosing the application software that can bring the most profit or return to the company.

Similarly, TCO cannot help you optimize the project, because optimization based on the lowest cost will mean never investing in the project. On the contrary, optimization requires the company to weigh various factors in cost, income and investment risk before finally deciding how to take the first step.

In terms of finance, there are no disciplinary problems such as corruption. Every company should have a clear understanding of the continuing costs of every technology investment they make. Because TCO can easily get the cost and has a specific figure for making decisions based on less expenditure, many companies and analysts have paid attention to TCO as an important specific standard. However, technology can have a great impact on the bottom line. Looking at the TCO situation where all the costs and benefits have exceeded, it can help you determine that although the solution you adopt is not the cheapest solution, it is the best solution for your company.

Summary

It can be seen from the above analysis that TCO has its advantages and disadvantages. TCO does not involve any financial discipline, and through TCO, every company can know the current cost of investing in each technology. It is precisely because TCO can provide enterprises with visible and tangible figures that many companies regard it as a basic technical evaluation criterion. With TCO, enterprises can easily see the cost flow and make decisions on the basis of the lowest possible expenditure. However, we must be clear that technology has a significant impact on the minimum requirements of enterprises. We should not be limited to TCO methods. We should consider all factors, including cost and income, to ensure that the selected solution is not necessarily the cheapest, but must be the best.

Cloud Computing - Content Delivery Network (CDN)

Wed, 06 Jul 2022 00:00:00 +0000

Overview

The full name of CDN is Content Delivery Network. Its purpose is to add a new layer of network architecture to the existing Internet, publish the content of the website to the network edge closest to the user, enable the user to obtain the required content nearby, and improve the response speed of the user when visiting the website. CDN is different from mirroring because it is more intelligent than mirroring. Or we can use a metaphor: CDN = more intelligent mirroring + caching + traffic diversion. Therefore, CDN can obviously improve the efficiency of information flow in Internet network. Technically, it comprehensively solves the problems of small network bandwidth, large number of users and uneven distribution of outlets, and improves the response speed of users visiting websites.

To better understand CDN, let’s take a look at the CDN workflow: When a user visits a website that has joined the CDN service, the best CDN node closest to the user is first determined through DNS redirection technology, and the user’s request is directed to the node. When the user’s request reaches the designated node, the CDN server (Cache on the node) is responsible for providing the content requested by the user to the user. The specific process is as follows: the user inputs the domain name of the website to be visited in his browser, the browser requests the local DNS to resolve the domain name, and the local DNS sends the request to the main DNS of the website. The main DNS determines the most appropriate CDN node at that time according to a series of policies, and sends the resolution result (IP address) to the user. The user requests the content of the corresponding website from the given CDN node.

Due to the performance bottleneck of users accessing the source service, the content of the source station is cached to multiple nodes through CDN technology. When a user makes a request to the domain name of the source site, the request will be dispatched to the service node closest to the user, and the service node will respond directly and quickly, effectively reducing the user’s access delay and improving the availability.

The advantages of CDN are obvious: (1) The CDN node solves the problem of cross operator and cross region access, and the access delay is greatly reduced; (2) Most requests are completed at the edge nodes of the CDN. The CDN plays a role of shunting and alleviates the load of the source station.

The implementation of CDN needs the support of many network technologies, among which load balancing technology, dynamic content distribution and replication technology and cache technology are the main ones

Load balancing technology

Load balancing technology is not only applied in CDN, but also widely used in many fields of the network, such as server load balancing and network traffic load balancing.

As the name implies, load balancing in the network is to distribute the network traffic to several servers or network nodes that can complete the same tasks as evenly as possible, so as to avoid overload of some network nodes. This can not only improve the network traffic, but also improve the overall performance of the network.

In CDN, load balancing is divided into server load balancing and server overall load balancing (also known as server global load balancing). Server load balancing refers to the ability to allocate tasks among servers with different performance, which can not only ensure that servers with poor performance do not become the bottleneck of the system, but also ensure that the resources of servers with high performance are fully utilized. While the overall server load balancing allows Web hosting providers, portal sites and enterprises to distribute content and services according to geographical location. Improve fault tolerance and availability by using multi site content and services to prevent failures caused by local network or regional network interruption, power failure or natural disasters. In the CDN scheme, the overall load balancing of the server will play an important role, and its performance will directly affect the performance of the entire CDN.

Dynamic content distribution and replication technology

As we all know, the response speed of website access depends on many factors, such as whether there is a bottleneck in the bandwidth of the network, whether there is congestion and delay in the route during transmission, the processing capacity of the website server and the access distance. In most cases, the response speed of the website is closely related to the distance between the visitor and the website server. If the distance between the visitors and the website is too long, the communication between them needs to go through heavy routing and processing, and network delay is inevitable.

An effective method is to use content distribution and replication technology to distribute and copy most of the static web pages, images and streaming media data that account for the main body of the website to the acceleration nodes in various places. Therefore, dynamic content distribution and replication technology is also a major technology required by CDN.

Cache technology

Cache technology is not a new technology. Web caching services improve the response time of users in several ways, such as proxy caching services, transparent proxy caching services, and transparent proxy caching services using redirection services. Through the Web cache service, users can minimize the traffic of the WAN when accessing the Web page. For corporate intranet users, this means that the content is cached locally, instead of retrieving Web pages through a dedicated Wan. For Internet users, this means storing content in the cache of their ISP without having to retrieve web pages over the Internet. This will undoubtedly improve the access speed of users. The core role of CDN is to improve the access speed of the network. Therefore, cache technology will be another major technology adopted by CDN.

Cloud Computing - Virtual eXtensible Local Area Network (VxLAN)

Sun, 12 Jun 2022 00:00:00 +0000

VxLAN overview and causes

The scale of virtual machine is limited by the performance specification of network equipment Server virtualization leads to a geometric increase in the number of virtual machines, which is accompanied by a sharp increase in the number of MAC addresses of virtual machine network cards. The MAC address table specification of the layer 2 device on the original access side cannot meet the rapidly growing number of virtual machines. VxLAN solution: VxLAN encapsulates the message sent by the virtual machine in the same area planned by the network administrator into a new UDP message, and uses the IP and MAC addresses of the physical network as the outer layer header, so that the message only shows the encapsulated parameters to other devices in the network.
The isolation capability of the network is limited For the current public cloud or large-scale virtualized computing server needs tens of thousands or more tenants, the existing VLAN number is insufficient to meet this demand VxLAN solution: VxLAN introduces a user ID similar to VLAN ID and becomes the vxlan network identifier VNI (VxLAN network identifier), which is composed of 24 bits. Therefore, it can support up to 16M VxLAN segments.
Virtual machine migration scope is limited The traditional two-layer network can not meet the virtual machine migration that has become a normal job, nor can it meet the business of ensuring that the migration range and the availability of services are not limited. VxLAN solution: VxLAN encapsulates the original message sent by the virtual machine and transmits it through the VxLAN tunnel. This tunnel can span any network. Therefore, for virtual machines in the same network segment, it is logically equivalent to being in the same layer 2 domain. In other words, VxLAN technology can build a virtual layer 2 network on the three-layer network. As long as the virtual machine route is reachable, it can be planned into the same layer 2 network.

Principle of VxLAN

VxLAN is a network virtualization technology. It encapsulates the data packets sent by the original host in UDP, and uses the IP and MAC of the physical network as the outermost headers to encapsulate them in parallel, and then transmits them on the IP network. After reaching the destination, the tunnel endpoint decapsulates and sends the data to the target host. UDP port number of VxLAN: 4789

VxLAN message encapsulation format

VxLAN communication process:

The sender sends a data frame to the receiver, which contains the virtual MAC addresses of the sender and the receiver.
The VTEP node connected by the sender receives the data frame, encapsulates it and sends it to the destination VTEP node.
The message is transmitted to the destination VTEP node through the Underlay network
After receiving the message, the destination VTEP node decapsules the internal data frame and delivers the data frame to the receiver.
The receiver receives the data frame and completes the transmission.

VxLAN Configuration instance

Cloud Computing - Non-Uniform Memory Access (NUMA)

Sat, 04 Jun 2022 00:00:00 +0000

From the perspective of system architecture, the current commercial servers can be broadly divided into three categories:

Symmetric Multi-Processor (SMP)
Non-Uniform Memory Access (NUMA)
Massive Parallel Processing (MPP)

There are two technologies for shared memory multiprocessors:

Uniform-Memory-Access (UMA)
Nonuniform-Memory-Access (NUMA)

Uniform-Memory-Access (UMA)

UMA is a shared memory architecture in parallel computers, that is, physical memory is shared uniformly by all processors and has the same access time for all stored words. Each processor can have a private cache, and peripheral devices can also be shared in some form. UMA technology is suitable for applications with common requirements and multi-user shared time. In applications with strict timing requirements, it is used to accelerate the execution rate of a single large-scale program.

Nonuniform-Memory-Access (NUMA)

NUMA is a memory design used in multiprocess computing, and the memory reading depends on the association between the current memory and the processor. Under NUMA technology, a processor accesses local memory faster than non-local memory (local memory of another processor or memory shared by the processor).

Virtual Nonuniform-Memory-Access (vNUMA)

vNUMA eliminates the transparency between the VM and the operating system, and directly connects the NUMA architecture to the operating system of the VM. It is worth mentioning that vNUMA is as famous as NUMA in the industry. For a wide range of VM technologies, the underlying architecture of VM operation, the NUMA topology of VM spans multiple NUMA nodes. After the initial function of vNUMA enabled VM, the architecture presented to the operating system is permanently defined and cannot be modified. This limitation is usually positive, because changing the vNUMA architecture may lead to instability of the operating system, but if the VM migrates to a hypervisor with a different NUMA architecture through vMotion, it may cause performance problems. It is worth mentioning that although most applications can use vNUMA, most VMS are small enough to load NUMA nodes; Recent optimizations to wide-VM support or vNUMA do not affect them.

Therefore, how the guest operating system or its application places processes and memory can significantly affect performance. The advantage of exposing NUMA topology to VM is that it allows users to make optimal decisions according to the underlying NUMA architecture. By assuming that the user operating system will make the best decisions in the exposed vNUMA topology, rather than inserting memory between NUMA clients.

Importance of NUMA

Multithreaded applications need to access the local memory of the CPU core. When it must use remote memory, the performance will be affected by the delay. Accessing remote memory is much slower than accessing local memory. So using NUMA will improve performance. Modern operating systems attempt to schedule processes on NUMA nodes (local memory + local CPU = NUMA nodes), and processes will use local NUMA nodes to access the core. ESXi also uses NUMA technology for a wide range of virtual machines. When the virtual core is greater than 8, the virtual core is distributed on multiple NUMA nodes. When the machine starts, the virtual core will be distributed to different NUMA nodes, which will improve performance because the virtual core will access local memory.

summary

When more virtual sockets are allocated to a virtual kernel, or more virtual cores are allocated to a virtual socket, the difference does not affect the number of NUMA nodes. Virtual sockets only affect software licenses, not performance.

Cloud Computing - High Availability (HA)

Mon, 23 May 2022 00:00:00 +0000

Definition of High Availability (HA)

“High availability” usually describes that a system is specially designed to reduce downtime and maintain high availability of its services.

For example, we hope that power and hydraulic services are highly available systems. The reliability of computer system is measured by mean time between failures (MTTF), that is, how long the computer system can operate normally before a failure occurs. The higher the reliability of the system, the longer the mean time between failures. Maintainability is measured by mean time to repair (MTTR), i.e. the average time taken to repair and restore normal operation after system failure. The better the maintainability of the system, the shorter the average maintenance time. The availability of computer system is defined as: MTTF / (MTTF + MTTR) * 100%. It can be seen that the availability of a computer system is defined as the percentage of the normal operation time of the system.

HA of load balancing server

In order to shield the failure of the load balancing server, a backup machine needs to be established. Both the primary server and the backup machine run the High Availability monitoring program to monitor the health of each other by transmitting information such as “I am alive”. When the backup machine cannot receive such information within a certain time, it takes over the service IP of the primary server and continues to provide services; When the backup manager receives the message “I am alive” from the primary manager again, it releases the service IP address, and the primary manager starts to perform cluster management again. In order to enable the system to work normally in the event of failure of the primary server, we synchronize and backup the configuration information of the load cluster system between the primary and backup machines to maintain the basic consistency of the two systems.

Fault tolerant backup operation process of HA

Auto-Detect phase

The software on the host passes through the redundant detection line and through the complex monitoring program. Logical judgment is used to detect each other’s operation. The items to be checked include: host hardware (CPU and peripheral), host network, host operating system, database engine and other application programs, host and disk array connection. In order to ensure the correctness of detection and prevent wrong judgment, the safety detection time can be set, including the detection time interval and the detection times to adjust the safety factor, and the redundant communication connection of the host computer records the collected information for maintenance reference.

Auto-Switch phase

If a host confirms the fault of the other side, the normal host will not only continue the original task, but also take over the preset backup operation procedures according to various fault-tolerant backup modes, and carry out subsequent procedures and services.

Auto-Recovery phase

After the normal host works instead of the fault host, the fault host can be repaired offline. After the fault host is repaired, connect with the original normal host through the redundant communication line and automatically switch back to the repaired host. The whole reply process is completed automatically by EDI-HA, and the reply action can be selected as semi-automatic or no reply according to the pre configuration.

Three working modes of HA:

Master slave mode (asymmetric mode) Working principle: the main machine is working and the standby machine is in the monitoring preparation state; When the host is down, the standby machine will take over all the work of the host. After the host is restored to normal, the service will be switched to the host automatically or manually according to the user’s settings. The data consistency will be solved through the shared storage system.
Dual machine duplex mode (mutual standby and mutual assistance) Working principle: two hosts run their own service work at the same time and monitor each other. When either host goes down, the other host immediately takes over all its work to ensure real-time work. The key data of the application service system is stored in the shared storage system.
Cluster working mode (multi server mutual backup mode) Working principle: multiple hosts work together, each running one or several services, and each defining one or more standby hosts for the service. When a host fails, the services running on it can be taken over by other hosts.

Measures of HA

calculation formula: ％availability＝（Total Elapsed Time－Sum of Inoperative Times）/ Total Elapsed Time where elapsed time = operating time＋downtime. Availability is related to the failure rate of system components. An indicator to measure the failure rate of system equipment is MTBF (mean time between failures). Usually this metric measures the components of the system, such as disks. MTBF＝Total Operating Time / Total No. of Failures where Operating time is the time the system is in use (excluding downtime).

Design of HA system

To design the usability of the system, the most important thing is to meet the needs of users. The failure of the system will affect the availability index only when the service failure is enough to affect the needs of the system users. The sensitivity of the user depends on the application provided by the system. For example, a failure that can be repaired within 1 second will not be perceived in some online transaction processing systems, but it is unacceptable for a real-time scientific computing application system. The high availability design of the system depends on your application. For example, if a planned downtime of several hours is acceptable, the storage system may not be designed to be disk hot swappable. Instead, you might want to use a disk system that can be hot swapped, hot swapped, and mirrored. Therefore, the high availability system needs to consider: Determine the duration of business interruption. According to the index of measuring ha calculated by the formula, the interruptible time in a period of time can be obtained. However, it is possible that a large number of short-time interrupts are tolerable, while a small number of long-time interrupts are intolerable. Statistics show that not all unplanned downtime factors are hardware problems. Hardware problems only account for 40%, software problems account for 30%, human factors account for 20%, and environmental factors account for 10%. Your high availability system should take all of the above factors into account as much as possible.

Factors leading to planned downtime

Periodic backup
Software upgrade
Hardware expansion or maintenance
System configuration changes
Data change

Factors leading to unplanned downtime

Hardware failure
File system full error
Memory overflow
Backup failed
Disk full
Power supply failure
Network failure
Application failed
natural disaster
Operation or management error Through targeted design, losses caused by all or part of the above factors can be avoided. Of course, 100% highly available systems do not exist.

Create a highly available computer system

Creating a high availability computer system on UNIX system is a common and effective practice in the industry, that is, using Cluster system to organically form a group of host systems through network or other means to provide services to the outside world. Create a Cluster system, and combine redundant hardware components and software components with high availability through software that realizes high availability to eliminate single point of failure:

Eliminate single point failure of power supply
Eliminate single point of failure of disk
Eliminate single point of failure of SPU (system Process Unit)
Eliminate network single point of failure
Eliminate software single point of failure
Try to eliminate single point of failure during single system operation

Mobile Communication - Wireless Channel Fading

Tue, 26 Apr 2022 00:00:00 +0000

Radio wave propagation effect

The wireless communication channel is a time-varying channel. When the radio signal passes through the channel, it will suffer from fading from different channels. The total power of the received signal is a combination of path loss, shadow fading and multipath effect. (multipath propagation: when radio waves encounter various obstacles, reflection, diffraction and scattering will occur, which will interfere with direct waves, that is, there are multiple paths between transceivers.)

Path loss: the signal strength changes with distance (hundreds or thousands of wavelengths) in a large range, which should be proportional to the square of the distance, and in essence, it is a wave energy diffusion phenomenon.
Shadowing: the median value of the signal level in the medium range changes slowly (hundreds of wavelengths). Due to the slow fading caused by the topographic relief in the propagation environment and the shielding of buildings and other obstacles, the median value of the signal changes slowly. The fading depth is related to the frequency and obstacles.
Multipath effect (fading): the instantaneous value of the signal in a small range changes rapidly (tens of wavelengths). Due to the fast fading caused by multipath propagation, the instantaneous value of the field strength of the received signal changes rapidly.

Wireless channel fading

According to the propagation effect of radio wave, the fading of radio channel is generally divided into two categories: large-scale fading and small-scale fading (the small scale is generally the same order of magnitude as the signal wavelength). The scale refers to the size of time or distance.

Large-scale fading

(including transmission loss and shadow fading; large-scale fading is slow fading, but slow fading is not necessarily large-scale fading.)

Transmission loss (path loss): when a radio signal is transmitted through a large-scale distance channel, with the increase of the transmission path, the radio wave energy diffuses, resulting in the average power attenuation of the received signal. The attenuation is related to the transmission distance. The greater the distance, the more the attenuation.

Shadow fading: when the radio signal is transmitted in the medium-scale distance channel, the shadow area is formed behind the obstacles due to the undulation of the terrain or the blocking of tall buildings, resulting in the random change of the average power of the received signal. Its fading characteristics obey lognormal distribution.

Small-scale fading

(caused by multipath effect or Doppler effect. When the transmission channel changes in small scale (distance or time), the radio signal is reflected, diffracted and scattered by surrounding obstacles during transmission, and its amplitude or phase changes rapidly.)

According to the delay spread caused by multipath effect, small-scale fading is divided into frequency selective fading (the channel has constant gain and the bandwidth range of linear phase is smaller than the bandwidth of the transmitted signal) and frequency non selective / flat fading (the wireless channel bandwidth is larger than the bandwidth of the transmitted signal and has constant gain and linear phase within the bandwidth range);

According to the Doppler (frequency domain) spread generated by Doppler effect, small-scale fading is divided into fast fading (the coherence time of the channel is shorter than the period of the transmitted signal, and the bandwidth of the baseband signal is smaller than the Doppler spread) and slow fading (the coherence time of the channel is much longer than the period of the transmitted signal, and the bandwidth of the baseband signal is much larger than the Doppler spread).

Communication & Computer Network - "Propagation delay" and "Transmission delay"

Thu, 21 Apr 2022 00:00:00 +0000

Propagation delay

The time when data (more specifically, photoelectric signal, because data is hidden in the photoelectric signal during transmission) propagates from one end of the network through the medium to the other end.

Determinant

This mainly depends on the propagation speed of the photoelectric signal in the medium and the length of the direct medium at both ends.

Transmission delay

It may be clearer to change the transmission delay to the transmission delay. Computer Network (the seventh edition, edited by Xie Xiren) is translated as transmission delay, which is easy to be ambiguous for Chinese. The following text also adds a special note to understand it as transmission.

Determinant

Actually, it refers to the time required from the start of data transmission to the completion of data transmission. This actually has a great relationship with the channel transmission rate. Note that there is no ambiguity when it comes to speed, and there is ambiguity when it comes to speed.

An analogy used to easily understand the difference between the two

Transmission delay is also called transmission delay. The example of a bus can be used for analogy. The transmission delay is similar to that before the bus arrives at the platform, the time spent by the passengers waiting for the bus on the platform from the first boarding to the last boarding is the transmission delay; The time taken by the bus to carry passengers from one station to the next is the propagation delay. In the network structure, the process of sending data to the media is the process of passengers boarding; The time taken for data to propagate on the propagation medium in the form of electromagnetic signals or optical signals is the propagation delay.

Other delays

Queuing delay

When a packet is transmitted through a network, it passes through many routers. After entering the router, you should queue in the input queue for processing. After the router determines the forwarding interface, it must queue in the output queue for forwarding

Nodal processing delay

The time generated by the node to store and forward messages (such as parsing messages, looking up routing tables, etc.)

Mobile Communication - TCP Three-way Handshake and Four-Way Wavehand

Sun, 17 Apr 2022 00:00:00 +0000

Three-way Handshake

First handshake: the client sends a SYN packet (syn=x) to the server and enters SYN_SEND status, waiting for server confirmation;
Second handshake: when the server receives the SYN packet, it must confirm the SYN (ack=x+1) of the client and send a SYN packet (syn = y), that is, SYN+ACK packet. At this time, the server enters SYN_RECV status;
Third Handshake: after receiving the SYN+ACK packet from the server, the client sends an acknowledgement packet ACK (ack=y+1) to the server. After this packet is sent, the client and the server enter the ESTABLISHED state and complete the third handshake.

The packet transmitted during the handshake does not contain data. After the three handshakes, the client and the server formally start to transmit data. In an ideal state, once a TCP connection is established, the TCP connection will be maintained until either of the communication parties actively closes the connection.

Four-Way Wavehand

Similar to the “three-way handshake” for establishing a connection, disconnecting a TCP connection requires “four-Way Wavehand”.

First wave: the active closing party sends a FIN to close the data transmission from the active party to the passive Closing Party, that is, the active Closing Party tells the passive closing party that I will not send you any more data (for the data sent before the FIN packet, if the corresponding ACK acknowledgement message is not received, the active closing party will still resend the data). However, the active closing party can still accept the data at this time.
Second wave: after receiving the FIN packet, the passive closing party sends an ACK to the other party, confirming that the sequence number is the received sequence number + 1 (the same as SYN, one FIN occupies one sequence number).
Third wave: the passive closing party sends a FIN to close the data transmission from the passive closing party to the active Closing Party, that is, to tell the active closing party that my data has been sent and will not be sent to you again.
Fourth wave: after the active closing party receives the FIN, it sends an ACK to the passive Closing Party, and confirms that the serial number is the received serial number + 1. So far, the fourth wave is completed.

Communication & Computer Network - "Single-hop" and "Multi-hop"

Thu, 07 Apr 2022 00:00:00 +0000

Definition of single-hop

In the traditional wireless LAN, each client accesses the network through an infinite link linked with the AP. If users want to communicate with each other, they must first access a fixed access point. This network is called a single-hop network.

Definition of multi-hop

In a wireless network, any wireless device point can act as an AP and a router at the same time. Each node in the network can send and receive signals, and each node can directly communicate with one or more peer nodes. This network is called a multi-hop network. It can also be understood that the transmission of information is completed through forwarding by multiple nodes on the link. Each node can directly communicate with one or more peer nodes. Multi hop means multiple forwarding.

Concrete explanation and application of multi-hop intermediate node

In wireless multi hop networks, the typical path from the source node to the destination node is composed of multiple hops, and the intermediate node on the path acts as the forwarding node. Therefore, a node in a wireless multi-hop network has two functions:

Act as an end node to generate or receive data packets;
Act as a router to forward data packets from other nodes. The main applications are: wireless AD Hoc network, wireless sensor network (WSN) and wireless mesh network.

Control_Typical_Block

Mon, 21 Mar 2022 00:00:00 +0000

1 Typical Components

The so-called component is the classification of original components with the same form transfer function. These transfer functions are the most classical and typical of control systems.

1.1 Proportional Component

Transfer function $$ G(s) = K $$

This system is to multiply the input by a constant K and then output, so it is called the proportional component. A typical proportional component is a potentiometer, which has a corresponding proportional output according to the position of the resistance pointer.

1.2 Differential Component

Transfer function $$ G(s) = s $$

It is not difficult to find that this is the Laplace transform differential theorem. Multiplying the input by s is the differential in the time domain.

1.3 Integral Component

Transfer function $$ G(s) = \frac{1}{s} $$

This is also the integral theorem of Laplace transform. Dividing the input by s is the integral in the time domain.

1.4 Inertia Component

Transfer function $$ G(s) = \frac{1}{Ts+1} $$

This function actually describes a process of slow disappearance. We write its time-domain function. $$ h(t) = \frac{1}{T}e^{-\frac{t}{T}} $$

With the increase of time, the function value decreases exponentially, just like it keeps a distance due to inertia, and finally disappears. Where $T$ is the time constant, which determines the decay speed of the system.

1.5 Sscillation Component

Transfer function $$ G(s) = \frac{1}{T^2s^2+2\xi Ts+1} $$

The function describes a state of oscillation, which will be described in detail in the second-order system response analysis later.

1.6 First-order Composite Differential Component

Transfer function $$ \tau s+1 $$

1.7 Second-order Composite Differential Component

Transfer function $$ \tau^2s^2+2\xi \tau s+1 $$

1.8 Summary

Any transfer function can be regarded as a combination of typical components, which can form a variety of systems.

2 Load Effect Problem

Now consider that two circuits are connected together, and the first stage circuit inputs $u_r$, output $u_a$, second stage circuit input $u_a$, output $u_c$。 However, we all know that if two circuits are simply coupled, the second stage circuit will actually affect the first stage circuit, which is called load effect. When the load is connected to the system, the system may not have a simple input-output relationship.

3 System Block Diagram

One way to describe the control system is to use the block diagram, in which the flow direction of the signal is represented by the directed line segment, the system module is represented by the box, and the adder is represented by the circle. The block diagram indicates the flow of signals and the processing of signals by various parts.

With the block diagram of the system, the next step is to solve the transfer function of the whole system.

4 Transfer Function Solution

4.1 Digestion Coefficient Method

Write all the signals in the block diagram, write the relationship between the input and output of the module according to the signal processing, and then eliminate all the intermediate variables to obtain the relationship between the final output and the input. This method is the simplest, but it is not easy when the system becomes very complex.

4.2 Equivalent Transformation Method

There are many equivalent transformation rules in the structure diagram. Flexible use of them can find out the transfer function of the system. This process is also called simplification.

4.2.1 Feedback equivalence

The forward module of the system is $G(s)$, the feedback module is $H(s)$, and the system is equivalent to

$$ \frac{G(s)}{1+G(s)H(s)} $$

4.2.2 Series equivalence

System $G_1(s)$ and $G_2(s)$ in series, equivalent to

$$ G_1(s)\cdot G_2(s) $$

4.2.3 Parallel equivalence

System $G_1(s)$ and $G_1(s)$ parallel connection, equivalent to

$$ G_1(s)+G_2(s) $$

4.3.4 Comparison point forward

The adder is also called a comparator because it subtracts two input signals (signal 1 plus negative signal 2)

The comparison point is after $G(s)$, which is equivalent to that the signal will pass through $G(s)$ after the forward shift without passing $G(s)$. At this time, it needs to be multiplied by the inverse system $\frac{1}{G(s)}$.

4.3.5 Comparison point backward

On the other hand, the comparison point was originally in front of $G(s)$, which is equivalent to passing $G(s)$. After moving back, it did not pass $G(s)$, so you need to add a $G(s)$.

4.3.6 Lead out point forward

If the lead out point is behind the system, the original signal has passed through the system. If it is moved to the front, it is equivalent to being led out without passing through the system, and a $G(s)$ needs to be added.

4.3.7 Lead out point backward

If the leading point is in front of the system, the original signal does not pass through the system. After moving to the back, it is equivalent to passing through the system before being led out. Therefore, it needs to be multiplied by an inverse system $\frac{1}{G(s)}$.

Machine Learning - Perceptron

Mon, 07 Mar 2022 00:00:00 +0000

1 Introduction

Perceptron is a simple model for binary classification. Its construction idea is to use a hyperplane to divide data into positive and negative categories, and output $+1$ or $-1$. Perceptron can be optimized by gradient descent method, but its optimization algorithm only converges when the data is linearly separable. The perceptron was proposed by Rosenblatt in 1957.

2 Model construction

According to the above construction idea, the perceptron needs to map a data with a feature space of $X \subseteq R^n$ to the output space of $Y={+1,-1}$. Use $x\in X$ to represent the feature vector of the training data, and $y\in Y$ to represent the output value. As mentioned earlier, the perceptron uses the hyperplane to distinguish the positive class from the negative class, so the equation of the hyperplane can be written first

$$ w\cdot x +b=0 $$

where $w$ is called the weight vector. Because the dot multiplication of $w$ and $x$ is actually a weighted sum with the characteristics of the input data. Then, we need to map the results of hyperplane calculation to 0 and 1, and the $sgn$ function is selected by the perceptron. Therefore, the mathematical expression of the perceptron model is

$$ f(x) = sgn(w\cdot x+b) $$

Its parameters are the weight vector $w$ and the constant $b$. Corresponding to the hyperplane, $w$ is the normal vector of the hyperplane and $b$ is the intercept of the hyperplane.

3 Learning strategies of perceptron

In order to optimize the parameters of the perceptron, we need to find its loss function. It is easy to think that the loss function can be measured by the number of classification errors, but the number of errors is discrete, and this function is non differentiable, so gradient descent is not necessary. Therefore, considering the total distance from the misclassified point to the hyperplane S, the calculation method from a point to the hyperplane is

$$ \frac{1}{||w||}|w\cdot x_0+b| $$

where $||w||$ is $L_2$ norm of $w$.

If we analyze the two situations when the perceptron misclassifies, that is, the negative judgment of the positive sample and the positive judgment of the negative sample. For the first, $w\cdot x_i+b >0$, the perceptron should have output 1, but when it is misclassified, it outputs -1, so $-(w\cdot x_i+b)y_i>0$, for the second, $w\cdot x_i+b <0$, the perceptron should output -1, but it outputs 1, so $-(w\cdot x_i+b)y_i>0$, so if the perceptron misclassifies a data, there will always be

$$ -y_i(w\cdot x_i+b)>0 $$

Therefore, the absolute value in the distance calculation formula can be removed to obtain a new distance calculation formula $$ -\frac{1}{||w||}y_i(w\cdot x_i+b) $$ Therefore, we sum the distances from all points to the hyperplane to obtain the calculation formula of the total distance $$ -\frac{1}{||w||}\sum_{x_i\in M}y_i(w\cdot x_i+b) $$

Now do not consider the weight of $L_2$ norm, then the loss function of the perceptron is $$ L(w,b) = -\sum_{x_i\in M}y_i(w\cdot x_i+b) $$

This function means that if there are fewer misclassifications and the point of misclassification is closer to the hyperplane, the loss function will be smaller.

4 Optimization

The optimization problem of perceptron is to solve the parameters $w$ and $b$ of the minimum loss function.

$$ \min_{w,b}L(w,b) = -\sum_{x_i\in M}y_i(w\cdot x_i+b) $$

Then, the stochastic gradient descent method is used for optimization. First, a random $w_0$ and $b_0$, then find the gradient

For normal vector $w$ $$ \nabla_w L(w,b) = -\sum_{x_i\in M}y_ix_i $$

For intercept $b$ $$ \nabla_b L(w,b) = -\sum_{x_i\in M}y_i $$

Therefore, if a misclassified data is randomly selected, the update method is $$ \begin{cases} w^* = w-(-\eta y_ix_i)=w+\eta y_ix_i \ b^* = b-(-\eta y_i) = b+\eta y_i \end{cases} $$

Where $\eta$ is the learning rate.

Therefore, the algorithm suitable for programming implementation should be

Random $w_0$, $b_0$
Select a data $(x_i, y_i)$ in the training set
If $y_i(w\cdot x_i+b)\leq 0$
Use the update method to update the parameters
Loop to step 2 until the perceptron does not generate misclassified data.

So far, the construction of perceptron, learning strategy and optimization algorithm have been explained.

Posts | Yitao Li's Homepage

Particle Swarm Optimization with Penalty Function

1. Overview of algorithm

2. Objective functions and updated formulas

3. Summary of advantages and disadvantages

Sources and Effects of Noise in Kalman Filter

1. Theoretical assumptions of two noises in the description of state space

2. How to obtain two noises from engineering application

3. Effect of two noise sources on Kalman filter estimation error

Mobile Communication - Three modes of RRC in 5G/NR (idle, active, inactive)

Introduction of RRC_ INACTIVE mode

So why introduce RRC_ INACTIVE mode?

How to reduce energy consumption and delay in RRC_ INACTIVE mode?

Differences and explanations of the three models

RRC_ IDLE：

RRC_ INACTIVE：

RRC_ ACTIVE (CONNECTED)：

Machine Learning - AWS series of course exercises

[Module 2 - Introducing Machine Learning]

[Module 3 - Implementing a Machine Learning pipeline with Amazon SageMaker]

[Module 4 - Implementing a Machine Learning pipeline with Amazon SageMaker]

[Module 5 - Introducing Computer Vision (CV)]

[Module 6 - Introducing Natural Language Processing]

Cloud Computing - Total Cost of Ownership (TCO)

Overview

Introduction

Concepts of TCO

Accounting of TCO

Pros & Cons of TCO

Pros

Cons

Summary

Cloud Computing - Content Delivery Network (CDN)

Overview

Related technologies

Load balancing technology

Dynamic content distribution and replication technology

Cache technology

Cloud Computing - Virtual eXtensible Local Area Network (VxLAN)

VxLAN overview and causes

Principle of VxLAN

VxLAN message encapsulation format

VxLAN Configuration instance

Cloud Computing - Non-Uniform Memory Access (NUMA)

Three system architectures & Two memory sharing methods

Uniform-Memory-Access (UMA)

Nonuniform-Memory-Access (NUMA)

Virtual Nonuniform-Memory-Access (vNUMA)

Importance of NUMA

summary

Cloud Computing - High Availability (HA)

Definition of High Availability (HA)

HA of load balancing server

Fault tolerant backup operation process of HA

Auto-Detect phase

Auto-Switch phase

Auto-Recovery phase

Three working modes of HA:

Measures of HA

Design of HA system

Factors leading to planned downtime

Factors leading to unplanned downtime

Create a highly available computer system

Mobile Communication - Wireless Channel Fading

Radio wave propagation effect

Wireless channel fading

Large-scale fading

Small-scale fading

Communication & Computer Network - "Propagation delay" and "Transmission delay"

Propagation delay

Determinant

Transmission delay

Determinant

An analogy used to easily understand the difference between the two

Other delays

Queuing delay

Nodal processing delay

Mobile Communication - TCP Three-way Handshake and Four-Way Wavehand

Three-way Handshake

Four-Way Wavehand