The way to advance big data -- Spark SQL log analysis

Basic scheme User behavior log: all behavior data (access, browse, search, click...) of users each time they visit the website User behavior track and traffic log Log data content: 1) System properties accessed: operating system, browser, etc2) Access characteristics: the url clicked, which urUTF-8...

Posted by jtron85 on Thu, 07 Oct 2021 07:00:11 +0530

RDD serialization & & dependencies & & persistence & & partitioner

RDD serialization Closure detection As we have already known, in spark, the external operations are performed by the Driver, while the internal operations of the operator are performed in the executor. Therefore, it is necessary to transmit the data in the Driver to the executor through networUTF-8...

Posted by Mod-Jay on Thu, 07 Oct 2021 09:25:58 +0530

Construction of big data platform

1, Preparation before installation 1. View firewall status 2. Turn off firewall [root@slave2 ~]# systemctl status firewalld.service ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: actiUTF-8...

Posted by moe180 on Thu, 07 Oct 2021 22:35:43 +0530

Introduction to Spark series tutorial operation mode

Spark operation mode Apache Spark is a unified analysis engine for large-scale data processing. It provides high-level API s for Java, Scala, Python and R languages, as well as an optimization engine that supports general execution graph calculation. Spark Core is the core module of spark, whicUTF-8...

Posted by eheia on Fri, 08 Oct 2021 07:17:26 +0530

spark advanced: kafka installation

Kafka plays a very important role in spark ecosystem. Kafka is a high throughput and low latency distributed publish and subscribe message system based on ZooKeeper written in Scala language. It can process a large amount of message data in real time to meet various needs. In actual developmentUTF-8...

Posted by talor123 on Sat, 09 Oct 2021 00:06:55 +0530

spark advanced: use MLlib for collaborative filtering and movie recommendation

1, MLlib introduction MLlib is the implementation of some commonly used machine learning algorithms and libraries on Spark platform. MLlib is the underlying component of MLBase, a machine learning project under research of AMPLab. MLBase is a machine learning platform, MLI is an interface layerUTF-8...

Posted by kliopa10 on Thu, 14 Oct 2021 12:32:28 +0530

Day67_Spark(2) Spark RDD operations

Course Outline Course Content learning effect Master Objectives Spark Execution Process Wordcount execution process master Spark job submission process master RDD Operation RDD Initialization master RDD Operation master variable master sort Advanced Sorting master 1. Spark Execution Process ( UTF-8...

Posted by pkmleo on Tue, 19 Oct 2021 21:40:35 +0530

spark streaming in spark Learning

SparkStreaming Spark Streaming is used for streaming data processing. Spark Streaming supports many data input sources, such as Kafka, Flume, Twitter, ZeroMQ and simple TCP sockets. After data input, you can use Spark's highly abstract primitives, such as map, reduce, join, window, etc. The resUTF-8...

Posted by pradee on Wed, 10 Nov 2021 04:09:20 +0530

Graphic big data | comprehensive case - mining retail transaction data using Spark analysis

Author: Han Xinzi@ShowMeAI Tutorial address: http://www.showmeai.tech/tutorials/84 Article address: http://www.showmeai.tech/article-detail/177 Notice: All Rights Reserved. Please contact the platform and the author for reprint and indicate the source introduction E-commerce and new retail areUTF-8...

Posted by ganesh129 on Tue, 08 Mar 2022 19:17:23 +0530

Graphic big data | Spark machine learning - workflow and Feature Engineering

Author: Han Xinzi@ShowMeAITutorial address: http://www.showmeai.tech/tutorials/84Article address: http://www.showmeai.tech/article-detail/180Notice: All Rights Reserved. Please contact the platform and the author for reprint and indicate the source1.Spark machine learning workflow1) Spark mllibUTF-8...

Posted by proctk on Tue, 08 Mar 2022 20:13:32 +0530