Anaconda software installation
Download the Anaconda installation package
through the official website https://www.anaconda.com/products/distribution#Downloads Download the installer and select the appropriate system.
or directly via the download link https://repo.anaconda.com/archive/Anaconda3-2021.11-Linux-x86_64.sh to download.
Place the downloaded installer in the softs directory.
Execute the command directly to install.
Follow the installation steps and enter the appropriate values where appropriate.
Welcome to Anaconda3 2021.11 In order to continue the installation process, please review the license agreement. # It is required to view the user agreement here, just view it Please, press ENTER to continue >>> # After reading the user agreement here, you need to agree to continue Do you accept the license terms? [yes|no] [no] >>> yes # Here, according to our own environmental planning, specify the Anaconda installation directory /home/hadoop/apps/anaconda3 Anaconda3 will now be installed into this location: /home/hadoop/anaconda3 - Press ENTER to confirm the location - Press CTRL-C to abort the installation - Or specify a different location below [/home/hadoop/anaconda3] >>> /home/hadoop/apps/anaconda3 PREFIX=/home/hadoop/apps/anaconda3 Unpacking payload ...
After the installation is complete, there will be the following prompt, telling us that the installation is complete and asking if we need to initialize.
installation finished. Do you wish the installer to initialize Anaconda3 by running conda init? [yes|no] [no] >>>
Let's stop here for a moment. If you don't know whether to initialize, you can look at the official documents first. https://docs.anaconda.com/anaconda/install/linux/.
According to the official website, if you choose "no" not to initialize, our shell script will not be modified, and subsequent initialization can be completed by executing conda init. It is not difficult for us to guess that this initialization should add environment variables and environment configuration for us. And the official suggestion is to choose "yes".
Let's take a look at the current environment configuration file first.
Enter yes at the installation interface and continue.
installation finished. Do you wish the installer to initialize Anaconda3 by running conda init? [yes|no] [no] >>> yes
As you can see, the installer tells us:
- The installation was successful
- Modified /home/hadoop/.bashrc for us
- If you don't want to activate the base environment at startup, you can disable it with the conda config --set auto_activate_base false command
Let's see what has changed in /home/hadoop/.bashrc.
You can see that there is an extra section of initialization configuration in the configuration file.
When we log out and log back in to the system, the base environment is automatically activated.
Configure domestic mirroring
Although Anaconda integrates many Python libraries, sometimes we still need to install some libraries ourselves. In order to speed up the download speed, we can configure domestic mirror acceleration.
Edit the file ~/.condarc
channels: - defaults show_channel_urls: true default_channels: - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2 custom_channels: conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud pytorch-lts: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
After the configuration is complete, you can use the following command to check whether it takes effect.
Create a virtual environment
After Anaconda is installed, we create a separate virtual environment to isolate it from other environments.
conda create -n pyspark python=3.9
And activate the newly created virtual environment.
conda activate pyspark
Install dependent libraries.
pip install pyspark pyhive pymysql jieba -i https://pypi.tuna.tsinghua.edu.cn/simple
Configure environment variables
Configure the environment variable PYSPARK_PYTHON in the .bashrc file to point to the executable Python environment.
Because in pyspark command, there is code
if [[ -z "$PYSPARK_PYTHON" ]]; then
If this variable PYSPARK_PYTHON is not configured, python3 will be used as the execution command by default. If Python3 is not installed in the system, an error will be reported.
export JAVA_HOME=/home/hadoop/apps/java export HADOOP_HOME=/home/hadoop/apps/hadoop export HADOOP_CONF_DIR=/home/hadoop/apps/hadoop/etc/hadoop export YARN_CONF_DIR=/home/hadoop/apps/hadoop/etc/hadoop export SPARK_HOME=/home/hadoop/apps/spark export PYSPARK_PYTHON=/home/hadoop/apps/anaconda3/envs/pyspark/bin/python3 export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH