Anaconda software installation

Anaconda software installation

Download the Anaconda installation package

through the official website https://www.anaconda.com/products/distribution#Downloads Download the installer and select the appropriate system.

or directly via the download link https://repo.anaconda.com/archive/Anaconda3-2021.11-Linux-x86_64.sh to download.

Place the downloaded installer in the softs directory.

Install Anaconda

install software

Execute the command directly to install.

sh softs/Anaconda3-2021.11-Linux-x86_64.sh

Follow the installation steps and enter the appropriate values ​​where appropriate.

Welcome to Anaconda3 2021.11

In order to continue the installation process, please review the license
agreement.
# It is required to view the user agreement here, just view it
Please, press ENTER to continue
>>> 
# After reading the user agreement here, you need to agree to continue
Do you accept the license terms? [yes|no]
[no] >>> yes
# Here, according to our own environmental planning, specify the Anaconda installation directory /home/hadoop/apps/anaconda3
Anaconda3 will now be installed into this location:
/home/hadoop/anaconda3

  - Press ENTER to confirm the location
  - Press CTRL-C to abort the installation
  - Or specify a different location below

[/home/hadoop/anaconda3] >>> /home/hadoop/apps/anaconda3
PREFIX=/home/hadoop/apps/anaconda3
Unpacking payload ...

After the installation is complete, there will be the following prompt, telling us that the installation is complete and asking if we need to initialize.

installation finished.
Do you wish the installer to initialize Anaconda3
by running conda init? [yes|no]
[no] >>> 

Let's stop here for a moment. If you don't know whether to initialize, you can look at the official documents first. https://docs.anaconda.com/anaconda/install/linux/.

According to the official website, if you choose "no" not to initialize, our shell script will not be modified, and subsequent initialization can be completed by executing conda init. It is not difficult for us to guess that this initialization should add environment variables and environment configuration for us. And the official suggestion is to choose "yes".

Let's take a look at the current environment configuration file first.

cat .bashrc

cat .bash_profile

Enter yes at the installation interface and continue.

installation finished.
Do you wish the installer to initialize Anaconda3
by running conda init? [yes|no]
[no] >>> yes


As you can see, the installer tells us:

  • The installation was successful
  • Modified /home/hadoop/.bashrc for us
  • If you don't want to activate the base environment at startup, you can disable it with the conda config --set auto_activate_base false command

Let's see what has changed in /home/hadoop/.bashrc.

You can see that there is an extra section of initialization configuration in the configuration file.

When we log out and log back in to the system, the base environment is automatically activated.

Configure domestic mirroring

Although Anaconda integrates many Python libraries, sometimes we still need to install some libraries ourselves. In order to speed up the download speed, we can configure domestic mirror acceleration.

Edit the file ~/.condarc

channels:
  - defaults
show_channel_urls: true
default_channels:
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
custom_channels:
  conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  pytorch-lts: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud

After the configuration is complete, you can use the following command to check whether it takes effect.

conda info

Create a virtual environment

After Anaconda is installed, we create a separate virtual environment to isolate it from other environments.

conda create -n pyspark python=3.9

And activate the newly created virtual environment.

conda activate pyspark

Install dependent libraries.

pip install pyspark pyhive pymysql jieba -i https://pypi.tuna.tsinghua.edu.cn/simple

Configure environment variables

Configure the environment variable PYSPARK_PYTHON in the .bashrc file to point to the executable Python environment.

Because in pyspark command, there is code

if [[ -z "$PYSPARK_PYTHON" ]]; then
PYSPARK_PYTHON=python3
fi

If this variable PYSPARK_PYTHON is not configured, python3 will be used as the execution command by default. If Python3 is not installed in the system, an error will be reported.

export JAVA_HOME=/home/hadoop/apps/java
export HADOOP_HOME=/home/hadoop/apps/hadoop
export HADOOP_CONF_DIR=/home/hadoop/apps/hadoop/etc/hadoop
export YARN_CONF_DIR=/home/hadoop/apps/hadoop/etc/hadoop 
export SPARK_HOME=/home/hadoop/apps/spark
export PYSPARK_PYTHON=/home/hadoop/apps/anaconda3/envs/pyspark/bin/python3
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH

Tags: Linux Python programming language

Posted by indian98476 on Tue, 13 Dec 2022 12:58:18 +0530