Python-for-Automation/C2/File-Backup-and-Compression/English

From Script | Spoken-Tutorial
Revision as of 17:22, 3 January 2025 by Madhurig (Talk | contribs)

Jump to: navigation, search


Visual Cue Narration
Show slide: Welcome to the Spoken Tutorial on “File Backup and Compression”
Show slide:

Learning Objectives

In this tutorial, we will learn to
  • Automate file backup using Python
  • Compress backups in various formats such as zip, tar.gz, tar.bz2
  • Schedule the backup process to run at a fixed time every day
Show slide: To record this tutorial, I am using
  • Ubuntu Linux OS version 22.04
  • Python 3.12.3
Show slide:

Prerequisite

https://www.spoken-tutorial.org

To follow this tutorial:
  • You must have basic knowledge of using Linux Terminal and Python
  • For pre-requisite Linux and Python Tutorials, please visit this website
  • Python libraries required for automation must be installed
Show slide:

Code files

● The files used in this tutorial are provided in the Code

files link.

● Please download and extract the files.

● Make a copy and then use them while practicing.

Show the folder Sample I have created a folder Sample which contains some files for demonstration.
Point to the file in Downloads folder


file_backup.py contains the code for the backup and compression of directories

run_backup.py is used to call the functions defined in file_backup.py

Ensure that both the source codes are saved in the same location.

Schedule.txt contains the commands required to schedule the backup process everyday

Open file_backup.py file Now, let us go through the file_backup.py file in the text editor.
Highlight: Import the necessary modules for File backup and compression
Highlight: First, we define a function that performs the backup operation.

We pass the source and backup directories and compression format as parameters.

Highlight:

compress=False

compress is a boolean value that determines if the backup needs to be compressed.
Highlight:

if not os.path.exists(backup_dir):

First we check if the backup directory exists using the exists function.
Highlight:

os.makedirs(backup_dir)

We create a backup directory using the makedirs function.
Highlight: Then we check if there is an existing backup directory named current_backup.

This existing current_backup directory is removed using rmtree function.

This prevents multiple copies of the backup directory from being created.

Highlight: We create a path for the new backup directory as current_backup using the join function.
Highlight:

for root, dirs, files in os.walk(source_dir):

Next, we use the walk method to recursively traverse through the source directory.

We then try to recreate the same directory structure in the backup folder.

Highlight:

root

Here, root is the current directory being traversed.
Highlight:

relative_path = os.path.relpath(root, source_dir)

relpath function computes the relative path of the root to the source folder.

This helps in maintaining the same structure as the source directory.

Highlight: Then the join function is used to construct the path for the destination directory.
Highlight:

if not os.path.exists(dest_dir):

os.makedirs(dest_dir)

We now check if the destination directory exists.

If it does not, then the destination directory is created.

Highlight: This for loop constructs the path to the source file in the source directory.

It will also do the same for the destination files in the backup directory.

Highlight:

if not os.path.exists(dest_file)

First we check if the source file already exists in the backup directory.

If it does not exist, then the file is copied.

Highlight: Next, getmtime function checks if the file has been modified since the last backup date.

The file gets copied only if it is modified.

This helps to minimize the processing when we perform backup everyday.

Highlight:

shutil.copy2(source_file, dest_file)

copy2 function is used to create a copy of the source to the destination directory.
Highlight: If compress is set to True, we generate a compressed backup.
Highlight: We call the compress_backup function to compress current_backup folder.
Highlight:

shutil.rmtree(current_backup_dir)

After compression we use rmtree method to remove the current_backup directory.
Highlight: Finally, we print a success message indicating where the backup is stored.
Highlight: Next, we define a function to compress the backup folder
Highlight: In the function, we check which compression format has been passed.

If the format is zip, we use make_archive method to create a zip archive.

Highlight: If the format is tar dot gz or tar dot bz2, we use tarfile dot open function.
This function creates tar archives with gzip or bzip2 compression.
This method adds the contents of the source directory to the tar archive.
Highlight:

os.path.basename(source_dir)

os dot path dot basename allows you to set the name of the tar archive.
Highlight: If an unsupported compression format is provided, we raise a ValueError.
Save file_backup.py Save the code as file_backup.py in the Downloads folder.
Open file:

run_backup.py

Open Downloads folder and open the run_backup.py file.

Now, let us go through this code.

Highlight:

import file_backup.py

We need to import the file_backup python code as a module into the run_backup file.
  • In Downloads point to sample Files
Go to the Downloads folder.

This is our sample directory that we will use for testing.

Highlight:

source_directory = ‘’

backup_directory_path = ‘’

Source_directory shows the path of the sample folder where it is saved

Backup_directory_path shows the path where back up has to be saved.

Please change the path according to the location of your directory.

Highlight: Finally, we call the perform_backup function and pass all the parameters.
Set compress=False Let us set the compress parameter to False and the format to zip.
Running the code:

Press Ctrl + Alt + T

Save the code file run_backup.py
Open the terminal by pressing Control, Alt and T keys simultaneously.
Type in terminal:

source Automation/bin/activate

We will open the virtual environment we created for the Automation series.

Type source space Automation forward slash bin forward slash activate.

Then press enter.

Type in terminal:

> cd Downloads

> python3 run_backup.py

Highlight:

In the terminal, type cd Downloads and press enter.

Next type python3 run_backup.py and press enter.

We see a message that backup is done successfully and it is stored in this location.

Let us see the output in this directory.

Switch to Downloads folder


Open current_backup

Go to the Downloads folder, and we can see a directory named current_backup.

Open the current_backup directory and you will find all the files from our source directory.

So we have successfully completed the file backup.

Next we will check the working of file compression.
Go to Downloads folder and in the Sample directory you will see a document named Doc3.odt

Let us modify this file to see he file compression is working properly.

Open the Doc3.odt and add some images or text.

Save the file.

  • In Downloads
  • Right click on Sample, select properties
  • Highlight size
Go to Downloads, right click on the Sample directory and select properties.

Here we can see the size of this directory.

After compression this size should reduce.

In run_backup.py type:

compress=True

Save run_backup.py

Go to run_backup.py file and set compress to True.

This will compress the backup directory.

Save the file.

In terminal type:

python3 run_backup.py

Switch to the terminal and type python3 run_backup.py again and press Enter.

We see a message that indicates successful backup and compression.

  • Point to current_backup.zip
  • Right click on current_backup.zip and select properties
  • Highlight size
This time the current_backup directory is compressed in zip format.

Again right click on the compressed current_backup.zip and select properties.

We can see that the size of the compressed folder is lesser than Sample folder.

  • Open current_backup
This is how the Python backup and compression code works.
Switch to terminal Switch back to the terminal.

Next we will learn how to schedule this program to run automatically at a specified time.

Type crontab -e We will use Crontab editor which is a Linux scheduler that runs a list of commands.

To open and edit the crontab editor, in the terminal, type EDITOR=nano space crontab space hyphen e.

We will type our commands here.

Open schedule.txt

Copy commands from schedule.txt and paste in crontab

Go to the Downloads folder and open schedule.txt

This file contains the necessary commands required to run a scheduler.

Copy these commands.

Switch back to the terminal and paste the commands at the end of the crontab editor.

Type:

50 11 * * *

Here this number represents the minutes, and this is the hours.

I will set this to 11:30 because that is when I want the backup to be scheduled.

Highlight:

* * *

This indicates that the backup will be scheduled every day at the specified time
Highlight

/usr/bin/python3

This is the path to the python3 interpreter.
Highlight

/home/jasmine/Downloads/run_backup.py

Next,we add the path to the run_backup.py file.
Highlight

>> /home/jasmine/Downloads/bkp_logfile.log

We can print the output of our code into a logfile and save it in the Downloads folder

Please change the path according to your system.

Highlight

2>&1

Save crontab: Ctrl+X Y Enter

This redirects all standard errors to the standard output file.

Now, press Ctrl plus X and then press Y followed by Enter to save and exit.

Now we have scheduled the run_backup.py file to be executed at 11:30 everyday.

The current date and time as of the creation of this video is 28th September 11:38

We will check if the scheduler has completed the backup the next day.

Now the date is 29th September 11:35 a.m.
Open current_backup

Directory

Point to Doc4.txt

Let us go to the downloads folder, extract the current_backup directory and open it.

We can see that all the files have been backed up.

Narration We can also check if there were any output messages or errors after the backup.
Point to bkp_logfile.log

Open bkp_logfile.log Highlight output in bkp_logfile.log

In the Downloads folder, we see that a bkp_logfile.log has been created.

Here, the output or error messages of the program will be stored after the cron job is completed.

Narration In this way, we can backup and compress a directory at a predetermined time and date.
Type in terminal:

deactivate

In the terminal, type deactivate.This will allow you to exit the virtual environment.
Show slide:

Summary

This brings us to the end of the tutorial. Let us summarize.

In this tutorial, we have learnt to

  • Automate file backup using Python
  • Compress backups in various formats (zip, tar.gz, tar.bz2)
  • Schedule the backup process to run at a fixed time each day
Show slide:

Assignment

As an assignment, do the following:
  1. Change the compression format to tar.gz in run_backup.py file
  2. In the crontab editor, change the schedule to once a week.
Show slide:

About the Spoken Tutorial Project

The video at the following link summarizes the Spoken Tutorial Project.Please download and watch it
Show Slide:

Spoken Tutorial Workshops

The Spoken Tutorial Project team conducts workshops and gives certificates.

For more details, please write to us.

Show Slide: Answers for THIS Spoken Tutorial Please post your timed queries in this forum.
Show Slide:

FOSSEE Forum

For any general or technical questions on Python for Automation, visit the FOSSEE forum and post your question.
Show slide:

Acknowledgement

The Spoken Tutorial Project was established by the Ministry of Education, Government of India.
Show slide:

Thank You

This is Jasmine Tresa Jose, a FOSSEE Summer Fellow 2024, IIT Bombay signing off.

Thanks for joining.

Contributors and Content Editors

Madhurig, Nirmala Venkat

Retrieved from "https://script.spoken-tutorial.org/index.php?title=Python-for-Automation/C2/File-Backup-and-Compression/English&oldid=56771"