Python-for-Automation/C2/File-Conversion/English
Visual Cue | Narration |
Show slide:
Welcome |
Hello and welcome to the Spoken Tutorial on "File Conversion" |
Show Slide:
Learning Objectives |
In this tutorial, we will learn how to
|
Show Slide:
System Requirements |
To record this tutorial, I am using
|
Show Slide:Pre-requisites | To follow this tutorial
|
Show Slide:
Code Files |
|
Show Slide: File Conversion | File conversion is the process of transforming a file from one format to another.
This can involve changing the file type, the structure of its contents, or both. This makes the file compatible with different applications and user requirements. |
Show Slide:
File Conversion - Libraries |
To automate the conversion of one file type to another, we need the following libraries:
|
Show Slide:
Install - espeak package
|
Note that the espeak package must be installed for this tutorial as a prerequisite.
Please install it using the following commands. |
Open the Downloads Folder
Files App Downloads Folder logo.jpg |
First let us see how to convert a JPG file into PNG format.
Logo.jpg is the file which we use for this demonstration. Right Click on the image and then select properties to see the type of image. This is a JPG. |
Open the Downloads Folder
Files App Downloads Folder imgconversion.py Open the Text Editor with the source file |
I have created the source file imgconversion.py for the file conversion.
Now, we will go through the source code in the text editor. |
Highlight:
from PIL import Image |
First, we import the Image module from the Python Imaging Library. |
Highlight:
def conversion(input_file, output_file): |
We define a function conversion with two parameters.
Here input_file is a JPG file and the output_file will be in PNG format. |
Highlight:
with Image.open(input_file) as img: |
The image specified by input_file is opened. |
Highlight:
img.save(output_file, "PNG") |
We can now save the opened image to the specified output file in PNG format. |
Highlight:
print("Conversion successful.") |
If no exceptions are raised, a message, Conversion successful is printed. |
Highlight:
except FileNotFoundError: print("Error: Input file not found.") |
If the input file is not found, an exception is caught and a message is printed to the terminal. |
Highlight:
except Exception as e: print(f"An error occurred: {str(e)}") |
Other exceptions are caught here and printed to the terminal. |
Highlight:
input_file = "logo.jpg" output_file = "logo_converted.png" |
The paths or names of the input and output files are specified. |
Highlight:
conversion(input_file, output_file) |
We can now call the function to initiate the conversion process. |
Save the Code in the Downloads Folder | Save the code as imgconversion.py in the Downloads folder.
Let us execute the program and see the results. |
Open the terminal (Ctrl + Alt + T)
Start Virtual Environment Type source Automation/bin/activate |
Open the terminal by pressing Control + Alt + T keys simultaneously.
We will open the virtual environment we created for the Automation series. Type source space Automation forward slash bin forward slash activate. Then press enter. |
Running the Code
Type cd Downloads python3 imgconversion.py |
Now type, cd Downloads.
Then type python3 imgconversion.py and press Enter to run the code. |
Observing the Output | Once the script is executed, the text “Conversion successful” is displayed on the terminal.
|
Navigating to Downloads
Files App Downloads Folder logo_converted.png |
Go to the Downloads folder and double click to open thelogo_converted.png file.
The image is the same as the one we provided as input. However, the format is saved as PNG. Let us check the properties of this image as well to confirm the image type. This is a PNG. |
PDF to Audio File Conversion | Next, we will look at the conversion of a PDF to an audio file. |
Open the Downloads Folder
Files App Downloads Folder test.pdf |
I have created a pdf file test.pdf for this tutorial.
You can download it from the code file section and use it or you can create one with some basic text. |
Open the Downloads Folder
Files App Downloads Folder pdftoaudio.py Open the Text Editor with the source file |
I have created the source file pdftoaudio.py for demonstration.
Let us review it in the text editor. |
Highlight:
import pyttsx3 import pdfplumber from PyPDF2 import PdfReader |
First, the libraries necessary to convert a PDF to an audio file are imported. |
Highlight:
file = 'test.pdf' |
We will use test.pdf for audio conversion. |
Highlight:
all_text = [] |
An empty list is initialized to store the extracted text from each page of the PDF. |
Highlight:
s = pyttsx3.init() |
We initialize a text to speech engine using the pyttsx3 library. |
Highlight:
pdf_reader = PdfReader(file) |
We create an object which opens the PDF file and accesses pages. |
Highlight:
pages = len(pdf_reader.pages) |
Here, the number of pages in the PDF is determined. |
Highlight:
with pdfplumber.open(file) as pdf: |
We will use the pdfplumber library and open the PDF. |
Highlight:
for i in range(pages): page = pdf.pages[i] text = page.extract_text() |
For loop is made to iterate over each page and all the text is extracted. |
Highlight:
if text: |
This checks if any text was extracted. |
Highlight:
all_text.append(text) |
We add the extracted text to the empty list we created earlier. |
Highlight:
print(f"\n{text}\n") |
The extracted text is printed on the terminal. |
Highlight:
audio_file_name = f'audio_page_{i + 1}.mp3' |
A string with the current page number is generated to act as the filename. |
Highlight:
s.save_to_file(text, audio_file_name) |
We now save the spoken text to an MP3 file with the generated name. |
Highlight:
s.runAndWait() |
We wait for all the text to speech conversions to finish before we move on. |
Highlight:
except FileNotFoundError: print('File not found. Please check the file path and name.') |
If the input file is not found, an exception is caught and a message is printed to the terminal. |
Highlight:
except Exception as e: print(f'An error occurred: {str(e)}') |
Other exceptions are caught here and printed to the terminal. |
Highlight:
print('Audio files saved successfully.') |
The print statement shows that the audio files have been saved successfully. |
Save the code in the Downloads Folder | Save the code as pdftoaudio.py in the Downloads folder. |
Type:
python3 pdftoaudio.py |
Switch back to the terminal.
Let us execute the program. Now, type python3 pdftoaudio.py to run your code. |
Observing the Output - in the terminal
|
Once thescript is executed, the text of thePDF is displayed on theterminal.
|
Navigating to the Downloads
Files App Downloads Folder audio_page_1.mp3 Play the audio File |
Let us now go to the Downloads folder and double click to open the audio_page_1.mp3 file.
The saved file being an audio file, we can play and pause the MP3 file as we please. Let us play the audio file. |
PDF to DOCX Conversion | Finally, we shall see how to convert a PDF to a DOC file. |
Open the Downloads Folder
Files App Downloads Folder newsletter.pdf |
I have a pdf file newsletter.pdf for this tutorial.
Let us now look at the code that will convert this PDF to a word document. |
Highlight:
from pdf2docx import Converter |
First, we import the Converter class from the pdf2docx library. |
Highlight:
def pdf_to_docx(pdf_file, docx_file): |
We then define a function to convert the PDF to a DOC file. |
Highlight:
cv = Converter(pdf_file) |
We create an instance of the converter class and pass the pdf we created. |
Highlight:
cv.convert(docx_file) |
The convert method is called on the instance we created earlier. |
Highlight:
cv.close() |
We can now close the instance as the conversion is over. |
Highlight:
print(f'Conversion complete: {pdf_file} to {docx_file}') |
This is to print a message indicating the conversion is complete. |
Highlight:
pdf_file = newsletter.pdf' docx_file = 'doc_output_text.docx' |
We assign the PDF file we want to convert and the path where the doc file will be saved. |
Highlight:
pdf_to_docx(pdf_file, docx_file) |
The function is called to initiate the conversion process. |
Save the code in the Downloads Folder | Save the code as pdftodoc.py in the Downloads folder. |
Switch to Terminal
Type: python3 pdftodoc.py Highlight:Conversion Complete |
Switch back to the terminal and type python3 pdftodoc.py and press enter.
Once the code is executed, we see a message which indicates the conversion was completed. |
Navigating to the Downloads
Files App Downloads Folder |
Go to the Downloads folder and double click to open the doc_output_text.docx file.
As we can see, it has the same content as the PDF had earlier but in a doc file. Earlier, it was not editable because it was in a PDF format. Now that it is a doc file, we can edit it. Let us change the date to August 2024. |
Closing the virtual environment
Type deactivate |
Switch back to the terminal to close the virtual environment.
Type deactivate. |
Show Slide:
Summary |
This brings us to the end of this tutorial. Let us summarise.
In this tutorial, we have learnt to
|
Show Slide:
Assignment |
As an assignment, please do the following:
|
Show Slide:About the Spoken Tutorial Project | The video at the following link summarises the Spoken Tutorial Project.
Please download and watch it |
Show Slide:
Spoken Tutorial Workshops |
The Spoken Tutorial Project team conducts workshops and gives certificates.
For more details, please write to us. |
Show Slide:
Answers for THIS Spoken Tutorial |
Please post your timed queries in this forum. |
Show Slide:
FOSSEE Forum |
For any general or technical questions on Python for Automation, visit the FOSSEE forum and post your question. |
Show Slide:
Acknowledgement |
The Spoken Tutorial Project was established by the Ministry of Education, Government of India. |
Show Slide:
Thank You |
This is Sai Sathwik, a FOSSEE Semester Long Intern 2024, IIT Bombay signing off.
Thanks for joining. |