Difference between revisions of "Python-for-Automation/C2/File-Conversion/English"
(Created page with "{| border="1" |- || '''Visual Cue''' || '''Narration''' |- |- style="border:1pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.206cm;padding-right:0.191cm;" |...") |
|||
Line 4: | Line 4: | ||
|| '''Narration''' | || '''Narration''' | ||
|- | |- | ||
− | + | ||
|| Show slide: | || Show slide: | ||
Line 14: | Line 14: | ||
'''Learning Objectives''' | '''Learning Objectives''' | ||
|| In this tutorial, we will learn how to | || In this tutorial, we will learn how to | ||
− | * | + | * Convert '''JPG '''images to '''PNG '''format |
− | * | + | * Extract text from '''PDF '''file''' '''and convert to audio |
− | * | + | * Convert a '''PDF '''file''' '''to '''DOCX '''format |
|- | |- | ||
|| Show Slide: | || Show Slide: | ||
− | '''System Requirements''' | + | '''System Requirements''' |
|| To record this tutorial, I am using | || To record this tutorial, I am using | ||
− | * | + | * '''Ubuntu''' '''Linux OS version 22.04''' |
− | * | + | * '''Python 3.12.3''' |
|- | |- | ||
Line 30: | Line 30: | ||
[https://www.spoken-tutorial.org/ https://www.spoken-tutorial.org] | [https://www.spoken-tutorial.org/ https://www.spoken-tutorial.org] | ||
|| To follow this tutorial | || To follow this tutorial | ||
− | * | + | * You must have basic knowledge of using Linux Terminal and Python |
− | * | + | * For pre-requisite Linux and Python Tutorials, please visit this website |
− | * | + | * Python libraries required for automation must be installed |
|- | |- | ||
Line 40: | Line 40: | ||
|| | || | ||
− | * | + | * The files used in this tutorial are provided in the Code files link. |
− | * | + | * Please download and extract the files. |
− | * | + | * Make a copy and then use them while practicing. |
|- | |- | ||
Line 52: | Line 52: | ||
This makes the file compatible with different applications and user requirements. | This makes the file compatible with different applications and user requirements. | ||
|- | |- | ||
− | | | + | || Show Slide: |
'''File Conversion - Libraries''' | '''File Conversion - Libraries''' | ||
− | | | + | || To automate the conversion of one file type to another, we need the following libraries: |
− | * | + | * '''PyPDF2''' to handle '''PDF '''manipulation |
− | * | + | * '''Pyttsx3 '''to help with extracting content |
− | * | + | * '''Pdfplumber '''to convert the '''text to speech''' |
− | * | + | * '''PIL'''for image processing and manipulation |
− | * | + | * '''pdf2docx '''to convert '''PDF''' files to '''DOCX''' format |
− | * | + | * '''os '''handles file manipulation and operating system interactions |
|- | |- | ||
− | | | + | || Show Slide: |
'''Install - espeak package''' | '''Install - espeak package''' | ||
− | * | + | * '''sudo apt-get update''' |
− | * | + | * '''sudo apt-get install espeak''' |
− | | | + | || Note that the '''espeak package''' must be installed for this tutorial as a prerequisite. |
Please install it using the following commands. | Please install it using the following commands. | ||
Line 78: | Line 78: | ||
| | Open the '''Downloads Folder''' | | | Open the '''Downloads Folder''' | ||
− | '''Files App | + | '''Files App Downloads Folder logo.jpg''' |
− | | | + | || First let us see how to convert a JPG file into PNG format. |
'''Logo.jpg '''is the file which we use for this demonstration. | '''Logo.jpg '''is the file which we use for this demonstration. | ||
Line 90: | Line 90: | ||
|| Open the '''Downloads Folder''' | || Open the '''Downloads Folder''' | ||
− | '''Files App | + | '''Files App Downloads Folder imgconversion.py''' |
Open the Text Editor with the source file | Open the Text Editor with the source file | ||
Line 151: | Line 151: | ||
|| We can now call the '''function '''to initiate the '''conversion '''process. | || We can now call the '''function '''to initiate the '''conversion '''process. | ||
|- | |- | ||
− | | | + | || Save the Code in the '''Downloads '''Folder |
− | | | + | || Save the code as '''imgconversion.py '''in the '''Downloads '''folder. |
Let us execute the program and see the result. | Let us execute the program and see the result. | ||
|- | |- | ||
− | | | + | || Open the terminal ('''Ctrl + Alt + T''') |
Start Virtual Environment | Start Virtual Environment | ||
Line 162: | Line 162: | ||
Type | Type | ||
− | ''' | + | ''' source Automation/bin/activate''' |
− | | | + | || Open the '''terminal''' by pressing '''Control + Alt + T '''keys simultaneously. |
We will open the virtual environment we created for the '''Automation''' series. | We will open the virtual environment we created for the '''Automation''' series. | ||
Line 169: | Line 169: | ||
Then press enter. | Then press enter. | ||
|- | |- | ||
− | | | + | || Running the Code |
Type | Type | ||
− | ''' | + | '''cd Downloads''' |
− | ''' | + | ''' python3 imgconversion.py''' |
− | | | + | || Now type, '''cd Downloads'''. |
− | + | Then type''' python3 imgconversion.py''' and press '''Enter''' to run the code. | |
|- | |- | ||
− | | | + | || Observing the Output |
− | | | + | || Once the '''script '''is executed, the text '''“Conversion successful”''' is displayed on the terminal. |
− | + | Let us check the '''output '''image. | |
|- | |- | ||
− | | | + | || Navigating to Downloads |
− | '''Files App | + | '''Files App Downloads Folder logo_converted.png ''' |
− | | | + | || Go to the '''Downloads folder '''and double click to open the'''logo_converted.png''' file. |
The image is the same as the one we provided as input. | The image is the same as the one we provided as input. | ||
Line 200: | Line 200: | ||
This is a '''PNG'''. | This is a '''PNG'''. | ||
|- | |- | ||
− | | | + | || PDF to Audio File Conversion |
− | | | + | || Next, we will look at the conversion of a '''PDF''' to an audio file. |
|- | |- | ||
|| Open the '''Downloads Folder''' | || Open the '''Downloads Folder''' | ||
− | '''Files App | + | '''Files App Downloads Folder test.pdf''' |
|| I have created a '''pdf '''file '''test.pdf''' for this tutorial. | || I have created a '''pdf '''file '''test.pdf''' for this tutorial. | ||
You can download it from the code file section and use it or you can create one with some basic text. | You can download it from the code file section and use it or you can create one with some basic text. | ||
|- | |- | ||
− | | | + | || Open the '''Downloads Folder''' |
− | '''Files App | + | '''Files App Downloads Folder pdftoaudio.py''' |
Open the Text Editor with the source file | Open the Text Editor with the source file | ||
− | | | + | || I have created the source file '''pdftoaudio.py''' for demonstration. |
Let us review it in the text editor. | Let us review it in the text editor. | ||
|- | |- | ||
− | | | + | || Highlight: |
'''import pyttsx3''' | '''import pyttsx3''' | ||
Line 226: | Line 226: | ||
'''from PyPDF2 import PdfReader''' | '''from PyPDF2 import PdfReader''' | ||
− | | | + | || First, the libraries necessary to convert a '''PDF '''to an '''audio file''' are imported. |
|- | |- | ||
− | | | + | || Highlight: |
'''file = 'test.pdf'''' | '''file = 'test.pdf'''' | ||
− | | | + | || We will use '''test.pdf''' for audio conversion. |
|- | |- | ||
− | | | + | || Highlight: |
'''all_text = []''' | '''all_text = []''' | ||
Line 310: | Line 310: | ||
'''print('Audio files saved successfully.')''' | '''print('Audio files saved successfully.')''' | ||
− | || | + | || The '''print statement '''shows that the '''audio files''' have been saved successfully. |
|- | |- | ||
|| Save the code in the '''Downloads '''Folder | || Save the code in the '''Downloads '''Folder | ||
Line 317: | Line 317: | ||
|| Type: | || Type: | ||
− | ''' | + | '''python3 pdftoaudio.py''' |
|| Switch back to the terminal. | || Switch back to the terminal. | ||
Line 324: | Line 324: | ||
Now, type '''python3 pdftoaudio.py''' to run your code. | Now, type '''python3 pdftoaudio.py''' to run your code. | ||
|- | |- | ||
− | || | + | || Observing the Output - in the terminal |
− | + | Listen to the Output | |
− | || | + | || Once the'''script '''is executed, the text of the'''PDF '''is displayed on the'''terminal'''. |
− | + | The text from the'''PDF''' is converted to audio using the''' text to speech''' engine. | |
|- | |- | ||
− | | | + | || Navigating to the Downloads |
− | '''Files App | + | '''Files App Downloads Folder audio_page_1.mp3''' |
'''Play the audio File''' | '''Play the audio File''' | ||
|| Let us now go to the '''Downloads folder '''and double click to open the '''audio_page_1.mp3''' file. | || Let us now go to the '''Downloads folder '''and double click to open the '''audio_page_1.mp3''' file. | ||
− | + | The saved file being an audio file, we can play and pause the '''MP3 file''' as we please. | |
− | + | Let us play the '''audio file'''. | |
|- | |- | ||
|| PDF to DOCX Conversion | || PDF to DOCX Conversion | ||
Line 348: | Line 348: | ||
|| Open the '''Downloads Folder''' | || Open the '''Downloads Folder''' | ||
− | '''Files App | + | '''Files App Downloads Folder newsletter.pdf''' |
|| I have a '''pdf '''file '''newsletter.pdf''' for this tutorial. | || I have a '''pdf '''file '''newsletter.pdf''' for this tutorial. | ||
Line 399: | Line 399: | ||
Type: | Type: | ||
− | ''' | + | '''python3 pdftodoc.py''' |
Highlight:'''Conversion Complete''' | Highlight:'''Conversion Complete''' | ||
Line 408: | Line 408: | ||
|| Navigating to the Downloads | || Navigating to the Downloads | ||
− | '''Files App | + | '''Files App Downloads Folder ''' |
|| Go to the '''Downloads folder '''and double click to open the''' doc_output_text.docx''' file. | || Go to the '''Downloads folder '''and double click to open the''' doc_output_text.docx''' file. | ||
Line 419: | Line 419: | ||
Let us change the date to '''August 2024'''. | Let us change the date to '''August 2024'''. | ||
|- | |- | ||
− | | | + | || Closing the virtual environment |
Type | Type | ||
− | ''' | + | ''' deactivate''' |
|| Switch back to the terminal to close the virtual environment. | || Switch back to the terminal to close the virtual environment. | ||
Type '''deactivate'''. | Type '''deactivate'''. | ||
Line 433: | Line 433: | ||
In this tutorial, we have learnt to | In this tutorial, we have learnt to | ||
− | * | + | * Convert '''JPG '''images to '''PNG '''format |
− | * | + | * Extract text from '''PDF file '''and convert to audio |
− | * | + | * Convert a '''PDF file '''to '''DOCX '''format |
|- | |- | ||
Line 442: | Line 442: | ||
'''Assignment''' | '''Assignment''' | ||
|| As an assignment, please do the following: | || As an assignment, please do the following: | ||
− | * | + | * Take a '''GIF '''and convert it to an image in '''PNG '''format. |
− | * | + | * Take a '''PDF '''of your favorite book and convert it to an '''audio book'''. |
|- | |- | ||
|| Show Slide:'''About the Spoken Tutorial Project''' | || Show Slide:'''About the Spoken Tutorial Project''' | ||
Line 455: | Line 455: | ||
For more details, please write to us. | For more details, please write to us. | ||
|- | |- | ||
− | || Show Slide:'''Answers for THIS Spoken Tutorial''' | + | || Show Slide: |
+ | |||
+ | '''Answers for THIS Spoken Tutorial''' | ||
|| Please post your timed queries in this forum. | || Please post your timed queries in this forum. | ||
|- | |- | ||
Line 463: | Line 465: | ||
|| For any general or technical questions on '''Python for Automation''', visit the '''FOSSEE forum''' and post your question. | || For any general or technical questions on '''Python for Automation''', visit the '''FOSSEE forum''' and post your question. | ||
|- | |- | ||
− | || Show Slide:'''Acknowledgement''' | + | || Show Slide: |
+ | |||
+ | '''Acknowledgement''' | ||
|| The '''Spoken Tutorial Project''' was established by the '''Ministry of Education, Government of India.''' | || The '''Spoken Tutorial Project''' was established by the '''Ministry of Education, Government of India.''' | ||
|- | |- |
Revision as of 20:57, 30 September 2024
Visual Cue | Narration |
Show slide:
Welcome |
Hello and welcome to the Spoken Tutorial on "File Conversion" |
Show Slide:
Learning Objectives |
In this tutorial, we will learn how to
|
Show Slide:
System Requirements |
To record this tutorial, I am using
|
Show Slide:Pre-requisites | To follow this tutorial
|
Show Slide:
Code Files |
|
Show Slide: File Conversion | File conversion is the process of transforming a file from one format to another.
This can involve changing the file type, the structure of its contents, or both. This makes the file compatible with different applications and user requirements. |
Show Slide:
File Conversion - Libraries |
To automate the conversion of one file type to another, we need the following libraries:
|
Show Slide:
Install - espeak package
|
Note that the espeak package must be installed for this tutorial as a prerequisite.
Please install it using the following commands. |
Open the Downloads Folder
Files App Downloads Folder logo.jpg |
First let us see how to convert a JPG file into PNG format.
Logo.jpg is the file which we use for this demonstration. Right Click on the image and then select properties to see the type of image. This is a JPG. |
Open the Downloads Folder
Files App Downloads Folder imgconversion.py Open the Text Editor with the source file |
I have created the source file imgconversion.py for the file conversion.
Now, we will go through the source code in the text editor. |
Highlight:
from PIL import Image |
First, we import the Image module from the Python Imaging Library. |
Highlight:
def conversion(input_file, output_file): |
We define a function conversion with two parameters.
Here input_file is a JPG file and the output_file will be in PNG format. |
Highlight:
with Image.open(input_file) as img: |
The image specified by input_file is opened. |
Highlight:
img.save(output_file, "PNG") |
We can now save the opened image to the specified output file in PNG format. |
Highlight:
print("Conversion successful.") |
If no exceptions are raised, a message, Conversion successful is printed. |
Highlight:
except FileNotFoundError: print("Error: Input file not found.") |
If the input file is not found, an exception is caught and a message is printed to the terminal. |
Highlight:
except Exception as e: print(f"An error occurred: {str(e)}") |
Other exceptions are caught here and printed to the terminal. |
Highlight:
input_file = "logo.jpg" output_file = "logo_converted.png" |
The paths or names of the input and output files are specified. |
Highlight:
conversion(input_file, output_file) |
We can now call the function to initiate the conversion process. |
Save the Code in the Downloads Folder | Save the code as imgconversion.py in the Downloads folder.
Let us execute the program and see the result. |
Open the terminal (Ctrl + Alt + T)
Start Virtual Environment Type source Automation/bin/activate |
Open the terminal by pressing Control + Alt + T keys simultaneously.
We will open the virtual environment we created for the Automation series. Type source space Automation forward slash bin forward slash activate. Then press enter. |
Running the Code
Type cd Downloads python3 imgconversion.py |
Now type, cd Downloads.
Then type python3 imgconversion.py and press Enter to run the code. |
Observing the Output | Once the script is executed, the text “Conversion successful” is displayed on the terminal.
|
Navigating to Downloads
Files App Downloads Folder logo_converted.png |
Go to the Downloads folder and double click to open thelogo_converted.png file.
The image is the same as the one we provided as input. However, the format is saved as PNG. Let us check the properties of this image as well to confirm the image type. This is a PNG. |
PDF to Audio File Conversion | Next, we will look at the conversion of a PDF to an audio file. |
Open the Downloads Folder
Files App Downloads Folder test.pdf |
I have created a pdf file test.pdf for this tutorial.
You can download it from the code file section and use it or you can create one with some basic text. |
Open the Downloads Folder
Files App Downloads Folder pdftoaudio.py Open the Text Editor with the source file |
I have created the source file pdftoaudio.py for demonstration.
Let us review it in the text editor. |
Highlight:
import pyttsx3 import pdfplumber from PyPDF2 import PdfReader |
First, the libraries necessary to convert a PDF to an audio file are imported. |
Highlight:
file = 'test.pdf' |
We will use test.pdf for audio conversion. |
Highlight:
all_text = [] |
An empty list is initialized to store the extracted text from each page of the PDF. |
Highlight:
s = pyttsx3.init() |
We initialize a text to speech engine using the pyttsx3 library. |
Highlight:
pdf_reader = PdfReader(file) |
We create an object which opens the PDF file and accesses pages. |
Highlight:
pages = len(pdf_reader.pages) |
Here, the number of pages in the PDF is determined. |
Highlight:
with pdfplumber.open(file) as pdf: |
We will use the pdfplumber library and open the PDF. |
Highlight:
for i in range(pages): page = pdf.pages[i] text = page.extract_text() |
For loop is made to iterate over each page and all the text is extracted. |
Highlight:
if text: |
This checks if any text was extracted. |
Highlight:
all_text.append(text) |
We add the extracted text to the empty list we created earlier. |
Highlight:
print(f"\n{text}\n") |
The extracted text is printed on the terminal. |
Highlight:
audio_file_name = f'audio_page_{i + 1}.mp3' |
A string with the current page number is generated to act as the filename. |
Highlight:
s.save_to_file(text, audio_file_name) |
We now save the spoken text to an MP3 file with the generated name. |
Highlight:
s.runAndWait() |
We wait for all the text to speech conversions to finish before we move on. |
Highlight:
except FileNotFoundError: print('File not found. Please check the file path and name.') |
If the input file is not found, an exception is caught and a message is printed to the terminal. |
Highlight:
except Exception as e: print(f'An error occurred: {str(e)}') |
Other exceptions are caught here and printed to the terminal. |
Highlight:
print('Audio files saved successfully.') |
The print statement shows that the audio files have been saved successfully. |
Save the code in the Downloads Folder | Save the code as pdftoaudio.py in the Downloads folder. |
Type:
python3 pdftoaudio.py |
Switch back to the terminal.
Let us execute the program. Now, type python3 pdftoaudio.py to run your code. |
Observing the Output - in the terminal
|
Once thescript is executed, the text of thePDF is displayed on theterminal.
|
Navigating to the Downloads
Files App Downloads Folder audio_page_1.mp3 Play the audio File |
Let us now go to the Downloads folder and double click to open the audio_page_1.mp3 file.
The saved file being an audio file, we can play and pause the MP3 file as we please. Let us play the audio file. |
PDF to DOCX Conversion | Finally, we shall see how to convert a PDF to a DOC file. |
Open the Downloads Folder
Files App Downloads Folder newsletter.pdf |
I have a pdf file newsletter.pdf for this tutorial.
Let us now look at the code that will convert this PDF to a word document. |
Highlight:
from pdf2docx import Converter |
First, we import the Converter class from the pdf2docx library. |
Highlight:
def pdf_to_docx(pdf_file, docx_file): |
We then define a function to convert the PDF to a DOC file. |
Highlight:
cv = Converter(pdf_file) |
We create an instance of the converter class and pass the pdf we created. |
Highlight:
cv.convert(docx_file) |
The convert method is called on the instance we created earlier. |
Highlight:
cv.close() |
We can now close the instance as the conversion is over. |
Highlight:
print(f'Conversion complete: {pdf_file} to {docx_file}') |
This is to print a message indicating the conversion is complete. |
Highlight:
pdf_file = newsletter.pdf' docx_file = 'doc_output_text.docx' |
We assign the PDF file we want to convert and the path where the doc file will be saved. |
Highlight:
pdf_to_docx(pdf_file, docx_file) |
The function is called to initiate the conversion process. |
Save the code in the Downloads Folder | Save the code as pdftodoc.py in the Downloads folder. |
Switch to Terminal
Type: python3 pdftodoc.py Highlight:Conversion Complete |
Switch back to the terminal and type python3 pdftodoc.py and press enter.
Once the code is executed, we see a message which indicates the conversion was completed. |
Navigating to the Downloads
Files App Downloads Folder |
Go to the Downloads folder and double click to open the doc_output_text.docx file.
As we can see, it has the same content as the PDF had earlier but in a doc file. Earlier, it was not editable because it was in a PDF format. Now that it is a doc file, we can edit it. Let us change the date to August 2024. |
Closing the virtual environment
Type deactivate |
Switch back to the terminal to close the virtual environment.
Type deactivate. |
Show Slide:
Summary |
This brings us to the end of this tutorial. Let us summarise.
In this tutorial, we have learnt to
|
Show Slide:
Assignment |
As an assignment, please do the following:
|
Show Slide:About the Spoken Tutorial Project | The video at the following link summarises the Spoken Tutorial Project.Please download and watch it |
Show Slide:
Spoken Tutorial Workshops |
The Spoken Tutorial Project team conducts workshops and gives certificates.
For more details, please write to us. |
Show Slide:
Answers for THIS Spoken Tutorial |
Please post your timed queries in this forum. |
Show Slide:
FOSSEE Forum |
For any general or technical questions on Python for Automation, visit the FOSSEE forum and post your question. |
Show Slide:
Acknowledgement |
The Spoken Tutorial Project was established by the Ministry of Education, Government of India. |
Show Slide:
Thank You |
This is Sai Sathwik, a FOSSEE Semester Long Intern 2024, IIT Bombay signing off.
Thanks for joining. |