Difference between revisions of "Python-for-Automation/C2/File-Conversion/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
(Created page with "{| border="1" |- || '''Visual Cue''' || '''Narration''' |- |- style="border:1pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.206cm;padding-right:0.191cm;" |...")
 
 
(One intermediate revision by the same user not shown)
Line 4: Line 4:
 
|| '''Narration'''
 
|| '''Narration'''
 
|-
 
|-
|- style="border:1pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.206cm;padding-right:0.191cm;"
+
 
 
|| Show slide:  
 
|| Show slide:  
  
Line 14: Line 14:
 
'''Learning Objectives'''
 
'''Learning Objectives'''
 
|| In this tutorial, we will learn how to  
 
|| In this tutorial, we will learn how to  
* <div style="margin-left:1.27cm;margin-right:0cm;">Convert '''JPG '''images to '''PNG '''format</div>
+
* Convert '''JPG '''images to '''PNG '''format
* <div style="margin-left:1.27cm;margin-right:0cm;">Extract text from '''PDF '''file''' '''and convert to audio</div>
+
* Extract text from '''PDF '''file''' '''and convert to audio
* <div style="margin-left:1.27cm;margin-right:0cm;">Convert a '''PDF '''file''' '''to '''DOCX '''format</div>
+
* Convert a '''PDF '''file''' '''to '''DOCX '''format
 
|-  
 
|-  
 
|| Show Slide:
 
|| Show Slide:
  
'''System Requirements'''<div style="margin-left:1.27cm;margin-right:0cm;"></div>
+
'''System Requirements'''
 
|| To record this tutorial, I am using
 
|| To record this tutorial, I am using
* <div style="margin-left:1.27cm;margin-right:0cm;">'''Ubuntu''' '''Linux OS version 22.04'''</div>
+
* '''Ubuntu''' '''Linux OS version 22.04'''
* <div style="margin-left:1.27cm;margin-right:0cm;">'''Python 3.12.3'''</div>
+
* '''Python 3.12.3'''
  
 
|-  
 
|-  
Line 30: Line 30:
 
[https://www.spoken-tutorial.org/ https://www.spoken-tutorial.org]
 
[https://www.spoken-tutorial.org/ https://www.spoken-tutorial.org]
 
|| To follow this tutorial  
 
|| To follow this tutorial  
* <div style="margin-left:1.27cm;margin-right:0cm;">You must have basic knowledge of using Linux Terminal and Python</div>
+
* You must have basic knowledge of using Linux Terminal and Python
* <div style="margin-left:1.27cm;margin-right:0cm;">For pre-requisite Linux and Python Tutorials, please visit this website</div>
+
* For pre-requisite Linux and Python Tutorials, please visit this website
* <div style="margin-left:1.27cm;margin-right:0cm;">Python libraries required for automation must be installed</div>
+
* Python libraries required for automation must be installed
  
 
|-  
 
|-  
Line 40: Line 40:
  
 
||
 
||
* <div style="margin-left:1.27cm;margin-right:0cm;">The files used in this tutorial are provided in the Code files link.</div>
+
* The files used in this tutorial are provided in the Code files link.
* <div style="margin-left:1.27cm;margin-right:0cm;">Please download and extract the files.</div>
+
* Please download and extract the files.
* <div style="margin-left:1.27cm;margin-right:0cm;">Make a copy and then use them while practicing.</div>
+
* Make a copy and then use them while practicing.
  
 
|-  
 
|-  
Line 52: Line 52:
 
This makes the file compatible with different applications and user requirements.
 
This makes the file compatible with different applications and user requirements.
 
|-
 
|-
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:1pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | Show Slide:
+
|| Show Slide:
  
 
'''File Conversion - Libraries'''
 
'''File Conversion - Libraries'''
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | To automate the conversion of one file type to another, we need the following libraries:
+
|| To automate the conversion of one file type to another, we need the following libraries:
  
* <div style="margin-left:1.27cm;margin-right:0cm;">'''PyPDF2''' to handle '''PDF '''manipulation</div>
+
* '''PyPDF2''' to handle '''PDF '''manipulation
* <div style="margin-left:1.27cm;margin-right:0cm;">'''Pyttsx3 '''to help with extracting content </div>
+
* '''Pyttsx3 '''to help with extracting content  
* <div style="margin-left:1.27cm;margin-right:0cm;">'''Pdfplumber '''to convert the '''text to speech'''</div>
+
* '''Pdfplumber '''to convert the '''text to speech'''
* <div style="margin-left:1.27cm;margin-right:0cm;">'''PIL'''for image processing and manipulation</div>
+
* '''PIL'''for image processing and manipulation
* <div style="margin-left:1.27cm;margin-right:0cm;">'''pdf2docx '''to convert '''PDF''' files to '''DOCX''' format</div>
+
* '''pdf2docx '''to convert '''PDF''' files to '''DOCX''' format
* <div style="margin-left:1.27cm;margin-right:0cm;">'''os '''handles file manipulation and operating system interactions</div>
+
* '''os '''handles file manipulation and operating system interactions
  
 
|-
 
|-
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:1pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | Show Slide:
+
|| Show Slide:
  
 
'''Install - espeak package'''
 
'''Install - espeak package'''
  
* <div style="margin-left:1.27cm;margin-right:0cm;">'''sudo apt-get update'''</div>
+
* '''sudo apt-get update'''
* <div style="margin-left:1.27cm;margin-right:0cm;">'''sudo apt-get install espeak'''</div>
+
* '''sudo apt-get install espeak'''
  
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | Note that the '''espeak package''' must be installed for this tutorial as a prerequisite.  
+
|| Note that the '''espeak package''' must be installed for this tutorial as a prerequisite.  
  
 
Please install it using the following commands.
 
Please install it using the following commands.
Line 78: Line 78:
 
|  | Open the '''Downloads Folder'''
 
|  | Open the '''Downloads Folder'''
  
'''Files App > Downloads Folder > logo.jpg'''
+
'''Files App Downloads Folder logo.jpg'''
  
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | First let us see how to convert a JPG file into PNG format.
+
|| First let us see how to convert a JPG file into PNG format.
  
 
'''Logo.jpg '''is the file which we use for this demonstration.
 
'''Logo.jpg '''is the file which we use for this demonstration.
Line 90: Line 90:
 
|| Open the '''Downloads Folder'''
 
|| Open the '''Downloads Folder'''
  
'''Files App > Downloads Folder > imgconversion.py'''
+
'''Files App Downloads Folder imgconversion.py'''
  
 
Open the Text Editor with the source file
 
Open the Text Editor with the source file
Line 151: Line 151:
 
|| We can now call the '''function '''to initiate the '''conversion '''process.
 
|| We can now call the '''function '''to initiate the '''conversion '''process.
 
|-
 
|-
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:1pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | Save the Code in the '''Downloads '''Folder
+
|| Save the Code in the '''Downloads '''Folder
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | Save the code as '''imgconversion.py '''in the '''Downloads '''folder.
+
|| Save the code as '''imgconversion.py '''in the '''Downloads '''folder.
  
Let us execute the program and see the result.
+
Let us execute the program and see the results.
 
|-
 
|-
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:1pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | Open the terminal ('''Ctrl + Alt + T''')
+
|| Open the terminal ('''Ctrl + Alt + T''')
  
 
Start Virtual Environment
 
Start Virtual Environment
Line 162: Line 162:
 
Type
 
Type
  
'''> source Automation/bin/activate'''
+
''' source Automation/bin/activate'''
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | Open the '''terminal''' by pressing '''Control + Alt + T '''keys simultaneously.
+
|| Open the '''terminal''' by pressing '''Control + Alt + T '''keys simultaneously.
  
 
We will open the virtual environment we created for the '''Automation''' series.
 
We will open the virtual environment we created for the '''Automation''' series.
 +
 
Type '''source space Automation forward slash bin forward slash activate.'''
 
Type '''source space Automation forward slash bin forward slash activate.'''
 +
 
Then press enter.
 
Then press enter.
 
|-
 
|-
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:1pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | Running the Code
+
|| Running the Code
  
 
Type  
 
Type  
  
'''> <span style="background-color:#ffffff;">cd Downloads'''</span>
+
'''cd Downloads'''
  
'''> python3 imgconversion.py'''
+
''' python3 imgconversion.py'''
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | <span style="background-color:#ffffff;color:#252525;">Now type, </span><span style="background-color:#ffffff;color:#252525;">'''cd Downloads'''</span><span style="background-color:#ffffff;color:#252525;">.</span>
+
|| Now type, '''cd Downloads'''.
  
<span style="color:#252525;">Then type</span><span style="color:#252525;">''' python3 </span>imgconversion<span style="color:#252525;">.py'''</span><span style="background-color:#ffffff;color:#252525;"> and press </span><span style="color:#252525;">'''Enter'''</span><span style="background-color:#ffffff;color:#252525;"> to run the code.</span>
+
Then type''' python3 imgconversion.py''' and press '''Enter''' to run the code.
 
|-
 
|-
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:1pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | Observing the Output  
+
|| Observing the Output  
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | <span style="background-color:#ffffff;color:#252525;">Once the </span><span style="background-color:#ffffff;color:#252525;">'''script '''</span><span style="background-color:#ffffff;color:#252525;">is executed, the text </span><span style="background-color:#ffffff;color:#252525;">'''“Conversion successful”'''</span><span style="background-color:#ffffff;color:#252525;"> is displayed on the terminal.</span>
+
|| Once the '''script '''is executed, the text '''“Conversion successful”''' is displayed on the terminal.
  
  
<span style="background-color:#ffffff;color:#252525;">Let us check the </span><span style="background-color:#ffffff;color:#252525;">'''output '''</span><span style="background-color:#ffffff;color:#252525;">image.</span>
+
Let us check the '''output '''image.
 
|-
 
|-
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:1pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | Navigating to Downloads
+
|| Navigating to Downloads
  
'''Files App > Downloads Folder > <span style="background-color:#ffffff;">logo_converted</span>.png '''
+
'''Files App Downloads Folder logo_converted.png '''
  
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | Go to the '''Downloads folder '''and double click to open the <span style="background-color:#ffffff;">'''logo_converted</span>.png''' file.<span style="background-color:#ffffff;color:#252525;"> </span>
+
|| Go to the '''Downloads folder '''and double click to open the'''logo_converted.png''' file.  
  
 
The image is the same as the one we provided as input.
 
The image is the same as the one we provided as input.
Line 200: Line 202:
 
This is a '''PNG'''.
 
This is a '''PNG'''.
 
|-
 
|-
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:1pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | PDF to Audio File Conversion
+
|| PDF to Audio File Conversion
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | Next, we will look at the conversion of a '''PDF''' to an audio file.
+
|| Next, we will look at the conversion of a '''PDF''' to an audio file.
 
|-  
 
|-  
 
|| Open the '''Downloads Folder'''
 
|| Open the '''Downloads Folder'''
  
'''Files App > Downloads Folder > test.pdf'''
+
'''Files App Downloads Folder test.pdf'''
 
|| I have created a '''pdf '''file '''test.pdf''' for this tutorial.
 
|| I have created a '''pdf '''file '''test.pdf''' for this tutorial.
  
 
You can download it from the code file section and use it or you can create one with some basic text.
 
You can download it from the code file section and use it or you can create one with some basic text.
 
|-
 
|-
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:1pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | Open the '''Downloads Folder'''
+
|| Open the '''Downloads Folder'''
  
'''Files App > Downloads Folder > pdftoaudio.py'''
+
'''Files App Downloads Folder pdftoaudio.py'''
  
 
Open the Text Editor with the source file
 
Open the Text Editor with the source file
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | I have created the source file '''pdftoaudio.py''' for demonstration.
+
|| I have created the source file '''pdftoaudio.py''' for demonstration.
  
 
Let us review it in the text editor.
 
Let us review it in the text editor.
 
|-
 
|-
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:1pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | Highlight:
+
|| Highlight:
  
 
'''import pyttsx3'''
 
'''import pyttsx3'''
Line 226: Line 228:
  
 
'''from PyPDF2 import PdfReader'''
 
'''from PyPDF2 import PdfReader'''
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | First, the libraries necessary to convert a '''PDF '''to an '''audio file''' are imported.
+
|| First, the libraries necessary to convert a '''PDF '''to an '''audio file''' are imported.
 
|-
 
|-
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:1pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | Highlight:
+
|| Highlight:
  
 
'''file = 'test.pdf''''
 
'''file = 'test.pdf''''
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | We will use '''test.pdf''' for audio conversion.
+
|| We will use '''test.pdf''' for audio conversion.
 
|-
 
|-
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:1pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | Highlight:
+
|| Highlight:
  
 
'''all_text = []'''
 
'''all_text = []'''
Line 310: Line 312:
  
 
'''print('Audio files saved successfully.')'''
 
'''print('Audio files saved successfully.')'''
|| <span style="background-color:#ffffff;color:#252525;">The </span><span style="background-color:#ffffff;color:#252525;">'''print statement '''</span><span style="background-color:#ffffff;color:#252525;">shows that the </span><span style="background-color:#ffffff;color:#252525;">'''audio files'''</span><span style="background-color:#ffffff;color:#252525;"> have been saved successfully.</span>
+
|| The '''print statement '''shows that the '''audio files''' have been saved successfully.
 
|-  
 
|-  
 
|| Save the code in the '''Downloads '''Folder
 
|| Save the code in the '''Downloads '''Folder
Line 317: Line 319:
 
|| Type:
 
|| Type:
  
'''>python3 pdftoaudio.py'''
+
'''python3 pdftoaudio.py'''
 
|| Switch back to the terminal.
 
|| Switch back to the terminal.
  
Line 324: Line 326:
 
Now, type '''python3 pdftoaudio.py''' to run your code.
 
Now, type '''python3 pdftoaudio.py''' to run your code.
 
|-
 
|-
|| <div style="color:#000000;">Observing the Output - in the terminal</div>
+
|| Observing the Output - in the terminal
  
  
<div style="color:#000000;">Listen to the Output</div>
+
Listen to the Output
|| <div style="color:#000000;"><span style="background-color:#ffffff;">Once the </span><span style="background-color:#ffffff;">'''script '''</span><span style="background-color:#ffffff;">is executed, the text of the </span><span style="background-color:#ffffff;">'''PDF '''</span><span style="background-color:#ffffff;">is displayed on the </span><span style="background-color:#ffffff;">'''terminal'''</span><span style="background-color:#ffffff;">.</span></div>
+
|| Once the'''script '''is executed, the text of the'''PDF '''is displayed on the'''terminal'''.
  
  
<div style="color:#000000;"><span style="background-color:#ffffff;">The text from the </span><span style="background-color:#ffffff;">'''PDF'''</span><span style="background-color:#ffffff;"> is converted to audio using the</span><span style="background-color:#ffffff;">''' text to speech'''</span><span style="background-color:#ffffff;"> engine.</span></div>
+
The text from the'''PDF''' is converted to audio using the''' text to speech''' engine.
 
|-
 
|-
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:1pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | Navigating to the Downloads  
+
|| Navigating to the Downloads  
  
'''Files App > Downloads Folder > audio_page_1.mp3'''
+
'''Files App Downloads Folder audio_page_1.mp3'''
  
 
'''Play the audio File'''
 
'''Play the audio File'''
 
|| Let us now go to the '''Downloads folder '''and double click to open the '''audio_page_1.mp3''' file.
 
|| Let us now go to the '''Downloads folder '''and double click to open the '''audio_page_1.mp3''' file.
<span style="background-color:#ffffff;color:#252525;">The saved file being an audio file, we can play and pause the </span><span style="background-color:#ffffff;color:#252525;">'''MP3 file'''</span><span style="background-color:#ffffff;color:#252525;"> as we please. </span>
 
  
<span style="background-color:#ffffff;color:#252525;">Let us play the </span><span style="background-color:#ffffff;color:#252525;">'''audio file'''</span><span style="background-color:#ffffff;color:#252525;">.</span>
+
The saved file being an audio file, we can play and pause the '''MP3 file''' as we please.
 +
 
 +
Let us play the '''audio file'''.
 
|-
 
|-
 
|| PDF to DOCX Conversion
 
|| PDF to DOCX Conversion
Line 348: Line 351:
 
|| Open the '''Downloads Folder'''
 
|| Open the '''Downloads Folder'''
  
'''Files App > Downloads Folder > newsletter.pdf'''
+
'''Files App Downloads Folder newsletter.pdf'''
 
|| I have a '''pdf '''file '''newsletter.pdf''' for this tutorial.
 
|| I have a '''pdf '''file '''newsletter.pdf''' for this tutorial.
  
Line 399: Line 402:
 
Type:  
 
Type:  
  
'''>python3 pdftodoc.py'''
+
'''python3 pdftodoc.py'''
  
 
Highlight:'''Conversion Complete'''
 
Highlight:'''Conversion Complete'''
Line 408: Line 411:
 
|| Navigating to the Downloads  
 
|| Navigating to the Downloads  
  
'''Files App > Downloads Folder > '''
+
'''Files App Downloads Folder '''
 
|| Go to the '''Downloads folder '''and double click to open the''' doc_output_text.docx''' file.
 
|| Go to the '''Downloads folder '''and double click to open the''' doc_output_text.docx''' file.
  
Line 419: Line 422:
 
Let us change the date to '''August 2024'''.
 
Let us change the date to '''August 2024'''.
 
|-
 
|-
| style="border-top:1pt solid #000000;border-bottom:0.5pt solid #000000;border-left:0.5pt solid #000000;border-right:1pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.092cm;padding-right:0.191cm;" | Closing the virtual environment
+
|| Closing the virtual environment
  
 
Type  
 
Type  
  
'''> deactivate'''
+
''' deactivate'''
 
|| Switch back to the terminal to close the virtual environment.
 
|| Switch back to the terminal to close the virtual environment.
 
Type '''deactivate'''.
 
Type '''deactivate'''.
Line 433: Line 436:
  
 
In this tutorial, we have learnt to
 
In this tutorial, we have learnt to
* <div style="margin-left:1.27cm;margin-right:0cm;">Convert '''JPG '''images to '''PNG '''format</div>
+
* Convert '''JPG '''images to '''PNG '''format
* <div style="margin-left:1.27cm;margin-right:0cm;">Extract text from '''PDF file '''and convert to audio</div>
+
* Extract text from '''PDF file '''and convert to audio
* <div style="margin-left:1.27cm;margin-right:0cm;">Convert a '''PDF file '''to '''DOCX '''format</div>
+
* Convert a '''PDF file '''to '''DOCX '''format
  
 
|-
 
|-
Line 442: Line 445:
 
'''Assignment'''
 
'''Assignment'''
 
|| As an assignment, please do the following:
 
|| As an assignment, please do the following:
* <div style="margin-left:1.27cm;margin-right:0cm;">Take a '''GIF '''and convert it to an image in '''PNG '''format.</div>
+
* Take a '''GIF '''and convert it to an image in '''PNG '''format.
* <div style="margin-left:1.27cm;margin-right:0cm;">Take a '''PDF '''of your favorite book and convert it to an '''audio book'''.</div>
+
* Take a '''PDF '''of your favorite book and convert it to an '''audio book'''.
 
|-
 
|-
 
|| Show Slide:'''About the Spoken Tutorial Project'''
 
|| Show Slide:'''About the Spoken Tutorial Project'''
|| The video at the following link summarises the '''Spoken Tutorial Project.'''Please download and watch it
+
|| The video at the following link summarises the '''Spoken Tutorial Project.
 +
 
 +
'''Please download and watch it
 
|-
 
|-
 
|| Show Slide:
 
|| Show Slide:
Line 455: Line 460:
 
For more details, please write to us.
 
For more details, please write to us.
 
|-
 
|-
|| Show Slide:'''Answers for THIS Spoken Tutorial'''
+
|| Show Slide:
 +
 
 +
'''Answers for THIS Spoken Tutorial'''
 
|| Please post your timed queries in this forum.
 
|| Please post your timed queries in this forum.
 
|-
 
|-
Line 463: Line 470:
 
|| For any general or technical questions on '''Python for Automation''', visit the '''FOSSEE forum''' and post your question.
 
|| For any general or technical questions on '''Python for Automation''', visit the '''FOSSEE forum''' and post your question.
 
|-
 
|-
|| Show Slide:'''Acknowledgement'''
+
|| Show Slide:
 +
 
 +
'''Acknowledgement'''
 
|| The '''Spoken Tutorial Project''' was established by the '''Ministry of Education, Government of India.'''
 
|| The '''Spoken Tutorial Project''' was established by the '''Ministry of Education, Government of India.'''
 
|-
 
|-

Latest revision as of 21:11, 30 September 2024

Visual Cue Narration
Show slide:

Welcome

Hello and welcome to the Spoken Tutorial on "File Conversion"
Show Slide:

Learning Objectives

In this tutorial, we will learn how to
  • Convert JPG images to PNG format
  • Extract text from PDF file and convert to audio
  • Convert a PDF file to DOCX format
Show Slide:

System Requirements

To record this tutorial, I am using
  • Ubuntu Linux OS version 22.04
  • Python 3.12.3
Show Slide:Pre-requisites

https://www.spoken-tutorial.org

To follow this tutorial
  • You must have basic knowledge of using Linux Terminal and Python
  • For pre-requisite Linux and Python Tutorials, please visit this website
  • Python libraries required for automation must be installed
Show Slide:

Code Files

  • The files used in this tutorial are provided in the Code files link.
  • Please download and extract the files.
  • Make a copy and then use them while practicing.
Show Slide: File Conversion File conversion is the process of transforming a file from one format to another.

This can involve changing the file type, the structure of its contents, or both.

This makes the file compatible with different applications and user requirements.

Show Slide:

File Conversion - Libraries

To automate the conversion of one file type to another, we need the following libraries:
  • PyPDF2 to handle PDF manipulation
  • Pyttsx3 to help with extracting content
  • Pdfplumber to convert the text to speech
  • PILfor image processing and manipulation
  • pdf2docx to convert PDF files to DOCX format
  • os handles file manipulation and operating system interactions
Show Slide:

Install - espeak package

  • sudo apt-get update
  • sudo apt-get install espeak
Note that the espeak package must be installed for this tutorial as a prerequisite.

Please install it using the following commands.

Open the Downloads Folder

Files App Downloads Folder logo.jpg

First let us see how to convert a JPG file into PNG format.

Logo.jpg is the file which we use for this demonstration.

Right Click on the image and then select properties to see the type of image.

This is a JPG.

Open the Downloads Folder

Files App Downloads Folder imgconversion.py

Open the Text Editor with the source file

I have created the source file imgconversion.py for the file conversion.

Now, we will go through the source code in the text editor.

Highlight:

from PIL import Image

First, we import the Image module from the Python Imaging Library.
Highlight:

def conversion(input_file, output_file):

We define a function conversion with two parameters.

Here input_file is a JPG file and the output_file will be in PNG format.

Highlight:

with Image.open(input_file) as img:

The image specified by input_file is opened.
Highlight:

img.save(output_file, "PNG")

We can now save the opened image to the specified output file in PNG format.
Highlight:

print("Conversion successful.")

If no exceptions are raised, a message, Conversion successful is printed.
Highlight:

except FileNotFoundError:

print("Error: Input file not found.")

If the input file is not found, an exception is caught and a message is printed to the terminal.
Highlight:

except Exception as e:

print(f"An error occurred: {str(e)}")

Other exceptions are caught here and printed to the terminal.
Highlight:

input_file = "logo.jpg"

output_file = "logo_converted.png"

The paths or names of the input and output files are specified.
Highlight:

conversion(input_file, output_file)

We can now call the function to initiate the conversion process.
Save the Code in the Downloads Folder Save the code as imgconversion.py in the Downloads folder.

Let us execute the program and see the results.

Open the terminal (Ctrl + Alt + T)

Start Virtual Environment

Type

source Automation/bin/activate

Open the terminal by pressing Control + Alt + T keys simultaneously.

We will open the virtual environment we created for the Automation series.

Type source space Automation forward slash bin forward slash activate.

Then press enter.

Running the Code

Type

cd Downloads

python3 imgconversion.py

Now type, cd Downloads.

Then type python3 imgconversion.py and press Enter to run the code.

Observing the Output Once the script is executed, the text “Conversion successful” is displayed on the terminal.


Let us check the output image.

Navigating to Downloads

Files App Downloads Folder logo_converted.png

Go to the Downloads folder and double click to open thelogo_converted.png file.

The image is the same as the one we provided as input.

However, the format is saved as PNG.

Let us check the properties of this image as well to confirm the image type.

This is a PNG.

PDF to Audio File Conversion Next, we will look at the conversion of a PDF to an audio file.
Open the Downloads Folder

Files App Downloads Folder test.pdf

I have created a pdf file test.pdf for this tutorial.

You can download it from the code file section and use it or you can create one with some basic text.

Open the Downloads Folder

Files App Downloads Folder pdftoaudio.py

Open the Text Editor with the source file

I have created the source file pdftoaudio.py for demonstration.

Let us review it in the text editor.

Highlight:

import pyttsx3

import pdfplumber

from PyPDF2 import PdfReader

First, the libraries necessary to convert a PDF to an audio file are imported.
Highlight:

file = 'test.pdf'

We will use test.pdf for audio conversion.
Highlight:

all_text = []

An empty list is initialized to store the extracted text from each page of the PDF.
Highlight:

s = pyttsx3.init()

We initialize a text to speech engine using the pyttsx3 library.
Highlight:

pdf_reader = PdfReader(file)

We create an object which opens the PDF file and accesses pages.
Highlight:

pages = len(pdf_reader.pages)

Here, the number of pages in the PDF is determined.
Highlight:

with pdfplumber.open(file) as pdf:

We will use the pdfplumber library and open the PDF.
Highlight:

for i in range(pages): page = pdf.pages[i] text = page.extract_text()

For loop is made to iterate over each page and all the text is extracted.
Highlight:

if text:

This checks if any text was extracted.
Highlight:

all_text.append(text)

We add the extracted text to the empty list we created earlier.
Highlight:

print(f"\n{text}\n")

The extracted text is printed on the terminal.
Highlight:

audio_file_name = f'audio_page_{i + 1}.mp3'

A string with the current page number is generated to act as the filename.
Highlight:

s.save_to_file(text, audio_file_name)

We now save the spoken text to an MP3 file with the generated name.
Highlight:

s.runAndWait()

We wait for all the text to speech conversions to finish before we move on.
Highlight:

except FileNotFoundError:

print('File not found. Please check the file path and name.')

If the input file is not found, an exception is caught and a message is printed to the terminal.
Highlight:

except Exception as e:

print(f'An error occurred: {str(e)}')

Other exceptions are caught here and printed to the terminal.
Highlight:

print('Audio files saved successfully.')

The print statement shows that the audio files have been saved successfully.
Save the code in the Downloads Folder Save the code as pdftoaudio.py in the Downloads folder.
Type:

python3 pdftoaudio.py

Switch back to the terminal.

Let us execute the program.

Now, type python3 pdftoaudio.py to run your code.

Observing the Output - in the terminal


Listen to the Output

Once thescript is executed, the text of thePDF is displayed on theterminal.


The text from thePDF is converted to audio using the text to speech engine.

Navigating to the Downloads

Files App Downloads Folder audio_page_1.mp3

Play the audio File

Let us now go to the Downloads folder and double click to open the audio_page_1.mp3 file.

The saved file being an audio file, we can play and pause the MP3 file as we please.

Let us play the audio file.

PDF to DOCX Conversion Finally, we shall see how to convert a PDF to a DOC file.
Open the Downloads Folder

Files App Downloads Folder newsletter.pdf

I have a pdf file newsletter.pdf for this tutorial.

Let us now look at the code that will convert this PDF to a word document.

Highlight:

from pdf2docx import Converter

First, we import the Converter class from the pdf2docx library.
Highlight:

def pdf_to_docx(pdf_file, docx_file):

We then define a function to convert the PDF to a DOC file.
Highlight:

cv = Converter(pdf_file)

We create an instance of the converter class and pass the pdf we created.
Highlight:

cv.convert(docx_file)

The convert method is called on the instance we created earlier.
Highlight:

cv.close()

We can now close the instance as the conversion is over.
Highlight:

print(f'Conversion complete: {pdf_file} to {docx_file}')

This is to print a message indicating the conversion is complete.
Highlight:

pdf_file = newsletter.pdf' docx_file = 'doc_output_text.docx'

We assign the PDF file we want to convert and the path where the doc file will be saved.
Highlight:

pdf_to_docx(pdf_file, docx_file)

The function is called to initiate the conversion process.
Save the code in the Downloads Folder Save the code as pdftodoc.py in the Downloads folder.
Switch to Terminal

Type:

python3 pdftodoc.py

Highlight:Conversion Complete

Switch back to the terminal and type python3 pdftodoc.py and press enter.

Once the code is executed, we see a message which indicates the conversion was completed.

Navigating to the Downloads

Files App Downloads Folder

Go to the Downloads folder and double click to open the doc_output_text.docx file.

As we can see, it has the same content as the PDF had earlier but in a doc file.

Earlier, it was not editable because it was in a PDF format.

Now that it is a doc file, we can edit it.

Let us change the date to August 2024.

Closing the virtual environment

Type

deactivate

Switch back to the terminal to close the virtual environment.

Type deactivate.

Show Slide:

Summary

This brings us to the end of this tutorial. Let us summarise.

In this tutorial, we have learnt to

  • Convert JPG images to PNG format
  • Extract text from PDF file and convert to audio
  • Convert a PDF file to DOCX format
Show Slide:

Assignment

As an assignment, please do the following:
  • Take a GIF and convert it to an image in PNG format.
  • Take a PDF of your favorite book and convert it to an audio book.
Show Slide:About the Spoken Tutorial Project The video at the following link summarises the Spoken Tutorial Project.

Please download and watch it

Show Slide:

Spoken Tutorial Workshops

The Spoken Tutorial Project team conducts workshops and gives certificates.

For more details, please write to us.

Show Slide:

Answers for THIS Spoken Tutorial

Please post your timed queries in this forum.
Show Slide:

FOSSEE Forum

For any general or technical questions on Python for Automation, visit the FOSSEE forum and post your question.
Show Slide:

Acknowledgement

The Spoken Tutorial Project was established by the Ministry of Education, Government of India.
Show Slide:

Thank You

This is Sai Sathwik, a FOSSEE Semester Long Intern 2024, IIT Bombay signing off.

Thanks for joining.

Contributors and Content Editors

Madhurig, Nirmala Venkat