How to Read Data From Pdf File Using Vbscript

December 10, 2021

These samples show how to extract all text from PDF file into TXT file (plain text) using Bytescout PDF Extractor SDK.

If you want to code along then you need to have Bytescout SDK installed in your machine. You can get your free trial here.

We've provided source code in different languages/frameworks below. Select your programming language:

ASP.NET
C#
Visual Basic .NET
VBScript

Input PDF file and output TXT file with extracted text (click to view full-size screenshot)

ASP.NET

using System; using System.Data; using System.Configuration; using System.Collections; using System.IO; using System.Web; using System.Web.Security; using System.Web.UI; using System.Web.UI.WebControls; using System.Web.UI.WebControls.WebParts; using System.Web.UI.HtmlControls; using Bytescout.PDFExtractor;  namespace ExtractAllText { 	public partial class _Default : System.Web.UI.Page 	{ 		protected void Page_Load(object sender, EventArgs e) 		{ 			// This test file will be copied to the project directory on the pre-build event (see the project properties). 			String inputFile = Server.MapPath('sample2.pdf');  			// Create Bytescout.PDFExtractor.TextExtractor instance 			TextExtractor extractor = new TextExtractor(); 			extractor.RegistrationName = 'demo'; 			extractor.RegistrationKey = 'demo'; 			 			// Load sample PDF document 			extractor.LoadDocumentFromFile(inputFile);  			Response.Clear(); 			Response.ContentType = 'text/html';  			// Save extracted text to output stream 			extractor.SaveTextToStream(Response.OutputStream);  			Response.End(); 		} 	} }

C#

using System; using Bytescout.PDFExtractor;  namespace ExtractAllText { 	class Program 	{ 		static void Main(string[] args) 		{ 			// Create Bytescout.PDFExtractor.TextExtractor instance 			TextExtractor extractor = new TextExtractor(); 			extractor.RegistrationName = 'demo'; 			extractor.RegistrationKey = 'demo';  			// Load sample PDF document 			extractor.LoadDocumentFromFile('sample2.pdf');  			// Save extracted text to file 			extractor.SaveTextToFile('output.txt');  			// Open output file in default associated application 			System.Diagnostics.Process.Start('output.txt'); 		} 	} }

VB.NET

Imports Bytescout.PDFExtractor  Class Program 	Friend Shared Sub Main(args As String())  		' Create Bytescout.PDFExtractor.TextExtractor instance 		Dim extractor As New TextExtractor() 		extractor.RegistrationName = 'demo' 		extractor.RegistrationKey = 'demo'  		' Load sample PDF document 		extractor.LoadDocumentFromFile('sample2.pdf')  		' Save extracted text to file 		extractor.SaveTextToFile('output.txt')  		' Open output file in default associated application 		System.Diagnostics.Process.Start('output.txt') 	End Sub End Class

VBScript

' Create Bytescout.PDFExtractor.TextExtractor object Set extractor = CreateObject('Bytescout.PDFExtractor.TextExtractor') extractor.RegistrationName = 'demo' extractor.RegistrationKey = 'demo'  ' Load sample PDF document extractor.LoadDocumentFromFile('....sample2.pdf')  ' Save extracted text to file extractor.SaveTextToFile('output.txt')  ' Open output file in default associated application Set shell = CreateObject('WScript.Shell') shell.Run 'output.txt', 1, false Set shell = Nothing  Set extractor = Nothing

Program Output is as below:

Though the source code is pretty simple and straight forward, Let's analyze it briefly.

1. Create Bytescout.PDFExtractor.TextExtractor instance.

TextExtractor extractor = new TextExtractor();
extractor.RegistrationName = 'demo';
extractor.RegistrationKey = 'demo';

After creating an instance we need to provide the registration key and name, which we'll get upon registration for Bytescout SDK.

2. Load PDF document

extractor.LoadDocumentFromFile('sample2.pdf');

Here we're simply loading the input PDF document. There are different versions of this method for different scenarios like loading password protected pdf, loading only a few pages of pdf document, etc. We can also use the 'LoadDocumentFromStream' method to load documents from memory or any other stream data.

3. Save extracted text to file

extractor.SaveTextToFile('output.txt');

In this step, we're performing text extraction along with saving it to the output file. If we want to save the output to stream then we can use the 'SaveTextToStream' method.

That's all guys. I hope you get an idea of how to use the Bytescout PDFExtractor assembly to extract text from PDF documents.

Happy Coding!

Tutorials:

How to Read Data From Pdf File Using Vbscript

Source: https://bytescout.com/products/developer/pdfextractorsdk/extract-text-from-pdf

Search This Blog

Uoutus