A guide to read excel file using Python

Introduction

This tutorial shows a guide on how to read excel file using Python. You know that excel is great for generating reporting data. It also allows you to perform calculation on data, build complex formula which could be used to apply on different data. You can use this easily accessible excel tool to organize, analyze and store your data in tables. What’s more, this software is widely used in many different application fields all over the world and it also applies to data science.

We have seen various operations on excel files using wonderful API – Apache POI in Java technology and it requires few more code has to be written to read from or write to excel files.

But to read excel file using Python is very easy with a few lines of code. I will use a sample excel file to read here. You may also download the sample excel file through Google search and give it a try.

Related Posts:

Prerequisites

Python 3.6.5 to 3.8.5

There are various packages available to work with the excel spreadsheets and such packages are pandas, openpyxl, xlrd, xlutils and pyexcel. In this example I am going to use xlrd package to read excel file using Python.

Read Excel

In your actual project, most of the time you will be working on the data which are either generated through some process or static excel data that have to be read for further processing in your project or application.

Preparing Workspace

Preparing your workspace is one of the first things that you can do to make sure that you start off well. The first step is to check your working directory.

When you are working in the Python terminal, you need first navigate to the directory, where your file is located and then start up Python, i.e., you have to make sure that your file is located in the directory where you want to work from.

Installing Required Modules

Check for modules xlrd, datetime in Python terminal. Type the command as shown below to check xlrd and datetime. If you do not get any error message then those modules exist otherwise you have to install those non-existence modules.

>>>import xlrd
>>>import datetime

For example, if you do not have xlrd module then please find below steps to install it.

Installing xlrd Module

1. Open cmd prompt in Administrator mode (type cmd in Windows search and do right click and Run As Administrator).

2. Type below command in the cmd prompt to install it. On successful installation you will find successful message.

python read excel file

Now let’s move on to the example read excel file using Python.

Create Python Script

In the below image you see I have opened a cmd prompt and navigated to the directory where I have put the excel file that has to be read.

read excel file using python

I will read the below excel file using Python programming language. I am reading only first four rows from the excel sheet to simplify the example. You would generally read whole file to process your business needs in your application.

read excel file using python

You may also download the excel file shown in the above image here OrderData.

The above excel file should be put into the C:\py_scripts directory where we will also put the Python script to read the excel file.

Now create a Python script read_excel.py under the C:\py_scripts for reading the above excel file. Here py is extension of the Python file.

In the below Python script we imported datetime and xlrd modules.

Make sure if you do not have datetime module available in the Pyhton package then you need to install it following the same steps we installed xlrd module in above.

Then I create class OrderDetails and define constructor with all fields shown in the excel file.

I also define _str_(self) to represent an object’s instance to string.

Then I read the excel file OrderData.xlsx and get the instance wordbook wb.

I retrieve the sheets from wb instance and iterate through it. Here we have only one sheet but if we have multiple sheets then it will iterate through all sheets.

I skip the first row because it contains header names for the column values.

I have taken care of date column values to format the date value into your desired date format. That’s why I import date time module in the below Python script. Notice also how I have determined the cell type of date column.

Finally I print items through iterating the items array and at the same time I am accessing the single attribute of the object’s instance to show know how to access.

import datetime
import xlrd
from xlrd import open_workbook

class OrderDetails(object):

    def __init__(self, order_date, region, rep, item, units, unit_cost, total):
        self.order_date = order_date
        self.region = region
        self.rep = rep
        self.item = item
        self.units = units
        self.unit_cost = unit_cost
        self.total = total
		
    def __str__(self):
        return("Order details:\n"
            "  Order Date = {0}\n"
            "  Region = {1}\n"
            "  Representative = {2}\n"
            "  Item = {3}\n"
            "  Units = {4} \n"
            "  Unit Cost = {5} \n"
            "  Total = {6} \n"
            .format(self.order_date, self.region, self.rep, self.item, self.units, self.unit_cost, self.total))
			
wb = open_workbook('OrderData.xlsx')

for sheet in wb.sheets():
    number_of_rows = sheet.nrows
    number_of_columns = sheet.ncols
    items = []
    rows = []
                
    #skip the first row as it contains headers
    for row in range(1, number_of_rows):
        values = []
        for col in range(number_of_columns):
            if(sheet.cell_type(row,col) == 3):
                value  = (sheet.cell(row,col).value)
                try:
                    value = datetime.datetime(* (xlrd.xldate_as_tuple(sheet.cell(row,col).value, wb.datemode))).strftime('%d.%m.%Y')
                except ValueError:
                    pass
                finally:
                    values.append(value)
            else:
                value  = (sheet.cell(row,col).value)
                try:
                    value = str(int(value))
                except ValueError:
                    pass
                finally:
                    values.append(value)
        item = OrderDetails(*values)
        items.append(item)
		
for item in items:
    print(item)
    print("Accessing single attribut's value (eg. Order Date): {0}\n".format(item.order_date))
    print

Testing the Python Script

I am done with the coding part of the Python programming to read excel file.

Now it’s time to test your application whether you are getting expected output or not.

Run the below command in the cmp prompt:

python read_excel.py

Once you hit Enter or Return key, you should see final output in console as shown below in the image.

read excel file using python

That’s all. Hope you can implement similar concept to your application to read excel file using Python.

Source Code

Download source code

Thanks for reading.

Leave a Reply

Your email address will not be published. Required fields are marked *