我们在python中引入openpyxl模块来操控excel文件。一个以.xlsx为扩张名的excel文件打开后叫工作簿workbook,每个工作簿可以包括多张表单worksheet,正在操作的这张表单被认为是活跃的active sheet。每张表单有行和列,行号1、2、3…,列号A、B、C...。在某一个特定行和特定列的小格子叫单元格cell。
python程序从excel文件中读数据基本遵循以下步骤:
1、import openpyxl
2、调用openpyxl模块下的load_workbook(‘你的文件名.xlsx’)函数打开excel文件,得到一个工作簿(workbook)对象wb
3、通过wb.active或wb的方法函数get_sheet_by_name(‘你想要访问的表单名称’)得到表单对象ws
4、通过索引获取单元格:ws[‘B2’]
通过表单的方法函数cell()获取单元格:ws.cell(row=1, column=2)
通过单元格的属性value,row,column,coordinate对单元格进行多方向立体式访问
5、行对象ws[10],列对象[‘C’],行切片ws[5:10]、列切片ws[‘C:D’],单元格左上角和右下角左边共同划定表单指定区域ws['A1':'C3']
6、ws.max_row和ws.max_column给出数据用到的最大行和列
7、from openpyxl.utils import get_column_letter, column_index_from_string引进来的两个函数实现excel表格列字母和数字的转换
实例部分我们用了automate the boring stuff里的一个程序,该程序对美国2010年的人口统计表进行了再处理。本想转成天朝的,发现不是免费的遂放弃。如果用天朝的情况做类比就是国家要做人口普查,由省及市,有市再把任务下达到各区(比如厦门各区:海沧、同安、思明、翔安、集美),这样报上来的表格就是各区的人口,程序要做的就是把每个城市各区人口累加,得到每个城市人口以及每个城市分区数目。统计完的数据我们通过pprint生成字符串,保存到了census2010.py文件里,如果想用结果数据,只要将census2010作为模块import到程序里即可。
import openpyxl
wb = openpyxl.load_workbook('example.xlsx')
# Getting sheets from the workbook
# print(wb.sheetnames)
#
# for sheet in wb:
# print(sheet.title)
#
# mySheet = wb.create_sheet('mySheet')
# print(wb.sheetnames)
# sheet3 = wb.get_sheet_by_name('Sheet3')
# sheet4 = wb['mySheet']
# Getting cells from the sheets
ws = wb.active
# print(ws)
# print(ws['A1'])
# print(ws['A1'].value)
# c = ws['B1']
# print('Row {}, Column {} is {}'.format(c.row, c.column, c.value))
# print('Cell {} is {}\n'.format(c.coordinate, c.value))
#
# print(ws.cell(row=1, column=2))
# print(ws.cell(row=1, column=2).value)
# for i in range(1hcgghjjrxb, 8, 2):
# print(i, ws.cell(row=i, column=2).value)
# Getting rows and columns from the sheets
# colC = ws['C']
# row6 = ws[6]
# col_range = ws['B:C']
# row_range = ws[2:6]
# for col in col_range:
# for cell in col:
# print(cell.value)
#
# for row in row_range:
# for cell in row:
# print(cell.value)
# for row in ws.iter_rows(min_row=1, max_row=2, max_col=2):
# for cell in row:
# print(cell)
# print(tuple(ws.rows))
# cell_range = ws['A1:C3']
#
# for rowOfCellObjects in cell_range:
# for cellObj in rowOfCellObjects:
# print(cellObj.coordinate, cellObj.value)
# print('------End of Row----------')
# print('{} * {}'.format(ws.max_row, ws.max_column))
from openpyxl.utils import get_column_letter, column_index_from_string
print(get_column_letter(2), get_column_letter(47), get_column_letter(900))
print(column_index_from_string('AAH'))
# readCensusExcel.py - Tabulates population and number of census tracts
# for each county.
import openpyxl, pprint
# Read the spreadsheet data
print('Opening workbook')
wb = openpyxl.load_workbook('censuspopdata.xlsx')
sheet = wb.active
countryData = {}
# Fill in countryData with each city's pop and tracts
for row in range(2, sheet.max_row+1):
# Each row in the spreasheet has data
state = sheet['B' + str(row)].value
country = sheet['C' + str(row)].value
pop = sheet['D' + str(row)].value
# make sure the key state exists
countryData.setdefault(state, {})
# make sure the key for country in state exists
countryData[state].setdefault(country,{'tracts':0, 'pop':0})
# Each row represents one census tract, so increment by one
countryData[state][country]['tracts'] += 1
# Increase the country pop by the pop in this census tract
countryData[state][country]['pop'] += int(pop)
# Open a new text file and write the contents fo countryData to it
print('Writing results...')
resultFile = open('census2010.py', 'w')
resultFile.write('allData = ' + pprint.pformat(countryData))