How To Read a File Line by Line Into a List in Python
In this article, I will discuss how to open a file for reading with the built-in function open() and the use of Pandas library to manipulate data in the file.
Join the DZone community and get the full member experience.
Join For FreeMost of the time, we process data from a file so that we can manipulate it from memory. This data can be numeric, string, or a combination of both. In this article, I will discuss how to open a file for reading with the built-in function open()
and the use of Pandas
library to manipulate data in the file. This also includes reading the contents of a file line by line and saving the same to a list.
Things To Learn in This Article
- Open the file for reading
- Read the contents of a file line by line
- Store the read lines into a list data type
- For loop
- List comprehension
- Readlines
- Readline
- Read file using pandas
1. Open the File for Reading
Python's built-in function open()
can be used to open a file for reading and writing. It is defined below based on the Python documentation.
open(file, mode='r', buffering=-1, encoding=None,
errors=None, newline=None, closefd=True, opener=None)
These are the supported values for the mode.
Here is an example script called main.py
that will open the file countries.txt
for reading.
main.py
with open('countries.txt', mode='r') as f:
# other stuff
An Alternative Method of Opening a File
An alternative way of opening a file for reading is the following.
f = open('countries.txt', mode='r')
# other stuff
f.close()
You have to close the file object explicitly with close()
. Anyhow use the with statement
whenever possible, as the context manager will handle the entry and exit execution of the code; hence closing the file object close()
is not needed.
2. Read the Contents of a File Line by Line
Let us go back to our code in main.py
. I will add a block of code that will read the contents of the file line by line and print it.
main.py
with open('countries.txt', mode='r') as f:
for lines in f:
line = lines.rstrip() # rstrip() will remove the newline character
print(line) # print to console
countries.txt
Australia
China
Philippines
Ensure that the main.py
and countries.txt
are on the same directory. That is because of the code above. In my case, they are in F:\Project\8thesource
path.
Execute the main.py
from the command line.
PS F:\Project\8thesource> python main.py
output
Australia
China
Philippines
There we have it. We read countries.txt
line by line using the open()
function and file object manipulation. The first line printed was Australia, followed by China, and finally the Philippines. It is consistent according to the sequence of how they were written in countries.txt
file.
3. Store the Read Lines Into a List Data Type
Python has a popular data type called list
that can store other object types or a combination of object types. A list of integers could be [1, 2, 3]
. A list of strings could be ['one', 'two', 'three']
. A list of integer and string could be [1, 'city', 45]
. A list of lists could be [[1, 2], [4, 6]]
. A list of tuples could be [(1, 2), ('a', 'b')]
. A list of dictionaries could be [{'fruit': 'mango'}, {'count': 100}]
.
I will modify the main.py
to store the read lines into a list.
a) For Loop
main.py
data_list = [] # a list as container for read lines
with open('countries.txt', mode='r') as f:
for lines in f:
line = lines.rstrip() # remove the newline character
data_list.append(line) # add the line in the list
print(data_list)
The countries.txt
is the file name. We open it for reading with symbol r
. We use the for loop
to read each line and save it to a list called data_list
. After saving all the lines to a list via append method, the items in the list are then printed.
After executing the main.py, we got the following output.
output
['Australia', 'China', 'Philippines']
b) List Comprehension
Another option to save the read lines into the list is by the use of list comprehension
. It uses a for loop
behind the scene and is more compact but not beginner-friendly.
with open('countries.txt', mode='r') as f:
data = [item.rstrip() for item in f]
print(data)
output
['Australia', 'China', 'Philippines']
c) Readlines
Yet another option to save the read lines in a list is the method readlines()
.
with open('countries.txt', mode='r') as f:
data = f.readlines()
print(data)
The output still has the newline character \n
.
output
['Australia\n', 'China\n', 'Philippines']
This newline character can be removed by reading each items on that list and strip it. The readlines
method is not an ideal solution if the file is big.
d) Readline
Another option to save the read line is by the use of the readline method.
data_list = []
with open('countries.txt', mode='r') as f:
while True:
line = f.readline()
line = line.rstrip() # remove the newline character \n
if line == '':
break
data_list.append(line)
print(data_list)
output
['Australia', 'China', 'Philippines']
4. Read File Using Pandas
For people aspiring to become data scientists, knowledge of processing files is a must. One of the tools that should be learned is the Pandas
library. This can be used to manipulate data. It can read files, including popular csv
or comma-separated values
formatted file.
Here is a sample scenario, we are given a capitals.csv
file that contains the name of the country in the first column and the corresponding capital in the second column. Our job is to get a list of country
and capital
names.
capitals.csv
Country,Capital
Australia,Canberra
China,Beijing
Philippines,Manila
Japan,Tokyo
For this particular job, it is better to use the Pandas library. The expected outputs are the country list [Australia, China, Philippines, Japan]
and the capital list [Canberra, Beijing, Manila, Tokyo]
.
Let us create capitals.py
to read the capitals.csv
using Pandas.
capitals.py
"""
requirements:
pandas
Install pandas with
pip install pandas
"""
import pandas as pd
# Build a dataframe based from the csv file.
df = pd.read_csv('capitals.csv')
print(df)
command line
PS F:\Project\8thesource> python capitals.py
output
Country Capital
0 Australia Canberra
1 China Beijing
2 Philippines Manila
3 Japan Tokyo
Now we need to get the values in the Country
and Capital
columns and convert those to a list.
import pandas as pd
# Build a dataframe based from the csv file.
df = pd.read_csv('capitals.csv')
print(df)
# Get the lists of country and capital names.
country_names = df['Country'].to_list()
capital_names = df['Capital'].to_list()
Pandas is very smart about this. It easily gets the tasks that we are after.
Now let us print those lists.
import pandas as pd
# Build a dataframe based from the csv file.
df = pd.read_csv('capitals.csv')
print(df)
# Get the lists of country and capital names.
country_names = df['Country'].to_list()
capital_names = df['Capital'].to_list()
# Print names
print('Country names:')
print(country_names)
print('Capital names:')
print(capital_names)
output
Country Capital
0 Australia Canberra
1 China Beijing
2 Philippines Manila
3 Japan Tokyo
Country names:
['Australia', 'China', 'Philippines', 'Japan']
Capital names:
['Canberra', 'Beijing', 'Manila', 'Tokyo']
That is it. We got the country and capital names as lists.
5. Conclusion
We use the built-in function open()
to open and read the contents of a file and utilize the for loop
to read it line by line, then save it to a list — a Python data type. There are also options, such as list comprehension
, readlines
and readline
to save data into the list. Depending on the tasks and file given, we can use the Pandas
library to process a csv
file.
For further reading, have a look at Python's built-in function open() and the very useful Pandas Python library.
Published at DZone with permission of Ankur Ranpariya. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments