Skip to main content

Building a Simple CSV Parser in C# [Beginner]


As several people have pointed out, the code in tutorial does not create a true CSV parser. We did, however, publish a followup tutorial that is a fully functional CSV parser. 


So you are sitting around and you somehow have 100 Comma Separated Value files (CSV) and you are not quite sure exactly what the best way to read them is. Well, if you are using Visual Studio and C#, you are in quite a bit of luck, because you can read a CSV file quite easily. With one very small function you can spit out a list of values, separated conveniently by rows and columns. Then you can take this list and use it however you want, perhaps in a DataTable or GridView object.

The parser we are going to build today is going to be extremely simple, and will in fact break on more complicated CSV files (files that have commas actually in the data, etc...). But for most CSV files, this will work fine - and look for a tutorial in the near future about building a parser that can easily deal with even the most convoluted of CSV files.

To start off, you need to open up Visual Studio and start a new C# application project, so go ahead and do that, naming the project whatever you want. Once your project is up and ready, you need to find a place to build and call your parser function. If you right click on your Form1.cs, then go to 'View Code' you will get your form1's code. Inside the main Form1 : Form class, under your public Form1() definition is the perfect place for your function for now. Later on you can move it to somewhere more permanent, but for now we will get the function working.

Sadly, one of the namespaces we will be using is not declared by default in the standard 'using' statements at the top of our file. But all you have to do is add it below all the others:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.IO; //System.IO is not used by default
 
Now we can tear up some serious code. The first step is to declare our parser function. It will look something like:
public List<string[]> parseCSV(string path)
{
}
 
This is a pretty simple function, which will take in a string that represents the path of the CSV file and spit out a List of string arrays. Now you may be asking why not just use a string array of string arrays (string[][])? Well, adding elements to an array is not exactly efficient, but a list can be added to, subtracted from, and is just generally a lot more flexible.

The next thing we need to do is declare our return variable, which is really just one line. So inside the function, as the first line, we have:
List<string[]> parsedData = new List<string[]>();
 
This is just declaring a List of string arrays that will hold our file information as we read each line. Now the cool thing is that the System.IO namespace has this neat class called StreamReader, which can open a text based file and read it line by line. This gives us the advantage of just calling a method that reads the file line by line rather than byte by byte. StreamReader is extremely handy for reading text files and is a perfect candidate for us in this case.

We are gong to declare our StreamReader with a using statement so it will be disposed once it leaves scope, and when it is disposed it will be closed automatically. So declaring the new StreamReader object will look something like:
using (StreamReader readFile = new StreamReader(path))
{
}
 
Take notice that the actual declaration is inside the using statement. Inside this statement we will be doing everything involving reading the file and building our list of string arrays. Now all a comma separated values file is is exactly what you would think - a file full of values separated by commas. Each line really corresponds to a row of data, so all we have to do is read the file line by line, then separate the values. Since the StreamReader Class can read a file line by line, all we really need to do is take the line and split it. But first we have to declare some variables, inside the using block of course.

We will need two variables, one to hold the line as it is read and an array to hold the separated values. We will call these line and row:
using (StreamReader readFile = new StreamReader(path))
{
  string line;
  string[] row;
 
Next we have to read the file line by line, which can be done with a very simple while statement. We will be reading the file until the current line is null, which will be the case when there are no more lines to read. To do this we set our line variable to our currently read line, then when the line is null, stop reading the file. It will look like so:
while ((line = readFile.ReadLine()) != null)
{
}
 
A very simple while loop that runs through the file until you reach a line that is empty. *Take note that a line is not null if it is space, newline, or the like. A line is only null if there is truly nothing there.* Inside this loop, all we need to do is split each line at the commas, then add the resulting array to our list. Luckily there are many times you need to split a string, so the basic string class has a method to do just this. After we split the line, adding it to the list is just as simple. We call list.add(). So with two lines of code we can do what we need to. After our additions our while statement will look like:
while ((line = readFile.ReadLine()) != null)
{
  row = line.Split(',');
  parsedData.Add(row);
}
 
Simple yet effective. So simple in fact, that you really don't have to read in just comma separated files, but any file separated by a standard character can be read. all you have to do is change the split() call to whatever character is splitting the file. As mentioned above, the first line sets our row variable to the values of our split string, and the second line adds that string array to our list. Not difficult to understand at all.

The end of the while loop actually means the end of our using block as well. After the using block we have a completely filled list of string arrays, which represent rows of data in our CSV file. All we need to do now is put the whole thing in a Try-Catch block, which will catch any errors we may get when attempting to open or read the file.

We don't actually need anything fancy, in fact we will just catch any exception we get in the using block (since there are a bunch of different kinds that could be thrown). So our final function will look something like:
public List<string[]> parseCSV(string path)
{
  List<string[]> parsedData = new List<string[]>();

  try
  {
    using (StreamReader readFile = new StreamReader(path))
    {
      string line;
      string[] row;

      while ((line = readFile.ReadLine()) != null)
      {
        row = line.Split(',');
        parsedData.Add(row);
      }
    }
  }
  catch (Exception e)
  {
    MessageBox.Show(e.Message);
  }

  return parsedData;
}
 
Notice that we just take the message from the exception and display it with the standard MessageBox class. This will work, and our function will actually just return an empty list on any error, which means that our code actually doesn't break, we just don't get any data. So after we return our parseData list, whether filled or not, our function ends.

A small function that is easy to understand and even easier to build. Even better, since it returns a basic list object, you can use the data returned to do anything from fill a grid to making complex calculations. You can also make this function read any type of separated file, just change the separator in the split statement.

Using our new function to fill a DataGrid

I hope this tutorial was informative and most of all useful.

Comments

Popular posts from this blog

C# Snippet - Shuffling a Dictionary [Beginner]

Randomizing something can be a daunting task, especially with all the algorithms out there. However, sometimes you just need to shuffle things up, in a simple, yet effective manner. Today we are going to take a quick look at an easy and simple way to randomize a dictionary, which is most likely something that you may be using in a complex application. The tricky thing about ordering dictionaries is that...well they are not ordered to begin with. Typically they are a chaotic collection of key/value pairs. There is no first element or last element, just elements. This is why it is a little tricky to randomize them. Before we get started, we need to build a quick dictionary. For this tutorial, we will be doing an extremely simple string/int dictionary, but rest assured the steps we take can be used for any kind of dictionary you can come up with, no matter what object types you use. Dictionary < String , int > origin = new Dictionary < string , int >();

C# Snippet - The Many Uses Of The Using Keyword [Beginner]

What is the first thing that pops into your mind when you think of the using keyword for C#? Probably those lines that always appear at the top of C# code files - the lines that import types from other namespaces into your code. But while that is the most common use of the using keyword, it is not the only one. Today we are going to take a look at the different uses of the using keyword and what they are useful for. The Using Directive There are two main categories of use for the using keyword - as a "Using Directive" and as a "Using Statement". The lines at the top of a C# file are directives, but that is not the only place they can go. They can also go inside of a namespace block, but they have to be before any other elements declared in the namespace (i.e., you can't add a using statement after a class declaration). Namespace Importing This is by far the most common use of the keyword - it is rare that you see a C# file that does not h

C# WPF Printing Part 2 - Pagination [Intermediate]

About two weeks ago, we had a tutorial here at SOTC on the basics of printing in WPF . It covered the standard stuff, like popping the print dialog, and what you needed to do to print visuals (both created in XAML and on the fly). But really, that's barely scratching the surface - any decent printing system in pretty much any application needs to be able to do a lot more than that. So today, we are going to take one more baby step forward into the world of printing - we are going to take a look at pagination. The main class that we will need to do pagination is the DocumentPaginator . I mentioned this class very briefly in the previous tutorial, but only in the context of the printing methods on PrintDialog , PrintVisual (which we focused on last time) and PrintDocument (which we will be focusing on today). This PrintDocument function takes a DocumentPaginator to print - and this is why we need to create one. Unfortunately, making a DocumentPaginator is not as easy as