DataScholars

A blog about data science, computer science, machine learning, artificial intelligence, computational social science, data mining, analysis, and visualization.

GDF: A CSV Like Format For Graphs

cross-blogged by reiver

Sometimes the data you are dealing with is a graph.

Before you even start considering how you will view your graph data (such as by using Gephi or Guess, etc) you need to (serialize and) store you graph into some kind of file format.

If we were dealing with tabular data (instead of graph data) we would probably use a CSV file, if we wanted to keep things simple. But is there anything like a CSV file format for graphs?

The answer is yes! The answer is: GDF.

CSV Format

The CSV format is a very simple format for storing spreadsheets. It looks like this:


"Name","DOB","Sex"
"Joe Blow","1922-11-15","male"
"Jane Doe","1980-02-14","female"
"Homer Jay Simpson","1955-05-12","male"
"Philip J. Fry","1974-08-09","male"
Figure 1. An example CSV file. This CSV file has 3 columns: name, DOB and sex. Also, this CSV file has 4 rows. (download)

As you can see it is a very simple text-based format that is very easy for someone with basic programming skills to generate.

It is also very well supported, in various software. All the major spreadsheet software supports it. And software that exports (to a spreadsheet file or "to Excel") tend to export to CSV.

As far as tabular data is concerned, CSV is ubiquitous.

In a spreadsheet software package, that example CSV file in figure 1 might look something like this:

Figure 2. Screenshot of the CSV shown in figure 1 loaded into LibreOffice Calc.

GDF Format

But CSV is for tabular-type data. Is there something as simple for graph-type data?

The answer is yes!

That is what GDF is.

The GDF format is a very simple format for storing graphs. It looks like this:


nodedef> name
a
b
c
d
e
edgedef> node1,node2
a,b
b,c
b,d
d,e
Figure 3. An example GDF file. This GDF file has 5 nodes: a, b, c, d and e. Also, this GDF file has 4 edges. (download)

As you can see this too is a very simple text-based format that is very easy for someone with basic programming skills to generate.

It is also very well supported, in various software. (Both Gephi and Guess support it.)

In a graph visualization software, it might look something like this:

Figure 4. Screenshot of the GDF file shown in figure 3 loaded into Gephi.

Here are some (more) complex GDF files, just to give you a sense of what GDF files can look like....


nodedef> name
cherry
c
n235
"Joe Blow"
"a,b,c"
edgedef> node1,node2
cherry,c
n235,c
c,"Joe Blow"
"Jow Blow","a,b,c"
"a,b,c",c
Figure 5. An example GDF file. This GDF file has 5 nodes: cherry, c, n235, "Joe Blow" and "a,b,c". Also, this GDF file has 5 edges.

And....


nodedef> name,label
me,"Joe Blow"
friend_1,"Jane Doe"
friend_2,Homer Jay Simpson"
friend_3,"Philip J. Fry"
friend_4,"Sheldon Lee Cooper"
edgedef> node1,node2,weight
me,friend_1,1.1
me,friend_2,7.4
me,friend_3,100.0003
me,friend_4,3.14159265358979323846264338327950
friend_1,friend_2,33.3
friend_1,friend_3,0.000001
friend_2,friend_3,12.345
Figure 6. An example GDF file. This GDF file has 5 nodes: me, friend_1, friend_2, friend_3 and friend_4. Also, this GDF file has 7 edges. This GDF file also makes use of the built-in node label and weight columns, in the node and edge sections, respectively.

Two Sections To A GDF File Format

You probably already noticed that a GDF file has 2 section: a node-section and an edge-section.

If we consider the GDF files in figure 3, the node-section is:


nodedef> name
a
b
c
d
e
Figure 7. Part of the GDF file from figure 3. This part of the GDF file lists the nodes in the graph.

And the edge-section is:


edgedef> node1,node2
a,b
b,c
b,d
d,e
Figure 8. Part of the GDF file from figure 3. This part of the GDF file lists the edges in the graph.

If you take what is in figure 7 and figure 8 and combine them (to get what you have in figure 3) you have a complete GDF file.

Node Labels

Nodes can be given a label. (Labels are human-readable text.)

You do this in the node-section with something like:


nodedef> name,label
a,"Apple"
b,"Banana"
c,"Cherry"
d,"Did it!"
e,"Ed 209"
edgedef> node1,node2
a,b
b,c
b,d
d,e
Figure 9. An example GDF file, with labels on the nodes. (download)

In a graph visualization software, it might look something like this:

Figure 10. Screenshot of the GDF file shown in figure 9 loaded into Gephi.

Edge Weights

Weights can be given to edges. (These are weights in the graph theory sense of the word.)

Building on what we have in figure 9, an example of this is:


nodedef> name,label
a,"Apple"
b,"Banana"
c,"Cherry"
d,"Did it!"
e,"Ed 209"
edgedef> node1,node2,weight
a,b,2
b,c,30
b,d,0.4
d,e,200
Figure 11. An example GDF file, with weights on the nodes. (download)

In a graph visualization software, the GDF filei in figure 11 might look something like:

Figure 12. Screenshot of the GDF file shown in figure 11 loaded into Gephi.

Note that the graph visualization software, used to create figure 12, represented the edge weights by the thickness and the length of the edges.

That's All Folks

And that is really the important stuff when it comes to GDF files.

There are some other capabilities, but for most your needs this is enough to know.

One thing to keep in mind though: make sure to start the node names with a letter. (Else, put the node name into a quoted string.)

And BTW. this is what my generated GDF files tend to look like (although usually much much larger):


nodedef> name,label
n11157,"James Wilson"
n12008,"Fred Olson"
n12009,"John Williams"
n12407,"Natalie McDonald"
n14773,"Gwen Nelson"
edgedef> node1,node2,weight
n11157,n12008
n12008,n12009
n12008,n12407
n12407,n14773
Figure 13. An example GDF file, more typical of what I generate.
--
submit to reddit
comments powered by Disqus