blog/content/posts/bubble-graph.md
Asger Gitz-Johansen 479a1a3d27 feat: I found my old blog
It only had 2 posts, but whatever
2025-04-13 19:51:55 +02:00

3.5 KiB

+++ date = '2019-03-13' draft = false title = 'Plotting Circular Bubble Graph in LaTeX' tags = ['latex', 'programming'] categories = ['technical'] +++

When doing web-intelligence, you might need to visualize what's called inter-sentence similarity in a particular format. I personally haven't found an official name for these kinds of graphs, so I just simply call them Circular Bubble Graphs.

During a university project we needed such a graph in our report, and I got the idea of automatically plotting it through gnuplot and integrating it directly into our report with gnuplottex. You can see an example of the outcome of the script.

Sentence similarity bubble graph example

The script operates on a comma-seperated file (.csv). The data should be provided as a matrix of sentences assumed to be symmetric, with the cells containing a real number from 0 to 1, indicating the similarity between the column and the row. Because of this symmetric property, half of the matrix is ignored by the script. (It also ignores the diagonal, since sentence 23 will always have a maximum similarity to sentence 23. It would also be hard to plot that line)

The whole script can be seen below, but you can also download it as a file here. Make sure to set the my_dataset variable to your desired dataset. Example matrix can be downloaded here.

# Sentence similarity graph plotter
# uncomment this for manual operation of the dataset plotted
# my_dataset = "./sentence_similarities.csv" # ARG1
set parametric
set size square

# Styling
set pointsize 7.5
set style fill solid 1.0 border rgb 'grey30'
set style line 1 lc rgb 'black' pt 6 lw 0.5

# Basically a one-dimensional circular coordinate system
fx(t) = cos(t)
fy(t) = sin(t)
rownum = floor(system("wc -l ".my_dataset."")) +1
coord(k) = (k/real(rownum))*(2*pi)
fxx(t) = cos(coord(t))
fyy(t) = sin(coord(t))

set trange [0:2*pi-(coord(1.0))]
set sample rownum
set noborder
unset tics
set xrange [-1.2:1.2]
set yrange [-1.2:1.2]
set title "Sentence inter-similarity graph"
set multiplot
refloptimization = 0
do for [i = 0:rownum-1] {
  do for [j = refloptimization:rownum-1] {
    if (i != j) {
      # Get how many columns there are in the dataset.
      arrwidth = real(system("awk 'FNR == ".(i+1)." {print $".(j+1)."}' ".my_dataset.""))
      if (arrwidth > 0.0) {
        bubblerad = 0.125
        x1 = fxx(i)
        y1 = fyy(i)
        x2 = fxx(j)
        y2 = fyy(j)
        
        dvx = x2-x1
        dvy = y2-y1
        dvl = sqrt((dvx ** 2) + (dvy ** 2))
        x1 = x1 + (dvx/dvl)*bubblerad
        y1 = y1 + (dvy/dvl)*bubblerad
        x2 = x2 - (dvx/dvl)*bubblerad
        y2 = y2 - (dvy/dvl)*bubblerad
        # Overleaf's arrow-width rendering is pretty terrible, 
        # so we use a color-gradient to determine connection-strength.
        if (arrwidth > 0.2) { 
          col = "#000000" 
        } else { 
          if (arrwidth < 0.1) { 
            col = "#B8B8B8"
          } else { 
            col = "#E4E4E4" 
          } 
        }
        
        set arrow "".i.j."" from x1,y1 to x2,y2 nohead lw 0.5 lc rgb col
        #set label "H" at (fxx(j)-fxx(i)),(fyy(j)-fyy(i))
        show arrow "".i.j.""
      }
    }
  }
  refloptimization = refloptimization + 1
}
# Plot the circles
plot '+' u (fx(t)):(fy(t)) w p ls 1 notitle

# Plot the sentence labels
plot '+' u (fx(t)):(fy(t)):(sprintf("s.%d",$0+1)) with labels notitle

{{< centered image="/6616144.png" >}}