Background 

This control is inspired by the free web based word cloud generator called Wordle.

In fact the control is a screw-out product of my project at http://sourcecodecloud.codeplex.com . 

I really loved visualizations produced by Wordle, but my goal was to write a non web based local solution to process large amount of sensible data. There where number of components I found on the web, but most of them had either very pure performance when processing text and the visualization or layout was not that I expected.

Architecture and usage 

There are 4 phases when visualizing the word cloud:

Processing data like text, HTML or source code and extracting relevant words, while ignoring others. As an example I have implemented two of them TextExtractor extracts all words from some text string ignoring spaces and all non letter characters. Another one UriExtractor  fetches a URL content and tries to clean away HTML tags and Javascript. To be honest I just implemented as a showcase and its filtering capabilities are very pure.

To tap your own data source just implement IWordExtractor interface.

    public interface IWordExtractor
    {
        IEnumerable<string> GetWords();
    } 

Counting words and ignoring ones from blacklist. 

The result is an enumeration with pairs of words and integers representing number of occurrences of this word in a text. You can sort this sequence either by weight (number of occurrences) or alphabetically and pass to control to visualize them.

IBlacklist blacklist = new CommonBlacklist(new[] {"the", "and", "to", "for", "out"});
WordCounter counter = new WordCounter(blacklist);
progress = new ProgressBarWrapper(progressBar);
extractor = new StringExtractor(textBox.Text, progress);
wordRegistry = counter.Count(extractor);
KeyValuePair<string, int>[] pairs = wordRegistry.GetSortedByOccurances();

cloudControl.WeightedWords = pairs;  

Layouting – I use a QuadTree data structure to create a non overlapping map of words on controls graphics. The same data structure is also used to query control which words are under certain rectangular area or point. This query is used to redraw only particular area when needed or perform some action when control was clicked. Thereby it is very useful to know which word was clicked to perform a word related action, let’s say show statistics or navigate to some URL.

        private void cloudControl_Click(object sender, EventArgs e)
        {
            LayoutItem itemUderMouse;
            Point mousePositionRelativeToControl = 
               cloudControl.PointToClient(new Point(MousePosition.X, MousePosition.Y));
            if (!cloudControl.TryGetItemAtLocation(mousePositionRelativeToControl, out itemUderMouse))
            {
                return;
            }
            MessageBox.Show(itemUderMouse.Word);
        } 

Configuring the Word Cloud Control 

There are several things you may very on this control: 

You can change font type and size.

cloudControl.MinFontSize = 6;
cloudControl.MaxFontSize = 60;
cloudControl.Font = new Font(new FontFamily("Verdana"), 8, FontStyle.Regular); 

Use different colours.

cloudControl.Palette = new Brush[] {Brushes.DarkRed, Brushes.Red, Brushes.LightPink};  

Use different layout. Currently there are two layouts implemented. You can implement your own one by deriving from BaseLayout or just by implementing ILayout interface on your own.

cloudControl.LayoutType = LayoutType.Typewriter; 

The logic of layouting and drawing graphics is strictly separated by IGraphicEngine interface. So I  think it would not be a big deal to port it to WPF or Silverlight in the future. 

Credits 

Thanks to Michael Coyle for the great article A Simple QuadTree Implementation in C# http://www.codeproject.com/KB/recipes/QuadTree.aspx 

Thanks to Jonathan Feinberg, creator of Wordle for that beautiful cloud and hints about algorithms behind http://stackoverflow.com/questions/342687/algorithm-to-implement-something-like-wordle 

推荐.NET配套的通用数据层ORM框架:CYQ.Data 通用数据层框架