In an earlier blog post, we explained how we keep on adding sources and references to our database of generic trademarks.
This week I analysed the quality and independency of the different sources.
Let us talk about whether the different sources we use are independent from each other.
I would like to avoid that we use a source that is just a copy-paste from a different source. The best way to analyse this, is using a correlation check.
In the image below, I ran a correlation check between the thirteen biggest sources we use on this website:
The maximum correlation is 0.6, meaning that there are no two sources that contain identicaly the same content. This means there are no sources that are just "copy paste" from a different source. So good news.
As you can see from the same table, it appears that "source D" is the most independant source. It is not correlated with the other sources. I am working on increasing the number of references from that gem.
Quality of sources
Using the same correlation matrix, I noticed that there are five instances where there is a negative correlation. Eg. source A en B are. Meaning that, in the most cases, there are more differences between the sources than there are similarities. This of course does not appear to be logic.
What's the reason for this negative correlation? Work in progress..