7. Run the batch.bat again. (It's possible to output dump.txt and dump2.txt at the grep step [which could be included in the batch.bat] and change the batch to output different files)
(all of them with one batch)
...
...
@@ -128,7 +128,9 @@ EXIT
9. Compare the both unique.txt files manually. (That was the approach, automation with reproducible results would be fine)
----
[[BR]] [[BR]] Old version:
Old version:
This is document is going to describe what I did to obtain two lists with bridge names to compare them.
[[BR]] Process the files // Since manual copying each line to a new file is slow I used grep for Windows. I had it installed already, but it was rather unused.
Process the files // Since manual copying each line to a new file is slow I used grep for Windows. I had it installed already, but it was rather unused.
// Linux is wonderful here, Windows lacks this function.
...
...
@@ -186,7 +190,8 @@ where not up 24 hours.
// I renamed the file to "relays unsorted uncleaned.txt"
[[BR]] Sort the lines // I considered it useful to sort the lines. Windows isn't able to sort the content of files.
Sort the lines // I considered it useful to sort the lines. Windows isn't able to sort the content of files.
// Since I used Notepad++ for looking into the files I wanted to use it
...
...
@@ -214,7 +219,8 @@ and vote digest
// by default it should not be hard to reproduce this.
[[BR]] Try trimming list // To compare nicknames in the first place it should be much easier
Try trimming list // To compare nicknames in the first place it should be much easier
// to see the same nickname just once at a time.
...
...
@@ -230,11 +236,12 @@ imported because spreadsheet programs are limited. I therefor had to split the
list first.
// I used !LibreOffice 3.5, but Microsoft Excel has a limit amount
// I used LibreOffice 3.5, but Microsoft Excel has a limit amount
// of lines as well.
[[BR]] Split the list // Windows is able to split files, but I don't know how well.
Split the list // Windows is able to split files, but I don't know how well.
// I used GSplit, because I knew it could split after x occurrences of a pattern. This includes special characters
...
...
@@ -264,7 +271,7 @@ Keep the wanted // I considered nickname and fingerprint to be va
// the fingerprint makes identification easier.
16. I loaded each part in a spreadsheet application. //Calc from !LibreOffice 3.5
16. I loaded each part in a spreadsheet application. //Calc from LibreOffice 3.5
17. I used spaces as separator and made sure every column is treated as text
...
...
@@ -280,7 +287,8 @@ Keep the wanted // I considered nickname and fingerprint to be va
// separated by space, no empty lines in between.
[[BR]] Trim the list // Now both files contained the nick and the fingerprint, but still multiple times
Trim the list // Now both files contained the nick and the fingerprint, but still multiple times
// I wanted to remove the duplicates.
...
...
@@ -324,7 +332,8 @@ which now contains 9469 lines. //strange there are not so many relays
// I decided to go on, even though it was strange.
[[BR]] Unnamed relays // Before I started I wondered if Unnamed relays would tell me anything.
Unnamed relays // Before I started I wondered if Unnamed relays would tell me anything.
// I looked at "Unnamed" and counted them; whole word, match case
...
...
@@ -354,7 +363,8 @@ which now contains 9469 lines. //strange there are not so many relays
// Should have names changed that often?
[[BR]] Bridges
Bridges
26. I downloaded the bridge data
...
...
@@ -398,15 +408,18 @@ which now contains 9469 lines. //strange there are not so many relays
35. Copied lines I found from "bridges sorted.txt" and "relay sorted cleaned.txt" to "findings.txt"
[[BR]] The files I really worked with are"unique relay names only sorted.csv","bridges names only unique.csv", "bridges sorted.txt" and"relay sorted cleaned.txt".
The files I really worked with are"unique relay names only sorted.csv","bridges names only unique.csv", "bridges sorted.txt" and"relay sorted cleaned.txt".
I did not know if the other files I created along the way would be useful so I saved them. At least I haven't used them.
[[BR]] My approach as I planned it would to look at the bridge names and compare them to the relay names. Mainly because there are much more relays.
My approach as I planned it would to look at the bridge names and compare them to the relay names. Mainly because there are much more relays.
Would and should my approach be different if there would be 50000 bridges?
[[BR]] I'm sure some call me (something) for not taking a shortcut. I'm sure I could remove or skip a few steps if I know the right tools. Also I'm on Windows.
I'm sure some call me (something) for not taking a shortcut. I'm sure I could remove or skip a few steps if I know the right tools. Also I'm on Windows.
After I did all this, I was quite sure that this can be done with a script. Some experienced user would be better at this.