I ran into a problem with some unit test code where Visual Studio Code (VSCode) was not displaying certain characters properly. I'd opened up a unit test project and suddenly a bunch of tests were failing because certain apostrophe characters and dash characters were -- to use a technical term -- "messed up".
Displaying as Weird Question Mark Character
The specific character codes were the 8217 apostrophe and the 8212 dash. Instead of displaying as the characters they were supposed to be, they were displaying as a weird little question mark character that you can see in the image below.
This was all super strange because this is code that had been working properly for a long time. I started digging in and decided to look at what the character codes were for the messed up chars. So I put in a little code to view the contents of the string as a char[] array and created a breakpoint.
var array = expectedVoiceOver.ToCharArray();
When I looked at the value in the watch window, that apostrophe character was showing as char 65533
. What was weird was that the dash character was also showing up as char 65533
.
This was screaming "encoding problems" to me but VSCode was saying that the file was encoded as UTF-8.
Since I work across a couple different machines sometimes in Windows and sometimes on Mac I started wondering if this was just a problem on the Mac. Nope. Same behavior in VSCode on Windows, too.
But it worked just fine in Visual Studio 2022.
All the characters are displaying just fine in Visual Studio. WHY???
Answer: It's Visual Studio's Problem
I'm not exactly sure if it is actually a problem in Visual Studio or not but -- well -- it seems to have something to do with the encoding that Visual Studio is choosing. It's was using something called "Western European (Windows)" on my Windows machine. I only found that out after I fixed the problem though.
Fix: Tell Visual Studio to 'Save with Encoding'
The fix is to resave the file from Visual Studio 2022 using the proper encoding.
The way I did this was to go to the File menu and choose Save As... for the file that's got the problem.
One you do that, you should see the file Save As dialog. This next part is a little tricky. The save button has a little drop down menu attached to it. Click the drop down menu and choose Save with Encoding... from the context menu.
You'll get a dialog called Advanced Save Options. Change the Encoding value to be Unicode (UTF-8 with signature) - Codepage 65001. Then click the OK button.
That will save the file with the correct encoding.
PRO TIP: MAKE SURE YOU CHECK IN YOUR CHANGES TO VERSION CONTROL!!! This is a pretty significant change to that file but I almost forgot to check it in to Git.
Re-open the file in VSCode
Now that the file has been saved with the encoding, you can re-open the file in Visual Studio Code. Now when you look at that string with problems, the problems are all gone.
Summary
In summary, something weird was happening in Visual Studio 2022. I'm not exactly sure where the problem was being introduced or how but the encoding for certain files wasn't correct. VSCode would look at the file and think it was fine and try to display it. But the dotnet test runner and/or xUnit was having trouble, too. So somewhere along the way it done got totally borkened.
The fix is just to resave with encoding in VS2022.
I hope this helps.
-Ben