If you have a problem or need to report a bug please email : support@dsprobotics.com
There are 3 sections to this support area:
DOWNLOADS: access to product manuals, support files and drivers
HELP & INFORMATION: tutorials and example files for learning or finding pre-made modules for your projects
USER FORUMS: meet with other users and exchange ideas, you can also get help and assistance here
NEW REGISTRATIONS - please contact us if you wish to register on the forum
Users are reminded of the forum rules they sign up to which prohibits any activity that violates any laws including posting material covered by copyright
RUBY Encoding format
12 posts
• Page 1 of 2 • 1, 2
RUBY Encoding format
Hey guys, im trying to parse some txt files for emails & I get an error within Ruby:
Encoding: (in method 'event')::ConverterNotFoundError: code converter not found (ASCII-8BIT to UTF-8)
Now, I can parse emails one by one which its fine, but as soon as I try to do a whole directory it gives this error. I am going through each one and adding them to an array. . . take a look:
So as I said, it works fine until I try to parse a lot of files. It gets through about 15 or so before stating the error. I have just exported my emails from thunderbird so they are .eml files.
Its weird because it goes through like 15 before spitting out the error.
I have checked online and tried a few solutions to no avail. Does anyone here have experience with the Encoder functions? Why give an error after doing a few then stopping?
Encoding: (in method 'event')::ConverterNotFoundError: code converter not found (ASCII-8BIT to UTF-8)
Now, I can parse emails one by one which its fine, but as soon as I try to do a whole directory it gives this error. I am going through each one and adding them to an array. . . take a look:
- Code: Select all
@db = []
Dir.foreach('C:\S') do |item|
next if item == '.' or item == '..'
newfile = 'C:/S/' << item
f = File.open(newfile)
content = f.read
r = Regexp.new(/\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b/)
emails = content.scan(r).uniq
addy = emails[-1]
@db << addy
output @db
end
So as I said, it works fine until I try to parse a lot of files. It gets through about 15 or so before stating the error. I have just exported my emails from thunderbird so they are .eml files.
Its weird because it goes through like 15 before spitting out the error.
I have checked online and tried a few solutions to no avail. Does anyone here have experience with the Encoder functions? Why give an error after doing a few then stopping?
- Drnkhobo
- Posts: 312
- Joined: Sun Aug 19, 2012 7:13 pm
- Location: ZA
Re: RUBY Encoding format
You said you tried something from the web without specifying what exactly. So excuse me, if this is not helpful.
This will inform ruby of interpreting as utf-8 without touching the byte sequence of the file
Another chance might be that the byte sequence is read in wrong. This will do a conversion, but then again tell ruby to handle it as utf-8
Lastly a double conversion might help
ASCII-8bit shows that somehow the data is interpreted as binary instead of string. But I don't know why.
This will inform ruby of interpreting as utf-8 without touching the byte sequence of the file
- Code: Select all
content = f.read.force_encoding("utf-8")
Another chance might be that the byte sequence is read in wrong. This will do a conversion, but then again tell ruby to handle it as utf-8
- Code: Select all
content = f.read.encode("iso-8859-1").force_encoding("utf-8")
Lastly a double conversion might help
- Code: Select all
content = f.read.encode("iso-8859-1").encode("utf-8")
ASCII-8bit shows that somehow the data is interpreted as binary instead of string. But I don't know why.
"There lies the dog buried" (German saying translated literally)
- tulamide
- Posts: 2714
- Joined: Sat Jun 21, 2014 2:48 pm
- Location: Germany
Re: RUBY Encoding format
Thanks Tulamide
I will give it a go now now, I did try to force encoding but your solution looks like it might just work for me (how do I know )
Its weird that if I put 10 emails in my folder it does its job fine without complaining. As soon as its more than that, the error spits up. Strange. . .
I will give it a go now now, I did try to force encoding but your solution looks like it might just work for me (how do I know )
Its weird that if I put 10 emails in my folder it does its job fine without complaining. As soon as its more than that, the error spits up. Strange. . .
- Drnkhobo
- Posts: 312
- Joined: Sun Aug 19, 2012 7:13 pm
- Location: ZA
Re: RUBY Encoding format
Ok, I thought about the weird behavior. Don't laugh: Are you sure, the folder contains nothing else than human readable text files? Maybe there's some hidden system file among them or another binary?
I ask because your code does not protect against reading those in.
To be absolutely sure, you could extend the if statement with
I ask because your code does not protect against reading those in.
To be absolutely sure, you could extend the if statement with
- Code: Select all
next if item == '.' or item == '..' or item[-3, 3].casecmp("eml") != 0
"There lies the dog buried" (German saying translated literally)
- tulamide
- Posts: 2714
- Joined: Sat Jun 21, 2014 2:48 pm
- Location: Germany
Re: RUBY Encoding format
tulamide wrote:ASCII-8bit shows that somehow the data is interpreted as binary instead of string. But I don't know why.
This might give you a clue...
- Code: Select all
Encoding.name_list
#=> ["ASCII-8BIT", "UTF-8", "US-ASCII"]
If I do the same thing on my standard Windows Ruby 1.9.3 install, I get 99 different available encodings!
So, the FS version of Ruby has had huge amounts of the Encoding class ripped out. I can only guess that this is partly to enforce compatibility with the "green" strings, which are always ASCII, and maybe just to reduce the Ruby interpreter size for embedding into exports. It is very annoying - I had hoped that Ruby could be used to allow proper Unicode support, but it seems not!
All schematics/modules I post are free for all to use - but a credit is always polite!
Don't stagnate, mutate to create!
Don't stagnate, mutate to create!
-
trogluddite - Posts: 1730
- Joined: Fri Oct 22, 2010 12:46 am
- Location: Yorkshire, UK
Re: RUBY Encoding format
trogluddite wrote:This might give you a clue...
It does
Tbh, I didn't even think of checking it. I just assumed a fully functional ruby.
@Drnkhobo
Ignore the last two "content" examples. They won't work under these circumstances. And let me know, if one of the other tips helped you.
"There lies the dog buried" (German saying translated literally)
- tulamide
- Posts: 2714
- Joined: Sat Jun 21, 2014 2:48 pm
- Location: Germany
Re: RUBY Encoding format
I spoke to Malc about the problem with international characters (editboxes in edit state and difference between string prim vs text prim) because I can't display some things proper in my language, and he said he will check it. Since FS is expanding into less audiomatic areas - I think this issue may land on the priority list.
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
Feel free to donate. Thank you for your contribution.
- tester
- Posts: 1786
- Joined: Wed Jan 18, 2012 10:52 pm
- Location: Poland, internet
Re: RUBY Encoding format
Would be a valuable thing for users of any language - it would open the doors for working with many kinds of external documents, and most of the Windows API assumes multi-byte character encodings as standard. For example, I have had to use some really ugly hacks to make the DLLs I'm working on call Windows functions successfully when they have string arguments.
All schematics/modules I post are free for all to use - but a credit is always polite!
Don't stagnate, mutate to create!
Don't stagnate, mutate to create!
-
trogluddite - Posts: 1730
- Joined: Fri Oct 22, 2010 12:46 am
- Location: Yorkshire, UK
Re: RUBY Encoding format
Don't forget attachments, eml can contain binary attachments.
"Two things are infinite: the universe and human stupidity; and I'm not sure about the the universe."
Albert Einstein
Albert Einstein
-
JB_AU - Posts: 171
- Joined: Tue May 21, 2013 11:01 pm
Re: RUBY Encoding format
I'm not 100% sure what sequence of events you need.
So its possible to set Preference file to view as html or plain text, 1 less conversion step.
Retrieve mail, parse tags, dump it, do stuff.
The long range drone (50km away) i parse sms as (plain text)
Parse the sender tags, for a specific sender id.
Then set predefined condition by subject tag.
I'm not using Ruby that i know of!
Hope this helps?
So its possible to set Preference file to view as html or plain text, 1 less conversion step.
Retrieve mail, parse tags, dump it, do stuff.
The long range drone (50km away) i parse sms as (plain text)
Parse the sender tags, for a specific sender id.
Then set predefined condition by subject tag.
I'm not using Ruby that i know of!
Hope this helps?
"Two things are infinite: the universe and human stupidity; and I'm not sure about the the universe."
Albert Einstein
Albert Einstein
-
JB_AU - Posts: 171
- Joined: Tue May 21, 2013 11:01 pm
12 posts
• Page 1 of 2 • 1, 2
Who is online
Users browsing this forum: No registered users and 70 guests