Support

If you have a problem or need to report a bug please email : support@dsprobotics.com

There are 3 sections to this support area:

DOWNLOADS: access to product manuals, support files and drivers

HELP & INFORMATION: tutorials and example files for learning or finding pre-made modules for your projects

USER FORUMS: meet with other users and exchange ideas, you can also get help and assistance here

NEW REGISTRATIONS - please contact us if you wish to register on the forum

Users are reminded of the forum rules they sign up to which prohibits any activity that violates any laws including posting material covered by copyright

RUBY Encoding format

For general discussion related FlowStone

RUBY Encoding format

Postby Drnkhobo » Mon Aug 11, 2014 9:46 am

Hey guys, im trying to parse some txt files for emails & I get an error within Ruby:

Encoding: (in method 'event')::ConverterNotFoundError: code converter not found (ASCII-8BIT to UTF-8)

Now, I can parse emails one by one which its fine, but as soon as I try to do a whole directory it gives this error. I am going through each one and adding them to an array. . . take a look:

Code: Select all
@db = []

Dir.foreach('C:\S') do |item|

next if item == '.' or item == '..'
newfile = 'C:/S/' << item

f =  File.open(newfile)
content = f.read
   
r = Regexp.new(/\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b/)     
emails = content.scan(r).uniq   
                                 
addy = emails[-1]
@db << addy
output @db

end


So as I said, it works fine until I try to parse a lot of files. It gets through about 15 or so before stating the error. I have just exported my emails from thunderbird so they are .eml files.

Its weird because it goes through like 15 before spitting out the error. :?

I have checked online and tried a few solutions to no avail. Does anyone here have experience with the Encoder functions? Why give an error after doing a few then stopping?
Drnkhobo
 
Posts: 312
Joined: Sun Aug 19, 2012 7:13 pm
Location: ZA

Re: RUBY Encoding format

Postby tulamide » Mon Aug 11, 2014 2:42 pm

You said you tried something from the web without specifying what exactly. So excuse me, if this is not helpful.

This will inform ruby of interpreting as utf-8 without touching the byte sequence of the file
Code: Select all
content = f.read.force_encoding("utf-8")


Another chance might be that the byte sequence is read in wrong. This will do a conversion, but then again tell ruby to handle it as utf-8
Code: Select all
content = f.read.encode("iso-8859-1").force_encoding("utf-8")


Lastly a double conversion might help
Code: Select all
content = f.read.encode("iso-8859-1").encode("utf-8")



ASCII-8bit shows that somehow the data is interpreted as binary instead of string. But I don't know why.
"There lies the dog buried" (German saying translated literally)
tulamide
 
Posts: 2714
Joined: Sat Jun 21, 2014 2:48 pm
Location: Germany

Re: RUBY Encoding format

Postby Drnkhobo » Mon Aug 11, 2014 2:58 pm

Thanks Tulamide :D

I will give it a go now now, I did try to force encoding but your solution looks like it might just work for me (how do I know :roll: :lol: )

Its weird that if I put 10 emails in my folder it does its job fine without complaining. As soon as its more than that, the error spits up. Strange. . .
Drnkhobo
 
Posts: 312
Joined: Sun Aug 19, 2012 7:13 pm
Location: ZA

Re: RUBY Encoding format

Postby tulamide » Mon Aug 11, 2014 4:57 pm

Ok, I thought about the weird behavior. Don't laugh: Are you sure, the folder contains nothing else than human readable text files? Maybe there's some hidden system file among them or another binary?
I ask because your code does not protect against reading those in.

To be absolutely sure, you could extend the if statement with

Code: Select all
next if item == '.' or item == '..' or item[-3, 3].casecmp("eml") != 0
"There lies the dog buried" (German saying translated literally)
tulamide
 
Posts: 2714
Joined: Sat Jun 21, 2014 2:48 pm
Location: Germany

Re: RUBY Encoding format

Postby trogluddite » Mon Aug 11, 2014 8:05 pm

tulamide wrote:ASCII-8bit shows that somehow the data is interpreted as binary instead of string. But I don't know why.

This might give you a clue...
Code: Select all
Encoding.name_list
#=> ["ASCII-8BIT", "UTF-8", "US-ASCII"]

If I do the same thing on my standard Windows Ruby 1.9.3 install, I get 99 different available encodings!
So, the FS version of Ruby has had huge amounts of the Encoding class ripped out. I can only guess that this is partly to enforce compatibility with the "green" strings, which are always ASCII, and maybe just to reduce the Ruby interpreter size for embedding into exports. It is very annoying - I had hoped that Ruby could be used to allow proper Unicode support, but it seems not!
All schematics/modules I post are free for all to use - but a credit is always polite!
Don't stagnate, mutate to create!
User avatar
trogluddite
 
Posts: 1730
Joined: Fri Oct 22, 2010 12:46 am
Location: Yorkshire, UK

Re: RUBY Encoding format

Postby tulamide » Tue Aug 12, 2014 11:07 am

trogluddite wrote:This might give you a clue...

It does :mrgreen:
Tbh, I didn't even think of checking it. I just assumed a fully functional ruby. :cry:

@Drnkhobo
Ignore the last two "content" examples. They won't work under these circumstances. And let me know, if one of the other tips helped you.
"There lies the dog buried" (German saying translated literally)
tulamide
 
Posts: 2714
Joined: Sat Jun 21, 2014 2:48 pm
Location: Germany

Re: RUBY Encoding format

Postby tester » Tue Aug 12, 2014 2:31 pm

I spoke to Malc about the problem with international characters (editboxes in edit state and difference between string prim vs text prim) because I can't display some things proper in my language, and he said he will check it. Since FS is expanding into less audiomatic areas - I think this issue may land on the priority list.
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
tester
 
Posts: 1786
Joined: Wed Jan 18, 2012 10:52 pm
Location: Poland, internet

Re: RUBY Encoding format

Postby trogluddite » Tue Aug 12, 2014 10:47 pm

Would be a valuable thing for users of any language - it would open the doors for working with many kinds of external documents, and most of the Windows API assumes multi-byte character encodings as standard. For example, I have had to use some really ugly hacks to make the DLLs I'm working on call Windows functions successfully when they have string arguments.
All schematics/modules I post are free for all to use - but a credit is always polite!
Don't stagnate, mutate to create!
User avatar
trogluddite
 
Posts: 1730
Joined: Fri Oct 22, 2010 12:46 am
Location: Yorkshire, UK

Re: RUBY Encoding format

Postby JB_AU » Mon Aug 18, 2014 9:48 am

Don't forget attachments, eml can contain binary attachments.
"Two things are infinite: the universe and human stupidity; and I'm not sure about the the universe."

Albert Einstein
User avatar
JB_AU
 
Posts: 171
Joined: Tue May 21, 2013 11:01 pm

Re: RUBY Encoding format

Postby JB_AU » Mon Aug 18, 2014 10:08 am

I'm not 100% sure what sequence of events you need.

So its possible to set Preference file to view as html or plain text, 1 less conversion step.
Retrieve mail, parse tags, dump it, do stuff.

The long range drone (50km away) i parse sms as (plain text)
Parse the sender tags, for a specific sender id.
Then set predefined condition by subject tag.

I'm not using Ruby that i know of!

Hope this helps?
"Two things are infinite: the universe and human stupidity; and I'm not sure about the the universe."

Albert Einstein
User avatar
JB_AU
 
Posts: 171
Joined: Tue May 21, 2013 11:01 pm

Next

Return to General

Who is online

Users browsing this forum: Majestic-12 [Bot] and 65 guests